Pandas DataFrame DataFrame.dropna() 函数
Minahil Noor
2023年1月30日
Pandas
Pandas DataFrame
-
pandas.DataFrame.dropna()语法 -
示例代码:
DataFrame.dropna()删除行 -
示例代码:
DataFrame.dropna()删除列 -
示例代码:
DataFrame.dropna()与how=all -
示例代码:
DataFrame.dropna()与指定的子集或阈值 -
示例代码:
DataFrame.dropna()与inplace=True
pandas.DataFrame.dropna() 函数通过丢弃包含空值的行或列,从 DataFrame 中删除空值(缺失值)。
NaN(Not a Number)和 NaT(Not a Time)代表空值。DataFrame.dropna() 检测这些值并相应地过滤 DataFrame。
pandas.DataFrame.dropna() 语法
DataFrame.dropna(axis, how, thresh, subset, inplace)
参数
axis |
它决定轴是行还是列。 如果它是 0 或 'index',那么它将删除包含缺失值的行。如果它是 1 或 'column',那么它将删除包含缺失值的列。默认情况下,它的值是 0 |
how |
这个参数决定函数如何删除行或列。它只接受两个字符串,可以是 all 或 all。默认情况下,它被设置为 any。any - 如果行或列中有任何空值,就会删除它。all - 如果行或列中缺少所有值,则放弃该行或列 |
thresh |
它是一个整数,指定了防止行或列丢失的非缺失值的最少数量 |
subset |
它是一个数组,其中有行或列的名称,用于指定删除程序 |
inplace |
它是一个布尔值,如果设置为 True,将就地改变调用者 DataFrame。默认情况下,它的值是 False |
返回值
它根据传递的参数返回一个过滤后的 DataFrame,其中包含删除的行或列。
示例代码:DataFrame.dropna() 删除行
默认情况下,轴为 0,即行,所以所有的输出都有行掉。
import pandas as pd
dataframe=pd.DataFrame({'Attendance': {0: 60, 1: None, 2: 80,3: None, 4: 95},
'Name': {0: 'Olivia', 1: 'John', 2: 'Laura',3: 'Ben',4: 'Kevin'},
'Obtained Marks': {0: None, 1: 75, 2: 82, 3: 64, 4: None}})
print(dataframe)
示例 DataFrame 如下。
Attendance Name Obtained Marks
0 60.0 Olivia NaN
1 NaN John 75.0
2 80.0 Laura 82.0
3 NaN Ben 64.0
4 95.0 Kevin NaN
这个函数的所有参数都是可选的。如果我们不传递任何参数,那么函数将丢弃所有包含一个空值的行。
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna()
print(dataframe1)
输出:
Attendance Name Obtained Marks
2 80.0 Laura 82.0
丢弃所有包含一个缺失值的行。
示例代码:DataFrame.dropna() 删除列
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(axis=1)
print(dataframe1)
输出:
Name
0 Olivia
1 John
2 Laura
3 Ben
4 Kevin
因为我们在 DataFrame.dropna() 方法中设置了 axis=1,所以它删除了所有包含一个缺失值的列。
示例代码:DataFrame.dropna() 与 how=all
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(axis=1, how="all")
print(dataframe1)
输出:
Attendance Name Obtained Marks
0 60.0 Olivia NaN
1 NaN John 75.0
2 80.0 Laura 82.0
3 NaN Ben 64.0
4 95.0 Kevin NaN
包含缺失值的行没有被删除,因为 how 参数的值被设置为 all,这意味着该行的所有值都应该是空的。
如果在指定的轴上缺少所有的值,那么 DataFrame.dropna() 方法会丢弃该轴,即使 how 被设置为 all。
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: None, 2: None, 3: None, 4: None},
}
)
print(dataframe)
print("--------")
dataframe1 = dataframe.dropna(axis=1, how="all")
print(dataframe1)
输出:
Attendance Name Obtained Marks
0 60.0 Olivia None
1 NaN John None
2 80.0 Laura None
3 NaN Ben None
4 95.0 Kevin None Attendance Name
0 60.0 Olivia
1 NaN John
2 80.0 Laura
3 NaN Ben
4 95.0 Kevin
示例代码:DataFrame.dropna() 与指定的子集或阈值
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(thresh=3)
print(dataframe1)
输出:
Attendance Name Obtained Marks
2 80.0 Laura 82.0
thresh 的值是 3,这意味着为了防止掉落,至少需要 3 个非空值。
我们也可以指定 subset。
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(subset=["Attendance", "Name"])
print(dataframe1)
输出:
Attendance Name Obtained Marks
0 60.0 Olivia NaN
2 80.0 Laura 82.0
4 95.0 Kevin NaN
根据 Attendance 和 Name 列,删除缺失值的行。如果只有其他列中的值比如这里的 Obtained Marks 列有缺失值,它就不会删除记录。
示例代码:DataFrame.dropna() 与 inplace=True
DataFrame.dropna() 如果 inplace 被设置为 True,则调用者 DataFrame 就地改变。
import pandas as pd
dataframe = pd.DataFrame(
{
"Attendance": {0: 60, 1: None, 2: 80, 3: None, 4: 95},
"Name": {0: "Olivia", 1: "John", 2: "Laura", 3: "Ben", 4: "Kevin"},
"Obtained Marks": {0: None, 1: 75, 2: 82, 3: 64, 4: None},
}
)
dataframe1 = dataframe.dropna(inplace=True)
print(dataframe1)
输出:
None
该参数对调用者 DataFrame 进行了原地修改,返回 None。
Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe