我正在使用 Pandas 中的布尔索引。问题是为什么声明： a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)] 工作正常而 a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)] 错误退出？例子： a = pd.DataFrame({'x':[1,1],'y':[10,20]}) In: a[(a['x']==1)&(a['y']==10)] Out: x y 0 1 10 In: a[(a['x']==1) and (a['y']==10)] Out: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() 原文由 user2988577 发布，翻译遵循 CC BY-SA 4.0 许可协议

Pandas 中布尔索引的逻辑运算符

2 个回答

发布于
2022-12-29

✓ 已被采纳

当你说

(a['x']==1) and (a['y']==10)

您隐含地要求 Python 将 (a['x']==1) 和 (a['y']==10) 转换为布尔值。

NumPy 数组（长度大于 1）和 Pandas 对象（例如 Series）没有布尔值——换句话说，它们引发

ValueError：数组的真值不明确。使用 a.empty、a.any() 或 a.all()。

当用作布尔值时。那是因为不清楚什么时候应该是 True 或 False 。如果它们的长度不为零，一些用户可能会认为它们是 True，例如 Python 列表。其他人可能希望它只有在所有元素都为真时才为真。如果其中任何元素为真，其他人可能希望它为真。

由于存在太多相互冲突的期望，NumPy 和 Pandas 的设计者拒绝猜测，而是引发 ValueError。

相反，您必须通过调用 empty() 、 all() 或 any() 方法来表明您想要的行为。

然而，在这种情况下，您似乎不需要布尔值求值，您需要 逐元素 逻辑与。这就是 & 二元运算符执行的操作：

 (a['x']==1) & (a['y']==10)

返回一个布尔数组。

顺便说一下，正如 alexpmil 指出的那样，括号是强制性的，因为 & 的运算符优先级高于 == 。

如果没有括号， a['x']==1 & a['y']==10 将被评估为 a['x'] == (1 & a['y']) == 10 这反过来相当于链式比较 (a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10) 这是 Series and Series 形式的表达式。使用 and 与两个系列将再次触发相同的 ValueError 如上所述。这就是为什么括号是强制性的。

原文由 unutbu 发布，翻译遵循 CC BY-SA 4.0 许可协议

社区维基

1

发布于
2022-12-29

TLDR; Logical Operators in Pandas are `&` , `|` and `~` , and parentheses `(...)` is important!

Python 的 and 、 or 和 not 逻辑运算符设计用于处理标量。因此 Pandas 必须做得更好并覆盖按位运算符以实现此功能的 _矢量化_（按元素）版本。

所以python中的以下内容（ exp1 和 exp2 是计算为布尔结果的表达式）……

 exp1 and exp2              # Logical AND
exp1 or exp2               # Logical OR
not exp1                   # Logical NOT

…将翻译成…

 exp1 & exp2                # Element-wise logical AND
exp1 | exp2                # Element-wise logical OR
~exp1                      # Element-wise logical NOT

对于熊猫。

如果在执行逻辑运算的过程中得到了 ValueError ，那么需要使用括号进行分组：

 (exp1) op (exp2)

例如，

 (df['col1'] == x) & (df['col2'] == y)

等等。

布尔索引：一个常见的操作是通过逻辑条件计算布尔掩码来过滤数据。 Pandas 提供了三个运算符： & 用于逻辑与， | 用于逻辑或，以及 ~ 用于逻辑非。

考虑以下设置：

 np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC'))
df

   A  B  C
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

逻辑与

对于上面的 df ，假设您想要返回 A < 5 和 B > 5 的所有行。这是通过分别计算每个条件的掩码并对它们进行 ANDing 来完成的。

按位重载 & 运算符

在继续之前，请注意文档的这个特定摘录，其中说明

另一个常见的操作是使用布尔向量来过滤数据。 The operators are: | for or , & for and , and ~ for not 。 这些必须使用括号分组，因为默认情况下 Python 将评估表达式，例如 df.A > 2 & df.B < 3 as df.A > (2 & df.B) < 3 ，而所需的评估顺序是 (df.A > 2) & (df.B < 3)

因此，考虑到这一点，可以使用按位运算符 & 实现按元素逻辑 AND ：

 df['A'] < 5

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'] > 5

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

 (df['A'] < 5) & (df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

随后的过滤步骤很简单，

 df[(df['A'] < 5) & (df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

括号用于覆盖按位运算符的默认优先顺序，其优先级高于条件运算符 < 和 > 。请参阅 python 文档中的运算符优先级部分。

如果不使用括号，则表达式的计算结果不正确。例如，如果您不小心尝试了诸如

df['A'] < 5 & df['B'] > 5

它被解析为

df['A'] < (5 & df['B']) > 5

哪个变成，

 df['A'] < something_you_dont_want > 5

这变成了（请参阅链式运算符比较中的 python 文档），

 (df['A'] < something_you_dont_want) and (something_you_dont_want > 5)

哪个变成，

 # Both operands are Series...
something_else_you_dont_want1 and something_else_you_dont_want2

哪个抛出

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

所以，不要犯那个错误！ 1个

避免括号分组

修复实际上非常简单。大多数运算符都有相应的 DataFrame 绑定方法。如果各个掩码是使用函数而不是条件运算符构建的，您将不再需要按括号分组来指定评估顺序：

 df['A'].lt(5)

0     True
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df['B'].gt(5)

0    False
1     True
2    False
3     True
4     True
Name: B, dtype: bool

 df['A'].lt(5) & df['B'].gt(5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

请参阅有关灵活比较的部分。 .总而言之，我们有

╒════╤════════════╤════════════╕
│    │ Operator   │ Function   │
╞════╪════════════╪════════════╡
│  0 │ >          │ gt         │
├────┼────────────┼────────────┤
│  1 │ >=         │ ge         │
├────┼────────────┼────────────┤
│  2 │ <          │ lt         │
├────┼────────────┼────────────┤
│  3 │ <=         │ le         │
├────┼────────────┼────────────┤
│  4 │ ==         │ eq         │
├────┼────────────┼────────────┤
│  5 │ !=         │ ne         │
╘════╧════════════╧════════════╛

避免括号的另一种选择是使用 DataFrame.query （或 eval ）：

 df.query('A < 5 and B > 5')

   A  B  C
1  3  7  9
3  4  7  6

我在使用 pd.eval() 的熊猫动态表达式评估中广泛记录了 query 和 eval 。

operator.and_

允许您以功能方式执行此操作。内部调用对应于位运算符的 Series.__and__ 。

 import operator

operator.and_(df['A'] < 5, df['B'] > 5)
# Same as,
# (df['A'] < 5).__and__(df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
dtype: bool

df[operator.and_(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

你通常不需要这个，但知道它很有用。

概括： np.logical_and （和 logical_and.reduce ）

另一种选择是使用 np.logical_and ，它也不需要括号分组：

 np.logical_and(df['A'] < 5, df['B'] > 5)

0    False
1     True
2    False
3     True
4    False
Name: A, dtype: bool

df[np.logical_and(df['A'] < 5, df['B'] > 5)]

   A  B  C
1  3  7  9
3  4  7  6

np.logical_and 是一个 ufunc （通用函数），大多数 ufunc 都有一个 reduce 方法。这意味着如果您有多个掩码与，则更容易用 logical_and 进行概括。 For example, to AND masks m1 and m2 and m3 with & , you would have to do

 m1 & m2 & m3

但是，一个更简单的选择是

np.logical_and.reduce([m1, m2, m3])

这很强大，因为它允许您在此基础上构建更复杂的逻辑（例如，在列表理解中动态生成掩码并添加所有掩码）：

 import operator

cols = ['A', 'B']
ops = [np.less, np.greater]
values = [5, 5]

m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)])
m
# array([False,  True, False,  True, False])

df[m]
   A  B  C
1  3  7  9
3  4  7  6

1 - 我知道我在这一点上喋喋不休，但请耐心等待。这是一个非常非常常见的初学者错误，必须非常彻底地解释。

逻辑或

对于上面的 df ，假设您想要返回 A == 3 或 B == 7 的所有行。

按位重载 |

 df['A'] == 3

0    False
1     True
2     True
3    False
4    False
Name: A, dtype: bool

df['B'] == 7

0    False
1     True
2    False
3     True
4    False
Name: B, dtype: bool

 (df['A'] == 3) | (df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[(df['A'] == 3) | (df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

如果您还没有，请同时阅读上面关于 逻辑和 的部分，所有注意事项都适用于此处。

或者，可以指定此操作

df[df['A'].eq(3) | df['B'].eq(7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

operator.or_

在后台调用 Series.__or__ 。

 operator.or_(df['A'] == 3, df['B'] == 7)
# Same as,
# (df['A'] == 3).__or__(df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
dtype: bool

df[operator.or_(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

np.logical_or

对于两个条件，使用 logical_or ：

 np.logical_or(df['A'] == 3, df['B'] == 7)

0    False
1     True
2     True
3     True
4    False
Name: A, dtype: bool

df[np.logical_or(df['A'] == 3, df['B'] == 7)]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

对于多个掩码，请使用 logical_or.reduce ：

 np.logical_or.reduce([df['A'] == 3, df['B'] == 7])
# array([False,  True,  True,  True, False])

df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])]

   A  B  C
1  3  7  9
2  3  5  2
3  4  7  6

逻辑非

给定一个面具，例如

mask = pd.Series([True, True, False])

如果您需要反转每个布尔值（以便最终结果为 [False, False, True] ），那么您可以使用以下任何方法。

按位 ~

 ~mask

0    False
1    False
2     True
dtype: bool

同样，表达式需要加括号。

 ~(df['A'] == 3)

0     True
1    False
2    False
3     True
4     True
Name: A, dtype: bool

这在内部调用

mask.__invert__()

0    False
1    False
2     True
dtype: bool

但是不要直接使用它。

operator.inv

在系列中内部调用 __invert__ 。

 operator.inv(mask)

0    False
1    False
2     True
dtype: bool

np.logical_not

这是 numpy 变体。

 np.logical_not(mask)

0    False
1    False
2     True
dtype: bool

Note, np.logical_and can be substituted for np.bitwise_and , logical_or with bitwise_or , and logical_not with invert 。

原文由 cs95 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

Pandas 中布尔索引的逻辑运算符

TLDR; Logical Operators in Pandas are `&` , `|` and `~` , and parentheses `(...)` is important!

逻辑与

逻辑或

逻辑非

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译

Pandas 中布尔索引的逻辑运算符

TLDR; Logical Operators in Pandas are & , | and ~ , and parentheses (...) is important!

逻辑与

逻辑或

逻辑非

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译

TLDR; Logical Operators in Pandas are `&` , `|` and `~` , and parentheses `(...)` is important!