使用 usecols 时出现 pandas.read_excel 错误

我在从 Excel 文件读取数据时遇到了一些问题。 Excel 文件包含带有 unicode 字符的列名称。

由于某些自动化原因，我需要将 usecols 参数传递给 pandas.read_excel 函数。

问题是，当我不使用 usecols 参数时，数据加载时没有错误。

这是代码：

 import pandas as pd

df = pd.read_excel(file)
df.colums

Index([u'col1', u'col2', u'col3', u'col with unicode à', u'col4'], dtype='object')

如果我使用 usecols：

 COLUMNS = ['col1', 'col2', 'col with unicode à']
df = pd.read_excel(file, usecols = COLUMNS)

我收到以下错误：

 ValueError: Usecols do not match columns, columns expected but not found: ['col with unicode \xc3\xa0']

使用 encoding = 'utf-8' 作为 read_excel 的参数不能解决问题，并且还对 COLUMNS 元素进行编码。

编辑：这里是完整的错误窗口。

  ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-541ccb88da6a> in <module>()
      2 df = pd.read_excel(file)
      3 cols = df.columns
----> 4 df = pd.read_excel(file, usecols = ['col1', 'col2', 'col with unicode à'])

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\util_decorators.pyc in wrapper(*args, **kwargs)
    186                 else:
    187                     kwargs[new_arg_name] = new_arg_value
--> 188             return func(*args, **kwargs)
    189         return wrapper
    190     return _deprecate_kwarg

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\util_decorators.pyc in wrapper(*args, **kwargs)
    186                 else:
    187                     kwargs[new_arg_name] = new_arg_value
--> 188             return func(*args, **kwargs)
    189         return wrapper
    190     return _deprecate_kwarg

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in read_excel(io, sheet_name, header, names, index_col, parse_cols, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skip_footer, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    373         convert_float=convert_float,
    374         mangle_dupe_cols=mangle_dupe_cols,
--> 375         **kwds)
    376
    377

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    716                                   convert_float=convert_float,
    717                                   mangle_dupe_cols=mangle_dupe_cols,
--> 718                                   **kwds)
    719
    720     @property

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
    599                                     usecols=usecols,
    600                                     mangle_dupe_cols=mangle_dupe_cols,
--> 601                                     **kwds)
    602
    603                 output[asheetname] = parser.read(nrows=nrows)

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in TextParser(*args, **kwds)
   2154     """
   2155     kwds['engine'] = 'python'
-> 2156     return TextFileReader(*args, **kwds)
   2157
   2158

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, engine, **kwds)
    893             self.options['has_index_names'] = kwds['has_index_names']
    894
--> 895         self._make_engine(self.engine)
    896
    897     def close(self):

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine)
   1130                                  ' "c", "python", or' ' "python-fwf")'.format(
   1131                                      engine=engine))
-> 1132             self._engine = klass(self.f, **self.options)
   1133
   1134     def _failover_to_python(self):

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, **kwds)
   2236         self._col_indices = None
   2237         (self.columns, self.num_original_columns,
-> 2238          self.unnamed_cols) = self._infer_columns()
   2239
   2240         # Now self.columns has the set of columns that we will process.

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _infer_columns(self)
   2609                 columns = [names]
   2610             else:
-> 2611                 columns = self._handle_usecols(columns, columns[0])
   2612         else:
   2613             try:

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _handle_usecols(self, columns, usecols_key)
   2669                             col_indices.append(usecols_key.index(col))
   2670                         except ValueError:
-> 2671                             _validate_usecols_names(self.usecols, usecols_key)
   2672                     else:
   2673                         col_indices.append(col)

C:\Users\GiacomoSachs\Anaconda2\lib\site-packages\pandas\io\parsers.pyc in _validate_usecols_names(usecols, names)
   1235         raise ValueError(
   1236             "Usecols do not match columns, "
-> 1237             "columns expected but not found: {missing}".format(missing=missing)
   1238         )
   1239

ValueError: Usecols do not match columns, columns expected but not found: ['col with unicode \xc3\xa0']

原文由 Giacomo Sachs 发布，翻译遵循 CC BY-SA 4.0 许可协议

使用 usecols 时出现 pandas.read_excel 错误

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

分解质因素的算法很难，理解不了。请问有哪位大佬可以进行解释一下呢？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Stack Overflow 翻译

使用 usecols 时出现 pandas.read_excel 错误

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

分解质因素的算法很难，理解不了。 请问有哪位大佬可以进行解释一下呢？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Stack Overflow 翻译

分解质因素的算法很难，理解不了。请问有哪位大佬可以进行解释一下呢？