新手上路，请多包涵

以下是示例代码，目的只是合并给定文件夹及其子文件夹中的文本文件。我偶尔会收到 Traceback，所以不确定去哪里找。还需要一些帮助来增强代码以防止空行被合并并在合并/主文件中不显示任何行。在合并文件之前，应该执行一些清理或者只是在合并过程中忽略空行可能是个好主意。

文件夹中的文本文件不超过 1000 行，但聚合主文件很容易超过 10000 行。

 import os
root = 'C:\\Dropbox\\ans7i\\'
files = [(path,f) for path,_,file_list in os.walk(root) for f in file_list]
out_file = open('C:\\Dropbox\\Python\\master.txt','w')
for path,f_name in files:
    in_file = open('%s/%s'%(path,f_name), 'r')

    # write out root/path/to/file (space) file_contents
    for line in in_file:
        out_file.write('%s/%s %s'%(path,f_name,line))
    in_file.close()

    # enter new line after each file
    out_file.write('\n')

with open('master.txt', 'r') as f:
  lines = f.readlines()
with open('master.txt', 'w') as f:
  f.write("".join(L for L in lines if L.strip()))

Traceback (most recent call last):
  File "C:\Dropbox\Python\master.py", line 9, in <module> for line in in_file:
  File "C:\PYTHON32\LIB\encodings\cp1252.py", line  23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 972: character maps to <undefined>

原文由 user1582596 发布，翻译遵循 CC BY-SA 4.0 许可协议

python file-io python-3.x traceback python-unicode

阅读 425

2 个回答

得票最新

社区维基

发布于
2023-01-04

✓ 已被采纳

抛出错误是因为 Python 3 使用与内容不匹配的默认编码打开文件。

如果您所做的只是复制文件内容，您最好使用 shutil.copyfileobj() 函数以及以二进制模式打开文件。这样你就可以完全避免编码问题（当然只要你所有的源文件都是 _相同的编码_，所以你最终不会得到一个混合编码的目标文件）：

 import shutil
import os.path

with open('C:\\Dropbox\\Python\\master.txt','wb') as output:
    for path, f_name in files:
        with open(os.path.join(path, f_name), 'rb') as input:
            shutil.copyfileobj(input, output)
        output.write(b'\n') # insert extra newline between files

我已经稍微清理了代码以使用上下文管理器（这样您的文件在完成后会自动关闭）并使用 os.path 为您的文件创建完整路径。

如果你确实需要逐行处理你的输入，你需要告诉 Python 需要什么编码，这样它就可以将文件内容解码为 python 字符串对象：

 open(path, mode, encoding='UTF8')

请注意，这需要您预先知道文件使用的编码方式。

如果您对 python 3、文件和编码有更多疑问，请阅读 Python Unicode HOWTO 。

原文由 Martijn Pieters 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2023-01-04

我在使用 os module remove 函数删除文件时遇到了类似的问题。

我执行的所需更改是：

 file = open(filename)

到

file = open(filename, encoding="utf8")

添加一个编码=“utf-8”

UTF-8 是最常用的编码之一，Python 通常默认使用它。 UTF 代表“Unicode Transformation Format”，’8’ 表示在编码中使用 8 位值。 … UTF-8 使用以下规则：如果代码点 < 128，则由相应的字节值表示。

原文由 Mani Singh 发布，翻译遵循 CC BY-SA 4.0 许可协议

查看全部 2 个回答

推荐问题

Python：回溯编解码器.charmap_decode（输入，self.errors，decoding_table）\[0\]

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译