新手上路，请多包涵

同一页中提到的 oodocx 模块将用户指向一个似乎不存在的 /examples 文件夹。

我已经阅读了 python-docx 0.7.2 的文档，以及我在 Stackoverflow 中可以找到的关于该主题的所有内容，所以请相信我已经完成了我的“功课”。

Python 是我唯一知道的语言（初学者+，也许是中级），所以请不要假设任何 C、Unix、xml 等知识。

任务：打开一个包含单行文本的 ms-word 2007+ 文档（为简单起见），并用字典值替换字典中出现在该行文本中的任何“关键”词。然后关闭文档，保持其他一切不变。

文本行（例如）“我们将在大海的房间里流连忘返。”

 from docx import Document

document = Document('/Users/umityalcin/Desktop/Test.docx')

Dictionary = {‘sea’: “ocean”}

sections = document.sections
for section in sections:
    print(section.start_type)

#Now, I would like to navigate, focus on, get to, whatever to the section that has my
#single line of text and execute a find/replace using the dictionary above.
#then save the document in the usual way.

document.save('/Users/umityalcin/Desktop/Test.docx')

我在文档中没有看到任何允许我这样做的东西——也许它在那里，但我不明白，因为在我的水平上，一切都没有被详细说明。

我遵循了该站点上的其他建议，并尝试使用该模块的早期版本 ( https://github.com/mikemaccana/python-docx )，该模块应该具有“替换、advReplace 等方法”，如下所示：我打开python 解释器中的源代码，并在末尾添加以下内容（这是为了避免与已安装的 0.7.2 版本发生冲突）：

 document = opendocx('/Users/umityalcin/Desktop/Test.docx')
words = document.xpath('//w:r', namespaces=document.nsmap)
for word in words:
    if word in Dictionary.keys():
        print "found it", Dictionary[word]
        document = replace(document, word, Dictionary[word])
savedocx(document, coreprops, appprops, contenttypes, websettings,
    wordrelationships, output, imagefiledict=None)

运行它会产生以下错误消息：

NameError：未定义名称“coreprops”

也许我正在尝试做一些无法完成的事情——但如果我遗漏了一些简单的事情，我将不胜感激你的帮助。

如果这很重要，我在 OSX 10.9.3 上使用 64 位版本的 Enthought’s Canopy

原文由 user2738815 发布，翻译遵循 CC BY-SA 4.0 许可协议

python text replace ms-word python-docx

阅读 1.4k

2 个回答

得票最新

社区维基

发布于
2023-01-03

✓ 已被采纳

更新： 有几个段落级函数可以很好地完成这项工作，可以在 GitHub 站点上找到 python-docx 。

这个将用替换 str 替换正则表达式匹配。替换字符串的格式与匹配字符串的第一个字符相同。
这将隔离运行，以便可以将某些格式应用于该单词或短语，例如突出显示文本中每次出现的“foobar”，或者可能将其设为粗体或以更大的字体显示。

当前版本的 python-docx 没有 search() 函数或 replace() 函数。这些要求相当频繁，但一般情况下的实施非常棘手，而且还没有上升到积压的顶部。

不过，有几个人已经取得了成功，他们使用现有的设施完成了他们需要的事情。这是一个例子。顺便说一句，它与部分无关:)

 for paragraph in document.paragraphs:
    if 'sea' in paragraph.text:
        print paragraph.text
        paragraph.text = 'new text containing ocean'

要在表格中搜索，您需要使用类似的东西：

 for table in document.tables:
    for row in table.rows:
        for cell in row.cells:
            for paragraph in cell.paragraphs:
                if 'sea' in paragraph.text:
                    paragraph.text = paragraph.text.replace("sea", "ocean")

如果您走这条路，您可能会很快发现其中的复杂性。如果您替换段落的整个文本，这将删除任何字符级格式，例如粗体或斜体的单词或短语。

顺便说一句，@wnnnmaw 的答案中的代码适用于 python-docx 的旧版本，并且在 0.3.0 之后的版本中根本不起作用。

原文由 scanny 发布，翻译遵循 CC BY-SA 4.0 许可协议

社区维基

发布于
2023-01-03

我需要一些东西来替换 docx 中的正则表达式。我接受了斯坎尼的回答。为了处理样式，我使用了以下答案： Python docx Replace string in paragraph while keeping style added recursive call to handle nested tables。并想出了这样的事情：

 import re
from docx import Document

def docx_replace_regex(doc_obj, regex , replace):

    for p in doc_obj.paragraphs:
        if regex.search(p.text):
            inline = p.runs
            # Loop added to work with runs (strings with same style)
            for i in range(len(inline)):
                if regex.search(inline[i].text):
                    text = regex.sub(replace, inline[i].text)
                    inline[i].text = text

    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace_regex(cell, regex , replace)

regex1 = re.compile(r"your regex")
replace1 = r"your replace string"
filename = "test.docx"
doc = Document(filename)
docx_replace_regex(doc, regex1 , replace1)
doc.save('result1.docx')

遍历字典：

 for word, replacement in dictionary.items():
    word_re=re.compile(word)
    docx_replace_regex(doc, word_re , replacement)

请注意，仅当整个正则表达式在文档中具有相同样式时，此解决方案才会替换正则表达式。

此外，如果在保存相同样式的文本后编辑文本，则文本可能会单独运行。例如，如果您打开包含“testabcd”字符串的文档并将其更改为“test1abcd”并保存，即使是相同样式的面团，也会有 3 个单独的运行“test”、“1”和“abcd”，在这种情况下更换 test1 将不起作用。

这是为了跟踪文档中的更改。要将其运行一次，在 Word 中，您需要转到“选项”、“信任中心”，然后在“隐私选项”中取消选中“存储随机数以提高组合准确性”并保存文档。

原文由 szum 发布，翻译遵循 CC BY-SA 3.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

如何使用python-docx替换Word文档中的文字并保存

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译