windows10使用tesseract-OCR打不开训练数据

代码:

    # -*- coding: utf-8 -*-
    
    try:
        import Image
    except ImportError:
        from PIL import Image
    
    import pytesseract
    
    
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
    print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif')))

错误信息:

    Traceback (most recent call last):
      File "D:/test.py", line 11, in <module>
        print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim'))
      File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string
        raise TesseractError(status, errors)
    pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')

训练数据在C:Program Files (x86)Tesseract-OCRtessdata已经存在,截图:
enter image description here

阅读 13.1k
2 个回答
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata
mv chi_sim.traineddata C:\Program Files (x86)\Tesseract-OCR\tessdata

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \Program Files (x86)\Tesseract-OCR\tessdata/chi_sim.traineddata')
解决方案:
设置环境变量 TESSDATA_PREFIX
C:Program Files (x86)Tesseract-OCRtessdata (供参考,以实际安装路径为准)

也可以直接拷贝代码D:/test.py到C盘运行,不推荐。

另外如果不设置环境变量,在安装盘之外的路径运行tesseract时会提示:
Please make sure the TESSDATA_PREFIX environment variable is set to the parent d irectory of your "tessdata" directory
设置了环境变量后,问题解决

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进