1.安装
## 下载源码
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
## 可能出现错误: Unable to find a valid copy of libtoolize or glibtoolize in your PATH!
## 解决方案:
## yum install automake -y
## yum install libtool -y
./configure
## ./configure可能出现以下问题, 附上解决方案
## 问题1 configure: error: Your compiler does not have the necessary C++17 support! Cannot proceed.
## 解决方案: https://segmentfault.com/a/1190000041832780
## 问题2 configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.
## 解决方案: https://segmentfault.com/a/1190000041833110
make && make install
ldconfig
查看版本
tesseract --version
tesseract 5.1.0-32-gf36c0
leptonica-1.82.0
libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : zlib 1.2.7 : libwebp 0.3.0
Found SSE4.1
Found OpenMP 201511
Found libcurl/7.29.0 NSS/3.53.1 zlib/1.2.7 libidn/1.28 libssh2/1.8.0
测试识别图片
tesseract tracking2.png result
正常会输出
Estimating resolution as 146
结果保存在当前目录的result.txt文件下
如果命令出错参考下面的解决方案:
提示加载语言库出错
...
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
下载语言库
官网下载地址:https://github.com/tesseract-ocr/tessdata
上传到Linux /usr/local/share/tessdata/目录
如果是用java开发,tess4j-5.2.1.jar包里也有tessdata语言库, 可以从jar包解压上传该目录, 不过只有eng、osd两种语言
提示缺模块
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
报错是因为环境没有安装这些libjpeg libpng freetype gd giflib libtiff zlib依赖库,并不是所有的都用到,可以只安装libjpeg libpng libtiff就可以了。
yum -y install libjpeg* libpng* libtiff*
# 重新编译leptonica
cd leptonica-1.82.0
make clean
./configure && make && make install
官方说明文档: https://tesseract-ocr.github.io/tessdoc/Compiling.html
其他教程链接:
Linux环境如何支持使用tess4j进行ORC
linux (centos7)上装Tesseract-OCR最新版本(5.0)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。