1

1.安装

## 下载源码
git clone https://github.com/tesseract-ocr/tesseract.git

cd tesseract

./autogen.sh
## 可能出现错误: Unable to find a valid copy of libtoolize or glibtoolize in your PATH!
## 解决方案: 
## yum install automake -y 
## yum install libtool -y

./configure
## ./configure可能出现以下问题, 附上解决方案
## 问题1  configure: error: Your compiler does not have the necessary C++17 support! Cannot proceed.
## 解决方案: https://segmentfault.com/a/1190000041832780

## 问题2  configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.
## 解决方案: https://segmentfault.com/a/1190000041833110

make && make install
ldconfig

查看版本

tesseract --version

tesseract 5.1.0-32-gf36c0
 leptonica-1.82.0
  libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : zlib 1.2.7 : libwebp 0.3.0
 Found SSE4.1
 Found OpenMP 201511
 Found libcurl/7.29.0 NSS/3.53.1 zlib/1.2.7 libidn/1.28 libssh2/1.8.0

测试识别图片

tesseract tracking2.png result
正常会输出

Estimating resolution as 146

结果保存在当前目录的result.txt文件下
如果命令出错参考下面的解决方案:

提示加载语言库出错

...
Error opening data file /usr/local/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

下载语言库
官网下载地址:https://github.com/tesseract-ocr/tessdata
上传到Linux /usr/local/share/tessdata/目录

如果是用java开发,tess4j-5.2.1.jar包里也有tessdata语言库, 可以从jar包解压上传该目录, 不过只有eng、osd两种语言
image.png

提示缺模块

Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made

报错是因为环境没有安装这些libjpeg libpng freetype gd giflib libtiff zlib依赖库,并不是所有的都用到,可以只安装libjpeg libpng libtiff就可以了。

yum -y install libjpeg* libpng*  libtiff*
# 重新编译leptonica

cd leptonica-1.82.0
make clean
./configure && make && make install

官方说明文档: https://tesseract-ocr.github.io/tessdoc/Compiling.html
其他教程链接:
Linux环境如何支持使用tess4j进行ORC
linux (centos7)上装Tesseract-OCR最新版本(5.0)


YYGP
25 声望11 粉丝

写BUG