Today I share a practical skill, using Python to read ID card information in batches and write it into Excel.
Read
Take an ID card in the form of a picture as an example. We use Baidu character recognition OCR to read information. The Baidu interface provides a free quota, which is almost enough for daily use. Let’s take a look at how to use Baidu character recognition.
SDK installation
Baidu Cloud SDK provides support for Python, Java and other languages. The Python version of the SDK is easy to install, just use
pip install baidu-aip
, and it supports Python 2.7+ & 3.x versions.
Create application
To create an application, you need a Baidu or Baidu cloud account. The registered login address is:
https://login.bce.baidu.com/?redirect=http%3A%2F%2Fcloud.baidu.com%2Fcampaign%2Fcampus-2018%2Findex.html
. After logging in, move the mouse to the position of the login avatar, and click User Center in the pop-up menu, as shown in the figure:
The first time you enter, you need to select the corresponding information, as shown in the figure:
After selecting, click Save.
Then move the mouse to the
>
symbol position on the left, then select artificial intelligence, and click
text recognition, as shown in the figure:
After clicking, you will enter the following picture:
Now, we can click
create the application, and then enter the following figure:
From the above figure, we can see that
Baidu text recognition OCR can recognize many types of information, that is to say, it is not just an ID card, if you have other information recognition needs, you can quickly realize it through it.
Here we fill in the
application name and
application description. After filling in, click Create immediately.
After the creation is complete, return to the application list, as shown in the following figure:
We need to use the
AppID
& API Key
& Secret Key
, record them.
Code
Code
The code implementation is very simple, a few lines of Python code can be done, as shown below:
from aip import AipOcr
APP_ID = '自己的APP_ID'
API_KEY = '自己的API_KEY'
SECRET_KEY = '自己的SECRET_KEY'
# 创建客户端对象
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
# 打开并读取文件内容
fp = open("idcard.jpg", "rb").read()
# res = client.basicGeneral(fp) # 普通
res = client.basicAccurate(fp) # 高精度
It can be seen from the above code that the recognition function is divided into
normal and
high-precision modes. In order to have a higher recognition rate, we use
high-precision mode here.
Take the following three fake ID cards that I found online as an example:
Because there are multiple ID card pictures, we need to write a method to traverse, the code implementation is as follows:
def findAllFile(base):
for root, ds, fs in os.walk(base):
for f in fs:
yield base + f
The format of the original ID card information obtained through the recognition function is as follows:
{'words_result': [{'words': '姓名韦小宝'}, {'words': '性别男民族汉'}, {'words': '出生1654年12月20日'}, {'words': '住址北京市东城区景山前街4号'}, {'words': '紫禁城敬事房'}, {'words': '公民身份证号码11204416541220243X'}], 'log_id': 1411522933129289151, 'words_result_num': 6}
Write
Write
The writing of credential information is realized by Pandas. Here we also need to preprocess the obtained original certificate information to be written into Excel. We store the name...address of the certificate in an array respectively. The processing code is implemented as follows:
for tex in res["words_result"]:
row = tex["words"]
if "姓名" in row:
names.append(row[2:])
elif "性别" in row:
genders.append(row[2:3])
nations.append(row[5:])
elif "出生" in row:
births.append(row[2:])
elif "住址" in row:
addr += row[2:]
elif "公民身份证号码" in row:
ids.append(row[7:])
else:
addr += row
After that, the information can be written directly into Excel very conveniently, and the writing code is implemented as follows:
df = pd.DataFrame({"姓名": names, "性别": genders, "民族": nations,
"出生": births, "住址": address, "身份证号码": ids})
df.to_excel('idcards.xlsx', index=False)
Take a look at the writing effect:
At this point, we have realized the batch read and write function of ID card information.
The source code can be obtained by ID background of the Python 2nd
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。