头图

Today I share a practical skill, using Python to read ID card information in batches and write it into Excel.

Read

Take an ID card in the form of a picture as an example. We use Baidu character recognition OCR to read information. The Baidu interface provides a free quota, which is almost enough for daily use. Let’s take a look at how to use Baidu character recognition.

SDK installation

Baidu Cloud SDK provides support for Python, Java and other languages. The Python version of the SDK is easy to install, just use pip install baidu-aip , and it supports Python 2.7+ & 3.x versions.

Create application

To create an application, you need a Baidu or Baidu cloud account. The registered login address is: https://login.bce.baidu.com/?redirect=http%3A%2F%2Fcloud.baidu.com%2Fcampaign%2Fcampus-2018%2Findex.html . After logging in, move the mouse to the position of the login avatar, and click User Center in the pop-up menu, as shown in the figure:

The first time you enter, you need to select the corresponding information, as shown in the figure:

After selecting, click Save.

Then move the mouse to the > symbol position on the left, then select artificial intelligence, and click text recognition, as shown in the figure:

After clicking, you will enter the following picture:

Now, we can click create the application, and then enter the following figure:

From the above figure, we can see that Baidu text recognition OCR can recognize many types of information, that is to say, it is not just an ID card, if you have other information recognition needs, you can quickly realize it through it.

Here we fill in the application name and application description. After filling in, click Create immediately.

After the creation is complete, return to the application list, as shown in the following figure:

We need to use the AppID & API Key & Secret Key , record them.

Code

The code implementation is very simple, a few lines of Python code can be done, as shown below:

from aip import AipOcr

APP_ID = '自己的APP_ID'
API_KEY = '自己的API_KEY'
SECRET_KEY = '自己的SECRET_KEY'
# 创建客户端对象
client = AipOcr(APP_ID, API_KEY, SECRET_KEY)
# 打开并读取文件内容
fp = open("idcard.jpg", "rb").read()
# res = client.basicGeneral(fp)  # 普通
res = client.basicAccurate(fp)  # 高精度

It can be seen from the above code that the recognition function is divided into normal and high-precision modes. In order to have a higher recognition rate, we use high-precision mode here.

Take the following three fake ID cards that I found online as an example:

Because there are multiple ID card pictures, we need to write a method to traverse, the code implementation is as follows:

def findAllFile(base):
    for root, ds, fs in os.walk(base):
        for f in fs:
            yield base + f

The format of the original ID card information obtained through the recognition function is as follows:

{'words_result': [{'words': '姓名韦小宝'}, {'words': '性别男民族汉'}, {'words': '出生1654年12月20日'}, {'words': '住址北京市东城区景山前街4号'}, {'words': '紫禁城敬事房'}, {'words': '公民身份证号码11204416541220243X'}], 'log_id': 1411522933129289151, 'words_result_num': 6}

Write

The writing of credential information is realized by Pandas. Here we also need to preprocess the obtained original certificate information to be written into Excel. We store the name...address of the certificate in an array respectively. The processing code is implemented as follows:

for tex in res["words_result"]:
    row = tex["words"]
    if "姓名" in row:
        names.append(row[2:])
    elif "性别" in row:
        genders.append(row[2:3])
        nations.append(row[5:])
    elif "出生" in row:
        births.append(row[2:])
    elif "住址" in row:
        addr += row[2:]
    elif "公民身份证号码" in row:
        ids.append(row[7:])
    else:
        addr += row

After that, the information can be written directly into Excel very conveniently, and the writing code is implemented as follows:

df = pd.DataFrame({"姓名": names, "性别": genders, "民族": nations,
                       "出生": births, "住址": address, "身份证号码": ids})
df.to_excel('idcards.xlsx', index=False)

Take a look at the writing effect:

At this point, we have realized the batch read and write function of ID card information.

The source code can be obtained by ID background of the Python 2nd


Python小二
180 声望415 粉丝