Background introduction

In the process of resuming work and production in an orderly manner, in order to complete the normalized nucleic acid screening work, employees in the Beijing and Shanghai offices need to submit the health treasure/applicant code and itinerary code of their location every day. After receiving the uploaded pictures, the relevant departments must check the nucleic acid status of each employee and confirm whether they have been to high-risk areas. The review process takes a lot of time and manpower. An artificial intelligence solution that can ensure the privacy of employees' personal health data to cope with the heavy workload of health data review.

The current difficulties encountered by the company can be attributed to the following aspects:

  • The number of pictures has exploded: the number of pictures that need to be reviewed, such as health codes/travel codes, is increasing by a large amount every day.
  • Traditional text recognition cannot meet the needs: because the source of the picture is not fixed, it may be a screenshot of the mobile phone (the resolution is not fixed), or the screen of the mobile phone is retaken (the shooting angle is random), the traditional text recognition technology cannot accurately obtain the text content through a fixed position.

Sample image to be processed

image.png

  • Weak research and development capabilities in machine learning: If a new function is independently developed for three-code recognition, the development cycle is too long to meet the current needs.
  • Personal health data security: Health information belongs to the personal privacy data of employees, and the security of these data should be ensured to the greatest extent.
  • Server operation and maintenance costs are high, and budgets are limited: image uploads will happen anytime, anywhere, and require stable computing resources to support. If the traditional architecture is used, the operation and maintenance costs will increase.

Demand analysis for text recognition of employee health information

The text information that needs to be extracted for the three codes (health code / application code / itinerary code) are:

image.png

As shown in the figure, the text information to be extracted in the three codes is located in a relatively fixed area in the picture, so the general template text recognition function can be used to realize text content extraction through a predefined template.

The text recognition functions in the AI Solution Kit of Amazon Cloud Technology include: general text recognition OCR (supporting simplified/traditional Chinese), custom template text recognition OCR and license plate recognition, among which the custom template text recognition function (Custom OCR) can Based on the predefined template information, for the fixed-format bills or forms, it can automatically identify the text content and return the results, which can meet the needs of health information extraction.

Therefore, it is only necessary to create corresponding templates for the three types of screenshots, and then the health information in the screenshots can be extracted. In use, you only need to deploy the solution on the Amazon cloud platform, you can immediately call the URL corresponding to the template text recognition function to send a picture request, so as to obtain the required health information, without any additional machine learning knowledge, and the development volume is extremely high. Less, very in line with existing company needs.

Identify health information with a template OCR solution

The specific operation can be divided into four steps: deploying the solution, creating a custom template, developing and calling logic to complete text recognition and structured output.

image.png

The first is to deploy the AI Solution Kit solution. Through the deployment link and implementation manual of the AI solution kit solution ( https://awslabs.github.io/aws-ai-solution-kit/zh/ ), you can deploy the AI solution kit within 10 minutes. The solution deployment is complete. Since the overall design of the solution is based on a serverless architecture, it will only pay for the amount of calls, so there is no need to worry about additional expenses (cost estimate: https://awslabs.github.io/aws-ai-solution-kit/zh /deploy-custom-ocr/#_7 ), after the AWS CloudFormation stack (Stack) is successfully created, you can see the calling URL based on Amazon API Gateway on the Outputs page of AWS CloudFormation, and the corresponding URL name (Key) is CustomOCR.

image.png

Next, we can test the generated API call URL. First, we need to create a new itinerary code identification template. Here we use the open source image processing software GIMP ( https://www.gimp.org/downloads/ ) to obtain the coordinate points. As shown in the figure below, first use GIMP to open the mobile phone screenshot of the itinerary code on the computer, and then move the mouse over the picture, you can see the X and Y values of the specified position coordinate point, please follow the upper left, upper right, lower right, lower left The clockwise order creates a rectangular box with a sequence of four coordinate points.

First move the mouse to the four corners of the date in the itinerary card (the position of the rectangular box in the figure below), record the corresponding coordinates of the four corners (the coordinate values are displayed in the lower left corner of the GIMP window) and specify the name of the recognition area as "Update". time".

image.png

In the same way, continue to mark the text recognition area of "mobile phone number" and "passing area", and the complete JSON data after marking is as follows:

 [[[116, 335], [410, 335], [410, 374], [116, 374]], "手机号码"],
[[[176, 387], [452, 384], [452, 429], [176, 429]], "更新时间"],
[[[53, 710], [465, 710], [465, 837], [54, 837]], "途经地区"]

Next, we need to transcode the image into the Base64 format that the template OCR can handle. We can use Base64Guru ( https://base64.guru/converter/encode/image ) to upload the image online, and then the image can be converted to the Base64 encoded format. After the conversion is completed, it is merged with the identified area JSON just marked into complete JSON data, as follows:

 {
        "type" : "add",
        "img": "行程码图片的Base64编码",
        "template": [
        [[[116, 335], [410, 335], [410, 374], [116, 374]], "手机号码"],
        [[[176, 387], [452, 384], [452, 429], [176, 429]], "更新时间"],
        [[[53, 710], [465, 710], [465, 837], [54, 837]], "途经地区"]
        ]
}

Then we send the template creation request to the template OCR calling URL through the following Python code to complete the creation of the itinerary code template.

 import json
import requests
import base64

jkb_img = open('jkb-template.png', 'rb')
base64_data = base64.b64encode(jkb_img.read())
payload = json.dumps({
    "type" : "add",
"img": str(base64_data, encoding="utf-8"),
"template": [
        [[[116, 335], [410, 335], [410, 374], [116, 374]], "手机号码"],
        [[[176, 387], [452, 384], [452, 429], [176, 429]], "更新时间"],
        [[[53, 710], [465, 710], [465, 837], [54, 837]], "途经地区"]
    ]
})
url = "https://[API-ID].execute-api.[REGION-ID].amazonaws.com.cn/prod/custom-ocr/"
response = requests.request("POST", url, data=payload)
json.loads(response.text)

Output result:

 {'template_id': '3e2183c63b139f6870c7d0ac53ffdc138bd21c95'}

In the output, we can see that the template has been created, and the corresponding template ID (template_id) is '3e2183c63b139f6870c7d0ac53ffdc138bd21c95', please note the template ID for later text recognition.

We found another picture of itinerary code in remake format to test the recognition effect

image.png

We used just 12 lines of code to complete the task of extracting the health information of the itinerary code.

 import base64
import json
import requests
import pandas as pd

jkb_img = open('scan-xjm-1.jpeg', 'rb')
base64_data = base64.b64encode(jkb_img.read())
payload = json.dumps({
    "template_id": "3e2183c63b139f6870c7d0ac53ffdc138bd21c95",
    "img": str(base64_data, encoding="utf-8")
})
url = "https://gqi4z1k9fl.execute-api.cn-northwest-1.amazonaws.com.cn/prod/custom-ocr/"
response = requests.request("POST", url, data=payload)
df = pd.DataFrame.from_dict(json.loads(response.text))

image.png

It can be seen that the template text recognition function of AI Solution Kit can automatically take screenshots of different mobile phone resolutions or even remake images, accurately detect the recognition area, and perform accurate text recognition to extract the required travel code information.

Next, we use the same method as creating the itinerary code template to create the health treasure template. The JSON data for creating the health treasure template is as follows:

 {
    "type" : "add",
    "img": "健康宝图像的Base64编码",
    "template": [
    [[[173, 177], [364, 173], [364, 210], [173, 210]], "日期"],
    [[[211, 206], [319, 210], [319, 233], [208, 231]], "时间"],
    [[[190, 575], [361, 575], [361, 628], [191, 629]], "状态"],
    [[[217, 668], [317, 671], [309, 709], [205, 710]], "核酸"],
    [[[386, 658], [424, 660], [424, 708], [386, 708]], "核酸时间"],
    [[[263, 810], [483, 828], [483, 864], [289, 867]], "姓名"],
    [[[220, 876], [482, 870], [479, 909], [222, 908]], "身份证号"],
    [[[277, 917], [478, 909], [482, 954], [272, 950]], "查询时间"],
    [[[272, 955], [483, 956], [475, 990], [269, 992]], "失效时间"]
    ]
}

image.png

After testing and verification, the output results are as follows:

 import base64
import json
import requests
import pandas as pd

jkb_img = open('jkb-1.png', 'rb')
base64_data = base64.b64encode(jkb_img.read())
payload = json.dumps({
    "template_id": "158ad1b39a4cba9ce4a1cade1fae2bb0740ccb10",
    "img": str(base64_data, encoding="utf-8")
})
url = "https://[API-ID].execute-api.[REGION-ID].amazonaws.com.cn/prod/custom-ocr/"
response = requests.request("POST", url, data=payload)
df = pd.DataFrame.from_dict(json.loads(response.text))

Output result:

image.png

It can be seen that the template OCR can also complete the more complex identification task of Beijing Health Treasure. Finally, let's create a template for Suishen code. The JSON data for creating a template for Suishen code is as follows:

 {
    "type" : "add",
    "img": "随申码参考样图的Base64编码"
    "template": [
    [[[189, 256], [257, 261], [261, 293], [192, 296]], "姓名"],
    [[[140,366], [356,369], [350,406], [138,402]], "查询时间"],
    [[[208,673], [285,675], [284,713], [211,712]], "状态"],
    [[[108,774], [181,777], [181,838], [114,839]], "天数"]
    ]
}

image.png

The identification result of the application code is as follows:

 import base64
import json
import requests
import pandas as pd

jkb_img = open('xsm_1.png', 'rb')
base64_data = base64.b64encode(jkb_img.read())
payload = json.dumps({
    "template_id": "531030f86c071571e54c19c9ac5c63751e97cddf",
    "img": str(base64_data, encoding="utf-8")
})
url = "https://[API-ID].execute-api.[REGION-ID].amazonaws.com.cn/prod/custom-ocr/"
response = requests.request("POST", url, data=payload)
df = pd.DataFrame.from_dict(json.loads(response.text))

Output result:

image.png

sample code

Please refer to the following link for the sample code in the text:

https://github.com/awslabs/aws-ai-solution-kit/tree/main/samples/custom-ocr-healthy-code

Summarize

The template text recognition function in the AI Solution Kit solution enhances the processing and recognition capabilities of the Chinese language by automatically deploying a pre-trained text recognition model, combined with a large thesaurus, and can automatically correct and recognize structured information through predefined templates. Thus, the input conversion efficiency is improved. Amazon CloudFormation based on Amazon Cloud Technology automatically creates and calls a RESTful API in Amazon API Gateway. After deploying the solution, users only need to submit HTTP(s) request parameters to the URL automatically created by Amazon API Gateway to achieve text recognition. The solution is based on a serverless architecture such as Amazon Lambda. Users do not need to operate and maintain any infrastructure, and only pay for actual usage. User data is not stored persistently throughout the process, and computing and storage resources are destroyed after the API is executed. More and more interesting features in the AI Solution Kit open source solutions are waiting for you to explore. https://github.com/awslabs/aws-ai-solution-kit

Author of this article

image.png

Yan Yi <br>Amazon AWS innovative solution architect, responsible for the architecture design of AWS-based cloud computing solutions, has rich practical experience in application development, serverless, and big data.

image.png

Xiaoting He <br>Amazon AWS innovative solution architect, responsible for the architecture design of AWS-based cloud computing solutions, has rich practical experience in application development, human intelligence, and serverless.


亚马逊云开发者
2.9k 声望9.6k 粉丝

亚马逊云开发者社区是面向开发者交流与互动的平台。在这里,你可以分享和获取有关云计算、人工智能、IoT、区块链等相关技术和前沿知识,也可以与同行或爱好者们交流探讨,共同成长。