架构 - Hands-on Tutorial | Building a Serverless Universal Text Recognition Function - 亚马逊云开发者

Preface

Serverless applications ensure that you can easily run your code content without having to configure or manage a server. This paper introduces a text recognition (Optical Character Recognition, OCR) solution based on serverless architecture, which can accurately recognize text in natural scene images. The underlying layer is implemented based on Amazon Lambda and Amazon API Gateway. Amazon Lambda's pay-per-request , automatic scaling, and ease of use make it a popular deployment choice for data science teams. Text recognition using a pre-trained model refers to the process of invoking a pre-trained OCR machine learning model through Amazon Lambda, analyzing and identifying image files of text data, and obtaining text and layout information. and returned as text. The solution has a very wide range of application scenarios and can be used in many fields such as printed text recognition, handwritten material recognition, image content review, and electronic publication image text.

OCR recognition effect

Architecture Introduction

Serverless architectures use an event-driven model. The Amazon Lambda service runs Lambda functions in response to events. Amazon Lambda functions can be invoked directly from many integrated Amazon services, including API Gateway. This solution is designed and implemented based on Amazon CloudFormation. After deployment, a user or program sends an API request to Amazon API Gateway. The request payload needs to contain the URL of the image to be processed or Base64 encoded information. After receiving the HTTP request, Amazon API Gateway will request the data. Send it to the corresponding Lambda function. The Amazon Lambda function completes inference by invoking the pre-trained model stored in Amazon EFS, and returns the text recognition result (data in JSON format) to the calling initiator. Because it is designed for serverless architecture, users only pay for the actual call volume.

Architecture Diagram

Amazon API Gateway: a fully managed service that helps developers easily create, publish, maintain, monitor, and secure APIs at any scale;

https://www.amazonaws.cn/api-gateway/

Amazon Lambda: a serverless computing service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes;

https://www.amazonaws.cn/lambda/

Amazon EFS: Amazon Elastic File System (Amazon EFS), which provides scalability, elasticity, availability, and durability as file storage for enterprise applications and applications delivered as a service. Provides a common repository for your development environment, enabling you to share code and other files in a safe and orderly manner.

https://www.amazonaws.cn/efs/

deployment method

First, log in to your Amazon Web Service account via a browser. After logging in, open the homepage of the Amazon Cloud Technology Solution Library in the browser, then find the AI Solution Kit solution in the solution library, click the open link and switch to the "Text Recognition (OCR)" tab (you can switch the display in the upper right corner of the page. language), click the region button on the right side of the page to start the corresponding deployment template in the console.

https://www.amazonaws.cn/solutions/ai-solution-kit/

Next, take the deployment to the China (Beijing) region as an example to deploy the general text recognition (OCR) solution.

Please click the "Launch Scheme from the Amazon Cloud Technology China (Beijing) Regional Console" button to start the corresponding Amazon CloudFormation. Check the Amazon S3 URL link on the Create Stack page and click the Next button to continue.

On the specified stack information page, you can modify parameters such as stack name and authentication method. In the parameter options, since this solution uses Amazon API Gateway to receive API call requests, you can use the aikitscustomAuthType-xxx parameter to specify the API call method. If you If you want to provide API requests that can be accessed without authentication (NONE) in Beijing or Ningxia, you need to apply for and ensure that your Amazon Web Services account has been filed with the Internet Content Provider (ICP), and port 80/443 can be opened normally. You can also choose a lightweight model (Lite) under "modelType". Click "Next" to continue.

Model description

On the Review page, review and confirm the settings. Check the box to confirm that the template will create Amazon Identity and Access Management (IAM) resources. Finally click the "Create Stack" button to start the deployment.

You can view the status of the stack in the Status column of the Amazon CloudFormation console. After about 15 minutes, seeing the status becomes CREATE_COMPLETE indicates that the creation is successful. After the stack is created successfully, you can see the records prefixed with aikitsInvokeURL in the Outputs/Outputs tab of Amazon CloudFormation, please remember the corresponding invocation URL.

After completing these steps, you can open the Lambda function in the Lambda Functions console to check the inference model invocation logic. The directory structure is as follows, where infer_ocr_app.py is the logic code for the Lambda Function to invoke the pre-trained model:

test OCR function

REST API interface

HTTP method: POST
Body request parameter

Request Body Example

{
  "url": "https://images-cn.ssl-images-amazon.cn/images/G/28/AGS/LIANG/Deals/2020/Dealpage_KV/1500300.jpg"
}

Sample image

test steps

After the configuration is complete, the API call test can be performed. First, you need to download the Postman test tool.

Download address: https://www.postman.com/downloads

Create a new tab in Postman and paste the API call URL (aikitsInvokeURL) from the previous step into the address bar. Select POST as the HTTP call method. Open the Body configuration item and select the raw and JSON data types. Enter the following test data in Body and click the Send button to see the corresponding results.

Return parameter description

If the AWS_IAM authentication method is selected when deploying the OCR solution, you need to open the Authorization configuration of Postman before the request, select Amazon Web Service Signature in the drop-down list, and fill in the AccessKey, SecretKey and Amazon Web Service Region of the corresponding account (such as cn- north-1 or cn-northwest-1).

In addition, you can also use the Python program to test, execute the following command in the command line window, and install the relevant authentication dependencies aws_requests_auth through pip3.

pip3 install aws_requests_auth

Keep the following python source code locally, and use .py as the file extension.

import requests
import json
from aws_requests_auth.boto_utils import BotoAWSRequestsAuth

auth = BotoAWSRequestsAuth(aws_host='[YOUR-API-ID].execute-api.us-east-1.amazonaws.com',
                           aws_region='us-east-1',
                           aws_service='execute-api')

url = 'https://[YOUR-API-ID].execute-api.us-east-1.amazonaws.com/standard/ocr'
payload = {
    'url': 'https://images-cn.ssl-images-amazon.cn/images/G/28/AGS/LIANG/Deals/2020/Dealpage_KV/1500300.jpg'
}
response = requests.request("POST", url, data=json.dumps(payload), auth=auth)
print(json.loads(response.text))

On the Amazon CloudFormation console page, switch to "Outputs/Outputs", find the aikitsInvokeURL field, copy the URL in the value to replace the address of the url in the python source code, and replace the [YOUR in the Python file with the API ID in the URL -API-ID], after saving, execute python3 ocr-iam-auth.py, you can see the following test text output results:

% python3 ocr-iam-auth.py
[{'words': 'SALE', 'location': {'top': 38, 'left': 939, 'width': 73, 'height': 26}, 'score': 0.9969024062156677}, {'words': '镇店之宝', 'location': {'top': 89, 'left': 348, 'width': 194, 'height': 47}, 'score': 0.9993890523910522}, {'words': '同步全球天天低价', 'location': {'top': 156, 'left': 367, 'width': 192, 'height': 27}, 'score': 0.9992773532867432}, {'words': '海外购', 'location': {'top': 208, 'left': 348, 'width': 68, 'height': 28}, 'score': 0.9987406730651855}, {'words': '折', 'location': {'top': 241, 'left': 664, 'width': 30, 'height': 25}, 'score': 0.9383366107940674}]

last

The general text recognition solution in the AI solution collection enhances the processing and recognition capabilities of Chinese language and improves the efficiency of input conversion by automatically deploying pre-trained text recognition models, combined with accurate language models and large thesaurus. Amazon CloudFormation based on Amazon Cloud Technology automatically creates and calls a RESTful API in Amazon API Gateway. After deploying the solution, users only need to submit HTTP(s) request parameters to the URL automatically created by Amazon API Gateway to realize the text recognition function. Accurate return results can be obtained. Moreover, the solution is based on serviceless architectures such as Amazon Lambda and Amazon API Gateway, and users do not need to worry about managing and running servers or runtimes in the cloud or locally. Just pay for what you actually use. No user data is stored throughout the serverless architecture, thus protecting user privacy. Stay tuned to learn about the AI Solutions Collection to experience more out-of-the-box AI capabilities on the cloud.

The author of this article

Yan one

Amazon cloud technology innovation solution architect, responsible for the architecture design of cloud computing solutions based on Amazon cloud technology, has rich practical experience in application development, serverless, and big data.

He Xiaoting

Hands-on Tutorial | Building a Serverless Universal Text Recognition Function

Preface

OCR recognition effect

Architecture Introduction

Architecture Diagram

deployment method

test OCR function

REST API interface

test steps

last

The author of this article

亚马逊云开发者

引用和评论

翰德 Hudson 携手亚马逊云科技，基于 MCP Agent 重塑智能招聘新范式

得物业务参数配置中心架构综述

分析型数据库入门指南：如何选择适合你的实时分析工具？

HarmonyOS NEXT HiLog日志学习和分析

百万架构师第二十五课：分布式架构的基础：分布式系统的基石TCP-IP通讯协议｜JavaGuide

vivo 大规模容器集群运维平台实践

字节跳动开源 Godel-Rescheduler：适用于云原生系统的全局最优重调度框架