In recent months, I have done a lot of PDF-related work. While I am not busy these two days, I will record the relevant knowledge points.
PDF is P ortable D ocument F Ormat acronym, translated as "portable document format", by the Adobe founded in 1992. The feature of its format is that it has nothing to do with the operating system platform and can maintain the same rendering effect on any platform.
As for this platform-independence... It is better to say that the format is simple, all platforms follow the same standard, and the standard is open, so naturally platform-independent.
development history
PDF has been Adobe's proprietary format since its creation, until 2008, when it became an official ISO ( ISO 32000 ) standard.
Although PDF files have become a standard, Adobe, as the originator of PDF, still has some proprietary functions, such as XFA (Adobe PDF Form), which does not belong to the ISO 32000 PDF standard.
Domestic Format - OFD
And PDF Similarly, there OFD ( O PEN F ixed-layout D ocuments) format, be regarded as "domestic PDF" standard, released in 2016 by the Standardization Administration of China.
Compared with the PDF format, the OFD format is simpler and easier to implement, and supports national secrets, but it is rarely used at present.
font
In common Office formats, the font is non-embedded by default. Non-embedded fonts can avoid repeated storage of the same font, and only need to install the corresponding font on the rendering device; but the disadvantage is also obvious, if there is no corresponding font on the client device If the font is used, it will cause the rendering to fail. If you use an alternate font, it will affect the rendering effect.
PDF is different from Office in font processing. PDF uses embedded font by default, and it is also a subset of embedded fonts - only the character fonts used in the file are embedded in the PDF file, not will embed the entire font library. In this way, even if the font is embedded, the file size will not be increased too much.
font encryption
Some documents provide data in PDF format for data security, but do not want users to copy text at will.
At this time, the benefits of embedded fonts in PDF are reflected. Based on the font obfuscation & encryption technology, the current PDF uses the obfuscated & encrypted font library. In this way, even if the PDF file is publicly provided, the text copied by the customer is obfuscated, which ensures data security.
But now OCR is so powerful, after obfuscating fonts, it can still be identified by OCR, but it just takes a little more effort.
Electronic Signature & Digital Signature
Electronic Signature - Electronic Signature
U.S. Global and National Commercial Electronic Signatures Act (2000) defines an "electronic signature" as "another record attached to a contract or generated, sent, communicated, received or stored by electronic means or logically associated therewith. Electronic sounds, symbols or processes."
In fact, the electronic signature is just a picture of the handwritten signature, attached to the electronic document, and then completed with some multi-factor authentication methods (PIN/password/email).
Digital Signature - Digital Signature
Digital signatures are different from electronic signatures. Digital signatures need to be implemented with digital certificates issued by the PKI Certification Center (CA). The basic gameplay is as follows:
- Use a digest algorithm (such as MD/SHA, etc.) to generate a digest for the content
- Encrypt digest using asymmetric encryption algorithm + certificate private key
- Attach the encrypted digest data, and the signed certificate (public key part) to the PDF file
As can be seen from the above steps, PDF digital signature is not the same as SSL encryption. PDF essentially "signs" the file, which can ensure the identity of the file signer and ensure that it cannot be tampered with, while SSL encrypts the message.
The following figure shows the difference between encryption and digital signature under asymmetric encryption algorithm:
To sum up, there are two mainstream applications of asymmetric encryption algorithms: public key encryption -> private key decryption, and private key encryption -> public key signature verification.
PDF digital signature also has a special way to "bind" the digital signature information and pictures, such as the stamp picture in the electronic invoice, which can be used as the appearance of the digital signature (Appearance).
If you don't use a facade, it's of course possible to just do a digital signature. But remember one thing: with a signature picture does not necessarily have a digital signature, and a digital signature does not necessarily have a signature picture , these two are not the same thing.
In fact, not only PDF files can be digitally signed, Microsoft's Office suite also supports digital signatures, but generally no one will sign files in Office format, so all you can see on the market are digital signatures for PDFs.
Signature verification
The principle of PDF signature verification is also very simple:
Verify that the PDF's signing certificate is trusted
- Use a client-side root certificate store (such as Adobe PDF, which uses the built-in root certificate list, regardless of operating system) to verify that the signed certificate is trusted
The signed digest data is verified by the public key of the certificate.
Certificate & Signature Algorithm
The certificate type used for PDF digital signature is different from SSL. The ordinary SSL certificate can verify the domain name owner, while the certificate used in PDF digital signature is generally called the agency certificate.
At present, the mainstream digital certificate asymmetric encryption algorithms are RSA/DSA/DSS, but the most widely used is the RSA algorithm. However, with the trend of localization, industries such as finance and insurance are slowly migrating to the national encryption algorithm.
However, the algorithm is not important, both are asymmetric encryption, and both are digital certificates, but the specific signature/verification/encryption/decryption algorithms are different.
Form Fields - Acro Form
The form field refers to the PDF form, and the English definition is called Acro Form. Yes, you read that right, PDF also has form technology similar to HTML, and elements such as text fields, radio buttons, checkboxes, etc. can be configured in the form:
After editing the PDF form, you can use tools or programs to fill in or fill in the PDF.
PDF library (JAVA)
The PDF technology is still relatively closed, and the open source library will be very uncomfortable to use. If the enterprise is commercial, try to consider buying a commercial SDK, which has rich functions and complete documents, and spends money to buy time.
Open Source & Free
- Itext - Free for versions below 4.x, open source license for versions above 5 is AGPL, commercial use requires payment
- Openpdf - fixed based on itext 2.x version, still Itext
- pdfbox - An open source PDF library under Apache. Although it is free, its functions are far inferior to Itext, so it is not recommended to use it.
There are also some more niche PDF libraries, which are not recommended here. At present, itext is the most used. Although pdfbox is completely open source and free, its functions and documents are far less rich than itext.
Business
- Itext 7 - There are two versions, JAVA and C#, with full functions and complete documents, which can basically meet all your requirements for PDF.
- Aspose.PDF - Provides a multi-language SDK with powerful functions and rich documentation
- Spire.PDF - Provides a multi-language SDK with powerful functions and rich documents, not only PDF, but also Office family buckets, and there are also distributors in China
- Adobe PDF Library SDK - Adobe's own PDF SDK, with the most complete functions and multi-language support
- Datalogics PDF Java Toolkit - Datalogics is a reseller of Adobe PDF Library and also provides another version of the PDF SDK itself
PDF tools
There are a lot of PDF tools on the market. Here are some mainstream GUI tools with full functions (reading, editing, converting, signing):
- Adobe Acrobat - the originator of PDF, the strongest PDF tool, no one
- Foxit - domestic old PDF software
- Swift Office
- small pdf - A conscientious PDF online tool website, with editing, conversion and other functions, the compatibility is also very good, and there is a certain free quota every day
refer to
- https://www.wikiwand.com/zh/%E5%8F%AF%E7%A7%BB%E6%A4%8D%E6%96%87%E6%A1%A3%E6%A0%BC%E5%BC%8F
- https://www.ssl.com/zh-CN/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98%E6%95%B0%E5%AD%97%E7%AD%BE%E5%90%8D%E5%92%8C%E6%96%87%E4%BB%B6%E7%AD%BE%E5%90%8D/
- https://www.wosign.com/FAQ/faq_2019070401.htm
- http://gmssl.org/
- https://itextpdf.com/sites/default/files/2018-12/digitalsignatures20130304.pdf
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。