Now there is such a requirement:
There is a PDF file with dozens of pages, and now it is necessary to split the specified page number from it, and then generate a new PDF file.
At this time, you can use the open source itextpdf library to implement, the official github address of itextpdf
is: https://github.com/itext/itextpdf .
The following is a demonstration of the specific code.
1. Introduce dependencies
Currently itextpdf
the latest version is 5.5.13.3
, which can be searched at https://search.maven.org/ .
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>5.5.13.3</version>
</dependency>
2. Code implementation
2.1 Specify page number extraction
package com.magic.itextpdf;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.List;
import java.util.Objects;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfSmartCopy;
/**
* PDF工具类
*/
public class PdfUtils {
/**
* 抽取PDF文件
* @param sourceFile 源PDF文件路径
* @param targetFile 目标PDF文件路径
* @param extractedPageNums 需要抽取的页码
*/
public static void extract(String sourceFile, String targetFile, List<Integer> extractedPageNums) {
Objects.requireNonNull(sourceFile);
Objects.requireNonNull(targetFile);
PdfReader reader = null;
Document document = null;
FileOutputStream outputStream = null;
try {
// 读取源文件
reader = new PdfReader(sourceFile);
// 创建新的文档
document = new Document();
// 创建目标PDF文件
outputStream = new FileOutputStream(targetFile);
PdfCopy pdfCopy = new PdfSmartCopy(document, outputStream);
// 获取源文件的页数
int pages = reader.getNumberOfPages();
document.open();
// 注意此处的页码是从1开始
for (int page = 1; page <= pages; page++) {
// 如果是指定的页码,则进行复制
if (extractedPageNums.contains(page)) {
pdfCopy.addPage(pdfCopy.getImportedPage(reader, page));
}
}
} catch (IOException | DocumentException e) {
e.printStackTrace();
} finally {
if (reader != null) {
reader.close();
}
if (document != null) {
document.close();
}
if (outputStream != null) {
try {
outputStream.flush();
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
extract()
method has three parameters, the sub-package is the source PDF file path, the target PDF file path and the specified page number, where the specified page number is passed by the List collection, for example, if you need to extract the first page, you can call it like the following
PdfUtils.extract("D:\\Test\\test.pdf", "D:\\Test\\test_out.pdf", Collections.singletonList(1));
If you need to extract multiple pages at the same time, such as pages 1, 3, and 5, you can call it like this
PdfUtils.extract("D:\\Test\\test.pdf", "D:\\Test\\test_out.pdf", Arrays.asList(1, 3, 5));
Of course, if a PDF has more than 100 pages, 10-60 pages need to be extracted now. If the parameters are passed as above, it will be very troublesome. At this time, you can overload a method to pass the starting page number and ending page number to extracted.
2.2 Start and end page number extraction
Overload extract
method, the specific code is as follows:
/**
* 抽取PDF文件
* @param sourceFile 源PDF文件路径
* @param targetFile 目标PDF文件路径
* @param fromPageNum 起始页码
* @param toPageNum 结束页码
*/
public static void extract(String sourceFile, String targetFile, int fromPageNum, int toPageNum) {
Objects.requireNonNull(sourceFile);
Objects.requireNonNull(targetFile);
PdfReader reader = null;
Document document = null;
FileOutputStream outputStream = null;
try {
// 读取源文件
reader = new PdfReader(sourceFile);
// 创建新的文档
document = new Document();
// 创建目标PDF文件
outputStream = new FileOutputStream(targetFile);
PdfCopy pdfCopy = new PdfSmartCopy(document, outputStream);
// 获取源文件的页数
int pages = reader.getNumberOfPages();
document.open();
// 注意此处的页码是从1开始
for (int page = 1; page <= pages; page++) {
if (page >= fromPageNum && page <= toPageNum) {
pdfCopy.addPage(pdfCopy.getImportedPage(reader, page));
}
}
} catch (IOException | DocumentException e) {
e.printStackTrace();
} finally {
if (reader != null) {
reader.close();
}
if (document != null) {
document.close();
}
if (outputStream != null) {
try {
outputStream.flush();
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
For continuous page numbers, this method is simpler. For example, if you want to extract 10-60 pages, you can call it like this
PdfUtils.extract("D:\\Test\\test.pdf", "D:\\Test\\test_out.pdf", 10, 60);
3. Test verification
Now there is a PDF file with a total of 2 pages. Use the above method to extract and split the first page respectively. The code is as follows:
package com.magic.itextpdf;
import java.util.Collections;
public class Test {
public static void main(String[] args) {
PdfUtils.extract("D:\\Test\\test.pdf", "D:\\Test\\test_out_1.pdf", Collections.singletonList(1));
PdfUtils.extract("D:\\Test\\test.pdf", "D:\\Test\\test_out_2.pdf", 1, 1);
}
}
After running, two new files test_out_1.pdf
and test_out_2.pdf
are generated respectively, and the new files are the first page of the source file.
4. Other methods
If you only deal with a single PDF file, you can use the print function of WPS or the print function of the Chrome browser, which is very convenient.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。