最近接到了一个需求:解出一个二进制文件的内容。
/home/work/files # ll
total 312
-rw-------@ 1 honvid staff 30K 7 24 14:52 15158
-rw------- 1 honvid staff 46K 7 24 14:53 62770
-rw-------@ 1 honvid staff 73K 7 24 11:26 8686584
vi
可见如下一堆乱码。
^_<8b>^H^@^@^@^@^@^D^@í½^G`^\I<96>%&/mÊ{^?JõJ×àt¡^H<80>`^S$
……
……
……
unzip
/home/work/files # unzip 15158
Archive: 15158
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of 15158 or
15158.zip, and cannot find 15158.ZIP, period.
gzip
/home/work/files # gzip -d 15158
gzip: 15158: unknown suffix -- ignored
tar
/home/work/files # tar -xzvf 15158
tar: Unrecognized archive format
tar: Error exit delayed from previous errors.
lzma
/home/work/files # lzma -d 15158
lzma: 15158: File format not recognized
xz
/home/work/files # xz -d 15158
xz: 15158: File format not recognized
jar
看的有文章说可以使用 jar
命令进行解压。
/home/work/files # jar xvf 15158
java.util.zip.ZipException: zip END header not found
at java.base/java.util.zip.ZipFile$Source.zerror(ZipFile.java:1470)
at java.base/java.util.zip.ZipFile$Source.findEND(ZipFile.java:1371)
at java.base/java.util.zip.ZipFile$Source.initCEN(ZipFile.java:1378)
at java.base/java.util.zip.ZipFile$Source.<init>(ZipFile.java:1209)
at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1172)
at java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:719)
at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:239)
at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:169)
at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:140)
at jdk.jartool/sun.tools.jar.Main.extract(Main.java:1389)
at jdk.jartool/sun.tools.jar.Main.run(Main.java:410)
at jdk.jartool/sun.tools.jar.Main.main(Main.java:1681)
7za
后来想着要不用个大而全的工具进行解压。查到可以用 P7ZIP 。
我是在 Alpine 中使用的测试。
安装步骤如下:
/home/work/files # apk add p7zip
(1/1) Installing p7zip (16.02-r3)
Executing busybox-1.29.3-r10.trigger
OK: 84 MiB in 64 packages
# 安装包不小
/home/work/files # 7za
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz (806E9),ASM,AES-NI)
Usage: 7za <command> [<switches>...] <archive_name> [<file_names>...]
[<@listfiles...>]
<Commands>
a : Add files to archive
b : Benchmark
d : Delete files from archive
e : Extract files from archive (without using directory names)
h : Calculate hash values for files
i : Show information about supported formats
l : List contents of archive
rn : Rename files in archive
t : Test integrity of archive
u : Update files to archive
x : eXtract files with full paths
<Switches>
-- : Stop switches parsing
-ai[r[-|0]]{@listfile|!wildcard} : Include archives
-ax[r[-|0]]{@listfile|!wildcard} : eXclude archives
-ao{a|s|t|u} : set Overwrite mode
-an : disable archive_name field
-bb[0-3] : set output log level
-bd : disable progress indicator
-bs{o|e|p}{0|1|2} : set output stream for output/error/progress line
-bt : show execution time statistics
-i[r[-|0]]{@listfile|!wildcard} : Include filenames
-m{Parameters} : set compression Method
-mmt[N] : set number of CPU threads
-o{Directory} : set Output directory
-p{Password} : set Password
-r[-|0] : Recurse subdirectories
-sa{a|e|s} : set Archive name mode
-scc{UTF-8|WIN|DOS} : set charset for for console input/output
-scs{UTF-8|UTF-16LE|UTF-16BE|WIN|DOS|{id}} : set charset for list files
-scrc[CRC32|CRC64|SHA1|SHA256|*] : set hash function for x, e, h commands
-sdel : delete files after compression
-seml[.] : send archive by email
-sfx[{name}] : Create SFX archive
-si[{name}] : read data from stdin
-slp : set Large Pages mode
-slt : show technical information for l (List) command
-snh : store hard links as links
-snl : store symbolic links as links
-sni : store NT security information
-sns[-] : store NTFS alternate streams
-so : write data to stdout
-spd : disable wildcard matching for file names
-spe : eliminate duplication of root folder for extract command
-spf : use fully qualified file paths
-ssc[-] : set sensitive case mode
-ssw : compress shared files
-stl : set archive timestamp from the most recently modified file
-stm{HexMask} : set CPU thread affinity mask (hexadecimal number)
-stx{Type} : exclude archive type
-t{Type} : Set type of archive
-u[-][p#][q#][r#][x#][y#][z#][!newArchiveName] : Update options
-v{Size}[b|k|m|g] : Create volumes
-w[{path}] : assign Work directory. Empty path means a temporary directory
-x[r[-|0]]{@listfile|!wildcard} : eXclude filenames
-y : assume Yes on all queries
重点来了。
/home/work/files # 7za x 15158
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C.UTF-8,Utf16=on,HugeFiles=on,64 bits,4 CPUs Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz (806E9),ASM,AES-NI)
Scanning the drive for archives:
1 file, 74830 bytes (74 KiB)
Extracting archive: 15158
WARNING:
15158
Can not open the file as [zip] archive
The file is open as [gzip] archive
--
Path = 15158
Open WARNING: Can not open the file as [zip] archive
Type = gzip
Headers Size = 10
Everything is Ok
Archives with Warnings: 1
Size: 149629
Compressed: 74830
我开始以为没有解压成功,一眼看去有个 WARNING
。
后来再仔细一看,竟然是有个 Type = gzip
。是 Gzip
文件。
那么为啥刚才尝试使用 gzip
命令失败了呢。
验证
- 添加文件后缀
.gz
后,使用gzip -d
命令解压成功。
应该是 gzip 的脚本没有对文件内容进行类型校验,只是对文件名后缀进行匹配。
- 使用
PHP
读取内容成功
$filename = '/home/work/files/15158';
$file = file_get_contents($filename);
echo gzdecode($filename);
成功输出文件内容。
扩展
后来看过一个封装的判断文件类型的工具类。其思路是判断文件头信息。
通过原生方法
$filename = '/home/work/files/15158';
//This function opens a magic database and returns its resource.
$handle = finfo_open(FILEINFO_MIME_TYPE);
// Return information about a file
$fileInfo = finfo_file($handle, $filename);
finfo_close($handle);
var_dump($fileInfo);
## 输出内容
string(18) "application/x-gzip"
通过头信息
这个的前提是知道各文件类型的头信息:可查询文件头信息库
$file = @fopen('/home/work/files/15158', "rb");
if (!$file) throw new \Exception("file refuse!");
$bin = fread($file, 15); //只读15字节 各个不同文件类型,头信息不一样。
fclose($file);
// 定义的文件头信息映射
$types = [
["FFD8FFE1", "jpg"],
["89504E47", "png"],
["255044462D312E", "pdf"],
["504B0304", "zip"],
["52617221", "rar"],
["1F8B08", "gzip"]
];
foreach ($types as $type) {
$blen = strlen(pack("H*", $type[0])); //得到文件头标记字节数
$tbin = substr($bin, 0, intval($blen)); ///需要比较文件头长度
if (strtolower($type[0]) == strtolower(array_shift(unpack("H*", $tbin)))) {
return $type[1];
}
}
return "unknown";
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。