前言
大家好,我又来了。我想问你一个问题:你阅读 epub 是不是一向来就使用阅读器?
诚然,使用阅读器,可以做笔记,可以记录阅读时间,可以有各种各样的功能。但是,某种程度上,你可能也会被阅读器所限制。这种限制不仅仅是约束了你的认知,形成你的思维定势,同时也会养成依赖。
当然你不可能不用阅读器,不过如果阅读器对你的限制越小,你的思维开放性才能越大。就好比说,你得到一把锤子🔨,在过去常常能用锤子来解决问题,那么被锤子培养的习惯和思维定势,可能会影响你对问题本质的认识,以及排斥解决问题方法存在多样性的事实。
要让自己的思维更有开放性,有一点基本要求就是,经常要去思考边界以外的问题。
行之而不著焉,习矣而不察焉,终身由之而不知其道者,众也。
-- 《孟子 尽心上》
所以今天,请跟着我一起来松动一下固有的观念,看一看如果设备上没有专门的阅读器时,我们又能靠什么来阅读 epub 😂。
一种阅读新方法:基于浏览器
对于一般的电子设备,只要具有上网能力,就往往会带有一个浏览器。在我看来,浏览器几乎是目前最强大的阅读工具,使用浏览器固然也会约束你的认知,但是可以保持在一个相对较高水平。
下面我会介绍浏览器这个阅读工具,在阅读 epub 时的一些优势:
1. 自带翻译
现代的浏览器往往会自带翻译功能,可以在各种语言之间互相翻译,如果你正在读一篇英文材料,你可以直接让浏览器把它翻译成中文,或者只在需要时,点击查询单词的意思。
2. 它有很多扩展,而且你可以自己编写扩展
无论是 Chrome 还是 Firefox,乃至 Safari、Edge、Opera、Vivaldi 以及 Brave,亦或者是其它,都在不遗余力的支持扩展,确实还存在需要另当别论的例外😄。
3. 当出现问题时,你可以借助开发者工具来了解发生了什么
几乎每个现代的桌面浏览器,都具备一个开发者工具,当你所浏览的页面发生一些状况时,可以利用它来了解情况。而当你对页面的某些东西感兴趣时,它同样可以为你提供信息。
4. 强大的渲染能力
epub 本质上就是一堆网页的打包,在以前 chm 也是这样,但是 epub 比它强大的多。依靠浏览器来阅读 epub,这本身就是一个合理的选项。而且如果你认同我的下列观念:
- epub 本质上可以被解释为一个运行在本地的网站,它是各种静态资源的打包,不需要再联网下载其它资源;
- epub 可以做到静态网页能做到的任何事情,就如用 hexo、Jekyll、Hugo、Gatsby 等工具创建的静态博客,也能有靓丽外观和酷炫交互;
- epub3 同样也是 W3C 组织推广的技术标准,W3C 组织还推广了 html5、css3 和 es6,epub3 同样也是一个面向网页的顶级标准
那么你就可以发现,利用浏览器来阅读 epub 几乎是优先选择😂。
我的实践:一个 epub 阅读服务器
上面👆我balabala说了这么多,如果不给你带来点什么,不就沦为理论派了吗。那今天,就为大家带来一个原创的 Python 工具:serve_epub.py
v0.0.1 版。
这个工具实现了,把任何一个 epub 文件,作为一个网站后台服务的根目录,然后用浏览器来阅读 epub 的功能。可以通过按左右键来翻页。后续还会添加很多新功能,甚至书架,欢迎期待 0.0.2 版。
$ python serve_epub.py -h
usage: serve_epub.py [-h] [-H HOST] [-p PORT] [-o] path
📖 EPub reader server 📚
🖍️ TIPS: Press the left ⬅️ and right ➡️ keys to turn pages
positional arguments:
path epub book path
options:
-h, --help show this help message and exit
-H HOST, --host HOST hostname, default to '0.0.0.0'
-p PORT, -P PORT, --port PORT
port, default to 8080
-o, --open-browser open browser to read book
https://www.bilibili.com/video/BV1AC4y1E7N8/?aid=748309408&ci...
附:源码
#!/usr/bin/env python3
# coding: utf-8
__author__ = "ChenyangGao <https://chenyanggao.github.io/>"
__version__ = (0, 0, 1)
if __name__ != "__main__":
print("must run as a main module")
raise SystemExit(1)
from argparse import ArgumentParser, RawDescriptionHelpFormatter
parser = ArgumentParser(description="""\
📖 \x1b[38;5;4m\x1b[1mEPub reader server\x1b[0m 📚
🖍️ \x1b[38;5;1m\x1b[1mTIPS\x1b[0m: Press the left ⬅️ and right ➡️ keys to turn pages
""", formatter_class=RawDescriptionHelpFormatter)
parser.add_argument("path", help="epub book path")
parser.add_argument("-H", "--host", default="0.0.0.0", help="hostname, default to '0.0.0.0'")
parser.add_argument("-p", "-P", "--port", type=int, default=8080, help="port, default to 8080")
parser.add_argument("-o", "--open-browser", action="store_true", help="open browser to read book")
args = parser.parse_args()
import posixpath
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
from io import BytesIO
from re import compile as re_compile
from urllib.parse import quote, unquote, urlsplit
from xml.etree.ElementTree import fromstring
from zipfile import ZipFile
CREB_XML_ENC = re_compile(br"(?<=\bencoding=\")[^\"]+|(?<=\bencoding=')[^']+")
CRE_OPF_ITEM = re_compile(r"<item\s[^>]+?/>")
CRE_OPF_ITEMREF = re_compile(r"<itemref\s[^>]+?/>")
def get_xml_encoding(content, /, default="utf-8"):
if isinstance(content, str):
content = bytes(content, "utf-8")
encoding = default
for xml_dec in BytesIO(content_opf):
xml_dec = xml_dec.strip()
if not xml_dec:
continue
if not xml_dec.startswith(b"<?"):
break
match = CREB_XML_ENC.search(xml_dec)
if match is None:
break
encoding = match[0].decode("ascii")
return encoding
def get_opf_path(container_xml):
etree = fromstring(container_xml)
for el in etree.iter():
if (
(el.tag == 'rootfile' or el.tag.endswith('}rootfile'))
and el.attrib.get('media-type') == 'application/oebps-package+xml'
):
return unquote(el.attrib['full-path'])
raise FileNotFoundError('OPF file path not found.')
def opf_item_iter(content_opf):
if isinstance(content_opf, bytes):
encoding = get_xml_encoding(content_opf)
content_opf = content_opf.decode(encoding)
for m in CRE_OPF_ITEM.finditer(content_opf):
yield fromstring(m[0])
def opf_itemref_iter(content_opf):
if isinstance(content_opf, bytes):
encoding = get_xml_encoding(content_opf)
content_opf = content_opf.decode(encoding)
for m in CRE_OPF_ITEMREF.finditer(content_opf):
yield fromstring(m[0])
class EpubHandler(BaseHTTPRequestHandler):
def do_HEAD(self):
path = urlsplit(unquote(self.path)).path.lstrip("/")
if path == "":
path = index_file
elif path.endswith("/"):
path = path.rstrip("/") + "/index.html"
if path not in href_2_attr:
self.send_response(404, "Not Found")
return
fullpath = posixpath.join(opf_root, path)
filesize = zfile.NameToInfo[fullpath].file_size
self.send_response(200)
self.send_header("Content-Length", str(filesize))
self.send_header("Content-Type", href_2_attr[path].get("media-type", "application/octet-stream"))
self.send_header("Accept-Ranges", "bytes")
self.end_headers()
def do_GET(self):
path = urlsplit(unquote(self.path)).path.lstrip("/")
if path == "":
path = index_file
elif path.endswith("/"):
path = path.rstrip("/") + "/index.html"
if path not in href_2_attr:
self.send_response(404, "Not Found")
return
fullpath = posixpath.join(opf_root, path)
filesize = zfile.NameToInfo[fullpath].file_size
prev_path = next_path = None
if "Range" in self.headers:
if filesize == 0:
self.send_response(206)
self.send_header("Content-Range", f"bytes 0-0/0")
start = size = 0
else:
try:
rng = self.get_range(filesize)
except Exception:
rng = None
if rng is None:
self.send_response(416, "Range Not Satisfiable")
self.send_header(f"Content-Range", f"bytes */{filesize}")
self.end_headers()
return
start, size = rng
self.send_response(206)
self.send_header("Content-Range", f"bytes {start}-{start+size-1}/{filesize}")
else:
self.send_response(200)
start, size = 0, filesize
count_spines = len(spine_files)
if count_spines > 1:
if path.endswith((".html", ".xhtml")):
try:
index = spine_files.index(path)
except ValueError:
pass
else:
prev_path = spine_files[(index-1)%count_spines]
next_path = spine_files[(index+1)%count_spines]
if prev_path is not None:
inject_code = b'''
<script>
document.addEventListener('keydown', function(e) {
if (e.keyCode == 37) { // press left key
window.location.href = "/%s";
} else if (e.keyCode == 39) { // press right key
window.location.href = "/%s";
}
});
</script>''' % (bytes(quote(prev_path), "utf-8"), bytes(quote(next_path), "utf-8"))
content = zfile.open(fullpath).read()
index = content.rfind(b"</body>")
if index == -1:
content += inject_code
else:
content = content[:index] + inject_code + content[index:]
self.send_header("Content-Length", str(len(content)))
else:
self.send_header("Content-Length", str(size))
self.send_header("Content-Type", href_2_attr[path].get("media-type", "application/octet-stream"))
self.send_header("Accept-Ranges", "bytes")
self.end_headers()
if prev_path is not None:
self.wfile.write(content)
return
if size > 0:
chunk_size = 1 << 16
write = self.wfile.write
with zfile.open(fullpath) as f:
read = f.read
if start:
f.seek(start)
while size > chunk_size:
write(read(chunk_size))
size -= chunk_size
write(read(size))
def get_range(self, file_size):
# NOTE: Content-Type "multipart/byteranges" is currently not supported
# Reference:
# - https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests
# - https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range
# - https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Range
# - https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/206
# - https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/416
range_header = self.headers.get("Range")
if not range_header:
return 0, file_size
unit, rng = range_header.strip().split("=", 1)
if unit != "bytes":
return None
start, end = rng.strip().split("-")
if not end:
start = int(start)
if start >= file_size:
return None
return start, file_size - start
if not start:
size = int(end)
if size < 0:
return None
elif size >= file_size:
size = file_size
return file_size - size, size
start, end = int(start), int(end)
if end < 0 or end < start or start >= file_size:
return None
if end >= file_size:
size = file_size - start
else:
size = end - start + 1
return start, size
path = args.path
host = args.host
port = args.port
open_browser = args.open_browser
with ZipFile(path) as zfile:
opf_path = get_opf_path(zfile.read("META-INF/container.xml"))
opf_root = posixpath.dirname(opf_path)
content_opf = zfile.read(opf_path)
itemlist = list(item.attrib for item in opf_item_iter(content_opf))
href_2_attr = {unquote(item["href"]): item for item in itemlist}
id_2_href = {item["id"]: unquote(item["href"]) for item in itemlist}
spine_files = [id_2_href[itemref.attrib["idref"]] for itemref in opf_itemref_iter(content_opf)]
for itemref in opf_itemref_iter(content_opf):
if itemref.attrib.get("linear") != "no":
index_file = id_2_href[itemref.attrib["idref"]]
break
else:
if posixpath.join(opf_root, "index.html") in zfile.NameToInfo:
index_file = "index.html"
elif posixpath.join(opf_root, "index.xhtml") in zfile.NameToInfo:
index_file = "index.xhtml"
elif any(((file:=href).endswith((".html", ".xhtml"))) for href in id_2_href.values()):
index_file = file
else:
raise RuntimeError("no mainpage found")
if open_browser:
import webbrowser
from time import sleep
from threading import Thread
def open_browser():
url = f"http://localhost:{port}"
sleep(1)
webbrowser.open(url)
Thread(target=open_browser).start()
with ThreadingHTTPServer((host, port), EpubHandler) as httpd:
host, port = httpd.socket.getsockname()[:2]
url_host = f'[{host}]' if ':' in host else host
print(
f"Serving HTTP on {host} port {port} "
f"(http://{url_host}:{port}/) ..."
)
try:
httpd.serve_forever()
except KeyboardInterrupt:
print("\nKeyboard interrupt received, exiting.")
# TODO: support for injecting code (append to body): css, js, html
# TODO: Before injecting all code, inject some environment variables: e.g. item-list, spine-list
# TODO: Enhance fault tolerance, only 404 will be reported when encountering non-existent files, without errors such as IndexError or KeyError
# TODO: Check the opf file, if there are any files that do not exist, ignore them
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。