Kar 设计文档

  • Published on: 2025-04-25 to joshleeb's blog
  • Purpose of Kar: A testbed for developing Pinto's core idea and exploring technical approaches for critical components like search.
  • Pinto Overview: A system for curating one's own subgraph of the web; early stages of figuring out specifics.
  • Kar as a Command Line Tool: Helps assess Pinto's core idea and figure out details from product and technical perspectives.
  • Goals of Kar Program:

    • Install, refresh, and remove documents with minimal info (only URL).
    • Query documents with full-text search and filters.
  • Non-Goals: Not explore or implement an RSS reader, provide a document reading list, or use document content for offline viewing.
  • Design Proposal:

    • API inspired by Pacman with top-level operations: -S (sync), -Q (query), -R (remove), -V (version), -h (help).
    • Data stored in statedir (default: ${XDG_STATE_HOME}/kar/).
  • Sync Operation:

    • Install new documents by downloading content and updating database and index.
    • Refresh existing documents to update content and search index.
    • Validates URLs according to [URL Standard] and applies normalizations.
    • Follows redirects based on specific conditions.
    • Identifies duplicates based on normalized URLs and uses ext_equiv_url for extended equivalency.
    • Can rebuild metadata database from downloaded content.
  • Query Operation:

    • Centers around full-text search of document content with additional filters and projections.
    • Supports filters like --url, --domain, and --limit.
    • Projections include --only-url.
    • Query results are printed as static text.
  • Remove Operation: Removes documents from the graph by updating multiple locations in the state directory; only supports hard deletion.
  • Appendix - Future Work:

    • Extend support for more mime types like text/plain and application/pdf.
    • Better handle rate limiting per domain.
    • Improve search architecture for better query performance.
    • Extend support for document display options like adding filters and projections.
    • Explore a browser extension to install documents with the active page's URL.
阅读 28
0 条评论