引言

平时安装 Python 库我们一般是直接使用 pip 或者其他的工具包管理工具安装,因为库都发布到了 pypi 上面,可以直接安装。

但是有的时候会有一些 Python 软件包没有发布到 pypi 上面,这种情况下我们要安装的话要将它 clone 下来, 然后进入到文件夹中安装,本篇文章分享一种新的方式,可以直接安装。

pip+git

先看官方文档:

python -m pip install [options] <requirement specifier> [package-index-options] ...
python -m pip install [options] -r <requirements file> [package-index-options] ...
python -m pip install [options] [-e] <vcs project url> ...
python -m pip install [options] [-e] <local project path> ...
python -m pip install [options] <archive url/path> ...

可以看到,pip 多种安装方式,不仅支持从本地路径安装,从依赖文件中读取安装,也支持从各种版本控制系统 (VCS) 安装。vcs 就是版本控制系统(version control system),平常我们一般使用 Git 来进行版本控制,也就是说 pip 支持从 Git 链接直接安装软件包。

示例:

# 安装 requests
❯ pip install git+https://github.com/psf/requests.git                                         
Collecting git+https://github.com/psf/requests.git
  Cloning https://github.com/psf/requests.git to /private/var/folders/77/9j7dsr_n0ls1b840fbmpzpvr0000gn/T/pip-req-build-c5zgg0bv
  Running command git clone --filter=blob:none --quiet https://github.com/psf/requests.git /private/var/folders/77/9j7dsr_n0ls1b840fbmpzpvr0000gn/T/pip-req-build-c5zgg0bv
  Resolved https://github.com/psf/requests.git to commit 23540c93cac97c763fe59e843a08fa2825aa80fd
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/b/Library/Caches/pypoetry/virtualenvs/test-Olh_IP0M-py3.10/lib/python3.10/site-packages (from requests==2.32.3) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /Users/b/Library/Caches/pypoetry/virtualenvs/test-Olh_IP0M-py3.10/lib/python3.10/site-packages (from requests==2.32.3) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/b/Library/Caches/pypoetry/virtualenvs/test-Olh_IP0M-py3.10/lib/python3.10/site-packages (from requests==2.32.3) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /Users/b/Library/Caches/pypoetry/virtualenvs/test-Olh_IP0M-py3.10/lib/python3.10/site-packages (from requests==2.32.3) (2024.12.14)

从日志中看,pip 会自动执行 clone 命令,然后自动执行安装命令来安装软件包完成安装。

不光支持 Git 链接,pip 还可以指定 git ref,例如分支名称、提交哈希或标签名称:

MyProject @ git+https://git.example.com/MyProject.git@master
MyProject @ git+https://git.example.com/MyProject.git@v1.0
MyProject @ git+https://git.example.com/MyProject.git@da39a3ee5e6b4b0d3255bfef95601890afd80709
MyProject @ git+https://git.example.com/MyProject.git@refs/pull/123/head

pip+其他 vcs

不光支持 Git,还有其他的 vcs 版本控制系统,来看官方的示例:

python -m pip install -e 'git+https://git.repo/some_pkg.git#egg=SomePackage'          # from git
python -m pip install -e 'hg+https://hg.repo/some_pkg.git#egg=SomePackage'            # from mercurial
python -m pip install -e 'svn+svn://svn.repo/some_pkg/trunk/#egg=SomePackage'         # from svn
python -m pip install -e 'git+https://git.repo/some_pkg.git@feature#egg=SomePackage'  # from 'feature' branch
python -m pip install -e 'git+https://git.repo/some_repo.git#egg=subdir&subdirectory=subdir_path' # install a python package from a repo subdirectory

更多用法可以参见官方文档:https://pip.pypa.io/en/stable/topics/vcs-support/

源码分析

class Bazaar(VersionControl):
    name = "bzr"
    dirname = ".bzr"
    repo_name = "branch"
    schemes = (
        "bzr+http",
        "bzr+https",
        "bzr+ssh",
        "bzr+sftp",
        "bzr+ftp",
        "bzr+lp",
        "bzr+file",
    )
class Git(VersionControl):
    name = "git"
    dirname = ".git"
    repo_name = "clone"
    schemes = (
        "git+http",
        "git+https",
        "git+ssh",
        "git+git",
        "git+file",
    )
class Mercurial(VersionControl):
    name = "hg"
    dirname = ".hg"
    repo_name = "clone"
    schemes = (
        "hg+file",
        "hg+http",
        "hg+https",
        "hg+ssh",
        "hg+static-http",
    )
class Subversion(VersionControl):
    name = "svn"
    dirname = ".svn"
    repo_name = "checkout"
    schemes = ("svn+ssh", "svn+http", "svn+https", "svn+svn", "svn+file")

从源码来看,pip 共支持四种方式,每种方式都支持多种 schemesschemes 可以理解为前缀,源码位置分别在:

src/pip/_internal/vcs/bazaar.py
src/pip/_internal/vcs/git.py
src/pip/_internal/vcs/mercurial.py
src/pip/_internal/vcs/subversion.py

每种安装方式都继承了 VersionControl 这个类,并且通过对指定版本控制系统命令的封装,支持切换分支,指定哈希,定制 tag 等功能,并且在文件的最后一行在 vcs 中进行了注册:

vcs.register(Git)

注册的方案很巧妙,这种方式可以很方便的进行拓展,例如想要支持一个新的版本控制系统的话,直接继承 VersionControl 类然后实现对应的方法即可,很方便。

来看注册代码:

   class VcsSupport:
    _registry: Dict[str, "VersionControl"] = {}
    schemes = ["ssh", "git", "hg", "bzr", "sftp", "svn"]

    # 其他代码省略...

    def register(self, cls: Type["VersionControl"]) -> None:
        if not hasattr(cls, "name"):
            logger.warning("Cannot register VCS %s", cls.__name__)
            return
        if cls.name not in self._registry:
            self._registry[cls.name] = cls()
            logger.debug("Registered VCS backend: %s", cls.name)
 
     # 其他代码省略...

    def get_backend_for_scheme(self, scheme: str) -> Optional["VersionControl"]:
        """
        Return a VersionControl object or None.
        """
        for vcs_backend in self._registry.values():
            if scheme in vcs_backend.schemes:
                return vcs_backend
        return None

判断是否有 name 属性,如果没有的话禁止注册,如果有的话,通过 name 属性进行注册,对类进行实例化并且将实例保存到 vcs 的 _registry 字典中。

def direct_url_from_link(
    link: Link, source_dir: Optional[str] = None, link_is_in_wheel_cache: bool = False
) -> DirectUrl:
    if link.is_vcs:
        vcs_backend = vcs.get_backend_for_scheme(link.scheme)
        assert vcs_backend
        url, requested_revision, _ = vcs_backend.get_url_rev_and_auth(
            link.url_without_fragment
        )
        # For VCS links, we need to find out and add commit_id.
        if link_is_in_wheel_cache:
            # If the requested VCS link corresponds to a cached
            # wheel, it means the requested revision was an
            # immutable commit hash, otherwise it would not have
            # been cached. In that case we don't have a source_dir
            # with the VCS checkout.
            assert requested_revision
            commit_id = requested_revision
        else:
            # If the wheel was not in cache, it means we have
            # had to checkout from VCS to build and we have a source_dir
            # which we can inspect to find out the commit id.
            assert source_dir
            commit_id = vcs_backend.get_revision(source_dir)
        return DirectUrl(
            url=url,
            info=VcsInfo(
                vcs=vcs_backend.name,
                commit_id=commit_id,
                requested_revision=requested_revision,
            ),
            subdirectory=link.subdirectory_fragment,
        )
    elif link.is_existing_dir():
        return DirectUrl(
            url=link.url_without_fragment,
            info=DirInfo(),
            subdirectory=link.subdirectory_fragment,
        )
    else:
        hash = None
        hash_name = link.hash_name
        if hash_name:
            hash = f"{hash_name}={link.hash}"
        return DirectUrl(
            url=link.url_without_fragment,
            info=ArchiveInfo(hash=hash),
            subdirectory=link.subdirectory_fragment,
        )
    @classmethod
    def get_url_rev_and_auth(cls, url: str) -> Tuple[str, Optional[str], AuthInfo]:
        """
        Prefixes stub URLs like 'user@hostname:user/repo.git' with 'ssh://'.
        That's required because although they use SSH they sometimes don't
        work with a ssh:// scheme (e.g. GitHub). But we need a scheme for
        parsing. Hence we remove it again afterwards and return it as a stub.
        """
        # Works around an apparent Git bug
        # (see https://article.gmane.org/gmane.comp.version-control.git/146500)
        scheme, netloc, path, query, fragment = urlsplit(url)
        if scheme.endswith("file"):
            initial_slashes = path[: -len(path.lstrip("/"))]
            newpath = initial_slashes + urllib.request.url2pathname(path).replace(
                "\\", "/"
            ).lstrip("/")
            after_plus = scheme.find("+") + 1
            url = scheme[:after_plus] + urlunsplit(
                (scheme[after_plus:], netloc, newpath, query, fragment),
            )

        if "://" not in url:
            assert "file:" not in url
            url = url.replace("git+", "git+ssh://")
            url, rev, user_pass = super().get_url_rev_and_auth(url)
            url = url.replace("ssh://", "")
        else:
            url, rev, user_pass = super().get_url_rev_and_auth(url)

        return url, rev, user_pass

在实际下载的时候,通过 scheme 来获取指定的实例,使用获取到的实例调用对应的方法,下载代码仓库,并且执行命令切换到对应的分支。准备好安装文件后,执行后续正常的安装流程,因为流程比较长,这里就不一一分析了。

总结

以上就是通过 pip 直接安装 Github 上 Python 第三方库的方法。


LLLibra146
35 声望6 粉丝

会修电脑的程序员