What is Yarn duplicate

Students who use yarn as a package manager may find that different versions of a package are repeatedly packaged when the app is built, even if these versions of the package are compatible.

For example, suppose there are the following dependencies:

monorepo-4

When (p)npm is installed to the same module, it is judged whether the installed module version meets the version range of the new module. If it does, skip it. If it does not, install the module under node_modules of the current module. That is, lib-a will reuse lib-b@1.1.0 that app depends on.

However, using Yarn v1 as the package manager, lib-a will install a separate copy of lib-b@1.2.0.

🤔 Think about it, if the app project relies on lib-b@^1.1.0, is there no problem?

yarn-duplicate

When app installs lib-b@^1.1.0, the latest version of lib-b is 1.1.0, then lib-b@1.1.0 will be locked yarn.lock

If lib-a is installed after a period of time, and the latest version of lib-b is 1.2.0, then Yarn duplicate will still appear, so this problem is still relatively common.

Although the company's Monorepo project was migrated to Rush and pnpm, many projects still use Yarn as the underlying package management tool, and there is no migration plan.

For this kind of project, we can use the yarn-deduplicate command line tool to modify yarn.lock to deduplicate.

yarn-deduplicate — The Hero We Need

Basic use

yarn.lock according to the default strategy

npx yarn-deduplicate yarn.lock

Processing strategy

--strategy <strategy>

highest strategy

The default strategy will try to use the largest version installed.

For example 1, there are the following yarn.lock :

library@^1.0.0:
  version "1.0.0"

library@^1.1.0:
  version "1.1.0"

library@^1.0.0:
  version "1.3.0"

The revised results are as follows:

library@^1.0.0, library@^1.1.0:
  version "1.3.0"

library@^1.0.0, library@^1.1.0 will be locked at 1.3.0 (the largest version currently installed).

Example 2:

Change library@^1.1.0 to library@1.1.0

library@^1.0.0:
  version "1.0.0"

library@1.1.0:
  version "1.1.0"

library@^1.0.0:
  version "1.3.0"

The revised results are as follows:

library@1.1.0:
  version "1.1.0"

library@^1.0.0:
  version "1.3.0"

library@1.1.0 remains unchanged, library@^1.0.0 is unified to the currently installed largest version 1.3.0.

fewer strategies

We will try to use the least number of packages. is the least number, not the lowest version. If the number of installations is the same, use the highest version .

Example 1:

library@^1.0.0:
  version "1.0.0"

library@^1.1.0:
  version "1.1.0"

library@^1.0.0:
  version "1.3.0"

The revised results are as follows:

library@^1.0.0, library@^1.1.0:
  version "1.3.0"

Note: with highest strategy is no different .

Example 2:

Change library@^1.1.0 to library@1.1.0

library@^1.0.0:
  version "1.0.0"

library@1.1.0:
  version "1.1.0"

library@^1.0.0:
  version "1.3.0"

The revised results are as follows:

library@^1.0.0, library@^1.1.0:
  version "1.1.0"

It can be found that only the 1.1.0 version can be used to minimize the installed version.

Progressive change

A shuttle is fast, but it may bring risks, so it needs to support gradual transformation.

--packages <package1> <package2> <packageN>

Specify a specific package

--scopes <scope1> <scope2> <scopeN>

Specify the Package under a certain scope

Diagnostic information

--list

Only output diagnostic information

Principle analysis of yarn-deduplicate

Basic process

By checking the package.json of yarn-deduplicate, you can find that the package depends on the following packages:

  • commander complete node.js command line solution;
  • @yarnpkg/lockfile parse or write the yarn.lock file;
  • semver The semantic versioner for npm can be used to determine whether the installed version meets the required version of package.json.

There are mainly two files in the source code:

  1. cli.js , command line related capabilities. Analyze the parameters and execute the method in index.js
  2. index.js . The main logic code.

yarn-duplicate-1

It can be found that the key point is getDuplicatedPackages .

Get Duplicated Packages

First, clarify the realization ideas of getDuplicatedPackages

Suppose the following yarn.lock , and the goal is to find lodash@^4.17.15 of bestVersion .

lodash@^4.17.15:
  version "4.17.21"

lodash@4.17.16:
  version "4.17.16"
  1. By yarn.lock analyzed lodash@^4.17.15 of requestedVersion is ^4.17.15 , installedVersion is 4.17.21 ;
  2. Gets meet requestedVersion(^4.17.15) all installedVersion , namely 4.17.21 and 4.17.16 ;
  3. installedVersion that meets the current strategy from bestVersion (if the current strategy is fewer , then lodash@^4.17.15 of bestVersion is 4.17.16 , otherwise it is 4.17.21 ).

Type definition

const getDuplicatedPackages = (
  json: YarnLock,
  options: Options
): DuplicatedPackages => {
  // todo
};

// 解析 yarn.lock 获取到的 object
interface YarnLock {
  [key: string]: YarnLockVal;
}

interface YarnLockVal {
  version: string; // installedVersion
  resolved: string;
  integrity: string;
  dependencies: {
    [key: string]: string;
  };
}

// 类似于这种结构
const yarnLockInstanceExample = {
  // ...
  "lodash@^4.17.15": {
    version: "4.17.21",
    resolved:
      "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz#679591c564c3bffaae8454cf0b3df370c3d6911c",
    integrity:
      "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==",
    dependencies: {
      "fake-lib-x": "^1.0.0", // lodash 实际上没有 dependencies
    },
  },
  // ...
};

// 由命令行参数解析而来
interface Options {
  includeScopes: string[]; // 指定 scope 下的 packages 默认为 []
  includePackages: string[]; // 指定要处理的 packages 默认为 []
  excludePackages: string[]; // 指定不处理的 packages 默认为 []
  useMostCommon: boolean; // 策略为 fewer 时 该值为 true
  includePrerelease: boolean; // 是否考虑 prerelease 版本的 package 默认为 false
}

type DuplicatedPackages = PackageInstance[];

interface PackageInstance {
  name: string; // package name 如 lodash
  bestVersion: string; // 在当前策略下的最佳版本
  requestedVersion: string; // 要求的版本 ^15.6.2
  installedVersion: string; // 已安装的版本 15.7.2
}

The ultimate goal is to obtain PackageInstance .

Get yarn.lock data

const fs = require("fs");
const lockfile = require("@yarnpkg/lockfile");

const parseYarnLock = (file) => lockfile.parse(file).object;

// file 字段通过 commander 从命令行参数获取
const yarnLock = fs.readFileSync(file, "utf8");
const json = parseYarnLock(yarnLock);

Extract Packages

We need to filter out some packages based on the specified range of parameters.

At the same time, the keys in the yarn.lock lodash@^4.17.15 , and there may also be lodash@4.17.16 as the key. This form of key name is not convenient for searching data.

We can uniformly name the lodash package as key, value as an array, and array items as different version information to facilitate subsequent processing.

interface ExtractedPackage {
  [key: string]: {
    pkg: YarnLockVal;
    name: string;
    requestedVersion: string;
    installedVersion: string;
    satisfiedBy: Set<string>;
  };
}

interface ExtractedPackages {
  [key: string]: ExtractedPackage[];
}

satisfiedBy is used to store meet this Package Penalty for requestedVersion all installedVersion , the default value new Set() .

bestVersion that satisfies the strategy, that is installedVersion , is taken out from the set.

The specific implementation is as follows:

const extractPackages = (
  json,
  includeScopes = [],
  includePackages = [],
  excludePackages = []
) => {
  const packages = {};
  // 匹配 yarn.lock object key 的正则
  const re = /^(.*)@([^@]*?)$/;

  Object.keys(json).forEach((name) => {
    const pkg = json[name];
    const match = name.match(re);

    let packageName, requestedVersion;
    if (match) {
      [, packageName, requestedVersion] = match;
    } else {
      // 如果没有匹配数据,说明没有指定具体版本号,则为 * (https://docs.npmjs.com/files/package.json#dependencies)
      packageName = name;
      requestedVersion = "*";
    }

    // 根据指定范围的参数过滤掉一些 package

    // 如果指定了 scopes 数组, 只处理相关 scopes 下的 packages
    if (
      includeScopes.length > 0 &&
      !includeScopes.find((scope) => packageName.startsWith(`${scope}/`))
    ) {
      return;
    }

    // 如果指定了 packages, 只处理相关 packages
    if (includePackages.length > 0 && !includePackages.includes(packageName))
      return;

    if (excludePackages.length > 0 && excludePackages.includes(packageName))
      return;

    packages[packageName] = packages[packageName] || [];
    packages[packageName].push({
      pkg,
      name: packageName,
      requestedVersion,
      installedVersion: pkg.version,
      satisfiedBy: new Set(),
    });
  });
  return packages;
};

After completing the extraction of the packages, we need to add the satisfiedBy field, and calculate bestVersion through it, that is, realize computePackageInstances .

Compute Package Instances

The related types are defined as follows:

interface PackageInstance {
  name: string; // package name 如 lodash
  bestVersion: string; // 在当前策略下的最佳版本
  requestedVersion: string; // 要求的版本 ^15.6.2
  installedVersion: string; // 已安装的版本 15.7.2
}

const computePackageInstances = (
  packages: ExtractedPackages,
  name: string,
  useMostCommon: boolean,
  includePrerelease = false
): PackageInstance[] => {
  // todo
};

The realization of computePackageInstances can be divided into three steps:

  1. Get all installedVersion information of the current package;
  2. Supplement the satisfiedBy field;
  3. satisfiedBy calculated by bestVersion .

Get installedVersion information

/**
 * versions 记录当前 package 所有 installedVersion 的数据
 * satisfies 字段用于存储当前 installedVersion 满足的 requestedVersion
 * 初始值为 new Set()
 * 通过该字段的 size 可以分析出满足 requestedVersion 数量最多的 installedVersion
 * 用于 fewer 策略
 */
interface Versions {
  [key: string]: { pkg: YarnLockVal; satisfies: Set<string> };
}

// 当前 package name 对应的依赖信息
const packageInstances = packages[name];

const versions = packageInstances.reduce((versions, packageInstance) => {
  if (packageInstance.installedVersion in versions) return versions;
  versions[packageInstance.installedVersion] = {
    pkg: packageInstance.pkg,
    satisfies: new Set(),
  };
  return versions;
}, {} as Versions);

supplement satisfiedBy and satisfies fields

// 遍历全部的 installedVersion
Object.keys(versions).forEach((version) => {
  const satisfies = versions[version].satisfies;
  // 逐个遍历 packageInstance
  packageInstances.forEach((packageInstance) => {
    // packageInstance 自身的 installedVersion 必定满足自身的 requestedVersion
    packageInstance.satisfiedBy.add(packageInstance.installedVersion);
    if (
      semver.satisfies(version, packageInstance.requestedVersion, {
        includePrerelease,
      })
    ) {
      satisfies.add(packageInstance);
      packageInstance.satisfiedBy.add(version);
    }
  });
});

calculated based on satisfiedBy and satisfies bestVersion

packageInstances.forEach((packageInstance) => {
  const candidateVersions = Array.from(packageInstance.satisfiedBy);
  // 进行排序
  candidateVersions.sort((versionA, versionB) => {
    // 如果使用 fewer 策略,根据当前 satisfiedBy 中 `satisfies` 字段的 size 排序
    if (useMostCommon) {
      if (versions[versionB].satisfies.size > versions[versionA].satisfies.size)
        return 1;
      if (versions[versionB].satisfies.size < versions[versionA].satisfies.size)
        return -1;
    }
    // 如果使用 highest 策略,使用最高版本
    return semver.rcompare(versionA, versionB, { includePrerelease });
  });
  packageInstance.satisfiedBy = candidateVersions;
  packageInstance.bestVersion = candidateVersions[0];
});

return packageInstances;

Complete getDuplicatedPackages

const getDuplicatedPackages = (
  json,
  {
    includeScopes,
    includePackages,
    excludePackages,
    useMostCommon,
    includePrerelease = false,
  }
) => {
  const packages = extractPackages(
    json,
    includeScopes,
    includePackages,
    excludePackages
  );
  return Object.keys(packages)
    .reduce(
      (acc, name) =>
        acc.concat(
          computePackageInstances(
            packages,
            name,
            useMostCommon,
            includePrerelease
          )
        ),
      []
    )
    .filter(
      ({ bestVersion, installedVersion }) => bestVersion !== installedVersion
    );
};

Conclusion

This article introduces Yarn duplicate, introduces yarn-deduplicate as a solution, and analyzes internal related implementations, looking forward to the arrival of Yarn v2.


海秋
311 声望19 粉丝

前端新手