What is Yarn duplicate
Students who use yarn as a package manager may find that different versions of a package are repeatedly packaged when the app is built, even if these versions of the package are compatible.
For example, suppose there are the following dependencies:
When (p)npm is installed to the same module, it is judged whether the installed module version meets the version range of the new module. If it does, skip it. If it does not, install the module under node_modules of the current module. That is, lib-a will reuse lib-b@1.1.0 that app depends on.
However, using Yarn v1 as the package manager, lib-a will install a separate copy of lib-b@1.2.0.
- difference between npm and yarn behavior with nested dependencies #3951
- Yarn installing multiple versions of the same package
- Yarn v2 supports package deduplication natively
🤔 Think about it, if the app project relies on lib-b@^1.1.0, is there no problem?
When app installs lib-b@^1.1.0, the latest version of lib-b is 1.1.0, then lib-b@1.1.0 will be locked yarn.lock
If lib-a is installed after a period of time, and the latest version of lib-b is 1.2.0, then Yarn duplicate will still appear, so this problem is still relatively common.
Although the company's Monorepo project was migrated to Rush and pnpm, many projects still use Yarn as the underlying package management tool, and there is no migration plan.
For this kind of project, we can use the yarn-deduplicate command line tool to modify yarn.lock
to deduplicate.
yarn-deduplicate — The Hero We Need
Basic use
yarn.lock
according to the default strategy
npx yarn-deduplicate yarn.lock
Processing strategy
--strategy <strategy>
highest strategy
The default strategy will try to use the largest version installed.
For example 1, there are the following yarn.lock
:
library@^1.0.0:
version "1.0.0"
library@^1.1.0:
version "1.1.0"
library@^1.0.0:
version "1.3.0"
The revised results are as follows:
library@^1.0.0, library@^1.1.0:
version "1.3.0"
library@^1.0.0, library@^1.1.0 will be locked at 1.3.0 (the largest version currently installed).
Example 2:
Change library@^1.1.0 to library@1.1.0
library@^1.0.0:
version "1.0.0"
library@1.1.0:
version "1.1.0"
library@^1.0.0:
version "1.3.0"
The revised results are as follows:
library@1.1.0:
version "1.1.0"
library@^1.0.0:
version "1.3.0"
library@1.1.0 remains unchanged, library@^1.0.0 is unified to the currently installed largest version 1.3.0.
fewer strategies
We will try to use the least number of packages. is the least number, not the lowest version. If the number of installations is the same, use the highest version .
Example 1:
library@^1.0.0:
version "1.0.0"
library@^1.1.0:
version "1.1.0"
library@^1.0.0:
version "1.3.0"
The revised results are as follows:
library@^1.0.0, library@^1.1.0:
version "1.3.0"
Note: with highest
strategy is no different .
Example 2:
Change library@^1.1.0 to library@1.1.0
library@^1.0.0:
version "1.0.0"
library@1.1.0:
version "1.1.0"
library@^1.0.0:
version "1.3.0"
The revised results are as follows:
library@^1.0.0, library@^1.1.0:
version "1.1.0"
It can be found that only the 1.1.0 version can be used to minimize the installed version.
Progressive change
A shuttle is fast, but it may bring risks, so it needs to support gradual transformation.
--packages <package1> <package2> <packageN>
Specify a specific package
--scopes <scope1> <scope2> <scopeN>
Specify the Package under a certain scope
Diagnostic information
--list
Only output diagnostic information
Principle analysis of yarn-deduplicate
Basic process
By checking the package.json of yarn-deduplicate, you can find that the package depends on the following packages:
- commander complete node.js command line solution;
- @yarnpkg/lockfile parse or write the yarn.lock file;
- semver The semantic versioner for npm can be used to determine whether the installed version meets the required version of package.json.
There are mainly two files in the source code:
cli.js
, command line related capabilities. Analyze the parameters and execute the method inindex.js
index.js
. The main logic code.
It can be found that the key point is getDuplicatedPackages
.
Get Duplicated Packages
First, clarify the realization ideas of getDuplicatedPackages
Suppose the following yarn.lock
, and the goal is to find lodash@^4.17.15
of bestVersion
.
lodash@^4.17.15:
version "4.17.21"
lodash@4.17.16:
version "4.17.16"
- By
yarn.lock
analyzedlodash@^4.17.15
ofrequestedVersion
is^4.17.15
,installedVersion
is4.17.21
; - Gets meet
requestedVersion(^4.17.15)
allinstalledVersion
, namely4.17.21
and4.17.16
; installedVersion
that meets the current strategy frombestVersion
(if the current strategy isfewer
, thenlodash@^4.17.15
ofbestVersion
is4.17.16
, otherwise it is4.17.21
).
Type definition
const getDuplicatedPackages = (
json: YarnLock,
options: Options
): DuplicatedPackages => {
// todo
};
// 解析 yarn.lock 获取到的 object
interface YarnLock {
[key: string]: YarnLockVal;
}
interface YarnLockVal {
version: string; // installedVersion
resolved: string;
integrity: string;
dependencies: {
[key: string]: string;
};
}
// 类似于这种结构
const yarnLockInstanceExample = {
// ...
"lodash@^4.17.15": {
version: "4.17.21",
resolved:
"https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz#679591c564c3bffaae8454cf0b3df370c3d6911c",
integrity:
"sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==",
dependencies: {
"fake-lib-x": "^1.0.0", // lodash 实际上没有 dependencies
},
},
// ...
};
// 由命令行参数解析而来
interface Options {
includeScopes: string[]; // 指定 scope 下的 packages 默认为 []
includePackages: string[]; // 指定要处理的 packages 默认为 []
excludePackages: string[]; // 指定不处理的 packages 默认为 []
useMostCommon: boolean; // 策略为 fewer 时 该值为 true
includePrerelease: boolean; // 是否考虑 prerelease 版本的 package 默认为 false
}
type DuplicatedPackages = PackageInstance[];
interface PackageInstance {
name: string; // package name 如 lodash
bestVersion: string; // 在当前策略下的最佳版本
requestedVersion: string; // 要求的版本 ^15.6.2
installedVersion: string; // 已安装的版本 15.7.2
}
The ultimate goal is to obtain PackageInstance
.
Get yarn.lock
data
const fs = require("fs");
const lockfile = require("@yarnpkg/lockfile");
const parseYarnLock = (file) => lockfile.parse(file).object;
// file 字段通过 commander 从命令行参数获取
const yarnLock = fs.readFileSync(file, "utf8");
const json = parseYarnLock(yarnLock);
Extract Packages
We need to filter out some packages based on the specified range of parameters.
At the same time, the keys in the yarn.lock
lodash@^4.17.15
, and there may also be lodash@4.17.16
as the key. This form of key name is not convenient for searching data.
We can uniformly name the lodash
package as key, value as an array, and array items as different version information to facilitate subsequent processing.
interface ExtractedPackage {
[key: string]: {
pkg: YarnLockVal;
name: string;
requestedVersion: string;
installedVersion: string;
satisfiedBy: Set<string>;
};
}
interface ExtractedPackages {
[key: string]: ExtractedPackage[];
}
satisfiedBy
is used to store meet this Package Penalty for requestedVersion
all installedVersion
, the default value new Set()
.
bestVersion
that satisfies the strategy, that isinstalledVersion
, is taken out from the set.
The specific implementation is as follows:
const extractPackages = (
json,
includeScopes = [],
includePackages = [],
excludePackages = []
) => {
const packages = {};
// 匹配 yarn.lock object key 的正则
const re = /^(.*)@([^@]*?)$/;
Object.keys(json).forEach((name) => {
const pkg = json[name];
const match = name.match(re);
let packageName, requestedVersion;
if (match) {
[, packageName, requestedVersion] = match;
} else {
// 如果没有匹配数据,说明没有指定具体版本号,则为 * (https://docs.npmjs.com/files/package.json#dependencies)
packageName = name;
requestedVersion = "*";
}
// 根据指定范围的参数过滤掉一些 package
// 如果指定了 scopes 数组, 只处理相关 scopes 下的 packages
if (
includeScopes.length > 0 &&
!includeScopes.find((scope) => packageName.startsWith(`${scope}/`))
) {
return;
}
// 如果指定了 packages, 只处理相关 packages
if (includePackages.length > 0 && !includePackages.includes(packageName))
return;
if (excludePackages.length > 0 && excludePackages.includes(packageName))
return;
packages[packageName] = packages[packageName] || [];
packages[packageName].push({
pkg,
name: packageName,
requestedVersion,
installedVersion: pkg.version,
satisfiedBy: new Set(),
});
});
return packages;
};
After completing the extraction of the packages, we need to add the satisfiedBy
field, and calculate bestVersion
through it, that is, realize computePackageInstances
.
Compute Package Instances
The related types are defined as follows:
interface PackageInstance {
name: string; // package name 如 lodash
bestVersion: string; // 在当前策略下的最佳版本
requestedVersion: string; // 要求的版本 ^15.6.2
installedVersion: string; // 已安装的版本 15.7.2
}
const computePackageInstances = (
packages: ExtractedPackages,
name: string,
useMostCommon: boolean,
includePrerelease = false
): PackageInstance[] => {
// todo
};
The realization of computePackageInstances
can be divided into three steps:
- Get all
installedVersion
information of the current package; - Supplement the
satisfiedBy
field; satisfiedBy
calculated bybestVersion
.
Get installedVersion
information
/**
* versions 记录当前 package 所有 installedVersion 的数据
* satisfies 字段用于存储当前 installedVersion 满足的 requestedVersion
* 初始值为 new Set()
* 通过该字段的 size 可以分析出满足 requestedVersion 数量最多的 installedVersion
* 用于 fewer 策略
*/
interface Versions {
[key: string]: { pkg: YarnLockVal; satisfies: Set<string> };
}
// 当前 package name 对应的依赖信息
const packageInstances = packages[name];
const versions = packageInstances.reduce((versions, packageInstance) => {
if (packageInstance.installedVersion in versions) return versions;
versions[packageInstance.installedVersion] = {
pkg: packageInstance.pkg,
satisfies: new Set(),
};
return versions;
}, {} as Versions);
supplement satisfiedBy
and satisfies
fields
// 遍历全部的 installedVersion
Object.keys(versions).forEach((version) => {
const satisfies = versions[version].satisfies;
// 逐个遍历 packageInstance
packageInstances.forEach((packageInstance) => {
// packageInstance 自身的 installedVersion 必定满足自身的 requestedVersion
packageInstance.satisfiedBy.add(packageInstance.installedVersion);
if (
semver.satisfies(version, packageInstance.requestedVersion, {
includePrerelease,
})
) {
satisfies.add(packageInstance);
packageInstance.satisfiedBy.add(version);
}
});
});
calculated based on satisfiedBy
and satisfies
bestVersion
packageInstances.forEach((packageInstance) => {
const candidateVersions = Array.from(packageInstance.satisfiedBy);
// 进行排序
candidateVersions.sort((versionA, versionB) => {
// 如果使用 fewer 策略,根据当前 satisfiedBy 中 `satisfies` 字段的 size 排序
if (useMostCommon) {
if (versions[versionB].satisfies.size > versions[versionA].satisfies.size)
return 1;
if (versions[versionB].satisfies.size < versions[versionA].satisfies.size)
return -1;
}
// 如果使用 highest 策略,使用最高版本
return semver.rcompare(versionA, versionB, { includePrerelease });
});
packageInstance.satisfiedBy = candidateVersions;
packageInstance.bestVersion = candidateVersions[0];
});
return packageInstances;
Complete getDuplicatedPackages
const getDuplicatedPackages = (
json,
{
includeScopes,
includePackages,
excludePackages,
useMostCommon,
includePrerelease = false,
}
) => {
const packages = extractPackages(
json,
includeScopes,
includePackages,
excludePackages
);
return Object.keys(packages)
.reduce(
(acc, name) =>
acc.concat(
computePackageInstances(
packages,
name,
useMostCommon,
includePrerelease
)
),
[]
)
.filter(
({ bestVersion, installedVersion }) => bestVersion !== installedVersion
);
};
Conclusion
This article introduces Yarn duplicate, introduces yarn-deduplicate as a solution, and analyzes internal related implementations, looking forward to the arrival of Yarn v2.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。