Not long ago, I released the open source software Zack.DotNetTrimmer for slimming .NET Core programs. Compared with the built-in trimmer in .NET Core, Zack.DotNetTrimmer not only has better trimming effects, but also supports WPF and WinForm programs. .
Many friends are very interested in the principle of this open source project, so I will introduce how it works through this article.
Technique 1 - Detect program loaded assemblies and classes
Microsoft provides Diagnostics, a library for analyzing the runtime behavior of .NET Core, which can obtain rich runtime information, such as class instance creation, assembly loading, class loading, method calls, GC operations, and file read and write operations. , network connection, etc. The tool in Visual Studio that evaluates the invocation time of each method is implemented using Diagnostics.
To use the Diagnostics library, we first need to install the two assemblies Microsoft.Diagnostics.NETCore.Client and Microsoft.Diagnostics.Tracing.TraceEvent, and then use the DiagnosticsClient class to connect to the process of the .NET Core program being analyzed. The code looks like this:
using Microsoft.Diagnostics.NETCore.Client;
using Microsoft.Diagnostics.Tracing;
usingMicrosoft.Diagnostics.Tracing.Parsers;
using Microsoft.Diagnostics.Tracing.Parsers.Clr;
using System.Diagnostics;
using System.Diagnostics.Tracing;
string filepath =@"E:\temp\test6\ConsoleApp1.exe";//被分析的程序路径
ProcessStartInfo psInfo = newProcessStartInfo(filepath);
psInfo.UseShellExecute = true;
using Process? p = Process.Start(psInfo);//启动程序
var providers = newList<EventPipeProvider>()//要监听的事件
{
new EventPipeProvider("Microsoft-Windows-DotNETRuntime",
EventLevel.Informational,(long)ClrTraceEventParser.Keywords.All)
};
var client = new DiagnosticsClient(p.Id);//设定DiagnosticsClient监听的进程
using EventPipeSession session =client.StartEventPipeSession(providers, false);//启动监听
var source = newEventPipeEventSource(session.EventStream);
source.Clr.All += (TraceEvent obj) =>
{
if (obj is ModuleLoadUnloadTraceData)//程序集加载事件
{
var data = (ModuleLoadUnloadTraceData)obj;
string path = data.ModuleILPath;//获取程序集的路径
Console.WriteLine($"Assembly Loaded:{path}");
}
else if (obj is TypeLoadStopTraceData)//类加载事件
{
var data = (TypeLoadStopTraceData)obj;
string typeName = data.TypeName;//获取类名
Console.WriteLine($"Type Loaded:{typeName}");
}
};
source.Process();
Different types of messages correspond to different types of objects in the source.Clr.All event. These classes inherit from TraceEvent. What I am analyzing here is the assembly loading event ModuleLoadUnloadTraceData and the class loading event TypeLoadStopTraceData.
In this way, we can know the assembly and type information loaded during the running of the program, so that we know which assemblies and types are not loaded, so that we know which assemblies and types to delete.
Technique 2 - Remove unused classes from assembly
Zack.DotNetTrimmer provides a function to delete the IL of unused classes in the assembly. This function uses the dnlib library to complete the editing of the assembly file. Dnlib is an open source project that reads, writes, and edits .NET assembly files.
In Dnlib, we use ModuleDefMD.Load to load an existing assembly, and the return value of the Load method is of type ModuleDefMD. ModuleDefMD represents assembly information, for example, the Types attribute in it represents all types in the assembly. We can modify ModuleDefMD and the objects in it, call the Write method of the modified assembly and save it to disk.
For example, the following code is used to change all non-public types in an assembly to public types, and clear all the attributes modified on the method:
using dnlib.DotNet;
string filename =@"E:\temp\net6.0\AppToBeTested1.dll";
ModuleDefMD module =ModuleDefMD.Load(filename);
foreach(var typeDef in module.Types)
{
if (typeDef.IsPublic == false)
{
typeDef.Attributes |= TypeAttributes.Public;//修改类的访问级别
}
foreach(var methodDef in typeDef.Methods)
{
methodDef.CustomAttributes.Clear();//清除方法的Attribute
}
}
module.Write(@"E:\temp\net6.0\1.dll");//保存修改
Here is the source code of the assembly under test:
internal class Class1
{
[DisplayName("AAA")]
publicvoid AA()
{
Console.WriteLine("hello");
}
}
The following is the decompilation result of the modified assembly:
public class Class1
{
publicvoid AA()
{
Console.WriteLine("hello");
}
}
You can see that our modifications to the assembly have worked.
After mastering the method of using Dnlib to modify the assembly, we can realize the function of deleting the types that are not used in the assembly. We only need to delete the corresponding type from the Types attribute of ModuleDefMD. However, in practice, this will encounter problems, because the class we want to delete may be referenced by other places, although those places only refer to the class we want to delete, and are not really called, but in order to ensure the modified assembly The validity of the verification, the Write method of ModuleDefMD will still do the legality verification, otherwise the Write method will throw a ModuleWriterException exception, such as:
ModuleWriterException: 'A method was removedthat is still referenced by this module.'
Therefore, we write code that requires careful inspection of the assembly, making sure to remove every reference to the class to be removed. Because the file size occupied by the class definition itself is very small, the main code space is occupied in the method body of the class, so I found an alternative, that is, not to delete the class, but to clear the method body of the class.
In Dnlib, the type corresponding to the method is the MethodDef type, and the Body attribute of the CilBody type of the MethodDef represents the method body of the method. If the method has a method body (that is, not an abstract method, etc.), then CilBody's Instructions represent the set of IL instructions for the method body code. So I immediately thought of clearing the body of the method with the following code:
methodDef.Body.Instructions.Clear();
However, when running, when using the above code to save the cleaned ModuleDefMD, it may cause the problem of illegal assembly structure. For example, some methods define the return value. If we directly clear the method body, it will cause the method to be invalid. The problem with the return value being returned. So I changed the way of thinking, which is to change all method bodies to throw null; the IL code corresponding to this C# code, because all method bodies can be changed to throw an exception to ensure the correct logic sex. So I wrote the following code to clean up the method body:
method.Body.ExceptionHandlers.Clear();
method.Body.Instructions.Clear();
method.Body.Variables.Clear();
method.Body.Instructions.Add(newInstruction(OpCodes.Nop) { Offset = 0 });
method.Body.Instructions.Add(newInstruction(OpCodes.Ldnull) { Offset = 1 });
method.Body.Instructions.Add(newInstruction(OpCodes.Throw) { Offset = 2 });
The IL code added in the last three lines is the C# code corresponding to the thrownull line.
Please check the github address of the project to get all the source code, the project address:
https://github.com/yangzhongke/Zack.DotNetTrimmer
Other issues with Dnlib usage
In the process of using Dnlib, I have some other gains, which are recorded here and shared with you.
▍ Harvest 1: Problems encountered when Dnlib saves assemblies containing native code
When using the method I mentioned above to clean up the assembly, most of the custom assemblies we wrote and the assemblies of third-party NuGet packages are fine. However, I encountered a problem when using the same method to deal with .NET Core basic assemblies such as PresentationCore.dll, System.Private.CoreLib.dll, etc., that is, even if I only load the assembly, after making no changes, directly Write, the assembly will also be significantly smaller. For example, I use the following code to handle PresentationFramework.dll:
using (var mod =ModuleDefMD.Load(@"E:\temp\PresentationFramework.dll"))
{
mod.Write(@"E:\temp\PresentationFramework.New.dll");
}
The original PresentationFramework.dll size is 15.9MB, and the new file size after saving is only 5.7MB. After asking the author of Dnlib, I learned that these assemblies contain native code (such as code written in C++/CLI or assemblies in the format of ReadyToRun / NGEN / CrossGen), and these native codes will be ignored when saving with the Write method, which is the saving process. The resulting assembly size is significantly smaller. We can use the NativeWrite method instead of the Write method because this method preserves the native code.
However, according to Washi1337, the author of AsmResolver (an open source project similar to DnLib), the NativeWrite method will try to preserve the structure of the native code so it cannot reduce the size of the assembly, and may even increase the size of the assembly instead. And in actual use, I found that after modifying these assemblies, the program would fail to start. Looking at the Windows event log, I found that the CLR failed to start when the program started. According to Washi1337, if only the native code of ReadyToRun is in the assembly, just remove the ILLibrary flag in the assembly, let the CLR skip the native code of ReadyToRun, and execute the IL code directly, after all, for the assembly optimized by ReadyToRun The original IL code is still preserved. But after I did what Washi1337 said, the program still failed to start. I don't know the reason, because the assembly containing the native code cannot be well tailored, so I didn't study it further. Friends who are proficient in CLR are welcome to share their experience. .
▍ Harvest 2: Other applications of Dnlib
Since DnLib can modify assemblies, we can use it to do a lot of things, like modify the default behavior of the program (you get the idea). We can use DnLib to write our own code obfuscator or implement Aspect Oriented Programming (AOP) static weaving.
What other application scenarios of DnLib have you thought of? Welcome to share.
Microsoft Most Valuable Professional (MVP)
The Microsoft Most Valuable Professional is a global award given to third-party technology professionals by Microsoft Corporation. For 29 years, technology community leaders around the world have received this award for sharing their expertise and experience in technology communities both online and offline.
MVPs are a carefully selected team of experts who represent the most skilled and intelligent minds, passionate and helpful experts who are deeply invested in the community. MVP is committed to helping others and maximizing the use of Microsoft technologies by Microsoft technical community users by speaking, forum Q&A, creating websites, writing blogs, sharing videos, open source projects, organizing conferences, etc.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。