Shredded compilation. . .

Three articles in the compilation series have been updated, and each one is a summary by the author, I hope it will be helpful to you

you how to assemble Debug

I love love, this register is a bit

In the previous article, we mainly talked about some basic assembly instructions, and through a debugging software called Debug, let us see how instructions and data are stored in the memory. After learning these, we can understand the assembly program. .

The execution process of the program

First, let us introduce the execution process of the program through a schematic diagram. Let's take a simple hello.c program in C language as an example.

This is a complete hello world program execution process, which involves several core components: preprocessor, compiler, assembler, and linker , let’s break them one by one below.

In the preprocessing phase, the preprocessor will modify the source C program # #include <stdio.h> The command will tell the preprocessor to read stdio.h and insert it into the program as text.
Then comes the compilation phase (Compilation phase), the compiler will hello.i into the text hello.s , which includes an assembly-language program.

Assembly language is very useful because it can provide its own set of standard output languages for different high-level languages.

After the compilation is completed, it is the assembly phase (Assembly phase). In this step, the assembler will translate hello.s into machine instructions, package these instructions into a relocatable object program, and place it in the hello.c file.
The last one is the linking phase (Linking phase), this phase is the process of using the linker to merge the translated programs together to generate an executable file that runs directly on the operating system.

So, generally speaking, executable files include two aspects

Programs and data, these are the basic information that constitutes an executable file.
Relevant descriptive information, such as how much space is, how big is the program, etc., these are necessary factors that constitute an executable file.

`Know the assembler`

Similarly, first an assembly code, and then slowly summarize it below.

assume cs:code
code segment
        mov ax,1234H
        add ax,ax
        mov bx,1111H
        add bx,bx
code ends
end

You may not know a few things about this assembly code, but you should know what the mov and add instructions mean (if you have read the author's previous article and studied it carefully).

The instructions that make up the assembler are divided into two types: one is the assembly instruction, and the other is the pseudo-instruction. The assembly instruction is the mov and add instructions we mentioned above. These instructions have practical meanings. For example, mov is a move register or Data, add is the addition of registers or data. Moreover, assembly instructions such as mov and add have corresponding machine codes in the memory, which will eventually be executed by the CPU. The pseudo-instructions have no practical meaning. They simply define a program segment. These pseudo-instructions will compiler. They have no corresponding machine code in the memory, so they will not be executed by the CPU.

There are three kinds of pseudo-instructions mentioned above, namely

code segment
    ......
code ends

Segment and ends are a pair of instructions that appear in pairs, and this pair of instructions must appear in pairs, and no one can do without it. This pair of instructions defines a segment, segment marks the beginning of the segment, and ends marks the end of the segment. code represents the name of the section, and the section name can be replaced at will.

The assembler is composed of multiple sections (including at least one section), which are used to store code, data or used as stack space. The section in the above example code is composed of code, so it is called a code section.

In addition to the section, the assembler also needs assume , which is also a pseudo-instruction, which means to assume that a certain section of registers is associated with a certain section, and this association is explained by assume. Assumes don't need to understand deeply, we only need to know that when programming, we can associate the specific purpose segment with the relevant register.

end is a mark of the end of a section of assembler. It is also a pseudo-instruction. During the process of compiling the assembler, the compiler will stop compiling when it encounters end. Therefore, if we finish writing the assembler, we need to add it at the end of the program end, which means the end of the program.

In the assembler, in addition to assembly instructions and pseudo-instructions, there is also a label, such as the code in the above code, the label is located in front of the segment, as the name of the segment, the name of this segment will eventually be compiled and connected as a segment The segment address.

Remind again, be careful not to confuse end and ends here. Ends are used with segment to indicate the assembly section, and end is the end of assembly.

So in summary, the source program written in assembly language includes pseudo-instructions and assembly instructions. The pseudo-instructions are executed by the compiler, and the assembly instructions can be translated into machine code and finally executed by the CPU.

From now on, we can refer to the contents of the source program file as the source program, and the instructions or data in the source program that are ultimately executed and processed by the computer as the program. The program first exists in the original program in the form of assembly instructions, and then is converted into machine code after being compiled and linked, and stored in an executable file, as shown in the following figure

So, in summary, writing an assembler is mainly divided into the following steps

First define a segment, such as code, abc, etc.
Write assembly instructions in the segment
Indicate when the program ends
The label must be associated with the register.
The program returns (to be said later)

`Program return`

A complete program must have a return condition. Only after the program executes the relevant code, executes the return condition and gives up the CPU execution power, the operating system will allocate time slices to other programs. The program cannot always occupy the CPU. This is a waste of resources, and it keeps occupying the CPU, which can also cause the program to crash.

In assembly language, there are only two lines of instructions that implement the program's return

mov ax,4c00H
int 21H

Explain the meaning of these two commands:

mov ax,4c00H is to move 4c00 to ax. INT 21H is the instruction to call the system interrupt. These two lines of code work is AH = 4CH, which means to call the 4CH interrupt of INT 21H. This interrupt is the safe exit of the program.

So far, we have learned a few things related to the end, such as the end of the section, the end of the assembler, and the program return we just said. The following table lists the differences between these three instructions.

`Program error`

Generally speaking, assembly language program errors are divided into two types: syntax error and logic error .

The grammatical error is very simple. To put it bluntly, you have written the wrong assembly language instructions, which can be found during the compilation of this program.

Logic errors occur at runtime, and are generally not easy to be found, and it is more difficult to troubleshoot. For example, the following code is a logic error if it returns without writing a program.

assume cs:code
code segment
        mov ax,1234H
        add ax,ax
        mov bx,1111H
        add bx,bx
code ends
end

Why? Because your code does not add program return logic. There are many similar logic errors, and these errors need to be discovered in specific scenarios.

`Write assembly`

Now we start to use the editor to write the assembly source program, as long as the assembly is stored as a text file, and then edited by the compiler, the CPU can run.

We can use a variety of text formats to write assembly programs, for example, we can use the simplest text file to write (based on win7 operating system environment)

assume cs:codeseg
codeseg segment
        mov ax,0123H
        mov bx,0456H
        add ax,bx
        add ax,ax
codeseg ends
end

After writing .asm suffix file, which is an assembly format.

`Compile`

A complete assembler execution flow is divided into writing, compiling, linking and running, so next we need to compile the compiled assembler. Before compiling, we need to find a corresponding compiler. Here we use the masm.exe 5.0 assembly compiler, and the execution program is 061b1946abee37.

(In order to prevent you from looking for resources from the website, I downloaded it and put it in the network disk. masm can get it by replying to 061b1946abee4a in the background of programmer cxuan)

Speaking of the process of using masm 5.0, I stepped on a lot of pits. Here is a reminder for everyone to close the pit in time! ! !
masm 5.0 is a stable version. I don't know what happened to the 6.x circulated on the Internet. I didn't run it successfully.
masm 5.0 needs to run under win7 environment, I use win11 to test, the program is not compatible, I don't know how other versions are. The win7 version can run normally.

After the installation is complete, we open cmd and enter the downloaded and decompressed masm 5.0 folder.

Then type masm directly

After running masm, it will first display some version information, and then enter the name of the original program file that needs to be compiled. Here we need to pay attention. [.ASM] reminds us that the default file extension is asm. For example, the name of the source program file we want to compile is test.asm , Input asm directly here. If the source file is not suffixed with .asm, you need to enter its full name, which is test.txt.

Here we enter test, because the file we write is .asm suffix.

After inputting the source program file name, press the enter key, the program will prompt us to input the name of the target file to be compiled. The target file name is the final result after we compile the source program. The suffix of Object filename is .obj , because the .asm file will be automatically compiled into an .obj file, so we don't need to specify the file name anymore, directly press the enter key, and the .obj file will be directly generated.

After the target file name is determined, Source listing will appear, which prompts us to enter the name of the list file. This file is the intermediate result produced by the compiler in the process of compiling the source program into the target file. The compiler can prevent the compiler from generating this File, just type enter directly. If the compiler wants to generate this file, its suffix name is .lst .

Then continue to prompt Cross-reference, which prompts us to enter the name of the cross-reference file. This file, like Source listing, is an intermediate result produced by the compiler. It is not necessary to let the compiler generate this file, and we can directly press enter. If the compiler wants to generate this file, its suffix name is .crf .

Finally, the compiler will output a result. This output will display warning errors and errors that must be corrected. As can be seen from the above figure, our program has no warnings and compilation errors.

When entering the file name of the source program, point out the path. If you encounter the problem of “unable to open input file”, it’s best to put the assembler directly on the C drive. I put it on the desktop, which is C:\Users\Administrator\ This error can also occur under Desktop.

`connect`

After compiling the source program to get the target file, we need to link the target file to get the executable file. We got the .obj file in the previous step, and now we need to link the .obj file into an .exe, which is an executable file.

In order to achieve our needs, we need to use the Microsoft Overlay Linker 3.60 connector, the file name is link.exe, this application does not need to be downloaded again (the software obtained in my public account reply will include the compiler and the linker, after decompression , They will all be in the masm folder).

Now we enter DOS, cd into the link file, and type 061b1946abf02f.

After running link, some version information will appear, and then it will prompt the name of the target file that needs to be connected. Note here that the default file ends in .obj, so if the file you need to connect is an obj file, you don’t need to enter the suffix. If it is not an obj file, you need to enter the full name.

We just compiled a test.obj file, so we directly link this obj file.

Enter the file name to be connected (here you still need to enter the path where obj is located), and press enter.

After entering enter, a three-link prompt will continue.

The first prompt indicates that the program continues to prompt us to enter the name of the executable file to be generated. The executable file is the final result we need to connect to a program. The default .exe file is TEST.EXE, so we no longer need to specify file name. Here you can also specify the directory where the executable file is generated, and we don’t need it, so continue down.

The second prompt is that the linker prompts for the name of the image file. This file is the intermediate result of the linker linking the target file into an executable file. You can also make the linker not generate this file and continue to go down.

The third prompt is that the linker prompts to enter the name of the library file. The library file contains some subprograms that can be called. If the program calls a subprogram in the library, you need to specify it, otherwise you don't need it.

In the end, there will be a waring: no stack segment . I always thought that the final execution file would not be generated when this prompt appeared, but after I checked carefully, I found that this is just a waning, and the final execution file is in the masm folder. Next, I will cut a picture for you to see.

This prompt just tells us that has no stack segment . We can ignore this prompt completely. Of course, if your program has a problem, you cannot generate a file after connection.

This process of connecting is very useful, in the final analysis, there are three main functions

When the source program is very large, it can be divided into multiple source program files for compilation. After each individual compiled object file, you can link them together to generate an executable file.
A subroutine in a library file is called in the program, and the library file and the object file need to be linked together to generate an executable file.
Some of the contents of the machine code file generated after compilation cannot be directly executed. The linker needs to convert these contents into executable information to be able to link the compiled machine code file into an executable file.

`Execute application`

Now I have an asm file in my left hand, an obj file in my right hand, and an exe file in my mouth, so I am the king of mouths. After spending a long time, I finally made asm into an exe file. I'm tired, but don't rush to rest, it's still the last step, execute it!

So we execute the following TEST.EXE file

I'm a bit confused, why is there nothing, what about the output result? . . . . . .

After thinking about it, oh, we didn't use any library to output information to the console, we just did some data and register movement and addition operations.

Of course we can output information to the console, but we will demonstrate this later.

`Briefly talk about the loading process of the program`

As we all know, if a program is to be executed, it needs to be loaded into the memory, and then the CPU fetches instructions from the memory to execute the command.

So, when we use DOS, who is responsible for loading executable programs into memory?

In DOS, there is a command.com , which is also the shell of the DOS system.

After DOS is started, it will initialize first, and then run command.com. After command.com runs, after performing other related tasks, it will display a prompt on the screen and wait for user input.

If the user enters a command to be executed, such as cd, taskkill, etc., these commands are executed by command and wait for user input again after the execution is completed.

If the user enters the program to be executed, the command will find the executable file by the file name, then load it into the memory, set the CS:IP execution entry, and then the command will be suspended and the CPU will execute the program. After the program is executed, it will return command and command Wait for user input again.

Therefore, the execution process of a complete assembly program is as follows.

If this article is well written and helpful to you, then I would like to ask for a compliment!

Shredded compilation. . .

The execution process of the program

`Know the assembler`

`Program return`

`Program error`

`Write assembly`

`Compile`

`connect`

`Execute application`

`Briefly talk about the loading process of the program`

程序员cxuan

`引用和评论`

我的网站搞好了！

C++ 中 VS 项目引入公共配置文件

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

Spring 数据校验：@Validated 与@Valid 注解全面对比与应用

OpenWebUI：一站式 AI 应用构建平台体验