Java Debugging Technology JPDA Architecture Interpretation

Original link: "Java Debugging Technology JPDA Architecture Interpretation" http://www.ytbean.com/posts/java-debug-internals/

JPDA overview

The full name of JPDA is Java Platform Debugger Architecture, which is a mechanism officially designed by Java for Java code debugging. There is a dedicated page introduction on Oracle's official website. It belongs to a multi-layer architecture including: JVMTI interface specification , JDWP communication specification , JDI API layer .

JPDA 组成

Debug should be experienced by every programmer. In daily IDE-based development, we can put a breakpoint on a line of code, and then run the program in Debug mode. After the program runs, it will pause at the breakpoint. At this time, the developer You can easily view the value of each variable at this time, and you can add more breakpoints at this time.

JVMTI
The Java VM Tool Interface defines a series of debugging-related interfaces that are implemented by the VM. When we say that the program is paused, it really means that the JVM stops at the place with the breakpoint when running the code, then the JVM must provide a way of contacting others to tell it which class the breakpoint is on. The contact method here is JVMTI (JVM Tool Interface), which is a hook-like mechanism provided by the JVM. Through JVMTI, the JVM can be instructed to perform certain operations, such as stopping at a breakpoint, or while the JVM is running, When certain events occur, external interested parties are notified via hooks.
So who can communicate with the hook, it's not that anyone just has to be interested. The JVM requires that it must be a JVMTI Agent. Java has built-in local JVMTI Agent in different operating systems. In Windows systems, the JVMTI Agent is a DLL file, and in Unix-like operating systems, it is an SO file. JVMTI Agent and JVM run in the same process on the same machine.
JDWP
When you want to use JVMTI to let the JVM do something, you must first communicate with the Agent, and let it pass the message on its behalf. Therefore, the JVMTI Agent has a built-in module called "communication backend" to receive external requests.
The third party that wants to communicate with the JVMTI Agent must first communicate with the communication backend. Communication means that there must be a communication protocol. This protocol is the JDWP (Java Debug Wire Protocol) protocol.
JDI
When Java programmers often use IDEs such as eclipse and idea to debug programs, they communicate with the JVMTI Agent of the target JVM through the JDWP protocol. Considering that the implementation of the JDWP protocol is cumbersome, Java officials also implemented a basic library called JDI (Java Debug Interface) in the com.sun.jdi package. JDI implements the JDWP protocol and encapsulates the details of communication with the JVMTI Agent as a Another Java API, which is convenient for third parties to communicate with JVMTI Agent. Corresponding to the communication backend of JVMTI Agent, JDI includes a communication front-end module, which is responsible for the conversion of the JDWP protocol and the sending and receiving of messages.

three-tier model

The JPDA abstraction mechanism is designed in three layers:

The first layer: the debugger, the API of the debugger is defined by JDI
The second layer: the communication layer, the communication protocol specification is defined by JDWP
The third layer: the debugged party, JVMTI defines how to interact with the target JVM

JPDA 模块间交互

Why does the JPDA mechanism need to design three layers? There are several reasons:

The debugger may debug remotely, and the JDWP protocol is a very low-level binary protocol, which requires a lot of cost to implement. Therefore, the JDWP protocol is implemented through JDI, and the API is provided externally.
JDI is not only as simple as implementing the JDWP protocol, it also implements services such as queues, caches, connection initialization, etc., which can be used simply through the JDI API.
With JVMTI, debugging can be decoupled from a specific JVM, different types of JVMs just need to follow the JVMTI specification, and JDWP doesn't need to assume that it is talking to a certain type of JVM.

Integration options

Can we directly implement the JDWP protocol to communicate with JVMTI without using JDI? Of course we can.

Can we write local C/C++ code and JVMTI Agent directly in the JVM process without using the JDWP protocol, or implement a JVMTI Agent to communicate with the target JVM? Of course it is also possible.

It all starts from the requirements:

If we only need to implement a debugger, such as an IDE, then we can use JDI directly.
If our debugger is not written in Java, then we need to implement the JDWP protocol ourselves.
If the functions included in JDI/JDWP do not meet our needs, such as stack analysis, then we can achieve what we want directly through JVMTI.

The functions corresponding to the JVMTI specification are the most complete, while JDWP only supports some functions, and JDI only supports debugging-related functions. Functionally, it is a parent-child relationship.

communication mechanism

The debugger and the debugged JVM need to communicate in a certain way. The communication mechanism mainly includes two parts.

Connector
Communication method (Transport)

The connector refers to a connection between the debugger and the JVM being debugged. JPDA implements the connector at the JDI level.

The communication mode refers to the data exchange mode and communication message format between the debugger and the debugged JVM. JPDA defines the message specification in JDWP.

Connector

There are three types of connectors:

Listening: the debugger listens for connections from the JVM being debugged;
Attaching: The debugger connects to a debugged JVM that is already running;
Launching: The debugger directly starts the debugged JVM, and the debugger and the debugged code are actually running in the same JVM;

way of communication

There are two ways of data exchange between the debugger and the JVM being debugged:

Based on Socket network connection, it is mainly used for remote debugging, that is, the debugger and the debugged JVM are not on the same machine;

Communication based on operating system shared memory, mainly used when the debugger and the debugged JVM are on the same machine;

configure

Both the debugger and the JVM to be debugged need to set the JVM parameters to enable it to have the ability to debug or be debugged.

For JDK5 and above, the parameter format is: -agentlib:jdwp={sub-configuration item}

For versions earlier than JDK5, the parameter format is: -Xdebug and -Xrunjdwp:{sub-configuration items} .

And sub-configuration items, including:

transport: data exchange mode, optional: dt_socket and dt_shmem , representing socket network communication and shared memory communication respectively
Address: Identifies the address of a peer, the format is: {ip}:{port}
server: Identifies whether it is a debugger or a debugger. The debugger is configured as: n , and the debugger is configured as: y
suspend: Only the debugged person needs to configure this parameter. When the configuration is y , it means that the Java application will not be started until the debugger is connected; when the configuration is n , the Java application will be started directly.

The Java application here is relative to the JVM. If the JVM is regarded as a platform, the code we write is a Java application. The JVM has been started, but our application code has not yet run, this situation in the above context, we call the Java application has not started.

Configuration example:

The debugee enables remote debugging monitoring:

-agentlib:jdwp=transport=dt_socket,address=localhost:7007,server=y,suspend=y

The debugee enables local shared memory debugging monitoring:
```
-agentlib:jdwp=transport=dt_shmem,server=y,suspend=n
```

The debugger connects to the debugee remotely:

-agentlib:jdwp=transport=dt_socket,address=localhost:7007,server=n,suspend=y

The debugger connects to the debugee based on shared memory:
```
-agentlib:jdwp=transport=dt_shmem, address=<mysharedmem>
```

The debugger starts the debugee based on the shared memory method:

-agentlib:jdwp=transport=dt_shmem,server=y,onuncaught=y,launch=d:\bin\debugstub.exe

After the debugee starts monitoring based on shared memory, the shared memory address will be printed to the console. The debugger needs to configure the address of this shared memory when configuring

`JDI`

`Function`

Provides Java APIs related to debugging;
Ability to obtain the status of a running JVM, including: classes, arrays, interfaces, primitive types and the number of objects of these types;
Execution-related controls, such as pausing and resuming threads;
Set breakpoints, monitor exception occurrences, class loading, thread creation, etc.;
Provide different connector implementations, such as socket-based remote connectors and shared memory-based local connectors;

`Technology Architecture`

Provide event mechanism
Codec for JDWP protocol

`usage`

To use the JDI function, you need to rely on the tools.jar that comes with the JDK, and the JDI-related code is under the com.sun.jdi package.

A rough usage step is as follows:

Get an instance of VirtualMachine
Get a 062298c66d3a97 from VirtualMachine instance of Connector
Use VirtualMachine of EventRequestManager to listen for events we are interested in

Event mechanism code example:

EventRequestManager em=vm.eventRequestManager(); 
MethodEntryRequest meR=em.createMethodEntryRequest(); 
meR.addClassFilter("mypckg.*");
meR.enable();
EventQueue eventQ=vm.eventQueue(); 
while (running) { 
    EventSet eventSet=null; 
    eventSet=eventQ.remove(); 
    EventIterator eventIterator=eventSet.eventIterator(); 
    while (eventIterator.hasNext()) { 
        Event event=eventIterator.nextEvent(); 
        if (event instanceof MethodEntryEvent) {
        // process this event 
        } 
        vm.resume(); 
    }
}

`JDWP`

`message format`

Request message:

Response message:

`command set`

named set	Order
Virtual Machine	Version, ClassesBySignature, Suspend, Resume etc
Reference Type	Signature, ClassLoader, Fields, Methods etc
Class Type	Super Class, Set Values, Invoke Method, NewInstance
Array Type	New Instance
Interface Type
Method	Line Table, Variable Table, Byte Codes, IsObsolete etc
Field
Object Reference	Reference Type, Get Values, Set Values, Monitor Info etc
String Reference	Value
Thread Reference	Name, Suspend, Resume, Status, Thread Group, Frames etc
Thread Group Reference	Name, Parent, Childern
Etc

`JVMTI`

JVMTI was introduced from Java 5, replacing JVMDI and JVMPI. JVMDI has been removed in Java6, and Java7 will remove JVMPI;

`Interface definition`

JVMTI defines a series of interfaces for debugging that the JVM must implement, which generally include:

Get the interface of the information class, such as getting the current heap memory usage
some kind of action, such as setting a breakpoint
Notifications, such as notifying listeners when a breakpoint is hit

`Agent`

Agents can be written in any language capable of calling C or C++, such as Java.
Functions, events, data types, constant definitions, etc. are defined in the basic library jvmti.h
Agent and target JVM are running in the same process
Allows multiple agents to run in parallel, each agent is independent of each other
JDK itself already has a debugging agent, which exists in the form of JDWP.dll under windows and in the form of JDWP.so under linux

Agent running sequence diagram:

When the JVM starts, the startup function of each Agent will be called. If the Agent is started, the Agent_OnLoad callback function will be called. If the Agent is attached to the JVM halfway, the callback function is Agent_OnAttach .

When the Agent is about to be shut down, the callback function Agent_OnUnload will be called.

Let the JVM load the Agent by configuring the JVM parameters

-agentlib:{agent-lib-name}={other configuration items}. For example, if the configuration is: -agentlib:myagent , on Windows platforms, the myagent.dll file in the PATH will be searched, and on Unix-like platforms, the myagent.so file in the LD_LIBRARY_PATH will be searched.
-agentpath:{path-to-agent}={other configuration items}. This configuration method is used to configure the absolute path of the Agent, for example: -agentpath:d:\myagent\MyAgent.dll