头图

Comparison of Dalvik and Java bytecode

The following blog post describes the main similarities and differences between Dalvik and Java bytecode. This is especially important to understand the difference between Dalvik and Java, so that you can understand the characteristics and malicious behavior of Android applications.

Android applications are usually written in the Java language and executed in the Dalvik Virtual Machine (DVM), which is different from the classic Java Virtual Machine (JVM). DVM is developed by Google and optimized for the characteristics of mobile operating systems (especially the Android platform). The bytecode running in Dalvik is converted dx translate Java .class files. Unlike DVM, JVM uses pure Java class files. If you want to reverse engineer an Android application, you need to understand the Dalvik bytecode format, and you need in-depth knowledge of static and dynamic detection. Authors such as William Enke summarized the differences between JVM and DVM bytecode "Android Application Security Research"

  1. Android application architecture

The JVM bytecode consists of one or more .class files (each file contains a Java class). At runtime, the JVM will dynamically load the bytecode of each class from the corresponding .class file. The Dalvik bytecode consists of only one .dex file, which contains all classes of the application. The following figure shows the generation process of the .dex file. After the Java compiler creates the JVM bytecode, the Dalvik dx compiler deletes all .class files and recompiles them into Dalvik bytecode. Then dx merges them into one .dex file. This process includes the translation, reconstruction, and interpretation of the basic elements of the application (constant pool, class definition, and data segment). constant pool describes all constants, including references, method names, and numeric constants. class definition includes access flags, class names, and so on. data segment includes all function codes executed by the target VM, as well as related information about classes and functions (such as the number of registers used by the DVM, the list of local variables, and the size of the operand stack) and instance variables.

image.png

  1. Register structure

DVM is register-based, while JVM is stack-based . In JVM bytecode, local variables will be listed in the local variable list, and then pushed onto the stack for opcode operations. In addition, the JVM can also work directly on the stack without explicitly storing local variables in the variable list. In Dalvik bytecode, local variables will be assigned to any of the 16 available registers (original 2 16 registers, suspected to be wrong). Dalvik opcodes do not access elements in the stack. Instead, they operate directly on the registers.

  1. Instruction Set

Dalvik has 218 opcodes, which are fundamentally different from the 200 opcodes in Java. For example, there are more than a dozen opcodes used to transfer data between the stack and the list of local variables, but there are none in Dalvik. The instructions in Dalvik are longer than those in Java because most of them contain the source and destination addresses of the registers. For a comprehensive overview of Dalvik opcodes, see Gabor Paller and Android developers’ blog posts .

  1. Constant pool structure

The JVM bytecode needs to loop through the constant pool of all the constants from all the .class files, such as the name of the referenced function. By providing a constant pool for all class references in Dalvik, the dx compiler eliminates iteration. In addition, dx removes some constants by using inline technology. Therefore, during dx compilation, integers, long integers, and single and double floating point constants disappear.

  1. Ambiguous primitive type

In the JVM, the opcodes of integers and single floating-point constants are different, as are long integers and double floating-point constants. The corresponding Dalvik implements the same opcodes for integer and floating-point constants.

  1. Null reference

Dalvik bytecode has no specific Null type. In contrast, Dalvik uses 0 value constants. Therefore, the ambiguous meaning of the constant 0 should be correctly distinguished.

  1. Object reference

JVM bytecode uses different opcodes for object reference comparison and null type comparison, and Dalvik simplifies them into one opcode. Therefore, the type information of the comparison object must be restored during the decompilation process.

  1. Storage of primitive type arrays

Dalvik uses uncertain opcodes to operate on arrays, while JVM uses defined opcodes. The array type information must be restored to be able to convert correctly.

[链接]

12.6k 声望
4.7k 粉丝
0 条评论
推荐阅读
从原生 JavaScript 到 React
React 是一个用于构建用户界面的 JavaScript 框架。它可用于通过动态操作页面内容来创建 JavaScript 应用程序。浏览器已经提供了在页面中创建元素的 API,即 DOM,所以新手可能想知道 React 带来了什么以及它与 D...

Yujiaao7阅读 3.4k

Java 领域概念:JDK、JRE、JVM
JVM 全称 Java Virtual Mechinal,即 Java 虚拟机。JVM 是驻留于内存中的抽象计算机,可以理解为一个虚拟的计算机,它是通过在真实计算机上仿真模拟各种计算机功能来实现的,具有处理器、堆栈、寄存器等完善的硬...

千猫阅读 2.2k

jvm垃圾回收机制
主要分为Eden、From、To三个区域,其中,默认内存占用比例为8:1:1存活对象进入年轻代的条件:新产生的对象优先分配到老年代(除大对象,大对象会优先分配到老年代)

Smile3k阅读 824

面试八股文(五)--类的加载和双亲委派机制
答:java时分两步的,编译和运行,类的加载指的是将编译生成的类的class文件读入内存,并为之创建一个java.lang.Class对象。类的加载过程是由类加载器来完成,而类加载器由JVM提供。

原来是小袁呐阅读 762

常量池、常量、静态变量
虚拟机把描述类的class文件加载到内存,并对数据进行校验、转换解析和初始化,最终形成可以被虚拟机直接使用的数据类型,这就是虚拟机的类加载机制。

KerryWu阅读 732

大厂敲门砖,Github霸榜的顶级并发编程宝典被我搞到手了!
并发编程的目的是为了提高程序的执行速度,但是并不意味着启动更多的线程会达到更好的并发效果,并发编程还会引起死锁 , 上下文频繁切换 , 线程不安全等问题。并发编程作为Java程序员最重要的技能之一,也是最难...

Java架构师阅读 621

【Java面试指北】反射(1) 初识反射
如果你被问到:什么是反射?为什么需要反射、以及反射的应用?你会如何回答呢?本篇会带大家初识反射,了解反射概念和基本应用。反射的原理以及深入源码的探究将会在后面几篇介绍。

大数据王小皮阅读 573

[链接]

12.6k 声望
4.7k 粉丝
宣传栏