Saturday, February 15, 2014

[infosecinstitute] Java Bytecode Reverse Engineering

Abstract
This article is especially designed to show how to crack a Java executable by disassembling the corresponding bytes code. Disassembling of Java bytecode is the act of transforming Java bytecode to Java source code. Disassembling is an inherent issue in the software industry, causing revenue loss due to software piracy. Security engineers have made an effort to resist disassembling techniques, including software watermarking, code obfuscation, in the context of Java bytecode disassembling. A huge allotment of this paper is dedicated to tactics that are commonly considered to be reverse engineering. The methods presented here, however, are intended for professional software developers and each technique is based on custom created application. We are not encouraging any kind of malicious hacking approach by presenting this article; in fact the contents of this paper help to pinpoint the vulnerability in the source code and learn the various methods developers can use in order to shield their intellectual property from reverse engineering. We shall explain the process of disassembling in terms of obtaining sensitive information from source code and cracking a Java executable without having the original source code.

Prerequisite
I presume that the aspirant would have thorough understanding of programming, debugging and compiling in JAVA on various platforms such as Linux and Windows and, of course, knowledge of JVM’s inner workings. Apart from that, the following tools are required to manipulate bytecode reverse engineering;
  • JDK Toolkit (Javac, javap)
  • Eclipse
  • JVM
  • JAD
Java Bytecode
Engineers usually construct software in a high-level language such as Java, which is comprehensible to them but which in fact, cannot be executed by the machine directly. Such a textual form of a computer program, known as source code, is converted into a form that the computer can directly execute. Java source code is compiled into an intermediate language known as Java bytecode, which is not directly executed by the CPU but rather, is executed by a Java virtual machine (JVM). Compilation is typically the act of transforming a high-level language into a low-level language such as machine code or bytecode. We do not need to understand Java bytecode, but doing so can assist debugging and can improve performance and memory convention.
The JVM is essentially a simple stack-based machine that can be separated into a couple of segments; for instance, stack, heap, registers, method area, and native method stacks. An advantage of the virtual machine architecture is portability: Any machine that implements the Java virtual machine specification is able to execute Java bytecode in a manner of “Write once, run anywhere.” Java bytecode is not strictly linked to the Java language and there are many compilers, and other tools, available that produce Java bytecode, such as the Eclipse IDE, Netbeans, and the Jasmin bytecode assembler. Another advantage of the Java virtual machine is the runtime type safety of programs. The Java virtual machine defines the required behavior of a Java virtual machine but does not specify any implementation details. Therefore the implementation of the Java virtual machine specification can be designed different ways for diverse platforms as long as it adheres to the specification.
Sample Cracked Application
The subsequent Java console application “LoginTest” is developed in order to explain Java bytecode disassembling. This application typically tests valid users by passing them through a simple login user name and password mechanism. We have got this application from other resources as an unregistered user and obviously we don’t possess the source code of this application. As a result, we do not know a valid user name and password, which are only provided to the registered user.
Without having the source code of the application or login credential sets, we still can manage to login into this mechanism, by disassembling its bytecode where we can expose sensitive information related to user login.
Disassemble Bytecode
Disassembling is the reverse approach, due to the standard and well-documented structure of bytecode, which is an act of transforming a low-level language into a high-level language. It basically generates the source code from Java bytecode. We typically run a disassembler to obtain the source code for the given bytecode, just as running a compiler yields bytecode from the source code. Disassembling is utilized to ascertain the implementation logic despite the absence of the relevant documentation and the source code, which is why vendors explicitly prohibit disassembling and reverse engineering in the license agreement. Here are some of the reasons to decompile:
  • Fixing critical bugs in the software for which no source code exists.
  • Troubleshooting a software or jar that does not have proper documentation.
  • Recovering the source code that was accidentally lost.
  • Learning the implementation of a mechanism.
  • Learning to protect your code from reverse engineering.
The process of disassembling Java bytecode is quite simple, not as complex as native c/c++ binary. The first step is to compile the Java source code file, which has the *.java extension through javac utility that produce a *.class file from the original source code in which bytecode typically resides. Finally, by using javap, which is a built-n utility of the JDK toolkit, we can disassemble the bytecode from the corresponding *.class file. The javap utility stores its output in *.bc file.
Opening a *.class file does not mean that we access the entire implementation logic of a mechanism. If we try to open the generated bytecode file through notepad or any editor after compiling the Java source code file using javac utility, we surprisingly find some bizarre or strange data in the class file which are totally incomprehensible. The following figure displays the .class files data:
So the idea of opening the class file directly isn’t at all successful, hence we shall use WinHex editor to disassemble the bytecode, which will produce the implementation logic in hexadecimal bytes, along with the strings that are manipulated in the application. Although we can reverse engineer or reveal sensitive information of a Java application using WinHex editor, this operation is sophisticated because unless we have the knowledge to match the hex byte reference to the corresponding instructions in the source code we can’t obtain much information.
Reversing Bytecode
It is relatively easy to disassemble the bytecode of a Java application, compared to other binaries. The javap in-built utility that ships with the JDK toolkit plays a significant role in disassembling Java bytecode, as well as helping to reveal sensitive information. It typically accepts a *.class file as an argument, as following:
1
Drive:> Javap LoginTest
Once this command is executed, it shows the real source code behind the class file; but remember one thing: It does display only the methods signature used in the source code, as follows:
1
2
3
4
5
6
7
Compiled from “LoginTest.java”
public class LoginTest
 {
  public LoginTest();
  public static void main(java.lang.String[]);
  static  boolean verify(java.lang.String, char[]);
}
The entire source code of the Java executable, even if it contains methods related to opcodes, would be showcased by thejavap –c switch, as following:
1
Drive:> Javap –c LoginTest
This command dumps the entire bytecode of the program in the form of a special opcode instruction. The meaning of each instruction in the context of this program will be explained in a later section of this paper. I have highlighted the important section, from which we can obtain critical information.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Compiled from "LoginTest.java"
public class LoginTest {
  public LoginTest();
    Code:
       0: aload_0
       1: invokespecial      #1                  // Method java/lang/Object."<init>":()V
       4: return
 
  public static void main(java.lang.String[]);
    Code:
       0: invokestatic       #2                  // Method java/lang/System.console:()Ljava/io/Console;
       3: astore_1
       4: getstatic             #3                      // Field java/lang/System.out:Ljava/io/PrintStream;
       7: ldc                        #4                           // String Login Verification
       9: invokevirtual    #5                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      12: getstatic           #3                      // Field java/lang/System.out:Ljava/io/PrintStream;
      15: ldc                       #6                          // String ************************
      17: invokevirtual  #5                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      20: aload_1
      21: ldc                       #7                            // String Enter username:
      23: iconst_0
      24: anewarray      #8                  // class java/lang/Object
      27: invokevirtual #9                  // Method java/io/Console.printf:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/Console;
      30: pop
      31: aload_1
      32: invokevirtual  #10                 // Method java/io/Console.readLine:()Ljava/lang/String;
      35: astore_2
      36: aload_1
      37: ldc                      #11                           // String Enter password:
      39: iconst_0
      40: anewarray      #8                  // class java/lang/Object
      43: invokevirtual  #9                  // Method java/io/Console.printf:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/Console;
      46: pop
      47: aload_1
      48: invokevirtual #12                   // Method java/io/Console.readPassword:()[C
      51: astore_3
      52: getstatic          #3                    // Field java/lang/System.out:Ljava/io/PrintStream;
      55: ldc                     #13                   // String -------------------------
      57: invokevirtual #5                    // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      60: aload_2
      61: aload_3
      62: invokestatic   #14                 // Method verify:(Ljava/lang/String;[C)Z
      65: ifeq          79
      68: getstatic          #3                   // Field java/lang/System.out:Ljava/io/PrintStream;
      71: ldc                     #15                   // String Status::Login Succesfull
      73: invokevirtual #5                    // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      76: goto          87
      79: getstatic        #3                      // Field java/lang/System.out:Ljava/io/PrintStream;
      82: ldc                   #16                     // String Status::Login Failed
      84: invokevirtual #5                    // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      87: getstatic        #3                      // Field java/lang/System.out:Ljava/io/PrintStream;
      90: ldc                   #13                     // String -------------------------
      92: invokevirtual #5                    // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      95: getstatic         #3                     // Field java/lang/System.out:Ljava/io/PrintStream;
      98: ldc                    #17                    // String !!!Thank you!!!
     100: invokevirtual #5                   // Method java/io/PrintStream.println:(Ljava/lang/String;)V
     103: return
}
From line 62, we can easily conclude that the login mechanism is implemented using a method called verify that typically checks either the user-entered username and password. If the user enters the correct password, then the "Login success" message flashes, otherwise:
1
2
3
4
5
6
7
8
62: invokestatic   #14                 // Method verify:(Ljava/lang/String;[C)Z
65: ifeq          79
68: getstatic          #3                   // Field java/lang/System.out:Ljava/io/PrintStream;
71: ldc                     #15                   // String Status::Login Succesfull
73: invokevirtual #5                    // Method java/io/PrintStream.println:(Ljava/lang/String;)V
76: goto          87
79: getstatic        #3                      // Field java/lang/System.out:Ljava/io/PrintStream;
82: ldc                   #16                     // String Status::Login Failed
But still we are unable to grab the username and password information. But, if we analyze the verify methods instruction, we can easily find that the username and password are hard-coded in the code itself, highlighted in the colored box as following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
static boolean verify(java.lang.String, char[]);
    Code:
       0: new                    #18                 // class java/lang/String
       3: dup
       4: aload_1
       5: invokespecial #19                 // Method java/lang/String."<init>":([C)V
       8: astore_2
       9: aload_0
      10: ldc                      #20                 // String ajay
      12: invokevirtual #21                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      15: ifeq          29
      18: aload_2
      19: ldc                     #22                 // String test
      21: invokevirtual #21                 // Method java/lang/String.equals:(Ljava/lang/Object;)Z
      24: ifeq          29
      27: iconst_1
      28: ireturn
      29: iconst_0
      30: ireturn
}
We finally come to the conclusion that this program accepts ajay as the username and test as the password, which is mentioned in the ldc instruction.
Now launch the application once again and enter the aforesaid credentials. Bingo!!!! We have successfully subverted the login authentication mechanism without even having the source code:
Bytecode Instruction Specification
Like Assembly programming, Java machine code representation is done via bytecode opcodes, which are the forms of instruction that the JVM executes on any platform. Java bytecodes typically offer 256 diverse mnemonic and each is one byte in length. Java bytecodes instructions fall into these major categories:
  • Load and store
  • Method invocation and return
  • Control transfer
  • Arithmetical operation
  • Type conversion
  • Object manipulation
  • Operand stack management
We shall only discuss the opcode instructions that are used in the previous Java binary. The following table illustrates the usage meanings as well as the corresponding hex value:
Java OpcodesMeaningHex value
AloadLoad a reference onto the stack from a local variable
19
Aload_0Load a reference onto the stack from local variable 0
2a
Aload_1Load a reference onto the stack from local variable 1
2b
Aload_2Load a reference onto the stack from local variable 2
2c
anewarrayCreate a new array of references of length count and component type identified by the class reference index in the constant pool.
bd
AstoreStore a reference into a local variable
3a
astore_0Store a reference into local variable 0
4b
astore_1Store a reference into local variable 1
4c
astore_2Store a reference into local variable 2
4d
dupDuplicate the value on top of the stack
59
getstaticGet a static field value of a class, where the field is identified by field reference in the constant pool index
B2
gotoGoes to another instruction at branch offset
A7
invokespecialInvoke instance method on object objectref, where the method is identified by method reference index in constant pool
B7
invokestaticInvoke a static method, where the method is identified by method reference index in constant pool
B8
invokevirtualInvoke virtual method on object objectref, where the method is identified by method reference index in constant pool
B6
ifeqIf value is 0, branch to instruction atbranchoffset
99
Iconst_0Load the int value 0 onto the stack
03
Iconst_1Load the int value 1 onto the stack
04
ireturnReturn an integer from a method
ac
ldcPush a constant index from a constant pool
12
popDiscard the top value on the stack
57
returnReturn void from method
B1
In Brief
This paper illustrates the mechanism of disassembling Java bytecode in order to reveal sensitive information when you do not have the source of the Java binary. We have come to an understanding of how to implement such reverse engineering tactics by using JDK utilities. This article also unfolds the importance of bytecode disassembling and JVM internal workings in the context of reverse bytecode and it also explains the meaning of essential bytecode opcode in detail. Finally, we have seen how to subvert login authentication on a live Java console application by applying disassembly tactics. In the forthcoming paper, we shall explain how to patch Java bytecode in the context of revere engineering.
Reference
http://resources.infosecinstitute.com/demystifying-java-internals-introduction/

No comments:

Post a Comment