Chapter 37

The Java Virtual Machine


CONTENTS


This chapter probes into the Java virtual machine (JVM) phenomenon. You will learn about the structures of a .class file, look at the virtual machine architecture, and be given a reference for the JVM instruction set. After you have completed this chapter, you will be able to diagram the internal structure of a .class file and will understand the machine architecture of the JVM.

Overview

When Java was created, the goal was to create a machine-independent programming language that then could be compiled into a portable binary format. In theory, that is exactly what was achieved. Java code is portable to any system that has a Java interpreter. However, Java is not at all machine independent. Rather, Java is machine specific to the Java virtual machine.

The JVM concept allows a layer of translation between the executable program and the machine-specific code. In a non-Java compiler, the source code is compiled into machine- specific assembly code. In doing this, the executable limits itself to the confines of that machine architecture. Compiling Java code creates an executable using JVM assembly directives. The difference of the two approaches is quite fundamental to the portability of the executable. Non-Java executables communicate directly with the platform's instruction set. Java executables communicate with the JVM instruction set, which is then translated into platform-specific instructions.

Structure of .class Files

Every machine has a certain form for its executable file. Java is no exception. The Java compiler creates its executable files in the form of .class.

.class files are composed of 8-bit values (bytes) that can be read in pairs of 16-bit values, or in 4-byte groups to create 32-bit values. The bytes are arranged in big-endian order, where the first byte contains the highest order bits of the 32-bit value and the last byte contains the lowest-order bits of the 32-bit value.

A .class file itself is broken into 15 separate regions:

The regions are not padded or aligned with one another. Each region can be of either fixed or variable size. Regions that contain variable amounts of information are preceded by a field specifying the size of the variable region. The following sections provide more information about these regions.

Magic

The magic region must contain a value of 0xCAFEBABE.

Version

version holds the version number of the compiler that created the .class file. This is used to specify incompatible changes to either the format of the .class file or to the bytecodes.

Constant_pool

constant_pool_count specifies the size of the next region. As noted previously, there is no alignment or padding. Instead, size fields are used to denote the extents of different variable regions. These fields are 2 bytes in length.

constant_pool contains an array of constant_pool_count - 1 items that store string constants, class names, field names, and all constants referenced in the body of the code.

The first byte in every entry of constant_pool contains a type that specifies the content of the entry.

Table 37.1 identifies the items that are contained in the constant pool.

Table 37.1. Constant types.
Constant Type
Value
CONSTANT_Asciiz
1
CONSTANT_Unicode
2
CONSTANT_Integer
3
CONSTANT_Float
4
CONSTANT_Long
5
CONSTANT_Double
6
CONSTANT_Class
7
CONSTANT_String
8
CONSTANT_Fieldref
9
CONSTANT_Methodref
10
CONSTANT_InterfaceMethodref
11
CONSTANT_NamedType
12

CONSTANT_Asciiz and CONSTANT_Unicode are represented by a 1-byte reference tag, a 2-byte length specifier, and an array of bytes that is of the specified length.

CONSTANT_Integer and CONSTANT_Float contain a 1-byte tag and a 4-byte value.

CONSTANT_Long and CONSTANT_Double are used to store 8-byte values. The structure begins with a 1-byte tag and includes a 4-byte value containing the high bytes, and a 4-byte value containing the low bytes.

CONSTANT_Class holds a 1-byte tag as well as a 2-byte index into the constant_pool that contains the string name of the class.

CONSTANT_String represents an object of type String. The structure contains two fields, a 1-byte tag, and a 2-byte index into constant_pool, which holds the actual string value encoded using a modified UTF scheme. constant_pool stores only 8-bit values, with the capability of combining them to form 8- and 16-bit characters.

CONSTANT_Fieldref, CONSTANT_Methodref, and CONSTANT_InterfaceMethodref represent their data with a 1-byte tag and two 1-byte indexes into constant_pool. The first index references the class; the second references the name and type.

CONSTANT_NameAndType contains information about constants not associated with a class. The first byte is the tag, followed by two 2-byte indexes into constant_pool specifying the type and signature of the constant.

Access_flags

The access_flags section is a 2-byte field that specifies 16 different values describing various properties of fields, classes, and methods. Table 37.2 lists the values of the access flags.

Table 37.2. Access flags.
Access Flag
Value
Acc_PUBLIC
0x0001
Acc_PRIVATE
0x0002
Acc_PROTECTED
0x0004
Acc_STATIC
0x0008
Acc_FINAL
0x0010
Acc_SYNchRONIZED
0x0020
Acc_THREADSAFE
0x0040
Acc_TRANSIENT
0x0080
Acc_NATIVE
0x0100
Acc_INTERFACE
0x0200
Acc_ABSTRACT
0x0400

This_class

this_class is a 2-byte index into constant_pool specifying the information about the current class.

Interfaces

interfaces_count is a 2-byte value denoting the size of the interfaces array.

The interfaces array contains indexes into the constant_pool specifying the interfaces that the current class implements.

Fields

fields_count is a 2-byte value denoting the size of the fields array.

The fields array contains complete information about the fields of a class. This array contains, for each element, a 2-byte value of access_flags, two 2-byte indexes into constant_pool, a 2-byte attribute_count, and an array of attributes.

The first index, name_index, holds the name of the field. The second, signature_index, holds the signature of the field. The last field stores any needed attributes about the field. Currently, the number of attributes supported is one of type ConstantValue, indicating that the field is a static constant value.

Methods

methods_count supplies the number of methods stored in the methods array. This number only includes the methods declared in the current class.

The methods field contains an array of elements containing complete information about the method. The information is stored with a 2-byte access_flags value, a 2-byte name_index referencing the name of the method in the constant_pool, a 2-byte signature_index referencing signature information found in the constant_pool, a 2-byte attributes_count containing the number of elements in the attributes array, and an attributes array.

Currently, the only value that can be found in the attributes array is the Code structure, which provides the information needed to properly execute the specified method. To facilitate this, the Code structure provides the following information.

Contained in the first 2 bytes is attribute_name_index, which provides an index into the constant_pool identifying the attribute as a Code structure.

The next 2 bytes, named attribute_length, provide the length of the Code structure, not including attribute_name_index.

Actual Code-specific information begins with the next three 4-byte fields, followed by the method's operation code (opcode). max_stack contains the maximum number of entries on the operand stack during the methods execution. max_locals specifies the total number of local variables for the method. code_length is the total length of the next field, the code field containing opcode.

After the code field, the Code structure provides detailed exception information for the method. This starts with exception_table_length and exception_table, which describe each exception handler in the method code. start_pc, end_pc, and handler_pc give the starting and ending positions in which the event handler, pointed to by handler_pc, is active. catch_type, which follows handler_pc, denotes the type of exception handled.

The remainder of the Code structure is devoted to information that is used for debugging purposes.

line_number is the 2-byte line number of the method's first line of code.

LocalVariableTable_attribute contains a structure used by the debugger to determine the value of local variables. The structure consists of three 1-byte values and a local_variable_table structure.

The first two fields of the structure, attribute_name_index and attribute_length, are used to describe the structure. The third contains the length of the local_variable_table.

local_variable_table contains the following five 2-byte fields, in order: start_pc, length, name_index, signature_index, and slot.

start_pc and length denote the offset where the variable value can be found.

name_index and signature_index are indexes into constant_pool, where the variable's name and signature can be found.

slot denotes the position in the local method frame where the variable can be found.

Attributes

attributes_count is the size of the attributes array containing attribute structure. Currently, the only attribute structure is the SourceFile structure.

The SourceFile structure consists of three 2-byte values. attribute_name_index indexes into constant_pool to the entry containing the string SourceFile. attribute_length must contain a value of 2. sourcefile_index indexes into the constant_pool to the entry containing the source filename.

Virtual Machine Architecture

The Java virtual machine's architecture revolves around the concept of non-machine-specific implementation. It assumes no specific platform architecture, but it does require certain facilities:

Whether these facilities exist in hardware or software makes no difference to the JVM. As long as they exist, the JVM can function correctly.

JVM Registers

The registers serve the same purpose as normal microprocessors' register devices, the main difference being the functions provided by each register. JVM is a stack-based machine, meaning it does not define registers for the passing of variables and instructions. This was a conscious decision when designing the JVM, and the result is a model requiring fewer registers. These registers are as follows:

JVM Stack

The Java stack is a 32-bit model used to supply the JVM with needed operation data as well as store return values. Like normal programming languages, the stack is broken into separate stack frames, containing information about the method associated with the frame. The Java stack frame comprises three separate regions:

Garbage-Collected Heap

All objects are allocated from the garbage-collection heap. The heap is also responsible for performing garbage collection, due primarily to the fact that Java does not allow the programmer to deallocate space. The JVM does not assume any method of garbage collection.

Method Area

The method area contains the binary method retrieved from the methods section of the class file. This includes the method's code as well as all symbol information.

Instruction Set

The instruction set is the set of operation codes that are executed by the JVM. When Java source code is compiled, the compiler converts the Java source code into the language of the JVM, the instruction set.

The JVM instruction set is currently comprised of more than 160 instructions held in an 8-bit field. The JVM will pop operands off the stack and push the result back onto the stack for some operations. If the operands are greater than 8 bits, the JVM uses the big-endian encoding scheme to pack the value into its 8-bit instruction alignment.

Because the JVM instruction set is 160 operations, the following sections break them down into categories for quicker reference.

Pushing Constants onto the Stack

The instructions introduced in this section are used to push constants onto the stack. In all these instructions, if the value pushed onto the stack is less than 32 bits, the value is expanded into a 32-bit form to fit properly onto the stack:

Pushing Local Variables onto the Stack

In a stack-based computer, multiple registers are replaced by a stack register from which operands are popped off as needed and results are pushed on as generated. The following instructions store a method's local variables onto the stack for later use:

Storing Stack Values into Local Variables

As described earlier, each method frame has a local variable region. When the method comes to the top of the stack, the base offset of the local variable gets placed into the vars register. These instructions provide methods for storing information into the local variables of the current stack frame:

Managing Arrays

The garbage-collection heap is responsible for the allocation and deallocation of referenced data. The following instructions allocate, deallocate, and store data to the garbage-collection heap:

Table 37.3. Variable types specified by the type parameter.
Variable Type
Value
T_ARRAY
0x0001
T_BOOLEAN
0x0004
T_chAR
0x0005
T_FLOAT
0x0006
T_DOUBLE
0x0007
T_BYTE
0x0008
T_SHORT
0x0009
T_INT
0x000A
T_LONG
0x000B

Stack Instructions

With the existence of any stack, there must be some fundamental operations to operate the stack. The following instructions do just that:

Arithmetic Instructions

All computers need to function as a calculator at some point. The capability to do fundamental computations is inherent to all computing devices, and the JVM is no exception. The following instructions provide the JVM with arithmetic operations:

Logical Instructions

The following instructions implement logical operations:

Conversion Operations

The following instructions provide the capability to convert data types:

Control Transfer Instructions

Conditional statements allow the computer to execute boolean logic. In doing so, they give the computer the capability to make simple decisions based on a true-or-false comparison. The following instructions support conditional decisions and alter program flow of control:

Function Return Instructions

The following instructions are used to return a value from a function call:

After the value has been returned, the JVM begins execution of the line following the function call. The value returned is then the top element(s) of the stack.

Table Jumping Instructions

The jump table stores the offset information when the program execution jumps to a non-sequential location. This information allows the program to resume execution at the next logical offset. The program jump is achieved by adding the new opcode offset to the current pc value. The following instructions provide the capability to jump to locations in the table:

Manipulating Object Fields

The following instructions provide the capability to access and modify members of an object:

Method Invocation

The following instructions provide the capability to execute a method of an object:

Exception Handling

The athrow instruction implements Java exception handling capabilities:

Object Utility Operations

The following instructions provide some object operations that don't fall into any other category:

Monitors

Due to the multithreaded nature of the JVM, there is a great need for a mechanism to access shared memory resources. The following instructions provide the capability to lock and unlock a memory object:

The breakpoint Instruction

The breakpoint instruction calls the breakpoint handler to notify the debugger of a breakpoint.

Summary

This chapter diagrams the internals of a .class file, discusses the JVM architecture, and provides insight into the JVM instruction set.