The Java^TM Virtual Machine Specification

CHAPTER 3

Structure of the Java Virtual Machine

This book specifies an abstract machine. It does not document any particular implementation of the Java Virtual Machine, including Sun's.

To implement the Java Virtual Machine correctly, you need only be able to read the Java class file format and correctly perform the operations specified therein. Implementation details that are not part of the Java Virtual Machine's specification would unnecessarily constrain the creativity of implementors, and will only be provided to make the exposition clearer. For example, the memory layout of runtime data areas, the garbage-collection algorithm used, and any optimizations of the bytecodes (for example, translating them into machine code) are left to the discretion of the implementor.

3.1 Data Types

Like the Java language, the Java Virtual Machine operates on two kinds of types: primitive types and reference types. There are, correspondingly, two kinds of values that can be stored in variables, passed as arguments, returned by methods, and operated upon: primitive values and reference values.

The Java Virtual Machine expects that nearly all type checking is done at compile time, not by the Java Virtual Machine itself. In particular, data need not be tagged or otherwise be inspectable to determine types. Instead, the instruction set of the Java Virtual Machine distinguishes its operand types using instructions intended to operate on values of specific types. For instance, iadd, ladd, fadd, and dadd are all Java Virtual Machine instructions that add two numeric values, but they require operands whose types are int, long, float, and double, respectively. For a summary of type support in the Java Virtual Machine's instruction set, see §3.11.1.

The Java Virtual Machine contains explicit support for objects. An object is either a dynamically allocated class instance or an array. A reference to an object is considered to have Java Virtual Machine type reference. Values of type reference can be thought of as pointers to objects. More than one reference may exist to an object. Although the Java Virtual Machine performs operations on objects, it never addresses them directly. Objects are always operated on, passed, and tested via values of type reference.

3.2 Primitive Types and Values

The primitive data types supported by the Java Virtual Machine are the numeric types and the returnAddress type. The numeric types consist of the integral types:

byte, whose values are 8-bit signed two's-complement integers
short, whose values are 16-bit signed two's-complement integers
int, whose values are 32-bit signed two's-complement integers
long, whose values are 64-bit signed two's-complement integers
char, whose values are 16-bit unsigned integers representing Unicode version 1.1.5 characters (§2.1)

and the floating-point types:

float, whose values are 32-bit IEEE 754 floating-point numbers
double, whose values are 64-bit IEEE 754 floating-point numbers

The values of the returnAddress type are pointers to the opcodes of Java Virtual Machine instructions. Only the returnAddress type is not a Java language type.

3.2.1 Integral Types and Values

The values of the integral types of the Java Virtual Machine are the same as those for the integral types of the Java language (§2.4.1):

For byte, from -128 to 127 (-27 to 27-1), inclusive
For short, from -32768 to 32767 (-215 to 215-1), inclusive
For int, from -2147483648 to 2147483647 (-231 to 231-1), inclusive
For long, from -9223372036854775808 to 9223372036854775807 (-263 to 263-1), inclusive
For char, from 'u0000' to 'uffff'; char is unsigned, so 'uffff' represents 65535 when used in expressions, not -1

3.2.2 Floating-Point Types and Values

The values of the floating-point types of the Java Virtual Machine are the same as those for the floating-point types of the Java language (§2.4.1). The floating-point types float and double represent single-precision 32-bit and double-precision 64- bit format IEEE 754 values as specified in IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std. 754-1985 (IEEE, New York).

The IEEE 754 standard includes not only positive and negative sign-magnitude numbers, but also positive and negative zeroes, positive and negative infinities, and a special Not-a-Number (hereafter abbreviated NaN) value that is used to represent the result of certain operations such as dividing zero by zero. Such values exist for both float and double types.

The finite nonzero values of type float are of the form s xfa m xfa 2e, where s is +1 or -1, m is a positive integer less than 224, and e is an integer between -149 and 104, inclusive. The largest positive finite floating-point literal of type float is 3.40282347e+38F. The smallest positive nonzero floating-point literal of type float is 1.40239846e-45F.

The finite nonzero values of type double are of the form s xfa m xfa 2e, where s is +1 or -1, m is a positive integer less than 253, and e is an integer between -1075 and 970, inclusive. The largest positive finite floating-point literal of type double is 1.79769313486231570e+308. The smallest positive nonzero floating-point literal of type double is 4.94065645841246544e-324.

Floating-point positive zero and floating-point negative zero compare as equal, but there are other operations that can distinguish them; for example, dividing 1.0 by 0.0 produces positive infinity, but dividing 1.0 by -0.0 produces negative infinity.

Except for NaN, floating-point values are ordered. When arranged from smallest to largest, they are negative infinity, negative finite values, negative zero, positive zero, positive finite values, and positive infinity.

NaN is unordered, so numerical comparisons have the value false if either or both of their operands are NaN. A test for numerical equality has the value false if either operand is NaN, and a test for numerical inequality has the value true if either operand is NaN. In particular, a test for numerical equality of a value against itself has the value false if and only if the value is NaN.

IEEE 754 defines a large number of distinct NaN values but fails to specify which NaN values are produced in various situations. To avoid portability problems, the Java Virtual Machine coalesces these NaN values together into a single conceptual NaN value.

3.2.3 The `returnAddress` Type and Values

The returnAddress type is used by the Java Virtual Machine's jsr, ret, and jsr_w instructions. The values of the returnAddress type are pointers to the opcodes of Java Virtual Machine instructions. Unlike the numeric primitive types, the returnAddress type does not correspond to any Java data type.

3.2.4 There Is No `boolean` Type

Although Java defines a boolean type, the Java Virtual Machine does not have instructions dedicated to operations on boolean values. Instead, a Java expression that operates on boolean values is compiled to use the int data type to represent boolean variables.

Although the Java Virtual Machine has support for the creation of arrays of type boolean (see the description of the newarray instruction), it does not have dedicated support for accessing and modifying elements of boolean arrays. Arrays of type boolean are accessed and modified using the byte array instructions.¹

For more information on the treatment of boolean values in the Java Virtual Machine, see Chapter 7, "Compiling for the Java Virtual Machine."

3.3 Reference Types and Values

There are three kinds of reference types: class types, interface types, and array types, whose values are references to dynamically created class instances, arrays, or class instances or arrays that implement interfaces. A reference value may also be the special null reference, a reference to no object, which will be denoted here by null. The null reference initially has no runtime type, but may be cast to any type (§2.4).

3.4 Words

No mention has been made of the storage requirements for values of the various Java Virtual Machine types, only the ranges those values may take. The Java Virtual Machine does not mandate the size of its data types. Instead, the Java Virtual Machine defines an abstract notion of a word that has a platform-specific size. A word is large enough to hold a value of type byte, char, short, int, float, reference, or returnAddress, or to hold a native pointer. Two words are large enough to hold values of the larger types, long and double. Java's runtime data areas are all defined in terms of these abstract words.

A word is usually the size of a pointer on the host platform. On a 32-bit platform, a word is 32 bits, pointers are 32 bits, and longs and doubles naturally take up two words. A naive 64-bit implementation of the Java Virtual Machine may waste half of a word used to store a 32-bit datum, but may also be able to store all of a long or a double in one of the two words allotted to it.

The choice of a specific word size, although platform-specific, is made at the implementation level, not as part of the Java Virtual Machine's design. It is not visible outside the implementation or to code compiled for the Java Virtual Machine.

Throughout this book, all references to a word datum are to this abstract notion of a word.

3.5 Runtime Data Areas

3.5.1 The `pc` Register

A Java Virtual Machine can support many threads of execution at once (§2.17). Each Java Virtual Machine thread has its own pc (program counter) register. At any point, each Java Virtual Machine thread is executing the code of a single method, the current method (§3.6) for that thread. If that method is not native, the pc register contains the address of the Java Virtual Machine instruction currently being executed. If the method currently being executed by the thread is native, the value of the Java Virtual Machine's pc register is undefined. The Java Virtual Machine's pc register is one word wide, the width guaranteed to hold a returnAddress or a native pointer on the specific platform.

3.5.2 Java Stack

Each Java Virtual Machine thread (§2.17) has a private Java stack, created at the same time as the thread. A Java stack stores Java Virtual Machine frames (§3.6). The Java stack is equivalent to the stack of a conventional language such as C: it holds local variables and partial results, and plays a part in method invocation and return. Because the stack is never manipulated directly except to push and pop frames, it may actually be implemented as a heap, and Java frames may be heap allocated. The memory for a Java stack does not need to be contiguous.

The Java Virtual Machine specification permits Java stacks to be of either a fixed or a dynamically varying size. If the Java stacks are of a fixed size, the size of each Java stack may be chosen independently when that stack is created. A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of Java stacks, as well as, in the case of dynamically expanding or contracting Java stacks, control over the maximum and minimum Java stack sizes.

The following exceptional conditions are associated with Java stacks:

If the computation in a thread requires a larger Java stack than is permitted, the Java Virtual Machine throws a StackOverflowError.
If Java stacks can be dynamically expanded, and Java stack expansion is attempted but insufficient memory can be made available to effect the expansion, or if insufficient memory can be made available to create the initial Java stack for a new thread, the Java Virtual Machine throws an OutOfMemory-Error.

In Sun's JDK 1.0.2 implementation of the Java Virtual Machine, the Java stacks are discontiguous and are independently expanded as required by the computation. The Java stacks do not contract, but are reclaimed when their associated thread terminates or is killed. Expansion is subject to a size limit for any one Java stack. The Java stack size limit may be set on virtual machine start-up using the "-oss" flag. The Java stack size limit can be used to limit memory consumption or to catch runaway recursions.

3.5.3 Heap

The Java Virtual Machine has a heap that is shared among all threads (§2.17). The heap is the runtime data area from which memory for all class instances and arrays is allocated.

The Java heap is created on virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (typically a garbage collector); objects are never explicitly deallocated. The Java Virtual Machine assumes no particular type of automatic storage management system, and the storage management technique may be chosen according to the implementor's system requirements. The Java heap may be of a fixed size, or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the Java heap does not need to be contiguous.

A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the heap, as well as, if the heap can be dynamically expanded or contracted, control over the maximum and minimum heap size.

The following exceptional condition is associated with the Java heap:

If a computation requires more Java heap than can be made available by the automatic storage management system, the Java Virtual Machine throws an OutOfMemoryError.

Sun's JDK 1.0.2 implementation of the Java Virtual Machine dynamically expands its Java heap as required by the computation, but never contracts its heap. Its initial and maximum sizes may be specified on virtual machine start-up using the "-ms" and "-mx" flags, respectively.

3.5.4 Method Area

The Java Virtual Machine has a method area that is shared among all threads (§2.17). The method area is analogous to the storage area for compiled code of a conventional language, or to the "text" segment in a UNIX process. It stores per- class structures such as the constant pool, field and method data, and the code for methods and constructors, including the special methods (§3.8) used in class and instance initialization and interface type initialization.

The method area is created on virtual machine start-up. Although the method area is logically part of the garbage-collected heap, simple implementations may choose to neither garbage collect nor compact it. This version of the Java Virtual Machine specification does not mandate the location of the method area or the policies used to manage compiled code. The method area may be of a fixed size, or may be expanded as required by the computation and may be contracted if a larger method area becomes unnecessary. The memory for the method area does not need to be contiguous.

A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the method area, as well as, in the case of a varying-size method area, control over the maximum and minimum method area size.

The following exceptional condition is associated with the method area:

If memory in the method area cannot be made available to satisfy an allocation request, the Java Virtual Machine throws an OutOfMemoryError.

Sun's JDK 1.0.2 implementation of the Java Virtual Machine dynamically expands its method are as required by the computation, but never contracts. No user control over the maximum or minimum size of the method area is provided.

3.5.5 Constant Pool

A constant pool is a per-class or per-interface runtime representation of the constant_pool table in a Java class file (§4.4). It contains several kinds of constants, ranging from numeric literals known at compile time to method and field references that must be resolved at run time. The constant pool serves a function similar to that of a symbol table for a conventional programming language, although it contains a wider range of data than a typical symbol table.

Each constant pool is allocated from the Java Virtual Machine's method area (§3.5.4). The constant pool for a class or interface is created when a Java class file for the class or interface is successfully loaded (§2.16.2) by a Java Virtual Machine.

The following exceptional condition is associated with the creation of the constant pool for a class or interface:

When loading a class file, if the creation of the constant pool requires more memory than can be made available in the method area of the Java Virtual Machine, the Java Virtual Machine throws an OutOfMemoryError.

Constant pool resolution, a runtime operation performed on entries in the constant pool, has its own set of associated exceptions. See Chapter 5 for information about the runtime management of the constant pool.

3.5.6 Native Method Stacks

An implementation of the Java Virtual Machine may use conventional stacks, colloquially called "C stacks," to support native methods, methods written in languages other than Java. A native method stack may also be used to implement an emulator for the Java Virtual Machine's instruction set in a language such as C. Implementations that do not support native methods, and that do not themselves rely on conventional stacks, need not supply native method stacks. If supplied, native method stacks are typically allocated on a per thread basis when each thread is created.

The Java Virtual Machine specification permits native method stacks to be of either a fixed or a dynamically varying size. If the native method stacks are of a fixed size, the size of each native method stack may be chosen independently when that stack is created. In any case, a Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the native method stacks. In the case of varying-size native method stacks, it may also make available control over the maximum and minimum method stack sizes.

The following exceptional conditions are associated with Java stacks:

If the computation in a thread requires a larger native method stack than is permitted, the Java Virtual Machine throws a StackOverflowError.
If native method stacks can be dynamically expanded, and native method stack expansion is attempted but insufficient memory can be made available, or if insufficient memory can be made available to create the initial native method stack for a new thread, the Java Virtual Machine throws an OutOfMemoryError.

Sun's JDK 1.0.2 implementation of the Java Virtual Machine allocates fixed-size native method stacks of a single size. The size of its native method stacks may be set on virtual machine start-up using the "-ss" flag. The native method stack size limit can be used to limit memory consumption or to catch runaway recursions in native methods.

Sun's implementation does not currently check for native method stack overflow.

3.6 Frames

A Java Virtual Machine frame is used to store data and partial results, as well as to perform dynamic linking, to return values for methods, and to dispatch exceptions.

A new frame is created each time a Java method is invoked. A frame is destroyed when its method completes, whether that completion is normal or abnormal (by throwing an exception). Frames are allocated from the Java stack (§3.5.2) of the thread creating the frame. Each frame has its own set of local variables (§3.6.1) and its own operand stack (§3.6.2). The memory space for these structures can be allocated simultaneously, since the sizes of the local variable area and operand stack are known at compile time and the size of the frame data structure depends only upon the implementation of the Java Virtual Machine.

Only one frame, the frame for the executing method, is active at any point in a given thread of control. This frame is referred to as the current frame, and its method is known as the current method. The class in which the current method is defined is the current class. Operations on local variables and the operand stack always are with reference to the current frame.

A frame ceases to be current if its method invokes another method or if its method completes. When a method is invoked, a new frame is created and becomes current when control transfers to the new method. On method return, the current frame passes back the result of its method invocation, if any, to the previous frame. The current frame is then discarded as the previous frame becomes the current one. Java Virtual Machine frames may be naturally thought of as being allocated on a stack, with one stack per Java thread (§2.17), but they may also be heap allocated.

Note that a frame created by a thread is local to that thread and cannot be directly referenced by any other thread.

3.6.1 Local Variables

On each Java method invocation, the Java Virtual Machine allocates a Java frame (§3.6), which contains an array of words known as its local variables. Local variables are addressed as word offsets from the base of that array.

Local variables are always one word wide. Two local variables are reserved for each long or double value. These two local variables are addressed by the index of the first of the variables.

For example, a local variable with index n and containing a value of type double actually occupies the two words at local variable indices n and n+1. The Java Virtual Machine does not require n to be even. (In intuitive implementation terms, 64-bit values need not be 64-bit aligned in the local variables array.) Implementors are free to decide the appropriate way to divide a 64-bit data value between two local variables.

3.6.2 Operand Stacks

On each Java method invocation, the Java Virtual Machine allocates a Java frame (§3.6), which contains an operand stack. Most Java Virtual Machine instructions take values from the operand stack of the current frame, operate on them, and return results to that same operand stack. The operand stack is also used to pass arguments to methods and receive method results.

For example, the iadd instruction adds two int values together. It requires that the int values to be added be the top two words of the operand stack, pushed there by previous instructions. Both of the int values are popped from the operand stack. They are added, and their sum is pushed back onto the stack. Subcomputations may be nested on the operand stack, resulting in values that can be used by the encompassing computation.

Each entry on the operand stack is one word wide. Values of types long and double are pushed onto the operand stack as two words. The Java Virtual Machine does not require 64-bit values on the operand stack to be 64-bit aligned. Implementors are free to decide the appropriate way to divide a 64-bit data value between two operand stack words.

Values from the operand stack must be operated upon in ways appropriate to their types. It is incorrect, for example, to push two int values and then treat them as a long, or to push two float values then add them with an iadd instruction. A small number of Java Virtual Machine instructions (the dup instructions and swap) operate on run-time data areas as raw values of a given width without regard to type; these instructions must not be used to break up or rearrange the words of 64-bit data. These restrictions on operand stack manipulation are enforced, in the Sun implementation, by the class file verifier (§4.9).

3.6.3 Dynamic Linking

A Java Virtual Machine frame contains a reference to the constant pool for the type of the current method to support dynamic linking of the method code. The class file code for a method refers to methods to be invoked and variables to be accessed via symbolic references. Dynamic linking translates these symbolic method references into concrete method references, loading classes as necessary to resolve as-yet-undefined symbols, and translates variable accesses into appropriate offsets in storage structures associated with the runtime location of these variables.

This late binding of the methods and variables makes changes in other classes that a method uses less likely to break this code.

3.6.4 Normal Method Completion

A method invocation completes normally if that invocation does not cause an exception (§2.15, §3.9) to be thrown, either directly from the Java Virtual Machine or as a result of executing an explicit throw statement. If the invocation of the current method completes normally, then a value may be returned to the invoking method. This occurs when the invoked method executes one of the return instructions (§3.11.8), the choice of which must be appropriate for the type of the value being returned (if any).

The Java Virtual Machine frame is used in this case to restore the state of the invoker, including its local variables and operand stack, with the program counter of the invoker appropriately incremented to skip past the method invocation instruction. Execution then continues normally in the invoking method's frame with the returned value (if any) pushed onto the operand stack of that frame.

3.6.5 Abnormal Method Completion

A method invocation completes abnormally if execution of a Java Virtual Machine instruction within the method causes the Java Virtual Machine to throw an exception (§2.15, §3.9), and that exception is not handled within the method. Evaluation of an explicit throw statement also causes an exception to be thrown and, if the exception is not caught by the current method, results in abnormal method completion. A method invocation that completes abnormally never returns a value to its invoker.

3.6.6 Additional Information

A Java Virtual Machine frame may be extended with additional implementation- specific information, such as debugging information.

3.7 Representation of Objects

The Java Virtual Machine does not require any particular internal structure for objects. In Sun's current implementation of the Java Virtual Machine, a reference to a class instance is a pointer to a handle that is itself a pair of pointers: one to a table containing the methods of the object and a pointer to the Class object that represents the type of the object, and the other to the memory allocated from the Java heap for the object data.

Other Java Virtual Machine implementations may use techniques such as inline caching rather than method table dispatch, and they may or may not use handles.

3.8 Special Initialization Methods

At the level of the Java Virtual Machine, every constructor (§2.12) appears as an instance initialization method that has the special name <init>. This name is supplied by a Java compiler. Because the name <init> is not a valid identifier, it cannot be used directly by a Java programmer. Instance initialization methods may only be invoked within the Java Virtual Machine by the invokespecial instruction, and they may only be invoked on uninitialized class instances. An instance initialization method takes on the access permissions (§2.7.8) of the constructor from which it was derived.

At the level of the Java Virtual Machine, a class or interface is initialized (§2.16.4) by invoking its class or interface initialization method with no arguments. The initialization method of a class or interface has the special name <clinit>. This name is supplied by a Java compiler. Because the name <clinit> is not a valid identifier, it cannot be used directly by a Java programmer. Class and interface initialization methods are invoked implicitly by the Java Virtual Machine; they are never invoked directly from Java code or directly from any Java Virtual Machine instruction, but are only invoked indirectly as part of the class initialization process.

3.9 Exceptions

In general, throwing an exception results in an immediate dynamic transfer of control that may exit multiple Java statements and multiple constructor invocations, static and field initializer evaluations, and method invocations until a catch clause (§2.15.2) is found that catches the thrown value.

If no such catch clause is found in the current method, then the current method invocation completes abnormally (§3.6.5). Its operand stack and local variables are discarded and its frame is popped, reinstating the frame of the invoking method. The exception is then rethrown in the context of the invoker's frame, and so on continuing up the method invocation chain. If no suitable catch clause is found before the top of the method invocation chain is reached, the execution of the thread that threw the exception is terminated.

At the level of the Java Virtual Machine, each catch clause describes the Java Virtual Machine instruction range for which it is active, describes the types of exceptions that it is to handle, and gives the address of the code to handle it. An exception matches a catch clause if the instruction that caused the exception is in the appropriate instruction range, and the exception type is the same type as or a subclass of the class of exception that the catch clause handles. If a matching catch clause is found, the system branches to the specified handler. If no handler is found, the process is repeated until all the nested catch clauses of the current method have been exhausted.

The order of the catch clauses in the list is important. The Java Virtual Machine execution continues at the first matching catch clause. Because Java code is structured, it is always possible to arrange all the exception handlers for one method in a single list. For any possible program counter value, this list can be searched to find the proper exception handler, that is, the innermost exception handler that both contains the program counter value and can handle the exception being thrown.

If there is no matching catch clause, the current method is said to have an uncaught exception. The execution state of the invoker, the method that invoked this method, is restored. The propagation of the exception continues as though the exception had occurred in the invoker at the instruction that invoked the method actually raising the exception.

Java supports more sophisticated forms of exception handling through its try-finally and try-catch-finally statements. In such forms, the finally statement is executed even if no matching catch clause is found. The way the Java Virtual Machine supports implementation of these forms is discussed in Chapter 7, "Compiling for the Java Virtual Machine."

3.10 The `class` File Format

Compiled code to be executed by the Java Virtual Machine is stored in a binary file which has a platform-independent format, the class file format. Given the aims of the Java Virtual Machine, the definition of this file format is of importance equal to its other components. The class file format precisely defines the contents of such a file, including details such as byte ordering that might be taken for granted in a platform-specific object file format.

Chapter 4, "The class File Format," covers the class file format in detail.

3.11 Instruction Set Summary

A Java Virtual Machine instruction consists of a one-byte opcode specifying the operation to be performed, followed by zero or more operands supplying arguments or data that are used by the operation. Many instructions have no operands and consist only of an opcode.

Ignoring exceptions, the inner loop of the Java Virtual Machine execution is effectively


    do {

    	fetch an opcode;
    	if (operands) fetch operands;
    	execute the action for the opcode;
    } while (there is more to do);

The number and size of the additional operands are determined by the opcode. If an additional operand is more than one byte in size, then it is stored in big-endian order-high-order byte first. For example, an unsigned 16-bit index into the local variables is stored as two unsigned bytes byte1 and byte2 such that its value is

    (byte1 << 8) | byte2

The bytecode instruction stream is only single-byte aligned. The two exceptions are the tableswitch and lookupswitch instructions, which are padded to force internal alignment of some of their operands on 4-byte boundaries.

The decision to limit the Java Virtual Machine opcode to a byte and to forego data alignment within compiled code reflects a conscious bias in favor of compactness, possibly at the cost of some performance in naive implementations. A one-byte opcode precludes certain implementation techniques that could improve the performance of a Java Virtual Machine emulator, and it limits the size of the instruction set. Not assuming data alignment means that immediate data larger than a byte must be constructed from bytes at run time on many machines.

3.11.1 Types and the Java Virtual Machine

Most of the instructions in the Java Virtual Machine instruction set encode type information about the operations they perform. For instance, the iload instruction loads the contents of a local variable, which must be an int, onto the operand stack. The fload instruction does the same with a float value. The two instructions may have identical implementations, but have distinct opcodes.

For the majority of typed instructions, the instruction type is represented explicitly in the opcode mnemonic by a letter: i for an int operation, l for long, s for short, b for byte, c for char, f for float, d for double, and a for reference. Some instructions for which the type is unambiguous do not have a type letter in their mnemonic. For instance, arraylength always operates on an object that is an array. Some instructions, such as goto, an unconditional control transfer, do not operate on typed operands.

Given the Java Virtual Machine's one-byte opcode size, encoding types into opcodes places pressure on the design of its instruction set. If each typed instruction supported all of the Java Virtual Machine's runtime data types, there would be more instructions than could be represented in a byte. Instead, the instruction set of the Java Virtual Machine provides a reduced level of type support for certain operations. In other words, the instruction set is intentionally not orthogonal. Separate instructions can be used to convert between unsupported and supported data types as necessary.

Table 3.1 summarizes the type support in the instruction set of the Java Virtual Machine. Only instructions that exist for multiple types are listed. A specific instruction, with type information, is built by replacing the T in the instruction template in the opcode column by the letter in the type column. If the type column for some instruction template and type is blank, then no instruction exists supporting that type of operation. For instance, there is a load instruction for type int, iload, but there is no load instruction for type byte.

Note that most instructions in Table 3.1 do not have forms for the integral types byte, char, and short. When writing to its local variables or operand stacks, the Java Virtual Machine internally sign-extends values of types byte and short to type int, and zero-extends values of type char to type int. Thus, most operations on values of types byte, char, and short are correctly performed by instructions operating on values of type int. The Java Virtual Machine also treats values of Java type boolean specially, as noted in §3.2.4.

opcode byte short int long float double char reference
Tipush bipush sipush
Tconst iconst lconst fconst dconst aconst
Tload iload lload fload dload aload
Tstore istore lstore fstore dstore astore
Tinc iinc
Taload baload saload iaload laload faload daload caload aload
Tastore bastore sastore iastore lastore fastore dastore castore aastore
Tadd iadd ladd fadd dadd
Tsub isub lsub fsub dsub
Tmul imul lmul fmul dmul
Tdiv idiv ldiv fdiv ddiv
Trem irem lrem frem drem
Tneg ineg lneg fneg dneg
Tshl ishl lshl
Tshr ishr lshr
Tushr iushr lushr
Tand iand land
Tor ior lor
Txor ixor lxor
i2T i2b i2s i2l i2f i2d
l2T l2i l2f l2d
f2T f2i f2l f2d
d2T d2i d2l d2f
Tcmp lcmp
Tcmpl fcmpl dcmpl
Tcmpg fcmpg dcmpg
if_TcmpOP if_icmpOP if_acmpOP
Treturn ireturn lreturn freturn dreturn areturn

The mapping between Java storage types and Java Virtual Machine computatational types is summarized by Table 3.2.

Java (Storage) Type Size in Bits Computational Type
byte 8 int
char 16 int
short 16 int
int 32 int
long 64 long
float 32 float
double 64 double

*Java (Storage) Type*	*`Size in Bits`*	*`Computational Type`*
`byte`	8	`int`
`char`	16	`int`
`short`	16	`int`
`int`	32	`int`
`long`	64	`long`
`float`	32	`float`
`double`	64	`double`

The exception to this mapping is in the case of arrays. Arrays of type boolean, byte, char, and short can be directly represented by the Java Virtual Machine. Arrays of type byte, char, and short are accessed using instructions specialized to those types. Arrays of type boolean are accessed using byte array instructions.

The remainder of this chapter summarizes the Java Virtual Machine instruction set.

3.11.2 Load and Store Instructions

The load and store instructions transfer values between the Java Virtual Machine's local variables and operand stack:

Load a local variable onto the operand stack: iload, iload_<n>, lload, lload_<n>, fload, fload_<n>, dload, dload_<n>, aload, aload_<n>.
Store a value from the operand stack into a local variable: istore, istore_<n>, lstore, lstore_<n>, fstore, fstore_<n>, dstore, dstore_<n>, astore, astore_<n>.
Load a constant onto the operand stack: bipush, sipush, ldc, ldc_w, ldc2_w, aconst_null, iconst_m1, iconst_<i>, lconst_<l>, fconst_<f>, dconst_<d>.
Gain access to more local variables using a wider index, or to a larger immediate operand: wide.

Instructions that access fields of objects and elements of arrays also transfer data to and from the operand stack (§3.6.2).

Instruction mnemonics shown above with trailing letters between angle brackets (for instance, iload_<n>) denote families of instructions (with members iload_0, iload_1, iload_2, and iload_3 in the case of iload_<n>). Such families of instructions are specializations of an additional generic instruction (iload) that takes one operand. For the specialized instructions the operand is implicit and does not need to be stored or fetched. The semantics are otherwise the same (iload_0 means the same thing as iload with the operand 0). The letter between the angle brackets specifies the type of the implicit operand for that family of instructions: for <n> a natural number, for <i> an int, for <l> a long, for <f> a float, and for <d> a double. Forms for type int are used in many cases to perform operations on values of type byte, char, and short (§3.11.1).

This notation for instruction families is used throughout The Java Virtual Machine Specification.

3.11.3 Arithmetic Instructions

The arithmetic instructions compute a result that is typically a function of two values on the operand stack, pushing the result back on the operand stack. There are two main kinds of arithmetic instructions, those operating on integer values and those operating on floating-point values. Within each of these kinds, the arithmetic instructions are specialized to Java Virtual Machine numeric types. There is no direct support for integer arithmetic on byte, short, and char types (§3.11.1); those operations are handled by instructions operating on type int. Integer and floating-point instructions also differ in their behavior on overflow, underflow, and divide-by-zero. The arithmetic instructions are as follows:

Add: iadd, ladd, fadd, dadd.
Subtract: isub, lsub, fsub, dsub.
Multiply: imul, lmul, fmul, dmul.
Divide: idiv, ldiv, fdiv, ddiv.
Remainder: irem, lrem, frem, drem.
Negate: ineg, lneg, fneg, dneg.
Shift: ishl, ishr, iushr, lshl, lshr, lushr.
Bitwise OR: ior, lor.
Bitwise AND: iand, land.
Bitwise exclusive OR: ixor, lxor.
Local variable increment: iinc.

The semantics of the Java operators on integer and floating-point values (§2.4.2, §2.4.3) are directly supported by the semantics of the Java Virtual Machine instruction set.

The Java Virtual Machine does not indicate overflow or underflow during operations on integer data types. The only integer operations that can throw an exception are the integer divide instructions (idiv and ldiv) and the integer remainder instructions (irem and lrem), which throw an ArithmeticException if the divisor is zero.

Java Virtual Machine operations on floating-point numbers behave exactly as specified in IEEE 754. In particular, the Java Virtual Machine requires full support of IEEE 754 denormalized floating-point numbers and gradual underflow, which make it easier to prove desirable properties of particular numerical algorithms.

The Java Virtual Machine requires that floating-point arithmetic behave as if every floating-point operator rounded its floating-point result to the result precision. Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode, known as round-to-nearest.

The Java Virtual Machine uses round-towards-zero when converting a floatingpoint value to an integer. This results in the number being truncated; any bits of the significand that represent the fractional part of the operand value are discarded. Round-towards-zero chooses as its result the type's value closest to, but no greater in magnitude than, the infinitely precise result.

The Java Virtual Machine's floating-point operators produce no exceptions. An operation that overflows produces a signed infinity, an operation that underflows produces a signed zero, and an operation that has no mathematically definite result produces NaN. All numeric operations with NaN as an operand produce NaN as a result.

3.11.4 Type Conversion Instructions

The type conversion instructions allow conversion between Java Virtual Machine numeric types. These may be used to implement explicit conversions in user code, or to mitigate the lack of orthogonality in the instruction set of the Java Virtual Machine.

The Java Virtual Machine directly supports the following widening numeric conversions, a subset of Java's widening primitive conversions (§2.6.2):

int to long, float, or double
long to float or double
float to double

The widening numeric conversion instructions are i2l, i2f, i2d, l2f, l2d, and f2d. The mnemonics for these opcodes are straightforward given the naming conventions for typed instructions and the punning use of 2 to mean "to." For instance, the i2d instruction converts an int value to a double. Widening numeric conversions do not lose information about the overall magnitude of a numeric value. Indeed, conversions widening from the int type to the long type and from float to double do not lose any information at all; the numeric value is preserved exactly. Conversion of an int or a long value to float, or of a long value to double, may lose precision, that is, may lose some of the least significant bits of the value; the resulting floating-point value is a correctly rounded version of the integer value, using IEEE 754 round-to-nearest mode.

According to this rule, a widening numeric conversion of an int to a long simply sign-extends the two's-complement representation of the int value to fill the wider format. A widening numeric conversion of a char to an integral type zero-extends the representation of the char value to fill the wider format.

Despite the fact that loss of precision may occur, widening numeric conversions never result in a runtime exception.

Note that widening numeric conversions do not exist from integral types byte, char, and short to type int. As noted in §3.11.1, values of type byte, char, and short are internally widened to type int, making these conversions implicit.

The Java Virtual Machine also directly supports the following narrowing numeric conversions, a subset of Java's narrowing primitive conversions (§2.6.3):

int to byte, short, or char
long to int
float to int or long
double to int, long, or float

The narrowing numeric conversion instructions are i2b, i2c, i2s, l2i, f2i, f2l, d2i, d2l, and d2f. A narrowing numeric conversion can result in a value of different sign, or of a different order of magnitude, or both; they may thereby lose precision.

A narrowing numeric conversion of an int or long to an integral type T simply discards all but the N lowest-order bits, where N is the number of bits used to represent type T. This may cause the resulting value not to have the same sign as the input value.

In a narrowing numeric conversion of a floating-point value to an integral type T, where T is either int or long, the floating-point value is converted to type T as follows:

If the floating-point value is NaN, the result of the conversion is an int or long 0.
Otherwise, if the value of the floating-point value is greater than or equal to the smallest value and less than or equal to the largest value representable in type T, then the floating-point value is rounded to an integer value V, rounding towards zero using IEEE 754 round-towards-zero mode. Then there are two cases:
If T is long and this integer value can be represented as a long, then the result is the long value V.
If T is of type int and this integer value can be represented as an int, then the result is the int value V.
Otherwise either:
The value must be too small (a negative value of large magnitude or negative infinity), and the result is the smallest representable value of type int or long.
The value must be too large (a positive value of large magnitude or positive infinity), and the result is the largest representable value of type int or long.

A narrowing numeric conversion from double to float behaves in accordance with IEEE 754. The result is correctly rounded using IEEE 754 round-to-nearest mode. A value too small to be represented as a float is converted to a positive or negative zero of type float; a value too large to be represented as a float is converted to a positive or negative infinity. A double NaN is always converted to a float NaN.

Despite the fact that overflow, underflow, or loss of precision may occur, narrowing conversions among numeric types never result in a runtime exception.

3.11.5 Object Creation and Manipulation

Although both class instances and arrays are objects, the Java Virtual Machine creates and manipulates class instances and arrays using distinct sets of instructions:

Create a new class instance: new.
Create a new array: newarray, anewarray, multianewarray.
Access fields of classes (static fields, known as class variables) and fields of class instances (non-static fields, known as instance variables): getfield, putfield, getstatic, putstatic.
Load an array component onto the operand stack: baload, caload, saload, iaload, laload, faload, daload, aaload.
Store a value from the operand stack as an array component: bastore, castore, sastore, iastore, lastore, fastore, dastore, aastore.
Get the length of array: arraylength.
Check properties of class instances or arrays: instanceof, checkcast.

3.11.6 Operand Stack Management Instructions

A number of instructions are provided for the direct manipulation of the operand stack: pop, pop2, dup, dup2, dup_x1, dup2_x1, dup_x2, dup2_x2, swap.

3.11.7 Control Transfer Instructions

The control transfer instructions conditionally or unconditionally cause the Java Virtual Machine to continue execution with an instruction other than the one following the control transfer instruction. They are:

Conditional branch: ifeq, iflt, ifle, ifne, ifgt, ifge, ifnull, ifnonnull, if_icmpeq, if_icmpne, if_icmplt, if_icmpgt, if_icmple, if_icmpge, if_acmpeq, if_acmpne, lcmp, fcmpl, fcmpg, dcmpl, dcmpg.
Compound conditional branch: tableswitch, lookupswitch.
Unconditional branch: goto, goto_w, jsr, jsr_w, ret.

The Java Virtual Machine has distinct sets of instructions to conditionally branch on comparison with data of int, long, float, double, and reference types. Comparison with data of byte, char, and short types is done using an int comparison instruction (§3.11.1). Because of this added emphasis on int comparisons, the Java Virtual Machine includes a larger complement of conditional branch instructions for type int than for other types. The Java Virtual Machine has distinct conditional branch instructions that test for the null reference, and thus is not required to specify a concrete value for null (§3.3).

All int and long conditional control transfer instructions perform signed comparisons. Floating-point comparison is performed in accordance with IEEE 754.

3.11.8 Method Invocation and Return Instructions

Four instructions invoke methods:

Invoke an instance method of an object, dispatching on the (virtual) type of the object: invokevirtual. This is the normal method dispatch in Java.
Invoke a method that is implemented by an interface, searching the methods implemented by the particular runtime object to find the appropriate method: invokeinterface.
Invoke an instance method requiring special handling, either an instance initialization method <init>, a private method, or a superclass method: invokespecial.
Invoke a class (static) method in a named class: invokestatic.

The method return instructions, which are distinguished by return type, are ireturn (used to return values of type byte, char, short, or int), lreturn, freturn, dreturn, and areturn. In addition, the return instruction is used to return from methods declared to be void.

3.11.9 Throwing and Handling Exceptions

An exception is thrown programmatically using the athrow instruction. Exceptions can also be thrown by various Java Virtual Machine instructions if they detect an abnormal condition.

3.11.10 Implementing `finally`

The implementation of the finally keyword uses the jsr, jsr_w, and ret instructions. See Section 4.9.6, "Exceptions and finally" and Section 7.13, "Compiling finally."

3.11.11 Synchronization

The Java Virtual Machine supports method- and block-level synchronization using a single mechanism (monitors) in different ways. Synchronized methods are handled as part of method invocation and return (see Section 3.11.8, "Method Invocation and Return Instructions"). Synchronization of code blocks, however, has explicit support in the instruction set: monitorenter, monitorexit.

3.12 Public Design, Private Implementation

Thus far this book has sketched the public view of the Java Virtual Machine: the class file format and the instruction set. These components are vital to the platform- and implementation-independence of the Java Virtual Machine. The implementor may prefer to think of them as a means to securely communicate fragments of programs between two platforms, rather than as a blueprint to be followed exactly.

It is important to understand where the line between the public design and the private implementation lies. The Java Virtual Machine must be able to read class files, and it must exactly implement the semantics of the Java Virtual Machine code therein. One way of doing this is to take this document as a specification and to implement that specification literally. But it is also perfectly feasible and desirable for the implementor to modify or optimize the implementation within the constraints of this specification. So long as the class file format can be read, and the semantics of its code are maintained, the implementor may implement these semantics in any way. What is "under the hood" is the implementor's business, as long as the correct external interface is carefully maintained.²

The implementor can use this flexibility to tailor Java Virtual Machine implementations for high performance, low memory use, or portability. What makes sense in a given implementation depends on the goals of that implementation. The range of implementation options includes the following:

Verifying properties of Java Virtual Machine code at linking time (§2.16.3) to reduce the need for runtime checks while ensuring that the code is safe and that the semantics of the Java language are preserved (as done by Sun's class file verifier; see Section 4.9, "Verification of class Files").
Translating the Java Virtual Machine code at load time or during execution (the subject of Chapter 9, "An Optimization") into the instruction set of another virtual machine.
Translating the Java Virtual Machine code at load time or during execution into the native instruction set of the host CPU (sometimes referred to as Just-In-Time or JIT code generation).

The existence of a precisely defined virtual machine and object file format need not significantly restrict the creativity of the implementor. The Java Virtual Machine is designed to support many different implementations, providing new and interesting solutions while retaining compatibility between implementations.

¹ In Sun's JDK 1.0.2 release, boolean arrays are effectively byte arrays, using 8 bits per boolean element.

² There are some exceptions: debuggers and JIT code generators can require access to elements of the Java Virtual Machine that are normally considered to be "under the hood." Sun is working with other Java Virtual Machine implementors and tools vendors to standardize interfaces to the Java Virtual Machine for use by such tools.

Contents | Prev | Next | Index

*opcode*	*`byte`*	*`short`*	*`int`*	*`long`*	*`float`*	*`double`*	*`char`*	*`reference`*
Tipush	bipush	sipush
Tconst			iconst	lconst	fconst	dconst		aconst
Tload			iload	lload	fload	dload		aload
Tstore			istore	lstore	fstore	dstore		astore
Tinc			iinc
Taload	baload	saload	iaload	laload	faload	daload	caload	aload
Tastore	bastore	sastore	iastore	lastore	fastore	dastore	castore	aastore
Tadd			iadd	ladd	fadd	dadd
Tsub			isub	lsub	fsub	dsub
Tmul			imul	lmul	fmul	dmul
Tdiv			idiv	ldiv	fdiv	ddiv
Trem			irem	lrem	frem	drem
Tneg			ineg	lneg	fneg	dneg
Tshl			ishl	lshl
Tshr			ishr	lshr
Tushr			iushr	lushr
Tand			iand	land
Tor			ior	lor
Txor			ixor	lxor
i2T	i2b	i2s		i2l	i2f	i2d
l2T			l2i		l2f	l2d
f2T			f2i	f2l		f2d
d2T			d2i	d2l	d2f
Tcmp				lcmp
Tcmpl					fcmpl	dcmpl
Tcmpg					fcmpg	dcmpg
if_TcmpOP			if_icmpOP					if_acmpOP
Treturn			ireturn	lreturn	freturn	dreturn		areturn