IPL3. Session 6. 19 July 03. Performance and monitoring.
Overview of Java program execution. There are two phases: first, source code is compiled
into platform-independent bytecode; second, at runtime, some JVM reads bytecode, translates
it to platform-specific instructions, and executes them.
The source is typically Java source, but it need not be. See for example:
Kawa
(http://www.gnu.org/software/kawa), a system for compiling Scheme code to Java
bytecode, and Qexo
(http://www.gnu.org/software/qexo), a Kawa-derivative that compiles XQuery into
Java bytecode. (There was an interesting session on Kawa at the recent JavaOne
conference--see Kawa: Compiling Programming Languages to the Java Virtual Machine
(session TS-2907), slides in PDF accessible from
http://servlet.java.sun.com/javaone/sf2003/conf/sessions/46-all-regular.en.jsp).
[Innocent query: why would you want to run code from some other language in a
JVM?]
The idea of starting from something other than vanilla Java source shouldn't seem offensive
to you at this point: recall that with JSPs we start off with non-Java source and end
up executing bytecode (though in this case the original source is first translated to
Java code and then compiled down, rather than being transformed directly to bytecode).
Of course it's also possible in principle to code directly in JVM instructions. The
bottom line is that you need somehow to end up with a block of bytecode that
conforms to the .class file format spelled out in the
JVM spec (http://java.sun.com/docs/books/vmspec/2nd-edition/html/VMSpecTOC.doc.html;
see especially chapter 4), but--in terms of the subsequent execution of the code--how
you get to that point is up to you. [Incidentally note that in the Sun JDK, javap -c
will disassemble bytecode for you].
JVM
A JVM is obliged to read .class-file-format bytecode and act on it, but how it
does so is up to the implementation. See in particular the beginning of chapter 3 of
the JVM spec: "To implement the Java virtual machine correctly, you need only be able
to read the class file format and correctly perform the operations specified therein.
Implementation details that are not part of the Java virtual machine's specification
would unnecessarily constrain the creativity of implementors. For example, the memory
layout of run-time data areas, the garbage-collection algorithm used, and any internal
optimization of the Java virtual machine instructions (for example, translating them
into machine code) are left to the discretion of the implementor".
Execution. Ultimately we have to translate down to machine instructions. There
are several options:
Interpret. Read the sequence of bytecode; for each JVM instruction, translate
to machine code and execute. Interpretation is relatively simple; it's also slow.
Early JVMs were interpreter-based.
Just-in-time (JIT) compilation. At runtime, compile down to platform-specific
machine instructions and execute the compiled code. Execution is faster than in
interpreted mode, but notice that we're compiling on the user's dime. Sun JVMs
immediately prior to 1.3 were JIT-based. Other JVMs still use JIT. Execution in
the .NET CLR (Common Language Runtime) is JIT-based.
Mixed mode. Start executing in interpretive mode. Only go to the trouble
of compiling when you have evidence that particular blocks of code are going to
be executed "often". Sun JVMs since 1.3 have used mixed mode execution technology
referred to as "HotSpot".
Note:
The structure of a bytecoded program doesn't necessarily map directly to
the structure of the machine instructions actually executed. E.g. what started
out as method calls in the Java source can get inlined in the translation process.
The dynamic and polymorphic nature of Java makes optimization tricky.
Just as a placeholder for the JFluid discussion to come: there's no reason
a JVM has to execute exactly and only the bytecode that it reads in from disk.
Memory management ("The representation method outlined ...[previously]... solves
the problem of implementing list structure, provided that we have an infinite amount
of memory". Abelson and Sussman, SICP, p. 540).
Java, like many other programming languages, involves both stack- and heap-based
memory allocation. What distinguishes it from C, say, is that the JVM is obliged
to support a programming model in which programmers allocate from the heap (using
"new"), but don't--can't, aren't supposed to--manually deallocate it; rather, it's
the JVM's responsibility to reclaim stale memory.
The issues involved here include:
Memory exhaustion (if we don't reclaim effectively, we'll run out of memory)
effectively we'll run out of memory)
Memory fragmentation (even if we do reclaim,
we may end up with unusably splintered blocks of memory; and if we compact to
defragment, we'll need some way ensuring that references to objects on the heap
remain valid)
Memory throughput (can we reclaim stale memory fast enough?; put differently,
what's the relationship between different GC systems and memory size
requirements)
Relationship between normal program execution and GC activity (is normal
program execution suspended when GC happens?--it certainly makes things easier
for the collector if it is; but how frequent are these pauses, anyway, and
how long do they last?; and what's the relationship between these frequencies
and lengths and the size of the heap?)
The basic idea here is that full heap scavenges at each GC event are too
expensive (and also, that they're not really necessary).
Instead of having
an undifferentiated heap, divide it into zones. For simplicity's sake let's
say we have just two nursery/young and tenured/old.
All allocation happens
in the nursery (and can be done quickly--we just advance a pointer).
When the nursery is exhausted, it--but not the tenured/old zone--is
garbage collected, and surviving objects are promoted to the tenured zone.
But what's supposed to be the advantage here? The scheme is based on
observations about the distribution of object lifetimes--and especially on
the fact that most objects die young. This mean, on the one hand, that
nursery GCs won't have many surviving objects to deal with; and on the
other that we don't have, for each GC event, to traverse the set of
objects that, having avoided infant mortality, are likely to be around for
awhile.
Of course in this model the tenure/old zone can eventually fill, so
it also involves the idea of less frequent "major collections".
Sun's current JVMs, as we'll see, offer various GC options, but they're
all based on a generational model. So, again by way of comparison, is GC in
the .NET CLR.
Naturally real-world generational collectors are far more complicated
than I outline here--e.g. in the case of the Sun JVMs, the nursery is
actually divided into several subzones: allocation always happens in
"eden"; there are also two "survivor spaces"; when eden overflows, live
objects from it and from one of the survivor spaces--let's call it the
from-space--are copied to the other survivor space--let's call it the
to-space; when the to-space overflows, objects are promoted to the tenured
generation; and in the next minor collection, from-space and to-space swap
roles. Similarly the tenured generation is split into a main section and
one called "permanent", reserved for holding class data.
Paleczny, Vick, and Click, The
Java HotSpot Server Compiler (http://www.usenix.org/events/jvm01/paleczny.html; paper
read at the Usenix JVM2001 conference)
Command-line options
We've seen some of these already (-cp/-classpath, -enableassertions, -Dsystemproperty=value...),
but there's in fact a reasonably large set.
They're usable when you're not actually invoking a program from the command line, but the
way of doing this will then be program specific (see, for example, in the case of Tomcat, the
discussion of JAVA_OPTS in catalina.sh).
There's an options hierarchy--there are standard and non-standard (-X) ones, where the
non-standard aren't guaranteed to be supported in all implementations (in fact, among the
non-standard options, some (-XX) are more non-standard than others--in terms of both their
functional stability and their availability). For information about the options, try:
man java; java -help; java -X, java -Xrunhprof:help
And see also
http://java.sun.com/docs/hotspot/VMOptions.html (although, particularly for the -XX options,
the regular documentation is spotty, and every now and then you'll see a reference to an
option you've never heard of before)
Compiler options
HotSpot executes in mixed (interpreted/compiled) mode by default. You can force pure
interpretation with -Xint.
The Sun JVM in fact includes two compilers, one optimized for clients (optimized, i.e., for startup
and footprint), the other for servers (optimized for throughput; the server compiler does
much more extensive code analysis and optimization than the client one does). The client
compiler is the default, you specify the other with -server. Other differences--e.g. in
default heap size--are associated with the -server and -client options.
There are some non-standard options relating to compilation, two of which are
informational in character: -XX:+PrintCompilation makes the compiler print a message to
console when it compiles a method, and -XX:+CITime prints total compilation time. HotSpot's
threshold for how often code needs to be run to qualify for compilation can be tweaked via
-XX:CompileThreshold=n.
Heap
Global sizing and utilization: -Xmx and -Xms (max and min heap sizes); -XX:MaxHeapFreeRatio=
and -XX:MinHeapFreeRatio= (bounds on overall heap utilization)
Relative sizing: -XX:NewRatio= (what young/tenured distribution do we want to see in the
heap); -XX:NewSize= and -XX:MaxNewSize= (absolute starting size and upper bound for the nursery);
-XX:SurvivorRatio= (ratio of eden to the survivor spaces); -XX:TargetSurvivorRatio= (how much
of the survivor spaces are we trying to occupy)
Garbage collection algorithms. All presume a generational heap.
The default is: single-threaded stop-and-copy in young. Mark-and-compact in old.
"Incremental" GC in the old generation can be forced with -Xincgc. The idea here is to shorten the long pauses
associate with major collections by doing a little bit of GC work in the old generation
every minor collection. Makes for shorter major collections, but at the expense of
lengthened minor ones (and of reduced overall throughput). For information about the
incremental collector (and a visualization tool for it) see
http://research.sun.com/projects/gcspy/printezis-garthwaite-ismm2002.pdf.
-XX:UseParallelGC yields multithreaded parallel copying in the young generation. Indicated
for multiprocessor systems. Doesn't affect how GC is done in the tenured generation.
-XX:UseConcMarkSweep yields "mostly concurrent" mark-and-compact collection in the old
generation. The idea is to devote one CPU to executing a GCing thread while application
threads remain active on other CPUs (except for two brief intervals when GC has to stop the world).
As an additional GC control, an independently of which collection algorithm is in force,
-XX:+DisableExplictGC makes a JVM ignore System.gc() calls.
-verbose by itself displays info about class loading (with -XX:+TraceClassloading and
-XX:TraceClassUnloading you get more)
-verbose:gc displays info relating to GC (and still more, with -XX:+PrintGCDetails,
-XX:+PrintGCTimeStamps, -XX:+PrintHeapAtGC, -XX:+PrintTenuringDistribution)
-Xloggc:file, new in 1.4, writes GC-related info to a file.
jvmstat is a lightweight performance monitoring toolkit. That's "lightweight" as in
subtle enough that you could use it in production enviroments. The data accessible
currently through jvmstat relate principally to memory usage. The standard Sun JVM
from 1.4.1 has jvmstat hooks built in. The jvmstat version 1.0 tools--the tools you
use to extract data from an instrumented JVM--currently only work with 1.4.1 (but the
2.0 ones, available RSN, will be compatible with 1.4.2).
Basics of using jvmstat
Start a program running in a 1.4.1 JVM, specifying the
-XX:+UsePerfData option.
Use jvmps (one of the jvmstat tools) to find what jvmstat calls
a uni (unique numerical identifier). On Linux for example this will just be
a pid.
Run something with the form jvmstat -gcutil uni interval count (we'll see
that there are options other than -gcutil), e.g.
...where the interval is specified in ms; where S0 = survivor 0, S1 = survivor 1, E =
eden, O = old, P = permanent, GC = number of minor collections, OGC = number of
major collections, TGCT = total time spent in garbage collection; and where not
much happened in this 10 second interval beyond the nursery filling up a little bit.
The IPC (inter-process communication) mechanism by which the jvmstat tool and
the instrumented JVM talk is named shared memory (rather than say network sockets).
Remote monitoring. With jvmstat you can also monitor JVMs running on remote computers,
as follows:
On the monitored system:
Run a target program in a 1.4.1 JVM, as above
Also run perfagent (another of the jvmstat tools).
perfagent is an RMI-based service that watches out for JVMs. It requires an
RMI registry, an will either use an existing one or else fall back to an internal
one if it finds none running.
perfagent requires permissions--Java not OS permissions--to access JVMs,
so there's a little bit of config you have to do before you can run it.
On the monitoring system:
jvmps -hostid --your jvmps talks to a remote perfagent,
which reports on JVMs that it can see.
jvmstat -gcutil vmid interval count --where a vmid is
a "virtual machine identifier". The syntax is roughly URL like: in a simple case
it would be something like 1738@bob.marlboro.edu (i.e., uni + "@" + DNS name
for the host).
jvmstat has different output options (raw, formatted) for a set of instrumented
objects (obtainable via jvmstat -list). The basic data obtainable relate
to space (reserved and consumed, in heap zones), events (e.g. number of compiles), and
times (e.g. time spent gc'ing the nursery). For formatted output the choices are
basically:
-gcold, -gcnew --capacity and utlization, for these zones separately; number of
related events; elapsed time; -gcnew has some additional info on object tenuring and
desired survivor usage
-gc --for each zone, capacity and utilization; summary number of events and
elapsed time
-gcutil --zone utilization percentages; number of events and elapsed time
-compiler --number of compilations and elapsed time
-class --i.e., class loader; how many classes have we loaded, and how long did
that take?
There's also a GUI tool that shows these same data--invoke with
visualgc vmid interval
I might mention that there are some other interesting GC visualization tools out
there. See:
JFluid is a profiling tool. It shows what methods are responsible for what percentage
of the running time of a program.
It's a dynamic profiling tool in the sense that the instrumenting code is inserted
at runtime, and doesn't involve modification to either program source or to bytecode.
In fact you can tell JFluid not only what code to monitor when you start the target
program but also to change the instrumented regions during program execution.
The technology that's involved here is called bytecode instrumentation: JFluid
can swap out some "real" methods of a program and replace them with versions to which
timing code has been attached; and as I note, it's able to swap the real ones back in
and possibly replace other code repeatedly as a program runs.
The components of JFluid are:
A twiddled (1.4.2) JVM, in which the target app runs. Not, note, a standard
JVM as in the case of jvmstat; there is talk of incorporating the JFluid mods into
the official 1.5 release of Sun's JVM, but whether that will happen remains to be seen.
There are twiddled JVMs available for Solaris, Linux, and Windows.
A controlling, GUI-based client. The client talks to the twiddled JVM over a
socket connection, issuing various commands--profile the call chain rooted at a
certain method, show your current results...
Running JFluid
There's a little one-time config you need to do to tell the startup script
for the client (jfluid.sh) where certain directories are.
Then for each app you want to examine, run jfluid.sh from some app-specific
directory (JFluid is going to write a per-application config file called jfluid.opt
into the current directory). The first time you run jfluid.sh from a given directory
jfluid.opt won't exist and you'll be prompted for some information--name of the
class in whose main the program begins, classpaths... The degree of difficulty of
specifying the necessary information will vary from target program to target program.
Next, from the Instrumentation menu, choose the area of the program you want
to instrument--the root of a method call graph. You can select an arbitrary method,
you can instrument everything from main() down, you can even select an arbitrary
block of code, from program source, at sub-method granularity. [As of this
week's release of JFluid 1.3, instead of instrumenting method call timings you
can monitor memory allocation--JFluid will show you, for each of the classes loaded
in the JVM, how many objects of those classes have been created and the reverse
call graphs of those instantiations--who created those objects.]
Then do Run from the Run menu to start program execution. To display current
profiling counters, then do Profile | Get latest results.
Alternatively, you can attach JFluid to a running program. That is,
rather than starting the target app from within JFluid, start it as you normally
would (making sure however that it runs within JFluid's special JVM), then run
the JFluid client and within it do a Run | Attach. At this point JFluid will
prompt you for some information (e.g. the process id of the JVM in which the
target app is running) and then connect to that JVM. From this point on you'll
use the client as you would in the first case: specify an instrumentation,
examine results, reset the instrumentation, etc.
JFluid walks on water, but some aspects of it are not entirely intuitive.
One is that when you instrument a method as the root of a call chain, the
instrumentation doesn't take effect for a current invocation of the method.
This means, for example, that if you attach to a running program,
instrumenting main() isn't going to do you any good. Another is that, for
threaded programs, you'll want to consider the virtues of Profile | Edit settings |
Instrumentation | Instrument threads started after root method.
JAMon (Java Application Monitor) is designed to be a low-impact performance monitor.
It reports on the number of calls to monitored code and provides various aggregate timing
statistics.
It's intended primarily for monitoring J2EE apps. In the web app case, JAMon includes
a JSP that display monitoring results and through which JAMon can be controlled
Use of JAMon
There's a jar that you need on your classpath (e.g. to be placed in .../WEB-INF/lib).
And then the code blocks you want monitored need to be marked up as follows:
You can fetch the output of an individual Monitor object by calling its toString()
method. The getReport() method of the MonitorFactory class will give you an HTMLized
account of all visible monitors, and it's essential this same report that's visible
in JSP form through JAMonAdmin.jsp.
A crucial notion in JAMon is that monitors are named--i.e., you pass an
identifying String to MonitorFactory.start() when you create and start a monitor. It's
on this basis, in the first instance, that JAMon can create aggregate statistics--all
monitors with the same name are treated as tokens of the same type, for reporting
purposes. So for example if you're interested in the distribution of elapsed times
in some servlet, you put a start() call at the beginning of your doGet(), handing in
some String, say "MyServlet"; you put a stop() call at the end of doGet(); you deploy
your web app and let calls to the servlet accrue; and then you check JAMonAdmin.jsp,
and there'll be a row for "MyServlet" giving summary statistics about all the calls
to the servlet.
But the real power here, it seems to me, lies in the idea that you can hand any
String to MonitorFactory.start()--not just literals but dynamically generated ones.
That is, any string expression computable at runtime can be the basis of a monitor
type. There's a nice example of this in the JAMon documentation: maybe we'd like
summary data about web app behavior based on client identity, e.g. client IP address.
We don't know in advance what IP addresses clients are going to contact us from, so we
can't define the monitors in advance. But no matter--we can extract client data from
the HttpServletRequest object on the fly and use it to generate the monitor name.