Profiling
This tool allows measuring a variety of metrics concerning program
execution, including how many times a particular function or control
structure (including conditionals, pattern matchers and loops) are
executed. The results are recorded in a file. By examining this
information, you may be able to locate either algorithmic errors or
crucial locations for optimization.
In order for the profiler to do its work, it is necessary to compile
the code using a special mode that adds profiling instructions.
There are two profiling modes: one for the bytecode compiler,
and the other for the native-code compiler. There are also two
commands used to analyze the results. Analysis of native code will
retrieve the time spent in each function.
Profiling an application therefore proceeds in three stages:
-
compilation in profiling mode;
- program execution;
- presentation of measurements.
Compilation Commands
The commands to compile in profiling mode are the following:
-
ocamlcp -p options
for the bytecode compiler;
- ocamlopt -p options for the native-code compiler.
These compilers produce the same type of files as the usual
commands (7). The different options are
described in figure 10.1.
f |
function call |
i |
branch of if |
l |
while and for loops |
m |
branches of match |
t |
branches of try |
a |
all options |
Figure 10.1: Options of the profiling commands
These indicate which control structures must be taken into account. By
default, the fm options are activated.
Program Execution
Bytecode Compiler
The execution of a program compiled in profiling mode will, if it terminates, produce a
file named ocamlprof.dump which contains the information wanted.
We resume the example of the product of a list of integers. We write
the following file f1.ml:
let
rec
interval
a
b
=
if
b
<
a
then
[]
else
a::(interval
(a+
1
)
b);;
exception
Found_zero
;;
let
mult_list
l
=
let
rec
mult_rec
l
=
match
l
with
[]
->
1
|
0
::_
->
raise
Found_zero
|
n::x
->
n
*
(mult_rec
x)
in
try
mult_rec
l
with
Found_zero
->
0
;;
and the file f2.ml which uses the functions of f1.ml:
let
l1
=
F1.interval
1
3
0
;;
let
l2
=
F1.interval
3
1
6
0
;;
let
l3
=
l1
@
(0
::l2);;
print_int
(F1.mult_list
l1);;
print_newline();;
print_int
(F1.mult_list
l3);;
print_newline();;
The compilation of these files in profiling mode is shown in the following:
ocamlcp -i -p a -c f1.ml
val profile_f1_ : int array
val interval : int -> int -> int list
exception Found_zero
val mult_list : int list -> int
With the -p
option, the compiler adds a new function
(profile_f1_) for the initialization of the counters in module
F1. It is the same for file f2.ml:
ocamlcp -i -p a -o f2.exe f1.cmo f2.ml
val profile_f2_ : int array
val l1 : int list
val l2 : int list
val l3 : int list
Native Compiler
The native code compilation gives the following result:
$ ocamlopt -i -p -c f1.ml
val interval : int -> int -> int list
exception Found_zero
val mult_list : int list -> int
$ ocamlopt -i -p -o f2nat.exe f1.cmx f2.ml
Only the -p option without argument is used. The execution of
f2nat.exe produces a file named gmon.out which is in a
format that can be handled by the usual Unix commands (see page ??).
Presentation of the Results
Since the information gathered by the two profiling modes
differs, their presentation follows suit. In the first (bytecode) mode
comments on the number of passages through the control structures are
added to the program text. In the second (native) mode,
the time spent in its body and the number of calls is associated with
each function.
Bytecode Compiler
The ocamlprof command gives the analysis of the measurement
results. It uses the information contained in the file camlprof.dump. This command takes the source of the program on
entry, then reads the measurements file and produces a new program
text with the desired counts added as comments.
For our example this gives:
ocamlprof f1.ml
let rec interval a b =
(* 62 *) if b < a then (* 2 *) []
else (* 60 *) a::(interval (a+1) b);;
exception Found_zero ;;
let mult_list l =
(* 2 *) let rec mult_rec l = (* 62 *) match l with
[] -> (* 1 *) 1
| 0::_ -> (* 1 *) raise Found_zero
| n::x -> (* 60 *) n * (mult_rec x)
in
try mult_rec l with Found_zero -> (* 1 *) 0
;;
These counters reflect the calculations done in F2 quite
well. There are two calls of mult_list and 62 of the
auxiliary function mult_rec. Examination of the different
branches of the pattern matching show 60 passages through the common case,
one through the pattern [] and the only match where the head is 0,
raising an exception, which can be seen in the counter of the
try statement.
The ocamlprof command accepts two options. The first -f
file indicates the name of the file to contain the
measurements. The second -F string specifies a string
to add to the comments associated with the control structures treated.
Native Compilation
To get the time spent in the calls of the functions for multiplying
the elements of a list, we write the following file f3.ml:
let
l1
=
F1.interval
1
3
0
;;
let
l2
=
F1.interval
3
1
6
0
;;
let
l3
=
l1
@
(0
::l2);;
for
i=
0
to
1
0
0
0
0
0
do
F1.mult_list
l1;
F1.mult_list
l3
done;;
print_int
(F1.mult_list
l1);;
print_newline();;
print_int
(F1.mult_list
l3);;
print_newline();;
This is the same file as f2.ml with a loop of 100000 iterations.
Execution of the program creates the file gmon.out. This is in a
format readable by gprof, a command that can be found on
Unix systems. The following call to gprof prints
information about the time spent and the call graph. Since the output
is rather long, we show only the first page which contains the name of
the functions that are called at least once and the time spent in each.
$ gprof f3nat.exe
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
92.31 0.36 0.36 200004 1.80 1.80 F1_mult_rec_45
7.69 0.39 0.03 200004 0.15 1.95 F1_mult_list_43
0.00 0.39 0.00 2690 0.00 0.00 oldify
0.00 0.39 0.00 302 0.00 0.00 darken
0.00 0.39 0.00 188 0.00 0.00 gc_message
0.00 0.39 0.00 174 0.00 0.00 aligned_malloc
0.00 0.39 0.00 173 0.00 0.00 alloc_shr
0.00 0.39 0.00 173 0.00 0.00 fl_allocate
0.00 0.39 0.00 34 0.00 0.00 caml_alloc3
0.00 0.39 0.00 30 0.00 0.00 caml_call_gc
0.00 0.39 0.00 30 0.00 0.00 garbage_collection
...
The main lesson is that almost all of the execution time is spent in
the function F1_mult_rec_45, which corresponds to the
function F1.mult_rec in file f1.ml. On the other hand
we recognize a lot of other functions that are called. The first on
the list are memory management functions in the runtime library
(9).