Previous Contents Next

Profiling

This tool allows measuring a variety of metrics concerning program execution, including how many times a particular function or control structure (including conditionals, pattern matchers and loops) are executed. The results are recorded in a file. By examining this information, you may be able to locate either algorithmic errors or crucial locations for optimization.

In order for the profiler to do its work, it is necessary to compile the code using a special mode that adds profiling instructions. There are two profiling modes: one for the bytecode compiler, and the other for the native-code compiler. There are also two commands used to analyze the results. Analysis of native code will retrieve the time spent in each function.

Profiling an application therefore proceeds in three stages:
  1. compilation in profiling mode;
  2. program execution;
  3. presentation of measurements.

Compilation Commands

The commands to compile in profiling mode are the following: These compilers produce the same type of files as the usual commands (7). The different options are described in figure 10.1.

f function call
i branch of if
l while and for loops
m branches of match
t branches of try
a all options

Figure 10.1: Options of the profiling commands


These indicate which control structures must be taken into account. By default, the fm options are activated.

Program Execution

Bytecode Compiler

The execution of a program compiled in profiling mode will, if it terminates, produce a file named ocamlprof.dump which contains the information wanted.

We resume the example of the product of a list of integers. We write the following file f1.ml:

let rec interval a b =
if b < a then []
else a::(interval (a+1) b);;

exception Found_zero ;;

let mult_list l =
let rec mult_rec l = match l with
[] -> 1
| 0::_ -> raise Found_zero
| n::x -> n * (mult_rec x)
in
try mult_rec l with Found_zero -> 0
;;


and the file f2.ml which uses the functions of f1.ml:

let l1 = F1.interval 1 30;;
let l2 = F1.interval 31 60;;
let l3 = l1 @ (0::l2);;

print_int (F1.mult_list l1);;
print_newline();;

print_int (F1.mult_list l3);;
print_newline();;


The compilation of these files in profiling mode is shown in the following:
ocamlcp -i -p a -c f1.ml
val profile_f1_ : int array
val interval : int -> int -> int list
exception Found_zero
val mult_list : int list -> int
With the -p option, the compiler adds a new function (profile_f1_) for the initialization of the counters in module F1. It is the same for file f2.ml:
ocamlcp -i -p a -o f2.exe f1.cmo f2.ml
val profile_f2_ : int array
val l1 : int list
val l2 : int list
val l3 : int list

Native Compiler

The native code compilation gives the following result:
$ ocamlopt -i -p  -c f1.ml
val interval : int -> int -> int list
exception Found_zero
val mult_list : int list -> int
$ ocamlopt -i -p -o f2nat.exe f1.cmx f2.ml
Only the -p option without argument is used. The execution of f2nat.exe produces a file named gmon.out which is in a format that can be handled by the usual Unix commands (see page ??).

Presentation of the Results

Since the information gathered by the two profiling modes differs, their presentation follows suit. In the first (bytecode) mode comments on the number of passages through the control structures are added to the program text. In the second (native) mode, the time spent in its body and the number of calls is associated with each function.

Bytecode Compiler

The ocamlprof command gives the analysis of the measurement results. It uses the information contained in the file camlprof.dump. This command takes the source of the program on entry, then reads the measurements file and produces a new program text with the desired counts added as comments.

For our example this gives:
ocamlprof f1.ml

let rec interval a b = 
  (* 62 *) if b < a then (* 2 *) []
  else (* 60 *) a::(interval (a+1) b);;

exception Found_zero ;; 

let mult_list l = 
 (* 2 *) let rec mult_rec l = (* 62 *) match l with 
    [] -> (* 1 *) 1
  | 0::_ -> (* 1 *) raise Found_zero
  | n::x -> (* 60 *) n * (mult_rec x)
 in
  try mult_rec l with Found_zero -> (* 1 *) 0
;; 
These counters reflect the calculations done in F2 quite well. There are two calls of mult_list and 62 of the auxiliary function mult_rec. Examination of the different branches of the pattern matching show 60 passages through the common case, one through the pattern [] and the only match where the head is 0, raising an exception, which can be seen in the counter of the try statement.

The ocamlprof command accepts two options. The first -f file indicates the name of the file to contain the measurements. The second -F string specifies a string to add to the comments associated with the control structures treated.

Native Compilation

To get the time spent in the calls of the functions for multiplying the elements of a list, we write the following file f3.ml:

let l1 = F1.interval 1 30;;
let l2 = F1.interval 31 60;;
let l3 = l1 @ (0::l2);;

for i=0 to 100000 do
F1.mult_list l1;
F1.mult_list l3
done;;

print_int (F1.mult_list l1);;
print_newline();;

print_int (F1.mult_list l3);;
print_newline();;
This is the same file as f2.ml with a loop of 100000 iterations.

Execution of the program creates the file gmon.out. This is in a format readable by gprof, a command that can be found on Unix systems. The following call to gprof prints information about the time spent and the call graph. Since the output is rather long, we show only the first page which contains the name of the functions that are called at least once and the time spent in each.
$ gprof  f3nat.exe 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 92.31      0.36     0.36   200004     1.80     1.80  F1_mult_rec_45
  7.69      0.39     0.03   200004     0.15     1.95  F1_mult_list_43
  0.00      0.39     0.00     2690     0.00     0.00  oldify
  0.00      0.39     0.00      302     0.00     0.00  darken
  0.00      0.39     0.00      188     0.00     0.00  gc_message
  0.00      0.39     0.00      174     0.00     0.00  aligned_malloc
  0.00      0.39     0.00      173     0.00     0.00  alloc_shr
  0.00      0.39     0.00      173     0.00     0.00  fl_allocate
  0.00      0.39     0.00       34     0.00     0.00  caml_alloc3
  0.00      0.39     0.00       30     0.00     0.00  caml_call_gc
  0.00      0.39     0.00       30     0.00     0.00  garbage_collection
...
The main lesson is that almost all of the execution time is spent in the function F1_mult_rec_45, which corresponds to the function F1.mult_rec in file f1.ml. On the other hand we recognize a lot of other functions that are called. The first on the list are memory management functions in the runtime library (9).






Previous Contents Next