17.2 Debugging

Since Python's development cycle is so fast, the most effective way to debug is often to edit your code to make it output relevant information at key points. Python has many ways to let your code explore its own state in order to extract information that may be relevant for debugging. The inspect and traceback modules specifically support such exploration, which is also known as reflection or introspection.

Once you have obtained debugging-relevant information, statement print is often the simplest way to display it. You can also log debugging information to files. Logging is particularly useful for programs that run unattended for a long time, as is typically the case for server programs. Displaying debugging information is like displaying other kinds of information, as covered in Chapter 10 and Chapter 16, and similarly for logging it, as covered in Chapter 10 and Chapter 11. Python 2.3 will also include a module specifically dedicated to logging. As covered in Chapter 8, rebinding attribute excepthook of module sys lets your program log detailed error information just before your program is terminated by a propagating exception.

Python also offers hooks enabling interactive debugging. Module pdb supplies a simple text-mode interactive debugger. Other interactive debuggers for Python are part of integrated development environments (IDEs), such as IDLE and various commercial offerings. However, I do not cover IDEs in this book.

17.2.1 The inspect Module

The inspect module supplies functions to extract information from all kinds of objects, including the Python call stack (which records all function calls currently executing) and source files. At the time of this writing, module inspect is not yet available for Jython. The most frequently used functions of module inspect are as follows.

getargspec, formatargspec

getargspec(f)

f is a function object. getargspec returns a tuple with four items (arg_names, extra_args, extra_kwds, arg_defaults). arg_names is the sequence of names of f's formal arguments. extra_args is the name of the special formal argument of the form *args, or None if f has no such special argument. extra_kwds is the name of the special formal argument of the form **kwds, or None if f has no such special argument. arg_defaults is the tuple of default values for f's arguments. You can deduce other details about f's signature from getargspec's results. For example, f has len(arg_names)-len(arg_defaults) mandatory arguments, and the names of f's optional arguments are the strings that are the items of the list slice arg_names[-len(arg_defaults):].

formatargspec accepts one to four arguments that are the same as the items of the tuple that getargspec returns, and returns a formatted string that displays this information. Thus, formatargspec(*getargspec(f)) returns a formatted string with f's formal arguments (i.e., f's signature) in parentheses, as used in the def statement that created f.

getargvalues, formatargvalues

getargvalues(f)

f is a frame object, for example the result of a call to the function _getframe in module sys (covered in Chapter 8) or to function currentframe in module inspect. getargvalues returns a tuple with four items (arg_names, extra_args, extra_kwds, locals). arg_names is the sequence of names of f's function's formal arguments. extra_args is the name of the special formal argument of form *args, or None if f's function has no such special argument. extra_kwds is the name of the special formal argument of form **kwds, or None if f's function has no such special argument. locals is the dictionary of local variables for f. Since arguments, in particular, are local variables, the value of each actual argument can be obtained from locals by indexing the locals dictionary with the argument's name.

formatargvalues accepts one to four arguments that are the same as the items of the tuple that getargvalues returns, and returns a formatted string that displays this information. formatargvalues(*getargvalues(f)) returns a formatted string with f's actual arguments in parentheses, in named (keyword) form, as used in the call statement that created f. For example:

def f(x=23): return inspect.currentframe(  )
print inspect.formatargvalues(inspect.getargvalues(f(  )))  
# prints: (x=23)

currentframe

currentframe(  )

Returns the frame object for the current function (caller of currentframe). formatargvalues(getargvalues(currentframe( )), for example, returns a formatted string with the actual arguments of the calling function.

getdoc

getdoc(obj)

Returns the docstring for obj, with tabs expanded to spaces and redundant whitespace stripped from each line.

getfile, getsourcefile

getfile(obj)

Returns the name of the file that defined obj, and raises TypeError when unable to determine the file. For example, getfile raises TypeError if obj is built-in. getfile returns the name of a binary or source file. getsourcefile returns the name of a source file, and raises TypeError when it can determine only a binary file, not the corresponding source file.

getmembers

getmembers(obj, filter=None)

Returns all attributes (members) of obj, a sorted list of (name,value) pairs. When filter is not None, returns only attributes for which callable filter returns a true result when called on the attribute's value, like:

[ (n, v) for n, v in getmembers(obj) if filter(v) ]

getmodule

getmodule(obj)

Returns the module object that defined obj, or None if unable to determine it.

getmro

getmro(c)

Returns a tuple of bases and ancestors of class c in method resolution order. c is the first item in the tuple. Each class appears only once in the tuple.

getsource, getsourcelines

getsource(obj)

Returns a single multiline string that is the source code for obj, and raises IOError if unable to determine or fetch it. getsourcelines returns a pair: the first item is the source code for obj (a list of lines), and the second item is the line number of the list's first line in the source file it comes from.

isbuiltin,isclass,iscode, isframe, isfunction, ismethod, ismodule, isroutine

isbuiltin(obj)

Each of these functions accepts a single argument obj and returns True if obj belongs to the type indicated in the function name. Accepted objects are, respectively: built-in (C-coded) functions, class objects, code objects, frame objects, Python-coded functions (including lambda expressions), methods, modules, and, for isroutine, all methods or functions, either C-coded or Python-coded. These functions are often used as the filter argument to getmembers.

stack

stack(context=1)

Returns a list of six-item tuples. The first tuple is about stack's caller, the second tuple is about the caller's caller, and so on. Each tuple's items, in order, are: frame object, filename, line number, function name, list of context source code lines around the current line, and index of current line within the list.

For example, suppose that at some point in your program you execute a statement such as:

x.f(  )

and unexpectedly receive an AttributeError informing you that object x has no attribute named f. This means that object x is not as you expected, so you want to determine more about x as a preliminary to ascertaining why x is that way and what you should do about it. Change the statement to:

try: x.f(  )
except AttributeError:
    import sys, inspect
    sys.stderr.write('x is type %s(%r)\n'%(x,type(x)))
    sys.stderr.write("x's methods are: ")
    for n, v in inspect.getmembers(x, callable):
       sys.stderr.write('%s '%n)
    sys.stderr.write('\n')
    raise

This example uses sys.stderr (covered in Chapter 8), since it's displaying diagnostic information related to an error, not program results. Function getmembers of module inspect obtains the name of all methods available on x in order to display them. Of course, if you need this kind of diagnostic functionality often, you should package it up into a separate function, such as:

import sys, inspect
def show_obj_methods(obj, name, show=sys.stderr.write):
    show('%s is type %s(%r)\n'%(name,obj,type(obj)))
    show("%s's methods are: "%name)
    for n, v in inspect.getmembers(obj, callable):
       show('%s '%n)
    show('\n')

And then the example becomes just:

try: x.f(  )
except AttributeError:
    show_obj_methods(x, 'x')
    raise

Good program structure and organization are just as necessary in code intended for diagnostic and debugging purposes as they are in code that implements your program's functionality. See also Section 6.6.4 in Chapter 6 for a good technique to use when defining diagnostic and debugging functions.

17.2.2 The traceback Module

The traceback module lets you extract, format, and output information about tracebacks as normally produced by uncaught exceptions. By default, module traceback reproduces the formatting Python uses for tracebacks. However, module traceback also lets you exert fine-grained control. The module supplies many functions, but in typical use you will use only one of them.

print_exc

print_exc(limit=None, file=sys.stderr)

Call print_exc from an exception handler or a function directly or indirectly called by an exception handler. print_exc outputs to file-like object file the traceback information that Python outputs to stderr for uncaught exceptions. When limit is not None, print_exc outputs only limit traceback nesting levels. For example, when, in an exception handler, you want to cause a diagnostic message just as if the exception propagated, but actually stop the exception from propagating any further (so that your program keeps running, and no further handlers are involved), call traceback.print_exc( ).

17.2.3 The pdb Module

The pdb module exploits the Python interpreter's debugging and tracing hooks to implement a simple, command-line-oriented interactive debugger. pdb lets you set breakpoints, single-step on sources, examine stack frames, and so on.

To run some code under pdb's control, you import pdb and then call pdb.run, passing as the single argument a string of code to execute. To use pdb for post-mortem debugging (meaning debugging of code that terminated by propagating an exception at an interactive prompt), call pdb.pm( ) without arguments. When pdb starts, it first reads text files named .pdbrc in your home directory and in the current directory. Such files can contain any pdb commands, but most often they use the alias command in order to define useful synonyms and abbreviations for other commands.

When pdb is in control, it prompts you with the string '(Pdb) ', and you can enter pdb commands. Command help (which you can also enter in the abbreviated form h) lists all available commands. Call help with an argument (separated by a space) to get help about any specific command. You can abbreviate most commands to the first one or two letters, but you must always enter commands in lowercase: pdb, like Python itself, is case-sensitive. Entering an empty line repeats the previous command. The most frequently used pdb commands are the following.

! statement

Executes Python statement statement in the currently debugged context.

alias, unalias

alias [ name [ command ] ]

alias without arguments lists currently defined aliases. alias name outputs the current definition of the alias name. In the full form, command is any pdb command, with arguments, and may contain %1, %2, and so on to refer to arguments passed to the new alias name being defined, or %* to refer to all such arguments together. Command unalias name removes an alias.

args, a

args

Lists all actual arguments passed to the function you are currently debugging.

break, b

break [ location [ ,condition ] ]

break without arguments lists currently defined breakpoints and the number of times each breakpoint has triggered. With an argument, break sets a breakpoint at the given location. location can be a line number or a function name, optionally preceded by filename: to set a breakpoint in a file that is not the current one or at the start of a function whose name is ambiguous (i.e., a function that exists in more than one file). When condition is present, condition is an expression to evaluate (in the debugged context) each time the given line or function is about to execute; execution breaks only when the expression returns a true value. When setting a new breakpoint, break returns a breakpoint number, which you can then use to refer to the new breakpoint in any other breakpoint-related pdb command.

clear, cl

clear [ breakpoint-numbers ]

Clears (removes) one or more breakpoints. clear without arguments removes all breakpoints after asking for confirmation. To deactivate a breakpoint without removing it, see disable.

condition

condition breakpoint-number [ expression ]

condition n expression sets or changes the condition on breakpoint n. condition n, without expression, makes breakpoint n unconditional.

continue, c, cont

continue

Continues execution of the code being debugged, up to a breakpoint if any.

disable

disable [ breakpoint-numbers ]

Disables one or more breakpoints. disable without arguments disables all breakpoints (after asking for confirmation). This differs from clear in that the debugger remembers the breakpoint, and you can reactivate it via enable.

down, d

down

Moves one frame down in the stack (i.e., toward the most recent function call). Normally, the current position in the stack is at the bottom (i.e., at the function that was called most recently and is now being debugged). Therefore, command down can't go further down. However, command down is useful if you have previously executed command up, which moves the current position upward.

enable

enable [ breakpoint-numbers ]

Enables one or more breakpoints. enable without arguments enables all breakpoints after asking for confirmation.

ignore

ignore breakpoint-number [ count ]

Sets the breakpoint's ignore count (to 0, if count is omitted). Triggering a breakpoint whose ignore count is greater than 0 just decrements the count. Execution stops, presenting you with an interactive pdb prompt, only when you trigger a breakpoint whose ignore count is 0. For example, say that module fob.py contains the following code:

def f(  ):
    for i in range(1000):
        g(i)

def g(i):
    pass

Now, consider the following interactive pdb session:

>>> import pdb
>>> import fob
>>> pdb.run('fob.f(  )')
> <string>(0)?(  )
(Pdb) break fob.g
Breakpoint 1 at C:\mydir\fob.py:6
(Pdb) ignore 1 500
Will ignore next 500 crossings of breakpoint 1.
(Pdb) continue
> <string>(1)?(  )
(Pdb) continue
> C:\mydir\fob.py(6)g(  )
-> pass
(Pdb) print i
500

The ignore command, as pdb shows in response to it, asks pdb to ignore the next 500 hits on breakpoint 1, which we just set at fob.g in the previous break statement. Therefore, when execution finally stops, function g has already been called 500 times, as we show by printing its argument i, which indeed is now 500. Note that the ignore count of breakpoint 1 is now 0; if we give another continue and print i, i will then show as 501. In other words, once the ignore count is decremented back to 0, execution stops every time the breakpoint is hit. If we want to skip some more hits, we need to give pdb another ignore command, in order to set the ignore count of breakpoint 1 at some value greater than 0 yet again.

list, l

list [ first [ , last ] ]

list without arguments lists 11 lines centered on the current one, or the next 11 lines if the previous command was also a list. By giving arguments to the list command, you may explicitly specify the first and last lines to list within the current file. The list command deals with physical lines, including comments and empty lines, not with logical lines.

next, n

next

Executes the current line, without stepping into any function called from the current line. However, hitting breakpoints in functions called directly or indirectly from the current line does stop execution.

p expression

Evaluates expression in the current context and displays the result.

quit, q

quit

Immediately terminates both pdb and the program being debugged.

return, r

return

Executes the rest of the current function, stopping only at breakpoints if any.

step, s

step

Executes the current line, stepping into any function called from the current line.

tbreak

tbreak [ location [ ,condition ] ]

Like break, but the breakpoint is temporary (i.e., pdb automatically removes the breakpoint as soon as the breakpoint is triggered).

up, u

up

Moves one frame up in the stack (i.e., away from the most recent function call and toward the calling function).

where, w

where

Shows the stack of frames and indicates the current one (i.e., in what frame's context command ! executes statements, command args shows arguments, command p evaluates expressions, etc.).

17.2.4 Debugging in IDLE

IDLE, the Interactive DeveLopment Environment that comes with Python, offers debugging functionality similar to that of pdb, although not quite as powerful. Thanks to IDLE's GUI, however, you may find the functionality easier to access. For example, instead of having to ask for source lists and stack lists explicitly with such pdb commands as list and where, you just activate one or more of four checkboxes in the Debug Control window to see source, stack, locals, and globals always displayed in the same window at each step.

To start IDLE's interactive debugger, use menu Debug Debugger in IDLE's *Python Shell* window. IDLE opens the Debug Control window, outputs [DEBUG ON] in the shell window, and gives you another >>> prompt in the shell window. Keep using the shell window as you normally would�any command you give at the shell window's prompt now runs under the debugger. To deactivate the debugger, use Debug Debugger again; IDLE then toggles the debug state, closes the Debug Control window, and outputs [DEBUG OFF] in the shell window. To control the debugger when the debugger is active, use the GUI controls in the Debug Control window. You can toggle the debugger away only when it is not busy actively tracking code: otherwise, IDLE disables the Quit button in the Debug Control window.