13.4 Garbage Collection
Python's garbage
collection normally proceeds transparently and automatically, but you
can choose to exert some direct control. The general principle is
that Python collects each object x at some
time after x becomes unreachable, that is,
when no chain of references can reach x by
starting from a local variable of a function that is executing, nor
from a global variable of a loaded module. Normally, an object
x becomes unreachable when there are no
references at all to x. However, a group
of objects can also be unreachable when they reference each other.
Classic
Python keeps in each object x a count,
known as a reference count,
of how many references to x are
outstanding. When x's
reference count drops to 0, CPython immediately
collects x. Function
getrefcount of module sys
accepts any object and returns its reference count (at least
1, since getrefcount itself has
a reference to the object it's examining). Other
versions of Python, such as Jython, rely on different garbage
collection mechanisms, supplied by the platform they run on (e.g.,
the JVM). Modules gc and
weakref therefore apply only to
CPython.
When Python garbage-collects x and there
are no references at all to x, Python then
finalizes x (i.e., calls
x._ _del_ _( )) and
makes the memory that x occupied available
for other uses. If x held any references
to other objects, Python removes the references, which in turn may
make other objects collectable by leaving them unreachable.
13.4.1 The gc Module
The gc module exposes
the functionality of Python's garbage collector.
gc deals only with objects that are unreachable in
a subtle way, being part of mutual reference loops. In such a loop,
each object in the loop refers to others, keeping the reference
counts of all objects positive. However, an outside reference no
longer exists to the whole set of mutually referencing objects.
Therefore, the whole group, also known as cyclic garbage, is
unreachable, and therefore garbage collectable. Looking for such
cyclic garbage loops takes time, which is why module
gc exists.
gc
exposes functions you can use to
help you keep garbage collection times under control. These functions
can sometimes help you track down a memory leak—objects that
are not getting collected even though there should be no more
references to them—by letting you discover what other objects
are in fact holding on to references to them.
Forces a full cyclic collection run to happen immediately.
Suspends automatic garbage collection.
Re-enables automatic garbage collection previously suspended with
disable.
A
read-only attribute that lists the uncollectable but unreachable
objects. This happens if any object in a cyclic garbage loop has a
_ _del_ _ special method, as there may be no safe
order in which Python can finalize such objects.
Returns an integer, a bit string corresponding to the garbage
collection debug flags set with set_debug.
get_objects |
New as of Python 2.2 |
Returns a list whose items are all the objects currently tracked by
the cyclic garbage collector.
Returns a list whose items are all the container objects, currently
tracked by the cyclic garbage collector, that refer to any one or
more of the arguments.
Returns a three-item tuple
(thresh0,
thresh1,
thresh2) corresponding
to the garbage collection thresholds set with
set_threshold.
Returns True if cyclic garbage collection is
currently enabled. When collection is currently disabled,
isenabled returns False.
Sets the debugging flags for garbage collection.
flags is an integer, a bit string composed
by ORing (with Python's normal bitwise-OR operator
|) zero or more of the following constants exposed
by module gc:
- DEBUG_COLLECTABLE
-
Prints information on collectable objects found during collection
- DEBUG_INSTANCES
-
Meaningful only if DEBUG_COLLECTABLE and/or
DEBUG_UNCOLLECTABLE are also set: prints
information on objects found during collection that are instances of
classic Python classes
- DEBUG_LEAK
-
The set of debugging flags that make the garbage collector print all
information that can help you diagnose memory leaks, equivalent to
the inclusive-OR of all other constants (except
DEBUG_STATS, which serves a different purpose)
- DEBUG_OBJECTS
-
Meaningful only if DEBUG_COLLECTABLE and/or
DEBUG_UNCOLLECTABLE are also set: prints
information on objects found during collection that are not instances
of classic Python classes
- DEBUG_SAVEALL
-
Saves all collectable objects to list garbage
(uncollectable ones are always saved there) to help diagnose leaks
- DEBUG_STATS
-
Prints statistics during collection to help tune the thresholds
- DEBUG_UNCOLLECTABLE
-
Prints information on uncollectable objects found during collection
set_threshold(thresh0[,thresh1[,thresh2]])
|
|
Sets the thresholds that control how frequently cyclic garbage
collection cycles run. If you set thresh0
to 0, garbage collection is disabled. Garbage
collection is an advanced topic, and the details of the generational
garbage collection approach used in Python and its thresholds are
beyond the scope of this book.
When
you know you have no cyclic garbage loops in your program, or when
you can't afford the delay of a cyclic garbage
collection run at some crucial time, you can suspend automatic
garbage collection by calling gc.disable( ). You
can enable collection again later by calling gc.enable(
). You can test whether automatic collection is currently
enabled by calling gc.isenabled( ), which returns
True or False. To control when
the time needed for collection is spent, you can call
gc.collect( ) to force a full cyclic collection
run to happen immediately. An idiom for wrapping some time-critical
code is therefore:
import gc
gc_was_enabled = gc.isenabled( )
if gc_was_enabled:
gc.collect( )
gc.disable( )
# insert some time-critical code here
if gc_was_enabled:
gc.enable( )
The other functionality in module gc is more
advanced and rarely used, and can be grouped into two areas.
Functions get_threshold and
set_threshold and the debug flag
DEBUG_STATS can help you fine-tune garbage
collection to optimize your program's performance.
The rest of gc's functionality is
there to help you diagnose memory leaks in your program. While
gc itself can automatically fix many such leaks,
your program will be faster if it can avoid creating them in the
first place.
13.4.2 The weakref Module
Careful design can often avoid reference
loops. However, at times you need certain objects to know about each
other, and avoiding mutual references would distort and complicate
design. For example, a container has references to its items, yet it
can often be useful for an object to know about some main container
that holds it. The result is a reference loop: due to the mutual
references, the container and items keep each other alive, even when
all other objects forget about them. Weak references solve this
problem by letting you have objects that mutually reference each
other as long as both are alive, but do not keep each other
alive.
A weak reference is a
special object w that refers to some other
object x without incrementing
x's reference count. When
x's reference count goes
down to 0, Python finalizes and collects
x, then informs
w of
x's demise. The weak
reference w can now either disappear or
become invalid in a controlled way. At any time, a given weak
reference w refers to either the same
target object x as when
w was created, or to nothing at all: a
weak reference is never re-targeted. Not all types of objects support
being the target x of a weak reference
w, but class instances and functions do.
Module weakref exposes functions and types to let
you create and manage weak references.
Returns
len(getweakrefs(x)).
Returns a list of all weak references and proxies whose target is
x.
Returns a weak proxy p of type
ProxyType (CallableProxyType,
if x is callable), with object
x as the target. In most contexts, using
p is just like using
x, except that if you use
p after x has
been deleted, Python raises ReferenceError.
p is never hashable (therefore you cannot
use p as a dictionary key), even when
x is. If f is
present, it must be callable with one argument, and is the
finalization callback for p (i.e., right
before finalizing x, Python calls
f(p)).
Note that when f is called,
x is no longer reachable from
p.
Returns a weak reference w of type
ReferenceType, with object
x as the target.
w is callable: calling
w( ) returns
x if x is still
alive, otherwise w( )
returns None. w is
hashable if x is hashable. You can compare
weak references for equality (= =,
!=), but not for order (<,
>, <=,
>=). Two weak references
x and y are
equal if their targets are alive and equal, or if
x is
y. If f is
present, it must be callable with one argument, and is the
finalization callback for w (i.e., right
before finalizing x, Python calls
f(w)).
Note that when f is called,
x is no longer reachable from
w.
class WeakKeyDictionary(adict={ })
|
|
A WeakKeyDictionary d
is a mapping that references its keys weakly. When the reference
count of a key k in
d goes to 0, item
d[k]
disappears. adict is used to initialize
the mapping.
class WeakValueDictionary(adict={ })
|
|
A WeakValueDictionary d
is a mapping that references its values weakly. When the reference
count of a value v in
d goes to 0, all items
of d such that
d[k]
is v disappear.
adict is used to initialize the mapping.
WeakKeyDictionary and
WeakValueDictionary are useful when you need to
non-invasively associate additional data with objects without
changing the objects. Weak mappings are also useful to non-invasively
record transient associations between objects and to build caches. In
each case, the specific consideration that can make a weak mapping
preferable to a normal dictionary is that an object that is otherwise
garbage-collectable is not kept alive just by being used in a weak
mapping.
A typical use could be a class that keeps track of its instances, but
does not keep them alive just in order to keep track of them:
import weakref
class Tracking:
_instances_dict = weakref.WeakValueDictionary( )
_num_generated = 0
def _ _init_ _(self):
Tracking._num_generated += 1
Tracking._instances_dict[Tracking._num_generated] = self
def instances( ): return _instances_dict.values( )
instances = staticmethod(instances)
|