Previous Section Next Section

10.4 Auxiliary Modules for File I/O

File objects supply all functionality that is strictly needed for file I/O. There are some auxiliary Python library modules, however, that offer convenient supplementary functionality, making I/O even easier and handier in several important special cases.

10.4.1 The fileinput Module

The fileinput module lets you loop over all the lines in a list of text files. Performance is quite good, comparable to the performance of direct iteration on each file, since fileinput uses internal buffering to minimize I/O. Therefore, you can use module fileinput for line-oriented file input whenever you find the module's rich functionality convenient, without worrying about performance. The input function is the main function of module fileinput, and the module also provides a FileInput class that supports the same functionality as the module's functions.

close

close(  )

Closes the whole sequence, so that iteration stops and no file remains open.

FileInput

class FileInput(files=None,inplace=0,backup='',bufsize=0)

Creates and returns an instance f of class FileInput. Arguments are the same as for fileinput.input, and methods of f have the same names, arguments, and semantics as functions of module fileinput. f also supplies a method readline, which reads and returns the next line. You can use class FileInput explicitly, rather than the single implicit instance used by the functions of module fileinput, when you want to nest or otherwise mix loops that read lines from more than one sequence of files.

filelineno

filelineno(  )

Returns the number of lines read so far from the file now being read. For example, returns 1 if the first line has just been read from the current file.

filename

filename(  )

Returns the name of the file being read, or None if no line has been read yet.

input

input(files=None,inplace=0,backup='',bufsize=0)

Returns the sequence of lines in the files, suitable for use in a for loop. files is a sequence of filenames to open and read one after the other, in order. Filename '-' means standard input (sys.stdin). If files is a string, it's a single filename to open and read. If files is None, input uses sys.argv[1:] as the list of filenames If the sequence of filenames is empty, input reads sys.stdin.

The sequence object that input returns is an instance of class FileInput; that instance is also the global state of module input, so all other functions of module fileinput operate on the same shared state. Each function of module fileinput corresponds directly to a method of class FileInput.

When inplace is false (the default), input just reads the files. When inplace is true, however, input moves each file being read (except standard input) to a backup file, and redirects standard output (sys.stdout) to write to the file being read. This operation lets you simulate overwriting files in-place. If backup is a string starting with a dot, input uses backup as the extension of the backup files and does not remove the backup files. If backup is an empty string (the default), input uses extension .bak, and deletes each backup file when the file is closed.

bufsize is the size of the internal buffer that input uses to read lines from the input files. If bufsize is 0, input uses a buffer of 8192 bytes.

isfirstline

isfirstline(  )

Returns True or False, just like filelineno( )= =1.

isstdin

isstdin(  )

Returns True if the file now being read is sys.stdin, otherwise False.

lineno

lineno(  )

Returns the total number of lines read so far since the call to input.

nextfile

nextfile(  )

Closes the file now being read, so that the next line to be read will be the first one of the following file.

10.4.2 The linecache Module

The linecache module lets you read a given line (specified by number) from a file with a given name. The module keeps an internal cache, so if you need to read several lines from a file, the operation is cheaper than opening and examining the file each time. Module linecache exposes the following functions.

checkcache

checkcache(  )

Ensures that the module's cache holds no stale data, but rather reflects what's on the filesystem. Call checkcache when the files you're reading may have changed on the filesystem, if you need to ensure that future calls to getline return updated information.

clearcache

clearcache(  )

Drops the module's cache so that the memory can be reused for other purposes. Call clearcache when you don't need to perform any more reading for now.

getline

getline(filename,lineno)

Reads and returns the lineno line from the text file named filename, including the trailing \n. For any error, getline does not raise exceptions, but rather returns the empty string ''. If filename is not found, getline also looks for the file in the directories listed in sys.path.

10.4.3 The struct Module

The struct module lets you pack binary data into a string, and then unpack the bytes of such a string back into the data they represent. Such operations can be useful for various kinds of low-level programming. Most often, you use module struct to interpret data records from binary files having some specified format or to prepare records to be written to such binary files. The module's name comes from C's keyword struct, which is usable for related purposes. On any error, functions of module struct raise exceptions that are instances of exception class struct.error, the only class that the module supplies.

Operations of module struct rely on struct format strings, which are ordinary strings that follow a specified syntax. The first character of a format string can specify the byte order, size, and alignment of packed data:

@

Native byte order, native data sizes, and native alignment for the current platform; this is the default, if the first character is none of the characters listed here (note that format P in Table 10-2 is available only for this kind of format string)

=

Native byte order for the current platform, but standard size and alignment

<

Little-endian byte order (like Intel platforms), standard size and alignment

> , !

Big-endian byte order (network-standard), standard size and alignment

Standard sizes are indicated in Table 10-2. Standard alignment means that there is no forced alignment and that explicit pad bytes are used if needed. Native sizes and alignment are whatever the platform's C compiler uses. Native byte order is either little-endian or big-endian, depending on the current platform.

After the optional leading character, a format string is made up of one or more format characters that can be preceded by an optional count (an integer represented by its decimal digits). The possible format characters are shown in Table 10-2. For most format characters, the count indicates repetition (e.g., '3h' is exactly the same as 'hhh'). When the format character is s or p, indicating a string, the count is not a repetition, but rather the total number of bytes occupied by the string. Whitespace can be freely and innocuously used between formats, but not between a count and its format character.

Table 10-2. Format characters for struct

Character

C type

Python type

Standard size

B
unsigned char
int

1 byte

b
signed char
int

1 byte

c
char

str (length 1)

1 byte

d
double
float

8 bytes

f
float
float

4 bytes

H
unsigned short
int

2 bytes

h
signed short
int

2 bytes

I
unsigned int
long

4 bytes

i
signed int
int

4 bytes

L
unsigned long
long

4 bytes

l
signed long
int

4 bytes

P
void*
int

N/A

p
char[  ]
string

N/A

s
char[  ]
string

N/A

x
padding byte

no value

1 byte

Format s denotes a fixed-length string, exactly as long as its count (the Python string is truncated or padded with copies of the null character '\0', if needed). Format p denotes a Pascal-like string: the first byte is the number of significant characters, and the characters start from the second byte. The count indicates the total number of bytes, including the length byte.

Module struct supplies the following functions.

calcsize

calcsize(fmt)

Returns the size in bytes of the structure corresponding to format string fmt.

pack

pack(fmt,*values)

Packs the given values according to format string fmt and returns the resulting string. values must match in number and types the values required by fmt.

unpack

unpack(fmt,str)

Unpacks binary string str according to format string fmt and returns a tuple of values. len(str) must be equal to struct.calcsize(fmt).

10.4.4 The xreadlines Module

The xreadlines module will be deprecated in Python 2.3. You should avoid it in Python 2.2, since directly iterating on a file object is at least as fast. If you need to support Python 2.1, module xreadlines and the xreadlines method of file objects are a good choice in terms of input performance. Module fileinput, covered earlier in this chapter, is a good compromise if your code needs to support many different versions of Python, and still get good performance. The xreadlines module supplies one function.

xreadlines

xreadlines(f)

Accepts argument f, which must be a file object or a file-like object with a readlines method like that of file objects. Returns a sequence object x that is usable in a for statement or as the argument to built-in functions such as filter. x represents the same sequence of strings as f.readlines( ), but x does so in a lazy way, limiting memory consumption. xreadlines is to readlines much like xrange is to range.

    Previous Section Next Section