10.4 Auxiliary Modules for File I/O
File objects supply all functionality that
is strictly needed for file I/O. There are some auxiliary Python
library modules, however, that offer convenient supplementary
functionality, making I/O even easier and handier in several
important special cases.
10.4.1 The fileinput Module
The fileinput module
lets you loop over all the lines in a list of text files. Performance
is quite good, comparable to the performance of direct iteration on
each file, since fileinput uses internal buffering
to minimize I/O. Therefore, you can use module
fileinput for line-oriented file input whenever
you find the module's rich functionality convenient,
without worrying about performance. The input
function is the main function of module fileinput,
and the module also provides a FileInput class
that supports the same functionality as the module's
functions.
Closes the whole sequence, so that iteration stops and no file
remains open.
class FileInput(files=None,inplace=0,backup='',bufsize=0)
|
|
Creates and returns an instance f of class
FileInput. Arguments are the same as for
fileinput.input, and methods of
f have the same names, arguments, and
semantics as functions of module fileinput.
f also supplies a method
readline, which reads and returns the next line.
You can use class FileInput explicitly, rather
than the single implicit instance used by the functions of module
fileinput, when you want to nest or otherwise mix
loops that read lines from more than one sequence of files.
Returns the number of lines read so far from the file now being read.
For example, returns 1 if the first line has just
been read from the current file.
Returns the name of the file being read, or None
if no line has been read yet.
input(files=None,inplace=0,backup='',bufsize=0)
|
|
Returns the sequence of lines in the files, suitable for use in a
for loop. files is a
sequence of filenames to open and read one after the other, in order.
Filename '-' means standard input
(sys.stdin). If files
is a string, it's a single filename to open and
read. If files is None,
input uses sys.argv[1:] as the
list of filenames If the sequence of filenames is empty,
input reads sys.stdin.
The sequence object that input returns is an
instance of class FileInput; that instance is also
the global state of module input, so all other
functions of module fileinput operate on the same
shared state. Each function of module fileinput
corresponds directly to a method of class
FileInput.
When inplace is false (the default),
input just reads the files. When
inplace is true, however,
input moves each file being read (except standard
input) to a backup file, and redirects standard output
(sys.stdout) to write to the file being read. This
operation lets you simulate overwriting files in-place. If
backup is a string starting with a dot,
input uses backup as
the extension of the backup files and does not remove the backup
files. If backup is an empty string (the
default), input uses extension
.bak, and deletes each backup file when the file
is closed.
bufsize is the size of the internal buffer
that input uses to read lines from the input
files. If bufsize is 0,
input uses a buffer of 8192 bytes.
Returns True or False, just
like filelineno( )= =1.
Returns True if the file now being read is
sys.stdin, otherwise False.
Returns the total number of lines read so far since the call to
input.
Closes the file now being read, so that the next line to be read will
be the first one of the following file.
10.4.2 The linecache Module
The
linecache module lets you read a given line
(specified by number) from a file with a given name. The module keeps
an internal cache, so if you need to read several lines from a file,
the operation is cheaper than opening and examining the file each
time. Module linecache exposes the following
functions.
Ensures that the module's cache holds no stale data,
but rather reflects what's on the filesystem. Call
checkcache when the files you're
reading may have changed on the filesystem, if you need to ensure
that future calls to getline return updated
information.
Drops the module's cache so that the memory can be
reused for other purposes. Call clearcache when
you don't need to perform any more reading for now.
Reads and returns the lineno line from the
text file named filename, including the
trailing \n. For any error,
getline does not raise exceptions, but rather
returns the empty string ''. If
filename is not found,
getline also looks for the file in the directories
listed in sys.path.
10.4.3 The struct Module
The
struct module lets you pack binary data into a
string, and then unpack the bytes of such a string back into the data
they represent. Such operations can be useful for various kinds of
low-level programming. Most often, you use module
struct to interpret data records from binary files
having some specified format or to prepare records to be written to
such binary files. The module's name comes from
C's keyword struct, which is
usable for related purposes. On any error, functions of module
struct raise exceptions that are instances of
exception class struct.error, the only class that
the module supplies.
Operations of module struct rely on struct format
strings, which are ordinary strings that follow a specified syntax.
The first character of a format string can specify the byte order,
size, and alignment of packed data:
- @
-
Native byte order, native data sizes, and native alignment for the
current platform; this is the default, if the first character is none
of the characters listed here (note that format P
in Table 10-2 is available only for this kind of
format string)
- =
-
Native byte order for the current platform, but standard size and
alignment
- <
-
Little-endian byte order (like Intel platforms), standard size and
alignment
- >
, !
-
Big-endian byte order (network-standard), standard size and alignment
Standard sizes are indicated in Table 10-2.
Standard alignment means that there is no forced alignment and that
explicit pad bytes are used if needed. Native sizes and alignment are
whatever the platform's C compiler uses. Native byte
order is either little-endian or big-endian, depending on the current
platform.
After the optional leading character, a format string is made up of
one or more format characters that can be preceded by an optional
count (an integer represented by its decimal digits). The possible
format characters are shown in Table 10-2. For most
format characters, the count indicates repetition (e.g.,
'3h' is exactly the same as
'hhh'). When the format character is
s or p, indicating a string,
the count is not a repetition, but rather the total number of bytes
occupied by the string. Whitespace can be freely and innocuously used
between formats, but not between a count and its format character.
Table 10-2. Format characters for struct
B
|
unsigned char
|
int
|
1 byte
|
b
|
signed char
|
int
|
1 byte
|
c
|
char
|
str (length 1)
|
1 byte
|
d
|
double
|
float
|
8 bytes
|
f
|
float
|
float
|
4 bytes
|
H
|
unsigned short
|
int
|
2 bytes
|
h
|
signed short
|
int
|
2 bytes
|
I
|
unsigned int
|
long
|
4 bytes
|
i
|
signed int
|
int
|
4 bytes
|
L
|
unsigned long
|
long
|
4 bytes
|
l
|
signed long
|
int
|
4 bytes
|
P
|
void*
|
int
|
N/A
|
p
|
char[ ]
|
string
|
N/A
|
s
|
char[ ]
|
string
|
N/A
|
x
|
padding byte
|
no value
|
1 byte
|
Format s denotes
a fixed-length string, exactly as long as its count (the Python
string is truncated or padded with copies of the null character
'\0', if needed). Format p
denotes a Pascal-like string: the first byte is the number of
significant characters, and the characters start from the second
byte. The count indicates the total number of bytes, including the
length byte.
Module struct supplies the following functions.
Returns the size in bytes of the structure corresponding to format
string fmt.
Packs the given values according to format string
fmt and returns the resulting string.
values must match in number and types the
values required by fmt.
Unpacks binary string str according to
format string fmt and returns a tuple of
values.
len(str)
must be equal to
struct.calcsize(fmt).
10.4.4 The xreadlines Module
The
xreadlines module will be deprecated in Python
2.3. You should avoid it in Python 2.2, since directly iterating on a
file object is at least as fast. If you need to support Python 2.1,
module xreadlines and the
xreadlines method of file objects are a good
choice in terms of input performance. Module
fileinput, covered earlier in this chapter, is a
good compromise if your code needs to support many different versions
of Python, and still get good performance. The
xreadlines module supplies one function.
Accepts argument f, which must be a file
object or a file-like object with a readlines
method like that of file objects. Returns a sequence object
x that is usable in a
for statement or as the argument to built-in
functions such as filter.
x represents the same sequence of strings
as f.readlines( ), but
x does so in a lazy way, limiting memory
consumption. xreadlines is to
readlines much like xrange is
to range.
|