10.2 Filesystem Operations
Using the os
module, you can manipulate the filesystem in a variety of ways:
creating, copying, and deleting files and directories, comparing
files, and examining filesystem information about files and
directories. This section documents the attributes and methods of the
os module that you use for these purposes, and
also covers some related modules that operate on the filesystem.
10.2.1 Path-String Attributes of the os Module
A file or directory is identified by a
string, known as its path, whose syntax depends
on the platform. On both Unix-like and Windows platforms, Python
accepts Unix syntax for paths, with slash (/) as
the directory separator. On non-Unix-like platforms, Python also
accepts platform-specific path syntax. On Windows, for example, you
can use backslash (\) as the separator. However,
you do need to double up each backslash to \\ in
normal string literals or use raw-string syntax as covered in Chapter 4. In the rest of this chapter, for brevity,
Unix syntax is assumed in both explanations and
examples.
Module os supplies attributes that provide details
about path strings on the current platform. You should typically use
the higher-level path manipulation operations covered in Section 10.2.4 later in this chapter,
rather than lower-level string operations based on these attributes.
However, the attributes may still be useful at times:
- curdir
-
The string that denotes the current directory ('.'
on Unix and Windows)
- defpath
-
The default search path used if the environment lacks a
PATH environment variable
- linesep
-
The string that terminates text lines ('\n' on
Unix, '\r\n' on Windows)
- extsep
-
The string that separates the extension part of a
file's name from the rest of the name
('.' on Unix and Windows)
- pardir
-
The string that denotes the parent directory ('..'
on Unix and Windows)
- pathsep
-
The separator between paths in lists of paths, such as those used for
the environment variable PATH
(':' on Unix, ';' on Windows)
- sep
-
The separator of path components ('/' on Unix,
'\\' on Windows)
10.2.2 Permissions
Unix-like platforms associate nine bits
with each file or directory, three each for the
file's owner (user), its group, and anybody else,
indicating whether the file or directory can be read, written, and
executed by the specified subject. These nine bits are known as the
file's permission
bits, part of the file's
mode (a bit string that also includes other bits
describing the file). These bits are often displayed in octal
notation, which groups three bits in each digit. For example, a mode
of 0664 indicates a file that can be read and
written by its owner and group, but only read, not written, by
anybody else. When any process on a Unix-like system creates a file
or directory, the operating system applies to the specified mode a
bit mask known as the process's
umask, which can remove some of the permission
bits.
Non-Unix-like platforms handle file
and directory permissions in very different ways. However, the
functions in Python's standard library that deal
with permissions accept a mode argument
according to the Unix-like approach described in the previous
paragraph. The implementation on each platform maps the nine
permission bits in a way appropriate for the given platform. For
example, on versions of Windows that distinguish only between
read-only and read-write files and do not distinguish file ownership,
a file's permission bits show up as either
0666 (read-write) or 0444
(read-only). On such a platform, when a file is created, the
implementation looks only at bit 0200, making the
file read-write if that bit is 0 or read-only if
that bit is 1.
10.2.3 File and Directory Functions of the os Module
The os module
supplies several functions to query and set file and directory
status.
Returns True if file
path has all of the permissions encoded in
integer mode, otherwise
False. mode can be
os.F_OK to test for file existence, or one or more
of os.R_OK, os.W_OK, and
os.X_OK (with the bitwise-OR operator
| joining them if more than one) to test
permissions to read, write, and execute the file.
access does not use the standard interpretation
for its mode argument, covered in Section 10.2.2 earlier in this chapter.
access tests only if this specific
process's real user and group identifiers have the
requested permissions on the file. If you need to study a
file's permission bits in more detail, see function
stat in this section.
Sets the current working directory to path.
Changes the permissions of file path, as
encoded in integer mode.
mode can be zero or more of
os.R_OK, os.W_OK, and
os.X_OK (with the bitwise-OR operator
| joining them if more than one) to set permission
to read, write, and execute. On Unix-like platforms,
mode can also be a richer bit pattern, as
covered in Section 10.2.2 earlier in
this chapter.
Returns the path of the current working directory.
Returns a list whose items are the names of all files and
subdirectories found in directory path.
The returned list is in arbitrary order, and does not include the
special directory names '.' and
'..'.
The dircache module also supplies a function named
listdir, which works like
os.listdir, with two enhancements. First,
dircache.listdir returns a sorted list. Further,
dircache caches the list it returns, so repeated
requests for lists of the same directory are faster if the
directory's contents have not changed in the
meantime. dircache automatically detects changes,
so the list that dircache.listdir returns is
always up to date.
makedirs(path,mode=0777)
mkdir(path,mode=0777)
|
|
makedirs creates all directories that are part of
path and do not yet exist.
mkdir creates only the rightmost directory of
path. Both functions use
mode as permission bits of directories
they create. Both functions raise OSError if
creation fails or if a file or directory named
path already exists.
remove(path)
unlink(path)
|
|
Removes the file named path (see
rmdir later in this section to remove a
directory). unlink is a synonym of
remove.
Loops from right to left over the directories that are part of
path, removing each one. The loop ends
when a removal attempt raises an exception, generally because a
directory is not empty. removedirs does not
propagate the exception as long as it has removed at least one
directory.
Renames the file or directory named source
to dest.
Like rename, except that
renames attempts to create all intermediate
directories needed for dest. After the
renaming, renames tries to remove empty
directories from path source using
removedirs. It does not propagate any resulting
exception, since it's not an error if the starting
directory of source does not become empty
after the renaming.
Removes the directory named path (raises
OSError if it is not empty).
Returns a value x that is a tuple of 10
integers that provide information about a file or subdirectory
path. See Section 10.2.5 later in this chapter for
details about using the returned tuple. In Python 2.2 and later,
x is of type
stat_result. You can still use
x as a tuple, but you can also access
x's items as read-only
attributes x.st_mode,
x.st_ino, and so on,
using as attribute names the lowercase versions of the names of
constants listed later in Table 10-1.
A module named statcache also supplies a function
named stat, like os.stat but
with an enhancement: the returned tuple (or
stat_result instance) is cached, so repeated
requests about the same file run faster. statcache
cannot detect changes automatically, so you should use it only for
stable files that do not change in the time between
stat requests.
tempnam(dir=None,prefix=None)
tmpnam( )
|
|
Returns an absolute path usable as the name of a new temporary file.
If dir is None, the
path uses the directory normally used for temporary files on the
current platform; otherwise the path uses
dir. If prefix
is not None, it should be a short string to be
prefixed to the temporary file's name.
tempnam never returns the name of any already
existing file. Your program must create the temporary file, use the
file, and remove the file when done, as in the following snippet:
import os
def work_on_temporary_file(workfun):
nam = os.tempnam( )
fil = open(nam, 'rw+')
try:
workfun(fil)
finally:
fil.close( )
os.remove(nam) tmpnam is a synonym for
tempnam. However, tmpnam does
not accept arguments, and always behaves like
tempnam(None,None). tempnam and
tmpnam are potential weaknesses in your
program's security, and recent versions of Python
emit a warning the first time your program calls these functions to
alert you to this fact. See Chapter 17 for
information about ways in which your program can interact with
warnings.
Sets the accessed and modified times of file or directory
path. If times
is None, utime uses the current
time. Otherwise, times must be a pair of
numbers (in seconds since the epoch, as covered in Chapter 12) in the order
(accessed,
modified).
10.2.4 The os.path Module
The os.path module
supplies functions to analyze and transform path strings.
Returns a normalized absolute path equivalent to
path, just like:
os.path.normpath(os.path.join(os.getcwd( ),path)) For example, os.path.abspath(os.curdir) always
returns the same string as os.getcwd( ).
Returns the base name part of path, just
like
os.path.split(path)[1].
For example, os.path.basename('b/c/d.e') returns
'd.e'.
Accepts a list of strings and returns the longest string that is a
prefix of all items in the list. Unlike other functions in
os.path, commonprefix works on
arbitrary strings, not just on paths.
Returns the directory part of path, just
like
os.path.split(path)[0].
For example, os.path.basename('b/c/d.e') returns
'b/c'.
Returns True when path
names an existing file or directory, otherwise
False. In other words,
os.path.exists(x)
always returns the same result as
os.access(x,os.F_OK).
Returns a copy of string path, replacing
each substring of the form
"$name"
or
"${name}"
with the value of environment variable
name. The replacement is an empty string
if name does not exist in the environment.
getatime, getmtime, getsize |
|
getatime(path)
getmtime(path)
getsize(path)
|
|
Each of these functions returns an attribute from the result of
os.stat(path),
respectively the attributes st_atime,
st_mtime, and st_size. See
Section 10.2.5 later in this chapter
for more information about these attributes.
Returns True when path
is absolute. A path is absolute when it starts with a slash
/, or, on some non-Unix-like platforms, with a
drive designator followed by os.sep. When
path is not absolute,
isabs returns False.
Returns True when path
names an existing regular file (in Unix, however,
isfile also follows symbolic links), otherwise
False.
Returns True when path
names an existing directory (in Unix, however,
isdir also follows symbolic links), otherwise
False.
Returns True when path
names a symbolic link. Otherwise (always, on platforms that
don't support symbolic links)
islink returns False.
Returns True when path
names a mount point. Otherwise (always, on platforms that
don't support mount points)
ismount returns False.
Returns a string that joins the argument strings with the appropriate
path separator for the current platform. For example, on Unix,
exactly one slash character / separates adjacent
path components. If any argument is an absolute path,
join ignores all previous components. For example:
print os.path.join('a/b', 'c/d','e/f')
# on Unix prints: a/b/c/d/e/f
print os.path.join('a/b', '/c/d', 'e/f')
# on Unix prints: /c/d/e/f The second call to os.path.join ignores its first
argument 'a/b', since its second argument
'/c/d' is an absolute path.
Returns a copy of path with case
normalized for the current platform. On case-sensitive filesystems
(as typical in Unix), path is returned
unchanged. On case-insensitive filesystems, all letters in the
returned string are lowercase. On Windows,
normcase also converts each /
to a \.
Returns a normalized pathname equivalent to
path, removing redundant separators and
path-navigation aspects. For example, on Unix,
normpath returns 'a/b' when
path is any of 'a//b',
'a/./b', or 'a/c/../b'.
normpath converts path separators as appropriate
for the current platform. For example, on Windows, the returned
string uses \ as the separator.
Returns a pair of strings
(dir,base)
such that
join(dir,base)
equals path.
base is the last pathname component and
never contains a path separator. If path
ends in a separator, base is
''. dir is the leading
part of path, up to the last path
separator, shorn of trailing separators. For example,
os.path.split('a/b/c/d') returns the pair
('a/b/c','d').
Returns a pair of strings
(drv,pth)
such that
drv+pth
equals path.
drv is either a drive specification or
''. drv is always
'' on platforms that do not support drive
specifications, such as Unix. For example, on Windows,
os.path.splitdrive('c:d/e') returns the pair
('c:','d/e').
Returns a pair of strings
(root,ext)
such that
root+ext
equals path.
ext either is '', or
starts with a '.' and has no other
'.' or path separator. For example,
os.path.splitext('a/b.c') returns the pair
('a/b','.c').
Calls
func(arg,dirpath,namelist)
for each directory in the tree whose root is directory
path, starting with
path itself. In each such call to
func, dirpath
is the path of the directory being visited, and
namelist is the list of
dirpath's contents as
returned by os.listdir.
func may modify
namelist in-place (e.g., with
del) to avoid visiting certain parts of the tree:
walk further calls func
only for subdirectories remaining in
namelist after
func returns, if any.
arg is provided only for
func's convenience:
walk just receives arg,
and passes arg back to
func each time walk
calls func. A typical use of
os.path.walk is to print all files and
subdirectories in a tree:
import os
def print_tree(tree_root_dir):
def printall(junk, dirpath, namelist):
for name in namelist:
print os.path.join(dirpath, name)
os.path.walk(tree_root_dir, printall, None)
10.2.5 The stat Module
Accessing
items in the tuple returned by os.stat by their
numeric indices is not advisable. The order of the
tuple's 10 items is guaranteed, but using numeric
literals to index into the tuple is not readable. The
stat module supplies attributes whose values are
indices into the tuple returned by os.stat. Table 10-1 lists the attributes of module
stat and the meaning of corresponding items.
Table 10-1. Items of a stat tuple
0
|
ST_MODE
|
Protection and other mode bits
|
1
|
ST_INO
|
Inode number
|
2
|
ST_DEV
|
Device ID
|
3
|
ST_NLINK
|
Number of hard links
|
4
|
ST_UID
|
User ID of owner
|
5
|
ST_GID
|
Group ID of owner
|
6
|
ST_SIZE
|
Size in bytes
|
7
|
ST_ATIME
|
Time of last access
|
8
|
ST_MTIME
|
Time of last modification
|
9
|
ST_CTIME
|
Time of last status change
|
In Python 2.2, os.stat returns an instance of type
stat_result, whose 10 items are also accessible as
attributes named st_mode,
st_ino, and so on—the lowercase versions of
the stat attributes listed in Table 10-1.
For example, to print the size in bytes of file
path, you can use any of:
import os, stat
print os.path.getsize(path)
print os.stat(path)[6]
print os.stat(path)[stat.ST_SIZE]
print os.stat(path).st_size # only in Python 2.2 and later
Time values are in seconds since the epoch, as covered in Chapter 12 (int on most platforms,
float on the Macintosh). Platforms unable to give
a meaningful value for an item use a dummy value for that item.
Module stat also supplies functions that examine
the ST_MODE item to determine the kind of file.
os.path also supplies functions for such tasks,
which operate directly on the file's
path. The functions supplied by
stat are faster when performing several tests on
the same file: they require only one os.stat call
at the start of a series of tests, while the functions in
os.path ask the operating system for the
information at each test. Each function returns
True if mode denotes a
file of the given kind, otherwise False.
- S_ISDIR(
mode)
-
Is the file a directory
- S_ISCHR( mode)
-
Is the file a special device-file of the character kind
- S_ISBLK( mode)
-
Is the file a special device-file of the block kind
- S_ISREG( mode)
-
Is the file a normal file (not a directory, special device-file, and
so on)
- S_ISFIFO( mode)
-
Is the file a FIFO (i.e., a named pipe)
- S_ISLNK(
mode)
-
Is the file a symbolic link
- S_ISSOCK( mode)
-
Is the file a Unix-domain socket
Except for stat.S_ISDIR and
stat.S_ISREG, the other functions are meaningful
only on Unix-like systems, since most other platforms do not keep
special files such as devices in the same namespace as regular files.
Module stat supplies two more functions that
extract relevant parts of a file's
mode
(x[ST_MODE], or
x.st_mode, in the
result x of function
os.stat).
Returns those bits of mode that describe
the kind of file (i.e., those bits that are examined by functions
S_ISDIR, S_ISREG, etc.).
Returns those bits of mode that can be set
by function os.chmod (i.e., the permission bits
and, on Unix-like platforms, other special bits such as the
set-user-id flag).
10.2.6 The filecmp Module
The filecmp module
supplies functionality to compare files and directories.
cmp(f1,f2,shallow=True,use_statcache=False)
|
|
Compares the files named by path strings
f1 and f2. If
the files seem equal, cmp returns
True, otherwise False. If
shallow is true, files are deemed equal if
their stat tuples are equal. If
shallow is false, cmp
reads and compares files with equal stat tuples.
If use_statcache is false,
cmp obtains file information via
os.stat; if
use_statcache is true,
cmp calls statcache.stat
instead. cmp remembers what files have already
been compared and does not repeat comparisons unless some file has
changed, but use_statcache makes
cmp believe that no file ever changes.
cmpfiles(dir1,dir2,common,shallow=True,use_statcache=False)
|
|
Loops on sequence common. Each item of
common is a string naming a file present
in both directories dir1 and
dir2. cmpfiles returns
a tuple with three lists of strings:
(equal,diff,errs).
equal is the list of names of files equal
in both directories, diff the list of
names of files that differ between directories, and
errs the list of names of files that could
not be compared (not existing in both directories or no permission to
read them). Arguments shallow and
use_statcache are just as for function
cmp.
class dircmp(dir1,dir2,ignore=('RCS','CVS','tags'),
hide=('.','..'))
|
|
Creates a new directory-comparison instance object, comparing
directories named dir1 and
dir2, ignoring names listed in
ignore, and hiding names listed in
hide. A dircmp instance
d exposes three methods:
- d.report( )
-
Outputs to sys.stdout a comparison between
dir1 and dir2
- d.report_partial_closure( )
-
Outputs to sys.stdout a comparison between
dir1 and dir2
and their common immediate subdirectories
- d.report_full_closure( )
-
Outputs to sys.stdout a comparison between
dir1 and dir2
and their common subdirectories, recursively
A dircmp instance d
supplies several attributes, computed just in time (i.e., only if and
when needed, thanks to a _ _getattr_ _ special
method) so that using a dircmp instance suffers no
unnecessary overhead. d's
attributes are:
- d.common
-
Files and subdirectories that are in both
dir1 and dir2
- d.common_dirs
-
Subdirectories that are in both dir1 and
dir2
- d.common_files
-
Files that are in both dir1 and
dir2
- d.common_funny
-
Names that are in both dir1 and
dir2 for which os.stat
reports an error or returns different kinds for the versions in the
two directories
- d.diff_files
-
Files that are in both dir1 and
dir2 but with different contents
- d.funny_files
-
Files that are in both dir1 and
dir2 but could not be compared
- d.left_list
-
Files and subdirectories that are in dir1
- d.left_only
-
Files and subdirectories that are in dir1
and not in dir2
- d.right_list
-
Files and subdirectories that are in dir2
- d.right_only
-
Files and subdirectories that are in dir2
and not in dir1
- d.same_files
-
Files that are in both dir1 and
dir2 with the same contents
- d.subdirs
-
A dictionary whose keys are the strings in
common_dirs: the corresponding values are
instances of dircmp for each
subdirectory
10.2.7 The shutil Module
The
shutil module (an abbreviation for shell
utilities) supplies functions to copy files and to remove
an entire directory tree.
Copies the contents of file src, creating
or overwriting file dst. If
dst is a directory, the target is a file
with the same base name as src in
directory dst. copy
also copies permission bits, but not last-access and modification
times.
Like copy, but also copies times of last access
and modification.
Copies the contents only of file src,
creating or overwriting file dst.
copyfileobj(fsrc,fdst,bufsize=16384)
|
|
Copies file object fsrc, which must be
open for reading, to file object fdst,
which must be open for writing. Copies no more than
bufsize bytes at a time if
bufsize is greater than
0. File objects are covered later in this chapter.
Copies permission bits of file or directory
src to file or directory
dst. Both src
and dst must exist. Does not modify
dst's contents, nor any
other aspect of file or directory status.
Copies permission bits and times of last access and modification of
file or directory src to file or directory
dst. Both src
and dst must exist. Does not modify
dst's contents, nor any
other aspect of file or directory status.
copytree(src,dst,symlinks=False)
|
|
Copies the whole directory tree rooted at
src into the destination directory named
by dst. dst
must not already exist, as copytree creates it.
copytree copies each file by using function
copy2. When symlinks is
true, copytree creates symbolic links in the new
tree when it finds symbolic links in the source tree. When
symlinks is false,
copytree follows each symbolic link it finds, and
copies the linked-to file with the link's name. On
platforms that do not have the concept of a symbolic link, such as
Windows, copytree ignores argument
symlinks.
rmtree(path,ignore_errors=False,onerror=None)
|
|
Removes the directory tree rooted at path.
When ignore_errors is true,
rmtree ignores errors. When
ignore_errors is false and
onerror is None, any
error raises an exception. When onerror is
not None, it must be callable with parameters
func, path, and
excp. func is
the function raising an exception (os.remove or
os.rmdir), path the
path passed to func, and
excp the tuple of information that
sys.exc_info( ) returns. If
onerror raises any exception
x, rmtree terminates,
and exception x propagates.
10.2.8 File Descriptor Operations
The os module supplies
functions to handle file
descriptors, integers that the operating system
uses as opaque handles to refer to open files. Python file objects,
covered in the next section, are almost invariably better for
input/output tasks, but sometimes working at file-descriptor level
lets you perform some operation more rapidly or elegantly. Note that
file objects and file descriptors are not interchangeable in any way.
You can get the file descriptor n of a
Python file object f by calling
n=f.fileno(
). You can wrap a new Python file object
f around an open file descriptor
fd by calling
f=os.fdopen(fd).
On Unix-like and Windows platforms, some file descriptors are
preallocated when a process starts: 0 is the file
descriptor for the process's standard input,
1 for the process's standard
output, and 2 for the process's
standard error.
os provides the following functions for working
with file descriptors.
Closes file descriptor fd.
Returns a file descriptor that duplicates file descriptor
fd.
Duplicates file descriptor fd to file
descriptor fd2. If file descriptor
fd2 is already open,
dup2 first closes fd2.
fdopen(fd,mode='r',bufsize=-1)
|
|
Returns a Python file object wrapping file descriptor
fd. mode and
bufsize have the same meaning as for
Python's built-in open, covered
in the next section.
Returns a tuple x
(x is a stat_result
instance in Python 2.2 and later), with information about the file
open on file descriptor fd. Section 10.2.5 earlier in this chapter
covers the format of x's
contents.
Sets the current position of file descriptor
fd to the signed integer byte offset
pos, and returns the resulting byte offset
from the start of the file. how indicates
the reference (point 0): when
how is 0, the reference
is the start of the file; when 1, the current
position; and when 2, the end of the file. In
particular,
lseek(fd,0,1)
returns the current position's byte offset from the
start of the file, without affecting the current position. Normal
disk files support such seeking operations, but calling
lstat on a file that does not support seeking
(e.g., a file open for output to a terminal) raises an exception.
open(file,flags,mode=0777)
|
|
Returns a file descriptor, opening or creating a file named
file. If open creates
the file, it uses mode as the
file's permission bits.
flags is an int,
normally obtained by bitwise ORing one or more of the following
attributes of os:
- O_RDONLY
, O_WRONLY, O_RDWR
-
Opens file for read-only, write-only, or
read-write respectively (mutually exclusive: exactly one of these
attributes must be in flags)
- O_NDELAY
, O_NONBLOCK
-
Opens file in non-blocking (no-delay)
mode, if the platform supports this
- O_APPEND
-
Appends any new data to
file's previous contents
- O_DSYNC
, O_RSYNC, O_SYNC, O_NOCTTY
-
Sets synchronization mode accordingly, if the platform supports this
- O_CREAT
-
Creates file, if
file does not already exist
- O_EXCL
-
Raises an exception if file already exists
- O_TRUNC
-
Throws away previous contents of file
(incompatible with O_RDONLY)
- O_BINARY
-
Open file in binary rather than text mode
on non-Unix platforms (innocuous and without effect on Unix and
Unix-like platforms)
Creates a pipe and returns a pair of file descriptors
(r,w)
open for reading and writing respectively.
Reads up to n bytes from file descriptor
fd and returns them as a string. Reads and
returns
m<n
bytes when only m more bytes are currently
available for reading from the file. In particular, returns the empty
string when no more bytes are currently available from the file,
typically because the file is ended.
Writes all bytes from string str to file
descriptor fd, and returns the number of
bytes written (i.e.,
len(str)).
|