If you are already familiar with SQL or other RDBMS packages, you probably already have a solid understanding of many of the relational database concepts put forth in this chapter. However, each RDBMS handles tables differently at the system level. This section takes a closer look at tables as they are implemented in PostgreSQL.
PostgreSQL defines a series of system columns in all tables, which are normally invisible to the user (e.g., they will not be shown by queries unless explicitly requested). These columns contain meta-data about the content of the table's rows. Many of these contain data that can help to differentiate between tuples (an individual state of a row) when working with transaction blocks. (See Chapter 7 for more about transactions.)
As a result of these system-defined columns, in addition to the user-defined columns of a table, any inserted row will have values in each of the columns described in Table 3-25.
Table 3-25. System columns
Column | Description |
---|---|
oid (object identifier) | The unique object identifier of a row. PostgreSQL automatically adds this 4-byte number to all rows. It is never re-used within the same table. |
tableoid (table object identifier) | The oid of the table that contains a row. The name and oid of a table are related by the pg_class system table. |
xmin (transaction minimum) | The transaction identifier of the inserting transaction of a tuple. |
cmin (command minimum) | The command identifier, starting at 0, associated with the inserting transaction of a tuple. |
xmax (transaction maximum) | The transaction identifier of a tuple's deleting transaction. If a tuple is visible (has not been deleted) this is set to zero. |
cmax (command maximum) | The command identifier associated with the deleting transaction of a tuple. Like xmax, if a tuple is visible, this is set to zero. |
ctid (tuple identifier) | The identifier which describes the physical location of the tuple within the database. A pair of numbers are represented by the ctid: the block number, and tuple index within that block. |
As described in the Section called Understanding Tables," each database consists of tables, and each table consists of at least one named column. These tables may contain rows, but do not necessarily at any given time.
One table management concern can be how to distinguish between two rows whose column values are identical. A very useful PostgreSQL feature is that every row has its own object identifier number, or OID, which is unique within that table. In other words, no two rows within the same table will ever have the same OID. This means that even if a table were designed in such a way that two rows might be identical, there is still a programmatic way to discern between them: via the OID. This is demonstrated in Example 3-31.
Example 3-31. Differentiating rows via the OID
testdb=# SELECT * FROM my_list; todos ---------------------------------- Correct redundancies in my list. Correct redundancies in my list. (2 rows) testdb=# SELECT *,oid FROM my_list; todos | oid ----------------------------------+--------- Correct redundancies in my list. | 3391263 Correct redundancies in my list. | 3391264 (2 rows) testdb=# DELETE FROM my_list testdb-# WHERE oid = 3391264; DELETE 1 testdb=# SELECT *,oid FROM my_list; todos | oid ----------------------------------+--------- Correct redundancies in my list. | 3391263 (1 row)
Before you start creating any tables, we suggest that you take some extra time to plan out your intended database objects by deciding the names, types, and purposes of all columns within each table. This can help you to be consistent with table naming structures, which in turn helps you more easily read and construct "legible" queries and statements.
In addition to taking the somewhat semantic considerations just described (names, types, and purposes), it is important to be sure that each table's relationship to each other table is clearly defined. This can be an important point of table design, as you do not wish to redundantly represent large amounts of data, nor do you want to end up omitting important data from one table by misunderstanding the needs that must be satisfied by your implementation.
As an example, consider again the Book Town books table, from Table 3-1. This table holds an internal Book Town identification number for each book, the title, author identification number, and a subject identification number. Notice that rather than storing the name of the author, and rather than storing a text representation of the subject of the book, simple identification integers are stored. These identification numbers are used to create relationships to two other tables: the authors, and subjects tables, whose partial contents are shown in Table 3-26 and Table 3-27.
Table 3-26. The authors table
id | last_name | first_name |
---|---|---|
1809 | Geisel | Theodor Seuss |
1111 | Denham | Ariel |
15990 | Bourgeois | Paulette |
2031 | Brown | Margaret Wise |
25041 | Margery Williams | Bianco |
16 | Alcott | Louisa May |
115 | Poe | Edgar Allen |
Table 3-27. The subjects table
id | subject | location |
---|---|---|
0 | Arts | Creativity St |
2 | Children's Books | Kids Ct |
3 | Classics | Academic Rd |
4 | Computers | Productivity Ave |
6 | Drama | Main St |
9 | Horror | Black Raven Dr |
15 | Science Fiction | Main St |
By keeping the author and subject-specific data separate from the books table, the data is stored more efficiently. When multiple books need to be correlated with a particular subject, only the subject_id needs to be stored, rather than all of the data associated with that subject. This also makes for simpler maintenance of data associated with book subjects, such as the location in the store. Such data can be updated in a single, small table, rather than having to update all affected book records with such a modification. The same general principle applies to the authors table, and its relationship to the books table via the author_id.
Thoughtful planning can also help to avoid mistakes in choosing appropriate data types. For example, in the editions table, ISBN numbers are associated with Book Town book identification numbers. At first glance, it might seem that the ISBN number could be represented with a column of type integer. The design oversight in this case would be that not only can ISBNs sometimes contain character data, but a value of type integer would lose any leading zeroes in the ISBN (e.g., 0451160916 would become 451160916).
For all of these reasons, good table design is not an issue to be overlooked in database administration.