Extending PostgreSQL

Practical PostgreSQL
Prev	Chapter 7. Advanced Features	Next

Extending PostgreSQL

PostgreSQL users have the option of extending the set of functions and operators available. If you have a common SQL or programmatic routine, custom functions can be an effective way to more succinctly and efficiently accomplish your tasks. Likewise, custom operators can be created to call these functions (or existing built-in functions) in order to make more efficient and legible SQL statements.

Functions and operators each exist as database objects, and are thus tied to a specific database. Creating a function while connected to the booktown database, for example, creates a function object available only to users connected to booktown.

If you intend to re-use some general functions or operators in multiple databases, you should create them in the template1 database. This will clone the function and operator objects from template1 when a new database is created.

The following sections cover the creation, use, and removal of custom functions and operators.

Creating New Functions

PostgreSQL supports a variation of the SQL99 CREATE FUNCTION command. It is not directly compatible with the standard, but it does allow for a variety of means to extend PostgreSQL by creating your own customized functions (see Chapter 5 for more on functions in general).

Here is the syntax for CREATE FUNCTION:

  CREATE FUNCTION name ( [ argumenttype [, ...] ] )
                  RETURNS returntype
                  AS 'definition'
                  LANGUAGE 'languagename'
                  [ WITH ( attribute [, ...] ) ]

CREATE FUNCTION name ( [ argumenttype [, ...] ] )

name is the name of the new function to be created. The parenthetically grouped argumenttype expression defines the data types of the arguments that the function requires when called, separated by commas. Leaving this expression blank results in a function which accepts no arguments (though the parentheses are still required in both definition and usage).

RETURNS returntype

The returntype is the single data type of the value which is returned by the function.

AS 'definition'

definition is the programmatic definition of the function itself. For procedural languages, such as PL/pgSQL, this is the literal code used to define the function. For compiled C functions, this is the absolute system path which to links the file containing the object code.

LANGUAGE 'languagename'

languagename is the name of the language which the function is written in. The language may be any supported procedural language (such as plpgsql, or plperl, assuming it has been added to the database), C, or SQL.

[ WITH ( attribute [, ...] ) ]

As of PostgreSQL 7.1.x, two possible values exist for attribute ; iscachable, and isstrict:

iscachable: This attribute lets the optimizer know if it is acceptable to pre-evaluate a call to a function with arguments that have already been evaluated once. This can be useful for functions which are programmatically expensive, but not terribly dynamic (e.g., functions with which the same input arguments will invariably return the same results).
isstrict: Causes the function to always return a NULL value whenever any of its arguments are NULL values. The function is actually not executed in such a case, when isstrict is defined.

Note: Functions may be overloaded (i.e., share the same name as an existing function) by defining them as accepting different arguments. In this way you can have a single function name that can perform several operations, depending on the number and type of the input arguments.

Creating SQL functions

The simplest kind of function to add to PostgreSQL is a pure SQL function, as it requires no external programming knowledge or experience. A SQL function is merely defined as a standard SQL statement with support for inline arguments passed as positional parameters.

A positional parameter is a reference used in a SQL function definition to one of the calling arguments. It is called positional because it is referenced by the order in which the arguments are passed to the function. The syntax of a positional parameter is a dollar sign followed by a number (e.g., $1). The number represents the ordered position in the arguments passed to the function, starting with 1.

Example 7-46 creates a function named isbn_to_title, which returns the title of a book when passed the ISBN number of the book. It accepts a single argument of type text, and returns its result as the same type.

Example 7-46. Creating a SQL function

booktown=# CREATE FUNCTION isbn_to_title(text) RETURNS text
booktown-#                 AS 'SELECT title FROM books
booktown'#                                  JOIN editions AS e (isbn, id)
booktown'#                                  USING (id)
booktown'#                                  WHERE isbn = $1'
booktown-#                 LANGUAGE 'SQL';
CREATE

Notice the $1 in Example 7-46; when the select statement executes, the complete value of the first argument to isbn_to_title replaces this positional parameter. Notice that the positional parameter does not need to be bound by single quotes, as the quotes are part of the argument passed. Each other element of the function definition is either a standard SQL keyword or identifier.

The CREATE message indicates that the function was successfully created. Example 7-47 calls the isbn_to_title function with a single text parameter of 0929605942. The title, from the books table, returns that ISBN number as per the SQL defined in Example 7-46.

Example 7-47. Using a SQL function

booktown=# SELECT isbn_to_title('0929605942');
    isbn_to_title
---------------------
 The Tell-Tale Heart
(1 row)

Once created, any user may access the function, presuming that they have the permission to execute the SQL involved. For example, the isbn_to_title function requires read access to the editions and books tables (see Chapter 10 for more information on user privileges).

Creating C functions

PostgreSQL is written in C and can dynamically load compiled C code for use on the fly, without recompilation of the base software. Only superusers are allowed to use CREATE FUNCTION to link to a C function, as functions can make system-level calls and potentially provide a security hole.

Documenting the entire PostgreSQL API is outside the scope of this book, but for an experienced programmer, some basic C functions can very easily be developed, compiled and linked through loadable shared object code.

The GNU C Compiler, gcc, supports a flag called -shared, which creates a dynamically loadable piece of object code. The most basic syntax to create such a function with gcc is:

$ gcc -shared input.c -o output.so

In this syntax, input.c is the name of the file containing the C code to be compiled, and output.so is the shared object file to build.

Example 7-48 is an extremely simple pair of C functions. They define two C functions called is_zero(int) and is_zero_two(int, int). The first function returns true (1) if the passed argument to it is 0; otherwise, it returns false (0). The second function returns true if at least one of the passed arguments is 0.

Example 7-48. is_zero.c, a simple C function

/* is_zero.c
 * A pair of simple zero-checking functions.
 */

int is_zero(int);
int is_zero_two(int, int);

int is_zero(int incoming) {
  /* Return true only if the incoming value is 0. */
  if (incoming == 0) return 1;
  else return 0;
}

int is_zero_two(int left, int right) {
  /* Return true only if either of the values are 0. */
  if (left == 0 || right == 0) return 1;
  else return 0;
}

Warning

No PostgreSQL-specific headers are included in this extremely basic example. They are not required in this case because of the obvious parallels between the example C and SQL data types. For more advanced examples of the internal PostgreSQL API and data structures, check the contrib directory within the PostgreSQL source path.

Example 7-49 compiles the file is_zero.c, with the -shared flag, and outputs the shared object code to a file called is_zero.so. The location of that file is then passed as the definition of the function to the CREATE FUNCTION command, and the function type is defined as C.

Example 7-49. Creating a C function

[jworsley@cmd ~]$ gcc -shared is_zero.c -o is_zero.so
[jworsley@cmd ~]$ psql -U manager booktown
Welcome to psql, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

booktown=# CREATE FUNCTION is_zero(int4) RETURNS Boolean 
booktown-#                 AS '/home/jworsley/is_zero.so' LANGUAGE 'C';
CREATE

The CREATE FUNCTION command in Example 7-49 creates a function named is_zero(), which accepts a single argument of type int4 and returns a value of type boolean. This function references the C function is_zero(int) implemented in the object code located at /home/jworsley/is_zero.so (since C has no Boolean type, PostgreSQL must transform the integer value returned by the function to a Boolean value). In this case, 0 is translated to false, and 1 is translated to true.

By default, PostgreSQL looks for a function in the shared object code with the same name as the function being created within PostgreSQL. This works well for the is_zero(integer) function, as its names matches the compiled symbol name of the is_zero(int) function within the file is_zero.so. In order to avoid a C name-collision with is_zero(int), the second function in the shared object is defined as is_zero_two(int, int). To load this function into PostgreSQL with the same name (as an overloaded function, with two arguments instead of one), pass the literal C function name (also called the link symbol) as a second string constant following the location of the shared object filename.

This name should not contain parentheses or arguments, and should be separated from the filename definition by a comma, as in this syntax:

CREATE FUNCTION name ( [ argumenttype [, ...] ] )
                RETURNS returntype
                AS 'definition', 'link_symbol'
                LANGUAGE 'C'
                [ WITH ( attribute [, ...] ) ]

Example 7-50 loads the same shared object code, but specifies the function symbol name as is_zero_two so that it knows which function to use for this overloaded function.

Example 7-50. Overloading a C function

booktown=# CREATE FUNCTION is_zero(int4, int4) RETURNS Boolean
booktown-#                 AS '/home/jworsley/is_zero.so', 'is_zero_two'
booktown-#                 LANGUAGE 'C';
CREATE

Like a SQL function, any user may call the C function once it has been created. As C functions can make direct modifications to the filesystem (where permissions allow) and affect other system level events, care must be taken in designing functions free from potential misuse. Example 7-51 makes several calls to the is_zero function defined in Example 7-49, and to its overloaded function, created in Example 7-50.

Example 7-51. Using a C function

booktown=# SELECT is_zero(0) AS zero, is_zero(1) AS one,
booktown-#        is_zero(6, 0) AS one_zero, is_zero(11,12) AS neither;
 zero | one | one_zero | neither
------+-----+----------+---------
 t    | f   | t        | f
(1 row)

Destroying functions

Functions may be destroyed either by their owner or by a superuser with the DROP FUNCTION SQL command. Here is the syntax for DROP FUNCTION:

  DELETE FUNCTION name ( [ argumenttype [, ...] ] );

For example, Example 7-52 drops the isbn_to_title(text) function. Note that the argument types are required to be specified, even though the function itself is not overloaded.

Example 7-52. Dropping a function

booktown=# DROP FUNCTION isbn_to_title(text);
DROP

The DROP server message indicates that the function was successfully dropped. Like most DROP SQL commands, this action is permanent, so be sure that you wish to drop your function before you execute this command.

Creating New Operators

PostgreSQL allows the creation of custom operators in addition to custom functions. Operators are sometimes called syntactic sugar for functions. This is because, technically, an operator is just an alternate syntax for an existing function. For example, the addition operator (+) actually calls one of several built-in functions, including numeric_add(). For example:

booktown=# SELECT 1 + 2 AS by_operator, numeric_add(1,2) AS by_function;
 by_operator | by_function
-------------+-------------
           3 |           3
(1 row)

An operator definition defines what data types it operates on, and which side of the operator to expect a value of the given data type to be found on (left, right, or both). It also defines the function that is called, passing the values that are being operated on as arguments to that function.

Creating an operator

The CREATE OPERATOR SQL command creates a new operator. Here is the syntax for CREATE OPERATOR:

  CREATE OPERATOR name ( PROCEDURE = functionname
                  [, LEFTARG = type1 ]
                  [, RIGHTARG = type2 ]
                  [, COMMUTATOR = commutatorop ]
                  [, NEGATOR = negatorop ]
                  [, RESTRICT = restrictproc ]
                  [, JOIN = joinproc ]
                  [, HASHES ]
                  [, SORT1 = leftsortop ]
                  [, SORT2 = rightsortop ] )

In this syntax, name is the name of the new operator, and functionname is the name of the function to be called by the operator. The remaining clauses are all optional, though at least one of the LEFTARG or RIGHTARG clauses must be applied. Note that the operator name may only consist of the following accepted characters:

 + - * / < > = ~ ! @ # % ^ & | ` ? $

Note: See the reference entry on CREATE OPERATOR for more information on the remaining optional clauses, and further restrictions on the operator name.

Specifying only the LEFTARG data type creates an operator that operates only on a value (e.g., a constant or identifier) to its left. Conversely, specifying only the RIGHTARG data type creates an operator that operates only on a value to its right. Specifying both a LEFTARG and RIGHTARG type results in an operator that operates on a value to both the left and right.

The factorial operator (!) is an example of a built-in operator that affects values to its left, while the addition operator (+) is a good example of an operator that affects values both on the left and right of the operator. Note that the functionname must accept the appropriate number of arguments as implied by the use of the LEFTARG and RIGHTARG keywords (either one or two arguments). Furthermore, the function's accepted argument types should match the relevant operator types defined by CREATE OPERATOR for each respective value to be operated on.

Example 7-53 creates an operator named !#, which passes the value to its left to the is_zero() function (defined in Example 7-49). This means that the syntax of value !# will be effectively identical to using the functional syntax of is_zero(value).

Example 7-53. Creating a user-defined operator

booktown=# CREATE OPERATOR !# (PROCEDURE = is_zero,
booktown(#                    LEFTARG = integer);
CREATE

The CREATE message returned by Example 7-53 indicates that the operator was successfully created. As with functions, any user connected to the database will be able to use the new operator. The operator is owned by the user who creates it, meaning that no other user may remove it (unless they have superuser rights). Example 7-54 demonstrates the use of the new !# operator to check for books that are out of stock in Book Town's stock table.

Example 7-54. Using a user-defined operator

booktown=# SELECT * FROM stock WHERE stock !#;
    isbn    | cost  | retail | stock
------------+-------+--------+-------
 0394900014 | 23.00 |  23.95 |     0
 0451198492 | 36.00 |  46.95 |     0
 0451457994 | 17.00 |  22.95 |     0
(3 rows)

Overloading an operator

Operators may become overloaded in much the same way as functions. This means that an operator is created with the same name as an existing operator, but affects a different set of defined types. More than one operator may have the same name, although two operators may not share the same name if they accept the same argument definitions. As long as a function exists to accept the number and type of arguments implied by the type of operator defined, though, the operator may be overloaded.

Example 7-53 overloads the !# operator. The first CREATE OPERATOR statement creates a similar operator to the one created in Example 7-53. However, it specifies a RIGHTARG clause rather than a LEFTARG clause, resulting in a version of the operator with the same name which operates on an argument of type integer to the right of the operator, rather than the left. The second statement creates a third variant of the !# operator, which operates on both an argument to the left and right of the operator, simultaneously.

Example 7-55. Overloading a user-defined operator

booktown=# CREATE OPERATOR !# (PROCEDURE = is_zero,
booktown(#                     RIGHTARG = integer);
CREATE
booktown=# CREATE OPERATOR !# (PROCEDURE = is_zero,
booktown(#                     LEFTARG = integer,
booktown(#                     RIGHTARG = integer);
CREATE

Example 7-55 overloads the !# operator with the same is_zero() function because the function itself was overloaded in Example 7-50 in the Section called Creating C functions," earlier in this chapter. As there are two copies of the is_zero() function--one that accepts one argument, and one that accepts two--the !# operator can be safely overloaded to accept both a single argument (on either the left or right side), as well as to accept parameters from both sides.

Once overloaded, PostgreSQL's parser can correctly interpret each defined operator in your SQL statements. Example 7-56 demonstrates three uses of the same operator, with different left and right arguments. Each use is valid, as the operator was overloaded in Example 7-55.

Example 7-56. Using an overloaded operator

booktown=# SELECT isbn, stock FROM stock
booktown-#                    WHERE stock !#
booktown-#                    AND !# stock
booktown-#                    AND stock !# stock;
    isbn    | stock
------------+-------
 0394900014 |     0
 0451198492 |     0
 0451457994 |     0
(3 rows)

Dropping an operator

An operator can be dropped with the DROP OPERATOR command. An operator may only be dropped by the user who created it, or by a PostgreSQL superuser.

Warning

The DROP OPERATOR command applies to built-in operators as well as user-defined operators, so take care with this command if executing it as a superuser!

As operators are defined by their arguments as well as their name, DROP OPERATOR requires that you specify the left and right argument types of the operator. If there is no type for a given side, specify the NONE keyword.

Here is the syntax for DROP OPERATOR:

  DROP OPERATOR name ( { lefttype | NONE } , { righttype | NONE } )

Example 7-57 drops the variant of the !# operator that affects both left and right arguments.

Example 7-57. Dropping an operator

booktown=# DROP OPERATOR !# (integer, integer);
DROP

Example 7-58 uses very similar syntax to Example 7-57, but drops the overloaded operator that affects only arguments to the right of the operator.

Example 7-58. Dropping an overloaded operator

booktown=# DROP OPERATOR !# (NONE, integer);
DROP

Prev	Home	Next
Transactions and Cursors	Up	Administrating PostgreSQL