Google


Overview of the Pyrex Language 

This document informally describes the extensions to the Python language made by Pyrex. Some day there will be a reference manual covering everything in more detail.
 

Contents


Python functions vs. C functions

There are two kinds of function definition in Pyrex:

Python functions are defined using the def statement, as in Python. They take Python objects as parameters and return Python objects.

C functions are defined using the new cdef statement. They take either Python objects or C values as parameters, and can return either Python objects or C values.

Within a Pyrex module, Python functions and C functions can call each other freely, but only Python functions can be called from outside the module by interpreted Python code. So, any functions that you want to "export" from your Pyrex module must be declared as Python functions.

Parameters of either type of function can be declared to have C data types, using normal C declaration syntax. For example,

def spam(int i, char *s):
    ...
cdef int eggs(unsigned long l, float f):
    ...
When a parameter of a Python function is declared to have a C data type, it is passed in as a Python object and automatically converted to a C value, if possible. Automatic conversion is currently only possible for numeric types and string types; attempting to use any other type for the parameter of a Python function will result in a compile-time error.

C functions, on the other hand, can have parameters of any type, since they're passed in directly using a normal C function call.

Python objects as parameters and return values

If no type is specified for a parameter or return value, it is assumed to be a Python object. (Note that this is different from the C convention, where it would default to int.) For example, the following defines a C function that takes two Python objects as parameters and returns a Python object:
cdef spamobjs(x, y):
    ...
Reference counting for these objects is performed automatically according to the standard Python/C API rules (i.e. borrowed references are taken as parameters and a new reference is returned).

The name object can also be used to explicitly declare something as a Python object. This can be useful if the name being declared would otherwise be taken as the name of a type, for example,

cdef ftang(object int):
    ...
declares a parameter called int which is a Python object. You can also use object as the explicit return type of a function, e.g.
 
cdef object ftang(object int):
    ...
It is probably a good idea to always be explicit about object parameters in C functions, in the interests of clarity.

C variable and type definitions

The cdef statement is also used to declare C variables, either local or module-level:
cdef int i, j, k
cdef float f, g[42], *h
and C struct, union or enum types:
cdef struct Grail:
    int age
    float volume
cdef union Food:
    char *spam
    float *eggs
cdef enum CheeseType:
    cheddar, edam, 
    camembert
cdef enum CheeseState:
    hard = 1
    soft = 2
    runny = 3
There is currently no special syntax for defining a constant, but you can use an anonymous enum declaration for this purpose, for example,
cdef enum:
    tons_of_spam = 3
Note that the words struct, union and enum are used only when defining a type, not when referring to it. For example, to declare a variable pointing to a Grail you would write
cdef Grail *gp
and not
cdef struct Grail *gp # WRONG
There is also a ctypedef statement for giving names to types, e.g.
ctypedef unsigned long ULong
ctypedef int *IntPtr

Scope rules

Pyrex determines whether a variable belongs to a local scope, the module scope, or the built-in scope completely statically. As with Python, assigning to a variable which is not otherwise declared implicitly declares it to be a Python variable residing in the scope where it is assigned. Unlike Python, however, a name which is referred to but not declared or assigned is assumed to reside in the builtin scope, not the module scope. Names added to the module dictionary at run time will not shadow such names.


Statements and expressions

Control structures and expressions follow Python syntax for the most part. When applied to Python objects, they have the same semantics as in Python (unless otherwise noted). Most of the Python operators can also be applied to C values, with the obvious semantics.

If Python objects and C values are mixed in an expression, conversions are performed automatically between Python objects and C numeric or string types.

Reference counts are maintained automatically for all Python objects, and all Python operations are automatically checked for errors, with appropriate action taken.

Differences between C and Pyrex expression syntax

Pyrex also includes some C operations which have no direct Python equivalent. Some of them are expressed differently in Pyrex than in C.
  • There is no -> operator in Pyrex. Instead of p->x, use p.x

  •  
  • There is no * operator in Pyrex. Instead of *p, use p[0]

  •  
  • There is an & operator, with the same semantics as in C

  •  
  • Type casts are written <type>value , for example:
    • cdef char *p, float *q
      p = <char*>q
  • The null C pointer is called NULL, not 0 (and NULL is a reserved word).

Integer for-loops

You should be aware that a for-loop such as
for i in range(n):
    ...
won't be very fast, even if i and n are declared as C integers, because range is a Python function. For iterating over ranges of integers, Pyrex has another form of for-loop:
for i from 0 <= i < n:
    ...
If the loop variable and the lower and upper bounds are all C integers, this form of loop will be much faster, because Pyrex will translate it into pure C code.

Some things to note about the for-from loop:

  • The target expression must be a variable name.
  • The name between the lower and upper bounds must be the same as the target name.
  • The direction of iteration is determined by the relations. If they are both from the set {<, <=} then it is upwards; if they are both from the set {>, >=} then it is downwards. (Any other combination is disallowed.)
Like other Python looping statements, break and continue may be used in the body, and the loop may have an else clause.


Error return values

If you don't do anything special, a function declared with cdef that does not return a Python object has no way of reporting Python exceptions to its caller. If an exception is detected in such a function, a warning message is printed and the exception is ignored.

If you want a C function that does not return a Python object to be able to propagate exceptions to its caller, you need to declare an exception value for it. Here is an example:

cdef int spam() except -1:
    ...
With this declaration, whenever an exception occurs inside spam , it will immediately return with the value -1. Furthermore, whenever a call to spam returns -1, an exception will be assumed to have occurred and will be propagated.

When you declare an exception value for a function, you should never explicitly return that value. If all possible return values are legal and you can't reserve one entirely for signalling errors, you can use an alternative form of exception value declaration:

cdef int spam() except? -1:
    ...
The "?" indicates that the value -1 only indicates a possible error. In this case, Pyrex generates a call to PyErr_Occurred if the exception value is returned, to make sure it really is an error.

There is also a third form of exception value declaration:

cdef int spam() except *:
    ...
This form causes Pyrex to generate a call to PyErr_Occurred after every call to spam, regardless of what value it returns. If you have a function returning void that needs to propagate errors, you will have to use this form, since there isn't any return value to test.

Some things to note:

  • Currently, exception values can only declared for functions returning an integer, float or pointer type, and the value must be a literal , not an expression (although it can be negative). The only possible pointer exception value is NULL. Void functions can only use the except * form.

  •  
  • The exception value specification is part of the signature of the function. If you're passing a pointer to a function as a parameter or assigning it to a variable, the declared type of the parameter or variable must have the same exception value specification (or lack thereof). Here is an example of a pointer-to-function declaration with an exception value:
    •  
      int (*grail)(int, char *) except -1
  • You don't need to (and shouldn't) declare exception values for functions which return Python objects. Remember that a function with no declared return type implicitly returns a Python object.


External declarations

By default, C functions and variables declared at the module level are local to the module (i.e. they have the C static storage class). They can also be declared extern to specify that they are defined elsewhere, for example:
cdef extern int spam_counter
cdef extern void order_spam(int tons)

Referencing C header files

When you use an extern definition on its own as above, Pyrex includes a declaration for it in the generated C file. This can cause problems if the declaration doesn't exactly match the declaration that will be seen by other C code. If you're wrapping an existing C library, for example, it's important that the generated C code is compiled with exactly the same declarations as the rest of the library.

To achieve this, you can tell Pyrex that the declarations are to be found in a C header file, like this:

cdef extern from "spam.h":
    int spam_counter
    void order_spam(int tons)
The cdef extern from clause does three things:
  1. It directs Pyrex to place a #include statement for the named header file in the generated C code.

  2.  
  3. It prevents Pyrex from generating any C code for the declarations found in the associated block.

  4.  
  5. It treats all declarations within the block as though they started with cdef extern.
It's important to understand that Pyrex does not itself read the C header file, so you still need to provide Pyrex versions of any declarations from it that you use. However, the Pyrex declarations don't always have to exactly match the C ones, and in some cases they shouldn't or can't. In particular:
  1. Don't use const. Pyrex doesn't know anything about const, so just leave it out. Most of the time this shouldn't cause any problem, although on rare occasions you might have to use a cast. 1

  2.  
  3. Leave out any platform-specific extensions to C declarations such as __declspec().

  4.  
  5. If the header file declares a big struct and you only want to use a few members, you can just declare the members you're interested in.

  6.  
  7. If the header file uses typedef names such as size_t to refer to platform-dependent flavours of numeric types, you will need a corresponding ctypedef statement, but you don't need to match the type exactly, just use something of the right general kind (int, float, etc). For example,
    1. ctypedef int size_t
    will work okay whatever the actual size of a size_t is (provided the header file defines it correctly).
     
  8. If the header file uses macros to define constants, translate them into a dummy enum declaration.

  9.  
  10. If the header file defines a function using a macro, declare it as though it were an ordinary function, with appropriate argument and result types.
A few more tricks and tips:
  • If you want to include a C header because it's needed by another header, but don't want to use any declarations from it, put pass in the extern-from block:
      cdef extern from "spam.h":
          pass
  • If you want to include some external declarations, but don't want to specify a header file (because it's included by some other header that you've already included) you can put * in place of the header file name:
cdef extern from *:
    ...

Styles of struct, union and enum declaration

There are two main ways that structs, unions and enums can be declared in C header files: using a tag name, or using a typedef. There are also some variations based on various combinations of these.

It's important to make the Pyrex declarations match the style used in the header file, so that Pyrex can emit the right sort of references to the type in the code it generates. To make this possible, Pyrex provides two different syntaxes for declaring a struct, union or enum type. The style introduced above corresponds to the use of a tag name. To get the other style, you prefix the declaration with ctypedef , as illustrated below.

The following table shows the various possible styles that can be found in a header file, and the corresponding Pyrex declaration that you should put in the cdef exern from block. Struct declarations are used as an example; the same applies equally to union and enum declarations.

Note that in all the cases below, you refer to the type in Pyrex code simply as Foo , not struct Foo.
 
  C code Possibilities for corresponding Pyrex code Comments
1 struct Foo {
  ...
};
cdef struct Foo:
  ...
Pyrex will refer to the type as struct Foo in the generated C code.
2 typedef struct {
  ...
} Foo;
ctypedef struct Foo:
  ...
Pyrex will refer to the type simply as Foo in the generated C code.
3 typedef struct foo {
  ...
} Foo;
cdef struct foo:
  ...
ctypedef foo Foo #optional
If the C header uses both a tag and a typedef with different names, you can use either form of declaration in Pyrex (although if you need to forward reference the type, you'll have to use the first form).
ctypedef struct Foo:
  ...
4 typedef struct Foo {
  ...
} Foo;
cdef struct Foo:
  ...
If the header uses the same name for the tag and the typedef, you won't be able to include a ctypedef for it -- but then, it's not necessary.

Accessing Python/C API routines

One particular use of the cdef extern from statement is for gaining access to routines in the Python/C API. For example,
cdef extern from "Python.h":
    object PyString_FromStringAndSize(char *s, int len)
will allow you to create Python strings containing null bytes.


Public Declarations

You can make C variables and functions defined in a Pyrex module accessible to external C code (or another Pyrex module) using the public keyword, as follows:
cdef public int spam # public variable declaration

cdef public void grail(int num_nuns): # public function declaration
    ...

If there are any public declarations in a Pyrex module, a .h file is generated containing equivalent C declarations for inclusion in other C code.


Extension Types

As well as creating normal user-defined classes with the Python class statement, Pyrex also lets you create new built-in Python types, known as extension types. You define an extension type using the cdef class statement. Here's an example:
cdef class Shrubbery:

    cdef int width, height

    def __init__(self, w, h):
        self.width = w
        self.height = h

    def describe(self):
        print "This shrubbery is", self.width, \
            "by", self.height, "cubits."

As you can see, a Pyrex extension type definition looks a lot like a Python class definition. Within it, you use the def statement to define methods that can be called from Python code. You can even define many of the special methods such as __init__ as you would in Python.

The main difference is that you can use the cdef statement to define attributes. The attributes may be Python objects (either generic or of a particular extension type), or they may be of any C data type. So you can use extension types to wrap arbitrary C data structures and provide a Python-like interface to them.

Some other differences between extension types and Python classes:

  • The set of attributes of an extension type is fixed at compile time; you can't add attributes to an extension type instance at run time simply by assigning to them, as you could with a Python class instance. (You can subclass the extension type in Python and add attributes to instances of the subclass, however.)

  •  
  • Attributes defined with cdef are only accessible from Pyrex code, not from Python code. (A way of defining Python-accessible attributes is planned, but not yet implemented. In the meantime, use accessor methods.)

  •  
  • To access the cdef-attributes of an extension type instance, the Pyrex compiler must know that you have an instance of that type, and not just a generic Python object. It knows this already in the case of the "self" parameter of the methods of that type, but in other cases you will have to tell it by means of a declaration. For example,
    • def widen_shrubbery(Shrubbery sh, extra_width):
          sh.width = sh.width + extra_width
  • Some of the __xxx__ special methods behave differently from their Python counterparts, and some of them are named differently as well. See here for more information.
  • Special methods of extension types

    This section has a whole separate page devoted to it.

    Subclassing extension types

    Pyrex extension types can be subclassed in Python. They cannot currently inherit from other built-in or extension types, but this may be possible in a future version.

    Forward-declaring extension types

    Extension types can be forward-declared, like struct and union types. This will be necessary if you have two extension types that need to refer to each other, e.g.
    cdef class Shrubbery # forward declaration

    cdef class Shrubber:
        cdef Shrubbery work_in_progress

    cdef class Shrubbery:
        cdef Shrubber creator

    External extension types

    Extension types can be declared extern. In conjunction with the cdef extern from statement, and together with a slight addition to the extension class syntax, this provides a way of gaining access to the internals of pre-existing Python objects. For example, the following declarations will let you get at the C-level members of the built-in complex object.
    cdef extern from "complexobject.h":

        struct Py_complex:
            double real
            double imag

        ctypedef class complex [type PyComplex_Type, object PyComplexObject]:
            cdef Py_complex cval

    Note the use of ctypedef class. This is because, in the Python header files, the PyComplexObject struct is declared with
    ctypedef struct {
        ...
    } PyComplexObject;
    Here is an example of a function which uses the complex type declared above.
    def spam(complex c):
        print "Real:", c.cval.real
        print "Imag:", c.cval.imag
    When declaring an external extension type, you don't declare any methods. Declaration of methods is not required in order to call them, because the calls are Python method calls. Also, as with structs inside a cdef extern from block, you only need to declare those C members which you wish to access.

    Name specification clause

    The part of the class declaration in square brackets is a special feature only available for extern extension types. The reason for it is that Pyrex needs to know the C names of the struct representing an instance of the type, and of the Python type-object for the type. It knows these names for non-extern extension types, because it generates them itself, but in the case of an extern extension type, you need to tell it what they are.

    Both the type and object parts are optional. If you don't specify the object part, Pyrex assumes it's the same as the name of the class. For instance, the class declaration could also be written

    class PyComplexObject [type PyComplex_Type]:
        ...
    but then you would have to write the function as
    def spam(PyComplexObject c):
        ...
    You can also omit the type part of the specification, but this will severely limit what you can do with the type, because Pyrex needs the type object in order to perform type tests. A type test is required every time an argument is passed to a Python function declared as taking an argument of that type (such as spam() above), or a generic Python object is assigned to a variable declared to be of that type. Without access to the type object, Pyrex won't allow you to do any of those things. Supplying the type object name is therefore recommended if at all possible.

    Final remarks

    There is one more subtlety to the above example that should be mentioned. By calling the extension type "complex", we're creating a module-level variable called "complex" that shadows the built-in name "complex". This isn't a problem, because they both have the same value, i.e. the type-object of the built-in complex type. In the Pyrex module, the name "complex" can be used both as a constructor of complex objects, and as a type name for declaring variables and arguments of type complex.

    If we call the class something else, however, such as "PyComplexObject" as in the second version above, we would have to use "PyComplexObject" as the type name. Both "complex" and "PyComplexObject" would work as constructors ("complex" because it's a built-in name), but only "PyComplexObject" would work as a type name for declaring variables and arguments.


    Limitations

    Pyrex is not quite a full superset of Python. The following restrictions apply:
  • Function definitions (whether using def or cdef) cannot be nested within other function definitions.

  •  
  • Class definitions can only appear at the top level of a module, not inside a function.

  •  
  • The import * form of import is not allowed anywhere (other forms of the import statement are fine, though).

  •  
  • Generators cannot be defined in Pyrex.
  • The above restrictions will most likely remain, since removing them would be difficult and they're not really needed for Pyrex's intended applications.

    There are also some temporary limitations which may eventually be lifted:

  • Class and function definitions cannot be placed inside control structures.

  •  
  • In-place operators (+=, etc) are not yet supported.

  •  
  • List comprehensions are not yet supported.
  • There are probably also some other gaps which I can't think of at the moment.


    Footnotes

    1. A problem with const could arise if you have something like
    cdef extern from "grail.h":
      char *nun
    where grail.h actually contains
    extern const char *nun;
    and you do
    cdef void oral(char *s):
      #something that doesn't change s
    ...
    oral(nun)
    which will cause the compiler to complain. You can work around it by casting away the constness:
    oral(<char *>nun)