PEP: Base multidimensional array type for Python core Version: 1.1 Last Modified: 23-Jun-2006 Author: Travis Oliphant , Karol Langner Status: Draft Type: Standards Track Created: 23-Sep-2005 Python-Version: 2.5 Introduction Multidimensional arrays are often used in scientific and engineering programming, but they have uses in other areas as well, as evidenced by the popularity of spreadsheets and image-processing applications. Python, however, has no default multidimensional array object. Since 1995, Numeric has been filling the need for multidimensional arrays for many users as a standard optional Python library. The only thing holding it back from the standard library has been stability, and, in particular, a desire on the part of the user community to have a faster release cycle for the array. While some changes have occurred in the functionality of the array object as it has progressed from Numeric to numarray to scipy core (numpy), what has not changed significantly is the interface, which allows Python users to interact with an array of bytes described by a certain shape, memory layout, and type description. In fact, this interface has recently been formalized by the creation of a description called the "array_interface", which any Python object can consume and/or export. To improve the usefullness of array_interface, however, it would be usefull to have a mechanism via which objects could use the interface quickly on the C-level. It would be beneficial to the community as a whole to place this common interface into Python itself. This would allow a wider Python community to quickly interact with and use the data in multidimension arrays without requiring or depending on a third-party package. It would also allow Python to work seamlessly with more capable multidimensional array objects that scientific users could install. The PEP proposes adding a new builtin type to the Python language, a generic multi-dimensional array type (basearray), and an associated type dsecribing the type of data it carries (datatype). The basearray type would have a C-structure similar to that which has been constant in Numeric and few other features. Together, these objects would implement the array interface specification introduced to Numeric and numarray in April 2005, and encourage the use of this interface both in extensions and Python code in general. Purpose of basearray The obvious purpose of basearray is to provide a base multidimensional array type for the Python distribution. This, however, is also the means by which other goals, more important in the long run, can be achieved. By providing a "minimal" base type, more capable array types can be created as subtypes of basearray (such as that contained in numpy). Other Python users could write extensions modules that enhance the basic structure of basearray, without having to install an entire scientific computing package. Standardized allocation and interpretation of memory for multidimensional arrays, combined with a generic way to share information about arrays and the memory they are stored in (such as the array_interface), will allow all objects dealing with multidimensional data to communicate, even if they are not subtypes of basearray. For example, the inclusion of basearray would allow extension modules such as PyOpenGL, wxPython, and PIL to make use of array-like data stored in a multidimensional array object, without making unnecessary copies. Finally, the addition of basearray would pave the way for a more capable multidimensional array object to be gradually added to the Python distribution, if specific features were deemed desireable by the broader community. Code Description To facilitate the acceptance of this PEP, the proposed basearray type does not have a fully-filled type-object structure. In other words, basearray is above all a place-holder and base-type, of which other, more capable array objects can be subtypes of. Besides serving as a base type, the object exports and consumes the array interface. Alongside basearray and datatype (a descriptor of the type an array carries), an array iterator type is defined to facilitate some procedures. There are also two auxialliary structures and a number of C-API functions. Ultimately, there are two files to be added: basearray.c and basearray.h. C structures defined PyBaseArrayObject void *data; /* Pointer to raw data buffer. */ int ndim; /* Number of dimensions. */ Py_intptr_t *dimensions; /* Size in each dimension. */ Py_intptr_t *strides; /* Bytes to jump to get to the next element in each dimension. */ PyDataTypeObject *datatype; /* Pointer to data type object describing the data. */ int flags; /* Flags describing the data buffer */ PyObject *base; /* This object should be decref'd upon deletion of the array.*/ /* For creation from a buffer object it points to an object that should be decref'd upon deletion. */ PyObject *weakreflist; /* For weakreferences. */ The flags variable is the bit-wise OR of: CONTIGUOUS -- set if array is c-style contiguous in memory with the last dimension varying the fastest. FORTRAN -- set if array is fortran-style contiguous in memory with the first dimension varying the fastest. OWN_DATA -- set if this array owns the data buffer and should de-allocate it when the array is deallocated WRITEABLE -- the memory can be written to. ALIGNED -- the memory (for each stride) is aligned properly for the type PyDataTypeObject Describes the type of data the array carries. There are instances of this object for a fixed set of built-in Python types. PyTypeObject *typeobj; /* A Python "scalar" type corresonding to the array elements. */ char kind; /* A character representing kind of data for this type. */ char byteorder; /* '>' (big-endian), '<' (little), '|' (not-applicable), or '=' (native). */ char hasobject; /* Non-zero only if there are object arrays in fields. */ int elsize; /* The size of each element this data-type represents, in bytes. */ int alignment; /* For a basic C-type, this holds the alignment needed for the type. For the C-type of a type this data-type object represents the alignment is offsetof(struct{char c; v;},v). */ struct _arr_ *subarray; /* Non-NULL if the type is a C-contigouus array of some other type. */ PyObject *fields; /* A dictionary of names and fields - a field is a segment of a larger type that has its own data-type object. This dictionary is keyed by strings representing field names and returns a tuple of the form (data-type, offset[, label]), where data-type is the data-type object describing the type, offset is the offset in bytes to the start of that field, and label is the optional user-name of this field. The reason for having label is that the key might be a Python-acceptable name (such that could be used in attribute access, for example), but label is the definer's "official title", which may contain spaces, units, and so forth. */ PyObject *names; /* An ordered tuple of field names or NULL if no fields are defined. */ PyDataTypeFuncs *funcs; /* Table of functions specific for each data-type. */ PyDataTypeFuncs Carries pointers to functions specific to a given datatype object. PyDataType_GetItemFunc *getitem; PyDataType_SetItemFunc *setitem; PyBaseArrayIterObject This iterator structure is usefull for looping over a basearray. int nd_m1; /* Number of dimensions minus 1. */ intp index; /* Current 1D index into the array. */ intp size; /* Total size of the array. */ intp coordinates[MAX_DIMS]; /* N-dimensional index into the array. */ intp dims_m1[MAX_DIMS]; /* Size of the array minus 1 in each dimension. */ intp strides[MAX_DIMS]; /* Bytes to jump to get to the next element in each dimension. */ intp backstrides[MAX_DIMS]; /* Bytes to jump from the end of a dimension to its beginning. */ intp factors[MAX_DIMS]; /* Shape factors, for computing an ND index from a 1D index. */ PyBaseArrayObject *ao; /* The underlying basearray object this iterator refers to. */ char *dataptr; /* Pointer to the element indicated by this iterator. */ Bool contiguous; /* True when *ao has the CONTIGUOUS flag set. */ PyBaseArrayDims Auxiliary structure used for interpreting the shape and stride of Python objects when they are converted to useful C objects. intp *ptr; /* Pointer to a list of intp representing array shape or strides. */ int len; /* The length of the list of integers pointed to above. */ PyBaseArrayChunk Auxiliary structure for representing a memory segment, the equivalent of the Python buffer object. PyObject *base; /* The Python object this memory segment comes from. */ void *ptr; /* Pointer to the beginning of the memory segment. */ intp len; /* Length of the segment in bytes. */ int flags; /* Any data flags that should be used to interpret the memory. */ Reference Implementation The files basearray.c and basearray.h are available from the public svn server: http://svn.scipy.org/svn/PEP/ As of revision 10, all changes by Karol Langner were made within a project during Google Summer of Code 2006, "Base multidimensional array type for Python core". Copyright This document is placed in the public domain.