You are viewing the version of this documentation from Perl blead. This is the main development branch of Perl. (git commit d3c72e2bf329edd64ef1988f8149d0404472802d)

CONTENTS

NAME

perlxs - the XS Language Reference Manual

SYNOPSIS

/* This is a simple example of an XS file. The first half of an XS
 * file is uninterpreted C code; all lines are passed through
 * unprocessed. */

=pod
Except that any POD is stripped.
=cut

/* Standard boilerplate: */

/* For efficiency, always define PERL_NO_GET_CONTEXT: not enabled by
 * default for backwards compatibility. For details, see "How multiple
 * interpreters and concurrency are supported" in perlguts. */
#define PERL_NO_GET_CONTEXT

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"

/* Any general C code here; for example: */

#define FOO 1
static int
my_helper_function(int i) { /* do stuff */ }

/* The first MODULE line starts the XS half of the file: */

MODULE = Foo::Bar PACKAGE = Foo::Bar

  # Indented '#' are XS code comments.
  # C preprocessor directives are still allowed and are passed
  # through:
#define BAR 2

  # File-scoped XS directives
PROTOTYPES: DISABLE

  # A simple XSUB: generate a wrapper for the strlen() C library
  # function.

int
strlen(char *s)

=pod
A more complex example:
C<multi16(i,j)>: do a 16-bit multiply
=cut

unsigned int
multi16(unsigned int i, \
        unsigned int j)
  CODE:
    i = i & 0xFFFF;
    j = j & 0xFFFF;
    RETVAL = (i * j) & 0xFFFF;
  OUTPUT:
    RETVAL

DESCRIPTION

This is the reference manual for the XS language. This is a type of template language from which is generated a C code source file that contains functions written in C, but which can be called from Perl, and which behave just like Perl subs. These are known as external or extension subs, or XSUBs for short.

Note that this POD file was heavily rewritten and modernised in 2025. Various old practices, such as "K&R" XSUB function signature declarations, are no longer encouraged; but much old code will still be using them, so be cautious when using old code as examples for writing new code.

Version numbers

Unless otherwise specified, the syntax described in this document is valid at least as far back as the 1.9508 version of the XS parser utility xsubpp, which was bundled with Perl release 5.8.0.

In xsubpp version 2.09_01, the bulk of the code for this utility was split out into a separate module, ExtUtils::ParseXS. This module inherited the version numbering scheme of xsubpp, and since then, the latter just uses the version number of the ParseXS.pm which gets loaded. This splitting out means that a newer version of ExtUtils::ParseXS can be installed via CPAN into an older Perl installation, allowing newer XS syntax to be used with older Perls.

(Between 1.98_01 and 2.09_01, ExtUtils::ParseXS existed as a separate fork of xsubpp, with some changes ported back and forwards with the perl distribution's xsubpp, in a confusing manner.)

This document refers to changes in XS syntax by reference to xsubpp version numbers. This should be understood as usually mapping directly to the same version number of ExtUtils::ParseXS. To determine which version of this module was bundled with which release of Perl, you can use the corelist utility which is usually a part of the Perl installation, e.g.

corelist -a ExtUtils::ParseXS

THE FORMAL SYNTAX OF AN XS FILE

This is a BNF-like description of the syntax of an XS file. It is intended to be human-readable rather than machine-readable, and doesn't try to accurately specify where line breaks can occur.

Key:

   foo         BNF token.
   "bar"       Literal terminal symbol.
   /.../       Terminal symbol defined by a pattern.
   [Foo::Bar]  Terminal symbol defined by way of an example.
   * + ? | ( ) These have their usual regex-style meanings.
   // ...      BNF Comments.


XS_file           = C_file_part ( module_decl  XS_file_part )+

C_file_part       = (
                       // Lines of C code (including /* ... */),
                       // which are all passed through uninterpreted.
                    |
                       pod // These are stripped.
                    )*

pod               = /^=/ .. /^=cut\s*$/

module_decl       = blank_line
                    // NB: all on one line:
                    "MODULE =" [Foo::Bar] "PACKAGE =" [Foo::Bar]
                                            ( "PREFIX =" [foo_] )?

blank_line        = /^\s*$/

XS_file_part      = ( file_scoped_decls* xsub )*

file_scoped_decls =
                    blank_line
                    // Any valid CPP directive: these are passed through:
                  | "#if" | "#  if" | "#define" | // etc
                  | #comment // anything not recognised as CPP directive
                  | pod
                  | "SCOPE:"               enable
                  | "EXPORT_XSUB_SYMBOLS:" enable
                  | "PROTOTYPES:"          enable
                  | "VERSIONCHECK:"        enable
                  | "FALLBACK:"            ("TRUE" | "FALSE" | "UNDEF")
                  | "INCLUDE:"             [foo.xs]
                  | "INCLUDE_COMMAND:"     [... some command line ...]
                  | "REQUIRE:"             [1.23] // min xsubpp version
                  | "BOOT:"
                        code_block
                  | "TYPEMAP: <<"[EOF]
                       // Heredoc with typemap declarations.
                    [EOF]

enable            = ( "ENABLE" | "DISABLE" )

code_block        = // Lines of C and/or blank lines terminated by the
                    // next keyword or XSUB start. POD is stripped.

xsub              = blank_line // not *always* necessary
                    xsub_decl
                    ( cases | xbody )

xsub_decl         = return_type
                    xsub_name "(" parameters ")" "const" ?

return_type       = "NO_OUTPUT" ? "extern \"C\"" ? "static" ? C_type

C_type            = [const char *]  // etc: any valid C type

C_expression      = [foo(ix) + 1]   // etc: any valid C expression

xsub_name         = [foo] | [X::Y::foo] // simple name or C++ name

parameters        =   empty
                    | parameter ( "," parameter )*

empty             = /\s*/

parameter         = (
                      in_out_decl ?
                      C_type ?
                      /\w+/   // variable name
                      // Default or optional value:
                      ( "=" ( C_expression | "NO_INIT" ) )?

                      // Pseudo-param: foo must match another param name:
                    | C_type "length(" [foo] ")"
                    | "..."
                    )

in_out_decl       = "IN" | "OUT" | "IN_OUT" | "OUTLIST" | "IN_OUTLIST"

cases             = (
                       "CASE:" ( C_expression | empty )
                       xbody
                    )+

xbody             = implicit_input     ?
                    xbody_input_part   *
                    xbody_init_part    *
                    xbody_code_part
                    xbody_output_part  * // Not after PPCODE.
                    xbody_cleanup_part * // Not after PPCODE.

implicit_input    = ( blank_line | input_line )+

xbody_input_part  =
                      "INPUT:" ( blank_line | input_line )*
                    | "PREINIT:"
                          code_block
                    | xbody_generic_key
                    | c_args
                    | interface_macro
                    | "SCOPE:" enable  // Only in xsubpp 3.58 onwards.

input_line        = C_type
                    "&" ?
                    /\w+/   // variable name
                    // Optional initialiser:
                    (
                      ( "=" | ";" ) "NO_INIT"
                    |
                      // Override or add to the default typemap.
                      // The expression is eval()ed as a
                      // double-quotish string.
                      "=" [ a_typemap_override($arg) ]
                    |
                      ("+" | ";")  [ a_deferred_initialiser($arg) ]
                    )?
                    ";" ?

xbody_init_part   =   "INIT:"
                          code_block
                    | xbody_generic_key
                    | c_args
                    | interface
                    | interface_macro

xbody_code_part   =
                      autocall
                    | "CODE:"
                          code_block
                    | "PPCODE:"
                          code_block
                    | // Only recognised if immediately following
                      // an INPUT section:
                      "NOT_IMPLEMENTED_YET:"

                    // Implicit call to wrapped library function.
autocall          = empty

xbody_output_part =
                     xbody_postcall *
                     xbody_output *

xbody_postcall    =   "POSTCALL:"
                          code_block
                    | xbody_generic_key

xbody_output      =   "OUTPUT:"
                       ( blank_line
                       | output_line
                       | "SETMAGIC:" enable
                       )*
                    | xbody_generic_key

                    // Variable name with optional expression which
                    // overrides the typemap
output_line       = /\w+/ ( [ sv_setfoo(ST[0], RETVAL) ] )?

xbody_cleanup_part = "CLEANUP:"
                          code_block
                    | xbody_generic_key

                    // Text to use as the arguments for an autocall;
                    // may be spread over multiple lines:
c_args            = "C_ARGS:" [foo, bar, baz]

                    // Comma-separated list of Perl subroutine names
                    // which use the XSUB, over one or more lines:
interface         = "INTERFACE:"  [foo, bar, Bar::baz]

interface_macro   =
                    "INTERFACE_MACRO:"
                      [GET_MACRO_NAME]
                      [SET_MACRO_NAME] ?


                      // These can appear anywhere in an XSUB.
xbody_generic_key =   pod
                    | alias
                    | "PROTOTYPE:" ( enable | [$$@] )

                      // Whitespace-separated list of overload types,
                      // over one or more lines:
                    | "OVERLOAD:"  [ cmp eq <=> etc ]

                      // Whitespace-separated list of attribute names,
                      // over one or more lines:
                    | "ATTRS:" [foo bar baz]


alias             = "ALIAS:"
                      // One or more lines; each with zero or more
                      // {alias_name, op, index} triplets:
                      (
                        [bar]       "="  [5]
                      | [Foo::baz]  "="  [A_CPP_DEFINE]
                      | [Bar::boz]  "=>" [Foo::baz]
                      )*

OVERVIEW OF XS AND XSUBS

Initial and Further Reading

This document is structured on the assumption that you are already familiar with the very basics of XS and XSUBs; in particular, the code examples may make use of common keywords that are only described later in the file. But once you have that basic familiarity, then this document may be read through in order.

It is in two main parts. First there is a long overview part, which explains (in great detail) what XS and XSUBs are, how the perl interpreter calls XSUBs, and how data is passed to and from an XSUB. There is much more detail here than is strictly necessary for writing simple XSUBs, but this document is intended to be comprehensive. Then comes the reference manual proper, which has a section for each keyword and other parts of an XSUB declaration and definition, plus a few more general topics, such as using typemaps and storing static data.

If necessary, read perlxstut first for a gentler tutorial introduction. In addition, you may find the following Perl documents useful.

An Introduction to XS and XSUBs

Formally, an XSUB is a compiled function, typically written in C or C++, which can be called from Perl as if it was a Perl function. A collection of them are compiled into a .so or .dll library file and are usually dynamically loaded at use Foo::Bar time (but can in principle be statically linked into the perl interpreter).

From Perl, an XSUB looks just like any other sub and is called in the same way. In the most general case, an XSUB can be passed and return arbitrary lists of values. More commonly, such as when XSUBs are being used as thin wrappers to call existing C library functions, they might take a fixed list of arguments and return a single result (or zero items for a void function).

An XS file is a template file format which contains a mixture of C code and XSUB declarations. It is used to generate XSUBs where most boilerplate code is handled automatically: e.g. converting argument values between C and Perl.

Note that this document refers to both the thing in the XS file, and to the C function generated from it, as an XSUB. It should be clear from the context which is being referred to.

When XSUBs are being used as a thin wrapper between Perl and the functions in a particular C library, the XSUB definitions in the XS file are often just a couple of lines, consisting of a declaration of the name, parameters and return type. The XS parser will do almost all the heavy lifting for you.

XS is optional; in principle you can write your own C code directly, or use other systems such as Inline::C or SWIG. For creating simple bindings to existing compiled libraries, there is also the libffi interface via CPAN modules like FFI::Platypus or FFI::Raw. Note that creating XS may initially take more effort than those, but it is lightweight in terms of dependencies.

XSUBs have three main roles. They can be used as thin wrappers for C library functions, e.g. Digest::SHA. They can be used to write functions which are faster than pure Perl or easier to do in C, e.g. List::Util. Or they can be used to extend the Perl interpreter itself, e.g. threads.

XS has extensive support for the first role, and makes writing the second need less boilerplate code. This document doesn't cover the third role, which often requires extensive knowledge of the Perl interpreter's internals.

The h2xs utility bundled with Perl can in principle be used to generate an initial XS file from a C header file, which (with possibly only minor edits) can be used to wrap an entire C API. But note that this utility is rather old and may not handle more modern C header code.

h2xs (as well as other tools) can also be used to generate an initial "empty" skeleton distribution even when not deriving from a header file (see perlxstut for more details).

Typemaps are sets of rules which map C types such as int to logical XS types such as T_IV, and from there to INPUT and OUTPUT templates such as $var = ($type)SvIV($arg) and sv_setiv($arg, (IV)$var) which, after variable expansion, generate C code which converts back and forth between Perl arguments and C auto variables.

There is a standard system typemap file which contains rules for common C and Perl types, but you can add your own typemap file in addition, and from xsubpp 3.01 onwards you can also add typemap declarations inline within the XS file. You can either just add mappings from new C types to existing XS types to make use of existing templates, or you can add new templates too. As an example of the former, if you're using a C header file which has:

typedef int my_int;

then adding this typemap entry:

my_int T_IV

is sufficient for the XS parser to know to use the existing T_IV templates when processing an XSUB which has a my_int parameter type. See "Using Typemaps" and perlxstypemap for more information.

An XS file is parsed by ExtUtils::ParseXS, or by the xsubpp utility (which is a thin wrapper over the module), and generates a .c file. xsubpp is typically called at build time from the Makefile of a distribution (as generated by ExtUtils::MakeMaker); or ExtUtils::ParseXS can be used directly, e.g. by Module::Build. The C file is then compiled into a .so or .dll, again at module build and install time.

The Structure of an XS File

An XS file has two parts, which are parsed and treated completely differently: the C half and the XS half.

Anything before the first MODULE directive line is treated as pure C (except for any sections of POD, which are discarded). All such lines, including C preprocessor directives and C code comments, are passed through unprocessed into the destination C file. XS comments (as described below) aren't recognised by the XS parser, and are just passed through unprocessed.

It is possible that machine-generated C code inserted in this section could include an equal sign character in column one, which would be misinterpreted as POD; if this is a risk, make sure that this hypothetical code generator includes a leading space character.

This half of the file is the place to put things which will be of use to the XSUB code further down: such as #include, #define, typedef, and static C functions. Note that you should (in general) avoid declaring static data in XS files; see "Safely Storing Static Data in XS" for details and workarounds.

After the first MODULE line, the rest of the file is interpreted as XS syntax. Further MODULE keywords may appear where needed to change the current package (in a similar fashion to a single Perl Foo.pm file having multiple package statements).

This second half consists mostly of a series of XSUB definitions. Between these XSUBs, there can be a few file-scope keywords (including further MODULE lines), POD, C preprocessor directives, XS (#) comments, and blank lines. See "File-scoped XS Keywords and Directives" for more details.

The XS half of the file can be thought of as being parsed in two stages. In the initial processing step, the XS parser does the following basic text processing actions.

Once that basic textual preprocessing has been performed, the main XS parsing takes place. XS syntax is very line-orientated. XS lines and sections mostly start with a keyword of the form;

/^\s*[A-Z_]+:/

It is best to position file-scoped keywords at column one, while XSUB-scoped keywords are best indented. This may avoid surprises with edge cases in the XS parser.

Keywords can be either single line, e.g. PROTOTYPES: ENABLE, or multi-line. The latter consume lines until the next keyword, or until the possible start of a new XSUB (/\n\n\S/), or to EOF. Multi-line keywords treat the rest of the text on the line which follows the keyword as the first line of data. The exception to this is keywords which introduce a block of code, such as CODE: or BOOT:, which silently ignore the rest of the first line. (Yes, this is a implementation flaw.)

It is best to include a blank line between each file-scoped item, and before the start of each XSUB. While some items are processed correctly if they are on the line immediately preceding the start of an XSUB, the parser is inconsistent in their handling.

An XSUB ends when /\n\n\S/ is encountered: i.e. a blank line followed by something on column one. (This is why it's recommended to indent XSUB-scoped keywords.) If the thing at column one matches any of the items which can appear in between XSUBs (such as file-scoped keywords) then it, and any subsequent lines, are processed as such. Anything starting on column one which isn't otherwise recognised, is interpreted as the first line of the next XSUB definition. In particular it is interpreted as the return type of the XSUB: this can lead to weird errors when something is unexpectedly interpreted as the start of a new XSUB, such as /* */, which isn't valid in the XS half of the file apart from within code blocks.

Some multi-line keywords, such as C_ARGS, are treated as just a single uninterpreted multi-line string. Others, such as OUTPUT, have a specific per-line syntax, where each line within the section is parsed. Finally, code blocks such as CODE are just copied as-is to the output C file (possibly sandwiched between #line directives to ensure that compiler error messages report from the correct location).

The XS parser doesn't recognise C comments, so don't use them apart from in C code (e.g. not in an XSUB signature). More generally, the XS parser doesn't understand C syntax or semantics; it just uses crude regexes to parse the XS file. For example the parser can handle an XSUB declaration like this:

int
foo(int a, char *b = "),")

Here, the parser just extracts out everything between (...) and splits on commas, with just enough intelligence to ignore commas etc within matching pairs of double-quotes. The parser doesn't understand C type declaration syntax; for example it typically just extracts everything before what appears to be a parameter name, and assumes that it must be the type. That "type" will later be looked up in a typemap, and if no entry is found, will only then raise an error. So it will fail to correctly parse the following parameters:

int
foo(int a /* not-a-comment */, this is seen as a type!! b)

In addition, the XS parser has historically been very permissive, even to the point of accepting nonsense as input. Since around xsubpp releases 3.54-3.61, more things are likely to warn or raise errors during XS parsing, rather than silently generating a non-compilable C code file.

As mentioned earlier, an XSUB definition typically starts with /\n\n\S/ and continues until the next /\n\n\S/. The XSUB definition consists of a declaration, followed by an optional body. The declaration gives the function's name, parameters and return type, and is intended to mimic a C function declaration. It is usually two lines long.

The XSUB's body consists of a series of keywords. The main C code of an XSUB is specified by a CODE or PPCODE section. In the absence of this, a short body is generated automatically, which consists of a call to a C function with the same name and arguments as the XSUB. In this way, the XSUB becomes a short wrapper function between Perl and the C library function, with the wrapper handling the conversion been Perl and C arguments. This is referred to in this document as autocall.

Other keywords can be used to modify the code generated for the XSUB, or to alter how it is registered with the interpreter (e.g. adding attributes).

So that is the basic structure of an XSUB. What a real XSUB looks like will be covered later in "The Anatomy of an XSUB", but first a slight digression follows.

Overview of how data is passed to and from an XSUB

This section contains a basic background on how XSUBs are invoked, what their arguments consist of, and how XSUB arguments are passed to and from Perl. It is essentially a summary of some relevant sections within perlguts; see that document for a more detailed exploration.

Note that most of the information in this section isn't needed to create basic XSUBs; but for more complex needs or for debugging, it helps to understand what's happening behind the scenes.

Perl OPs

(This next paragraph is definitely only for background on debugging.)

An OP is a data structure within the perl interpreter. It is used to hold the nodes within a tree structure created when the perl source is compiled. It usually represents a single operation within the perl source, such as an add, or a function call. The structure has various flags and data, and a pointer to a C function (called the PP function) which is used to implement the actions of that OP. The main loop of the perl interpreter consists of calling the PP function associated with the current OP (PL_op) and then updating it, typically to PL_op->op_next.

In particular, the OP_ENTERSUB OP, via a call to pp_enterub(), performs (or at least starts) a function call, including any calls to XS functions

SVs and the Perl interpreter's argument stack

Almost all runtime data within the Perl interpreter, including all Perl variables, are stored in an SV structure. These SVs can hold data of many different types, including integers (IV - integer value), strings (PV - pointer value), references (RV), arrays (AV), elements of arrays, subroutines (CV - code value) etc. These will be discussed in more detail below.

Perl has an argument stack, which is a C array of SV pointers. Most of the run-time actions of the PP functions consist of pushing SV pointers onto the stack or popping them off and processing them. There is a companion mark stack, which is an array of integers which are argument stack offsets. These marks serve to delineate the stack into frames.

Consider this subroutine call:

@a = foo(1, $x);

The various OPs executed by the Perl interpreter up until the function is called will: push a mark indicating the start of a new argument stack frame; push an SV containing the integer value 1; push the SV currently associated with the variable $x; and push the *foo typeglob. Then the PP function pp_entersub() associated with the OP_ENTERSUB will pop that typeglob, extract the &foo CV from it, and see whether it is a normal CV or an XSUB CV.

For a normal Perl subroutine call, pp_entersub() will then: pop the topmost mark off the mark stack; pop the SV pointers between that mark and the top of the stack and store them in @_; then set PL_op to the first OP pointed to by the &foo CV. Those OPs will then be run by the main loop, until the OPs associated with the last statement of the function (or an explicit return) will leave any return values as SV pointers on the stack.

For an XSUB sub, pp_entersub() will instead note the value of the topmost mark (but not pop it) and call the C function pointed to from the CV; this is the XSUB which has been generated by the XS parser. The XSUB itself is responsible for popping the mark stack, doing any processing of its arguments on the stack, and then pushing return values. But note that for straightforward XSUBs, this is usually all done by boilerplate code generated by the XS parser. Exactly what is done automatically and what can be overridden and handled manually if needed, is one of the themes of this document. Finally, pp_entersub() will do any post-processing of the returned values; for example discarding all but the top-most stack item if the function call was in scalar context.

An SV's reference count

Perl uses reference counting as its garbage collection method. One of the always-present fields in an SV is its reference count, accessible as SvREFCNT(sv).

Usually an SV's reference count is incremented each time a pointer to the SV is stored somewhere, and decremented any time such a pointer is removed. When the reference count reaches zero, any destructor associated with that SV is called, then the SV is freed. Mismanaging reference counts can lead to SVs leaking or being prematurely freed.

When relying on XS to generate all the boilerplate code, reference count bookkeeping is usually handled for you automatically. Once you start handling this yourself, then there are some specific considerations.

Functions which create a new SV, such as newSViv(i), return an SV that has an initial SvREFCNT() of one. This is actually one too high, since there are not yet any pointers to this SV stored anywhere. The expectation is that the SV will shortly be embedded somewhere - such as stored in an array - which will take "ownership" of that one count. If the program calls croak() or similar before the new SV has been embedded, then it will leak. Note that croak() can be trapped by eval(), so it's possible that croak() could be called many times, leaking each time. Note also that many things may indirectly trigger a croak(). For example accessing the value of an SV associated with a tied variable may trigger a call to its FETCH() method, which could call die. So a new SV needs to be embedded quickly.

Since such new SVs already have a reference count of one, when embedding them it should be done in a way which doesn't increase its reference count. For example, this modifies sv to be a reference to a newly-created SV holding an integer value, i.e. the perl equivalent of $sv = \99:

sv_setrv_noinc(sv, newSViv(99));

The _noinc variant is used here as it doesn't increment the reference count of the integer-valued SV when creating a reference to it.

Where appropriate, reference counts can be adjusted with SvREFCNT_inc() and SvREFCNT_dec() and their variants.

An exception to this system is the argument stack. Pointers on the argument stack to SVs do not contribute to the reference count of that SV. The code typically generated by XS takes advantage of this. For example when ready to return a single value, the XSUB just stores a new SV pointer at the base of the current stack frame, overwriting the old value, then resets the argument stack pointer to the base of the frame plus one, and returns. All the original values on the stack are discarded, without adjusting any reference counts.

This can be a problem if the XSUB is returning a new SV. Since this SV isn't embedded anywhere apart from on the stack (which doesn't hold a reference count to it), then if the code croaks, the SV on the stack will leak. To avoid this, there is a separate temps stack in the Perl interpreter. Items on this stack are reference counted. Typically the temps stack is reset at the start of each statement, back to some particular level. Each SV above this level has its reference count decremented. Putting an SV on the temps stack is referred to as mortalising it. It is common to create a new SV and mortalise it at the same time: here are some examples:

SV *sv_99  = sv_2mortal(newSViv(99));
SV *sv_abc = newSVpvn_flags("abc", 3, SVs_TEMP);

Many OPs have an SV attached to them called a PADTMP. This SV has a long lifetime which is the same as the sub which the OP is a part of (typically it is created when a named sub is compiled and freed when that sub is deleted, often at the end of the execution of the program), and usually has a reference count of one. It is used by many OPs to avoid having to create (and later free) a temporary SV to return a value. For example the ADD op in $a + $b typically extracts the integer values of its two arguments, calculates its sum, sets its PADTMP to that value and pushes it onto the stack. The OP_ENTERSUB which typically invokes an XSUB usually has a PADTMP attached to it, and when returning a value, the XSUB's boilerplate code generated by XS will usually try to use it to return the value, rather than creating a fresh mortal SV on each call.

Note that there is a highly experimental perl interpreter build option, PERL_RC_STACK, under which the argument stack is reference counted, but that is currently beyond the scope of this document.

The IV, NV etc types

An IV (Integer Value) is a typedef in the perl interpreter's header files that maps to a C integer. The exact integer type and size will depend on the build configuration of the interpreter. It is guaranteed to be large enough to hold a pointer. A UV is the same but unsigned. An NV (numeric value) is a floating-point value; usually a double. These types are used widely within the perl interpreter.

A PV (pointer value) "type" is often used informally within documentation and within the names of structure fields etc to refer to a string pointer (char*), but it is not actually a declared type. Similarly, RV (reference value) is informally a pointer to another SV.

There are also SSize_t and Size_t, which are large enough to hold signed and unsigned integer values representing the number of items in a C array. STRLEN is used specifically for variables which store the number of characters in a string (it is typically just an alias for Size_t).

The SV scalar value structure

(Again there is a lot of detail in this section that you may not need to know when just creating simple XSUBs, but is useful background for debugging.)

As mentioned above, almost all runtime data within the perl interpreter is stored in an SV (scalar value) structure. The head of an SV structure consists of three or four fields: a reference count; a type and flags; a pointer to a body; and since perl 5.10.0, a general payload field. There are around 17 types, and the type indicates what body (if any) is pointed to from the SV's head. The body type only indicates what sorts of data the body is capable of holding; the actual "type" of the SV (IV, NV, PV, RV etc) is mostly indicated by what flags are set.

Simple SVs may not have a body. Undefined values typically don't have one. Also, some IV, NV, and RV values are stored directly in the payload field. In this case the body pointer is faked up to point back to the head, but with a suitable offset so that an attempt to access the IV field (for example) within the "body" actually reads the IV value in the head's payload field.

For SVs which have a body, the payload field in the head is usually used to store one common value which would otherwise have to be stored in the body and require a further pointer indirection to access. For example, the char* pointer of a perl string SV is stored in the head, while the length is stored in the body.

The fields of an SV (both in the head and in the body) are usually accessed via macros, which has allowed various rearrangements of the head and body fields over the years while maintaining backwards compatibility. Always use the macros. For example, SvIVX(sv) directly accesses the IV field of the SV (which may be in the head or body depending on the SV's type). If the SV has a valid integer value, then the SVf_IOK flag will be set, which can be tested with the macro SvIOK(sv).

The body of an SV may be upgraded to a "bigger" one during the SV's lifetime, but it is not usually downgraded. For example, during the course of executing this perl code:

my $x;
$x = "1";
$u = $x + 1;
undef $x;

Initially the SV has no body and none of the SVf_IOK, SVf_NOK, SVf_POK, nor SVf_ROK flags are set, indicating that it has neither an IV, NV, PV or RV value. The complete lack of those flags indicates an undefined value. After the string is assigned to it, its body type is set to SVt_PV, and it is given the corresponding body. The string pointer and length are stored in the body (or perhaps the pointer in the payload word), and the SVf_POK flag is set, indicating that the SV holds a valid string value.

When Perl wants to use that SV as an integer, it uses a macro like SvIV(sv) to return the integer value. Unlike the direct SvIVX() macro, this first checks SvIOK(sv), and if not true, calls a function which calculates the integer value from its current string value. The effect of this call is to update the SV's type and body to SVt_PVIV which is capable of holding both a string and and integer value, and then to set the SVf_IOK flag in addition to the SVf_POK flag.

Finally, the undef frees the string and turns off the SVf_IOK and SVf_POK flags, but leaves the body type as SVt_PVIV. (Hence why an SV's current Perl-level type should be determined by its flags, not its body type.)

Note that you should never directly access fields using macros like SvIVX() (the X implies direct) unless you have just tested for the corresponding flag, e.g. SvIOK(). In general, always use macros such as SvIV(), which will do any checking and conversion for you.

There is a further complication with SVs: they can have one or more items of magic attached to them. These are small payloads, along with a pointer to a jump table of pointers to functions with get/set etc actions. They are used to implement things like $1, $. and tied variables. The idea is that macros like SvIV() will first check whether the SV has get magic (using SvGMAGICAL(sv)); and if so call its get method first. For example, for a tied variable, this C-level get function will call the perl-level FETCH() method and assign the return value of that to the SV. Only then will SvIV() do its SvIOK() check.

When presented with an unknown SV, it should always have its magic checked before examining the values of the SVs flags.

In total, the SvIV(sv) macro does roughly the equivalent of:

if (SvGMAGICAL(sv))
    mg_get(sv);   /* do FETCH() etc; update the SV's value / flags */
if (!SvIOK(sv))
    sv_2iv(sv);   /* convert undef to 0, "1" to 1 etc */
return SvIVX(sv); /* use the raw value */

You will see soon that XS's typemap templates mostly use high-level macros like SvIV(), so this is usually all handled automatically for you. Only if you start to do your own type conversions will you need to worry about these details.

Forgetting to test for, and to call, get magic will typically appear to work fine until the first time someone passes a tied variable or similar to your XSUB, and FETCH() doesn't get called. Accessing fields with SvPVX() etc without testing for SvPOK() first may access a field in a body which doesn't exist and possibly trigger a SEGV.

Magic should only be called once per "use"; for example if a tied scalar is passed as an argument to your XSUB, you would expect FETCH() to only be called once. Normally this is easy because you (or the typemap code) does a single SvIV() call. Occasionally you may have explicitly called mg_get() first, perhaps in order to check some flags; if so, you can skip a second magic call with variants like SvIV_nomg(). For example:

SvGETMAGIC(sv); /* this calls mg_get() if SvGMAGICAL() */
if (SvNOK(sv))
    /* special-case: do something with a floating-point value */
else {
    IV i = SvIV_nomg(sv);
    /* fall-back to treating it as an integer value */
}

A Perl reference is just another type of scalar. It is indicated by SvROK() being true, and the pointer to the referent SV is accessed using SvRV().

The equivalent of SvIV() for strings is SvPV() (and variants, such as SvPVutf8):

STRLEN len;
char *pv = SvPV(sv, len);

which both retrieves a string pointer and sets len to its length. (SvPV is a macro, which is how it can update len without needing an explicit &len.) Note that there is no guarantee that after this call SvPOK(sv) is true, nor that pv == SvPVX(sv). For example, sv may be a reference to a blessed object with an overloaded stringify ("") method. In which case, behind the scenes there may be a temporary SV containing the result of the call to the method, with pv pointing to that SV's string buffer; sv remains a reference. Similarly, a non-overloaded reference to an array may return a temporary string like "ARRAY(0x12345678)".

If you need to coerce an SV to a string (e.g. before directly modifying its string buffer) then use SvPV_force() or one of its variants. For example if used on an array reference, the SV will be converted from a reference into a plain string SV with an SvPVX() value of "ARRAY(0x12345678)", and the array's reference count decremented.

Once an SV has been coerced into a PV (SvPOK(sv) is true), then SvLEN(sv) represents the size of the allocated buffer, while SvCUR(sv) represents the current length (in bytes) of the string. Note that with Unicode, SvCUR(sv) may not necessarily equal the value returned by the Perl built-in length(sv), which is the length in characters. That can be obtained using the sv_len_utf8(sv) function. See "Unicode and UTF-8" below for more details.

The SV structure can also be used to store things which aren't simple scalar values: in particular, arrays, hashes and code values. There are typedefs for AV, HV and CV structures (plus a few others). These structures are identical to SVs and can generally be used interchangeably with suitable casting, e.g. SV *ret = (SV*)av. The main feature of these non-scalar SVs is that the value of the type field in these cases, SVt_PVAV, SVt_PVHV, SVt_PVCV etc, actually do indicate the Perl type, rather than just indicating what sort of body they have.

An important thing to note is that AVs and HVs are never directly pushed onto the stack when calling and returning from subroutines and XSUBs. Instead where necessary, references (RVs) to them are pushed: either automatically via suitable typemaps, or using newRV() or similar. You will likely first spot such an error when you start getting "Bizarre copy of ..." error messages.

Unicode and UTF-8

A simple Perl string SV uses what is sometimes referred to as byte encoding: each character is represented using a single byte. But when a Perl string contains code points >= 0x100, each character of the string is stored as a variable number of bytes using the UTF-8 encoding scheme, with the SvUTF8(sv) flag being set to indicate this. Other strings may or may not be using UTF-8 encoding, depending on the history of the string. For example, with:

my $s = "A\x80";
$s .= "\x{100}";
chop $s;

the string starts off in byte encoding, with SvCUR(sv) == 2, sv_len_utf8(sv) == 2 and with each byte representing one character. When the extra character is appended, the string gets upgraded to UTF-8, with SvCUR(sv) == 5, sv_len_utf8(sv) == 3 and the second and third characters each using two bytes of storage. Once the third character is removed, the string stays in UTF-8 encoding, with SvCUR(sv) == 3, sv_len_utf8(sv) == 2 and the second character using two bytes. So such a string SV when passed to an XSUB has two possible representations; and which will be used is somewhat unpredictable.

Unfortunately XS currently has no support for UTF-8. All the standard typemap entries, such as char *, assume that the buffer of a string SV is just an array of bytes to be manipulated by the XSUB or passed on uninterpreted to a C function. If it is necessary for the XSUB to control the UTF-8 status of an argument, then it is best to declare the parameter as type SV* and do your own manipulation of it. Similarly for returning string values.

An SV's string representation can be forced to bytes using SvPVbyte() and variants; if the string contains any characters not representable in a single byte, then that call croaks with a Wide character error. Conversely, SvPVutf8() and variants will force the string to UTF-8.

See perlunicode for more details.

The Anatomy of an XSUB

The previous section has explained how arguments are pushed onto the stack, what those arguments look like, and how XSUBs are called. We will now look at what happens inside an XSUB function once called; in particular, how it retrieves values from its arguments on the stack and later returns a value or values on the stack; and how XS and typemaps automate most of this.

This section will provide both an overview of what an XSUB looks like in XS, and what sort of C code is generated for it. The majority of the rest of this document will then describe in more detail the various parts of an XSUB mentioned here. Note that the various keywords within an XSUB's definition usually correspond closely (and in the same order) to what C code is generated for the XSUB. Most of the boilerplate code generated for an XSUB is concerned with getting argument values off the stack at the start, then returning zero or one result values on the stack at the end.

A typical XSUB definition might look like:

MODULE = Foo::Bar PACKAGE = Foo::Bar

short
baz(int a, char *b = "")
  PREINIT:
    long z = ...;
  CODE:
    ... do stuff ...;
    RETVAL = some_function(a, b, z);
  OUTPUT:
    RETVAL

The first two lines of an XSUB are its declaration, which must be preceded by a blank line. It gives the XSUB's return type, its name, and its parameters (including any default values). While it is modelled on C syntax, it is actually XS syntax (so for example /* ... */ isn't recognised). The return type and name must both start on column one, although the XS parser actually allows both to be on the same line, such as

short baz(...)

This XSUB definition will be translated into a C function whose start may look something like this (the exact details may vary across XS parser releases):

void
XS_Foo__Bar_baz(pTHX_ CV* cv)
{
    dVAR; dXSARGS;
    if (items < 1 || items > 2)
       croak_xs_usage(cv,  "a, b= \"\"");

Note that the first line of the function is actually specified using a macro such as XS_EXTERNAL(), but for explanatory purposes, what is shown above is one possible expansion of that macro, depending on the Perl version and XS configuration.

The important thing to note is that the XSUB's arguments are not passed as arguments of the C function; they are still on the Perl argument stack. Nor is the XSUB's return value returned by the C function.

The C function's name is based on the XSUB's name plus the current XS package (with s/:/_/g). Apart from debugging, you don't generally need to know this name.

The function's parameters are the CV associated with this XSUB (i.e. &Foo::Bar::baz) and, on MULTIPLICITY/threaded builds, a pointer to the current Perl interpreter context. You won't need to directly use these most of the time.

The first few lines of code in the C function are standard boilerplate added to to all XSUBs. Note that the naming convention for Perl interpreter macros is that ones starting with a d are declarations; they go in places where a variable can be declared, and typically declare one or more variables and possibly their initialisations.

dVAR is mostly a no-op; it used to be needed for some obscure Perl interpreter configurations and is still emitted for backwards compatibility.

dXSARGS pops one index off the mark stack and sets up some auto variables to allow the arguments on the stack to be accessed: specifically, the variable items is declared, which indicates how many arguments were passed, and some hidden variables are also declared which are used by the macro ST(n) to retrieve a pointer to argument n from the stack (counting from 0). The stack pointer is not actually decremented yet.

For a generic list-processing XSUB, these argument-accessing variables and macros may be used directly. But more commonly, for an XSUB which has a fixed signature (as in the example above), the parser will declare an auto C variable for each parameter, and (using the system or a user typemap) assign them values extracted from ST(0) etc. It will also declare a variable called RETVAL with the XSUB's return type (unless that is void), which is typically assigned to by the coder and then whose value is automatically returned. Continuing the example above, the generated code for the input part of the XSUB is similar to:

{
    long z = ...;
    short RETVAL;
    int   a = (int)SvIV(ST(0));
    char *b;

    if (items < 2)
        b = "";
    else
        b = (char *)SvPV_nolen(ST(1));

This consists of declarations for a, b, z and RETVAL, plus code to initialise them. The part of the code which extracts a value from an SV on the stack, such as (int)SvIV(ST(0)), is derived from a typemap entry. For a simple entry such the one for a, the code may be added as part of the declaration of the variable itself; otherwise the initialisation may be done as a separate statement after all the variable declarations (such as for b).

Variable declarations appear in the order they appear in INPUT and PREINIT blocks, followed by RETVAL and then any parameters defined completely within the signature (i.e. which don't use an INPUT section to specify their type).

Note that INPUT sections are generally obsolete these days, and PREINIT is rarely needed. Perls before 5.36 used C89 compiler semantics, which didn't allow variable declarations after statements. CPAN modules, depending on if/how they set compiler flags, may still default to C89. To work around this, the PREINIT keyword allows you to inject additional variable declaration code early in the function.

Following on from the input part, the main body of the function is output; this is copied exactly as-is from the CODE or PPCODE section, if present. If neither is present, the parser will assume that this XSUB is just wrapping a C library function of the same name as the XSUB, and will automatically generate some code like the following:

RETVAL = baz(a, b);

The INIT and POSTCALL keywords may be used to add code just before and after the main code; typically only useful for autocall.

PPCODE is the same as CODE except that after argument processing, the stack pointer is reset to the base of the frame, and the coder becomes responsible for pushing any return values onto the stack. No further keywords can follow PPCODE. This is typically used for XSUBs which need to return a list or have other complex requirements beyond just returning a single value.

For CODE and autocall, unless the return type is void, the parser will generate code to return the value of RETVAL. This is automatic in the case of autocall, but for CODE you have to ask the parser to do so with OUTPUT: RETVAL. The code generated in either case may look something like

{
    SV *RETVALSV = sv_newmortal();
    sv_setiv(RETVALSV, (IV)RETVAL);
    ST(0) = RETVALSV;
}

A temporary SV will be created, set to the value of RETVAL (again, using a typemap template), then placed on the stack. In practice, various optimisations may be used; in particular, the PADTMP target SV which is attached to the calling OP_ENTERSUB may be used instead of allocating and freeing an SV for each call, as explained earlier.

XSUB parameters declared as OUT or OUTLIST will cause additional output code to be generated which respectively: updates the value of one of the passed arguments; or pushes the value of that parameter onto the stack (in addition to RETVAL).

Finally, (apart from PPCODE), a macro like this is added to the end of the C function:

XSRETURN(1);

This resets the stack pointer to one above the base of the frame (so the top item on the stack is ST(0)), then does return.

For a void XSUB, XSRETURN_EMPTY is used instead.

Returning Values from an XSUB

An XSUB's declared return type is typically a C type such as int or char*. XS is very good at automating this common case of returning a single C-ish value: behind the scenes it creates a temporary SV; then, using an appropriate typemap template, sets that SV to the value of RETVAL and returns that SV on the stack.

But sometimes you want to return a Perl-ish value rather than a C-ish value, for example, Perl's undef value or a Perl array reference. Or you may want to return multiple values, or update one of the passed arguments. The following subsections describe various such cases.

Note that XSUBs are somewhat like Perl lvalue subs, in that they return the actual SV to the caller, while normal Perl subs return a temporary copy of each return value. When returning a C value like int this doesn't matter, since the XSUB is returning a temporary SV anyway; but when returning your own SV, it could in theory make a visible difference. For example,

sub foo { $_[0]++ }
foo(an_xsub_which_returns_element_0_of_an_array(\@a));

would increment $a[0].

Returning undef / TRUE / FALSE / empty list

Sometimes you need to return an undefined value, e.g. to indicate failure. It's possible to return early from a CODE block with an undefined value, bypassing the normal creation of a temporary SV and the setting of its value. For example:

int
file_size(char *filename)
  CODE:
    RETVAL = file_size(filename);
    if (RETVAL == -1)
        XSRETURN_UNDEF;
  OUTPUT:
    RETVAL

The XSRETURN_UNDEF macro causes the address of the special Perl SV PL_sv_undef to be stored at ST(0) (which is the same value that the Perl function undef returns), and then makes the XSUB return immediately.

If using autocall, then you can instead return early in a POSTCALL section:

int
file_size(char *filename)
  POSTCALL:
    if (RETVAL == -1)
        XSRETURN_UNDEF;

There are similar macros

XSRETURN_YES
XSRETURN_NO
XSRETURN_EMPTY

which allow you to return Perl's true and false values, or to return an empty list.

If your XSUB will always explicitly return a special SV and won't ever require typemap conversions (e.g. it always returns via XSRETURN_YES or XSRETURN_NO), then just declare the return type as SV*.

Note that any early return from an XSUB should always be via one of the XSRETURN macros and not directly via return; the former will do any bookkeeping associated with the argument stack.

Returning an SV*

More generally, you may want to create and return an SV yourself, rather than relying on the boilerplate XSUB code to generate a temporary SV and set it to a C-ish value. Here you would declare the return type as SV*. For example:

SV*
abc(bool uc)
  CODE:
    RETVAL = newSVpv(uc ? "ABC" : "abc", 3);
  OUTPUT:
    RETVAL

There is some special processing which happens when using a return type such as SV*. First, consider that for a C return type like int, the typemap template which sets the temporary SV's value may look something like:

sv_setiv($arg, (IV)$var);

which after expansion may look like:

sv_setiv(RETVALSV, (IV)RETVAL);

where the temporary SV has previously been assigned to RETVALSV.

Now, if you declare an XSUB with a return type of SV*, you might expect the typemap template to look something like:

sv_setsv($arg, (SV*)$var);

This Perl library function copies the value of one SV to another (the XS user's equivalent of the Perl $a = $b).

However, the design decision was made that for the SV* type in particular, the typemap template would be

$arg = $var;

Here is where the special processing comes in. The XS compiler, in the case of an output template beginning $arg = ..., skips creating a temporary SV, and just returns the SV in RETVAL directly. So the typemap template would be expanded to

ST(0) = RETVAL;

This is faster than copying.

But in addition, for any $arg = ... template (not just the template for SV*), the XS compiler makes one further assumption: that the expression to the right of the assign evaluates to an SV with a reference count one too high, and so in addition, the XS compiler emits:

sv_2mortal(RETVAL);

or similar, which causes the reference count of the SV to be decremented by one at (typically) the start of the next statement. This makes sense if the SV is newly created with one of the newSVfoo() family of functions: see the discussion on this in "An SV's reference count"

However, if the SV comes from elsewhere, for example via a Perl array lookup, then its reference count doesn't need to be adjusted, and so the mortalising will cause it to be prematurely freed. In this case, you need to artificially increase the SV's reference count: typically using SvREFCNT_inc(), as shown below.

The previous example showed creating a new SV using newSVpv(); here's an example where the SV pre-exists in an array:

SV*
lookup(int i)
  CODE:
    {
        SV** svp = av_fetch(some_array_AV, i, 0);
        if (!svp)
            XSRETURN_UNDEF;
        /* compensate for the implicit mortalisation */
        RETVAL = SvREFCNT_inc(*svp);
    }
  OUTPUT:
    RETVAL

Finally, note that some very old (pre-1996) XS documentation suggested that you could return your own SV using code like:

void
foo(...)
  CODE:
    ST(0) = some_SV;

This is very wrong, as the void declaration tells the XS code to expect to return zero items on the stack. There is still come code like this in the wild, and to work around it, the XS compiler does a very special and ugly hack for a void XSUB when it sees ST(0) being assigned to within a CODE block: it pretends that the XSUB was actually declared as returning SV* and so emits XSRETURN(1) rather than XSRETURN_EMPTY. But don't rely on this: it is likely to warn eventually. If your XSUB is doing its own setting of ST(0), then always declare the return type as SV*.

The mark stack isn't used when returning arguments; instead, the caller of the XSUB (usually the OP_ENTERSUB) notes the offset of the base of the argument stack frame before calling the XSUB and the offset of the stack pointer on return, and can deduce the number of returned arguments from that.

Returning AV* etc refs

Sometimes you want to return a non-scalar SV, such as an AV, HV or CV. However, these aren't allowed directly on the argument stack. You are supposed to instead return a reference to the AV: a bit like a Perl sub returning \@foo.

The standard typemaps can create this reference for you automatically. So for example an XSUB with a return type of AV* will actually create and return an RV scalar which references the AV in RETVAL. So the XS equivalent of Perl's return [8,9] might be:

AV *
array89()
  CODE:
    RETVAL = newAV();
    /* see text below for why this line is needed */
    sv_2mortal((SV*)RETVAL);
    av_store(RETVAL, 0, newSViv(8));
    av_store(RETVAL, 1, newSViv(9));
  OUTPUT:
    RETVAL

Note that the RETVAL variable is declared as type AV*, but what is actually returned to the caller is a temporary SV which is a reference to RETVAL. The standard output typemap template for the AV* type looks like:

$arg = newRV((SV*)$var);

This means it creates a new RV which refers to to the AV. Because of the rule for $arg = ... typemaps, the RV will be correctly mortalised before being returned. However, the newRV() function increments the reference count of the thing being referred to (the RETVAL AV in this case). Since the AV has just been created by newAV() with a reference count one too high, it will leak. This why the sv_2mortal() is required. Conversely for a pre-existing AV, the mortalisation isn't required.

Since xsubpp 3.06, there are a set of alternative XS types which can be used for AVs etc which don't increment the reference count of the AV when being pointed to from the new RV. These can be enabled by mapping the AV* etc C types to these new XS types:

TYPEMAP: <<EOF
AV*   T_AVREF_REFCOUNT_FIXED
HV*   T_HVREF_REFCOUNT_FIXED
CV*   T_CVREF_REFCOUNT_FIXED
SVREF T_SVREF_REFCOUNT_FIXED
EOF

Or alternatively you could just declare the return type as SV* and handle the RV generation yourself:

SV *
create_array_ref()
  CODE:
    RETVAL =  newRV_noinc((SV*)newAV());
  OUTPUT:
    RETVAL

If instead you want to return a flattened array (the equivalent of Perl's return @a) then you would have to push the elements of the array individually onto the stack in a PPCODE block. See "Returning a list" below.

Finally, the C SVREF type in the standard typemap is a way of creating and returning a reference to a scalar. This is in contrast to the SV* type, which just returns a scalar.

Note that unlike SV etc, SVREF isn't a standard built-in Perl type: it exists purely as an entry in a typemap. So In this case you have to tell the C compiler that SVREF is just another name for SV*:

typedef SV *SVREF;

Then in an XSUB like

SVREF
foo()
  CODE:
    RETVAL = newSViv(9);
    sv_2mortal(RETVAL);
  OUTPUT:
    RETVAL

RETVAL will be declared with type SVREF (hence the need for the typedef above), and the XSUB will return a reference the RETVAL SV. This XSUB is the equivalent of the perl my $x = 9; return \$x.

Updating arguments and returning multiple values.

By using the IN_OUT and similar parameter modifiers, XS provides limited support for returning extra values in addition to (or instead of) RETVAL, either by updating the values of passed arguments (OUT), or by returning some of the parameters (and pseudo-parameters) as extra return values (OUTLIST). For returning an arbitrary list of values, see the next section.

Here are a couple of simple XS examples with their approximate perl equivalents:

# Update a passed argument

void                          sub inc9 {
inc9(IN_OUT int i)                my $i = $_[0];
  CODE:                           $i += 9;
    i += 9                        $_[0] = $i;
                              }

# Return (2*$i, 3*$i)

void                         sub mul23 {
mul23(int i, \                   my $i = $_[0];
      OUTLIST int x, \           my ($x, $y);
      OUTLIST int y)             $x = $i * 2;
  CODE                           $y = $i * 3;:
    x = i * 2;                   return $x, $y;
    y = i * 3;               }

See "Updating and returning parameter values: the IN_OUT etc keywords" for the full details,

Returning a list

If you want to return a list, i.e. an arbitrary number of items on the stack, you generally have to forgo the convenience of some of the boilerplate code generated by XS, which is biased towards returning a single value. Instead you will have to create and push the SVs yourself. The PPCODE keyword is specifically intended for this purpose. Here is a simple example which does the same as the Perl-level return 1..$n:

void
one_to_n(int n)
  PPCODE:
    {
        int i;
        if (n < 1)
            Perl_croak_nocontext(
                "one_to_n(): argument %d must be >= 1", n);
        EXTEND(SP, n);
        for (i = 1; i <= n; i++)
            mPUSHi(i);
    }

The PPCODE keyword causes the argument stack pointer to be initially reset to the base of the frame (discarding any passed arguments), and suppresses any automatic return code generation. The return type of the XSUB is ignored, except that declaring it void suppresses the declaration of a RETVAL variable.

The EXTEND() macro makes sure that there are at least that many free slots on the stack (its first argument should always be SP). The mPUSHi() macro creates a new SV, mortalises it, sets its value to the integer i, and pushes it on the stack.

Here's another example, which flattens the array passed as an argument: the equivalent of this Perl:

sub flatten { my $aref = $_[0]; @$aref: }

In this example, the SVs being pushed aren't freshly created with a reference count one too high, so don't need mortalising.

void
flatten(AV *av)
  PPCODE:
    {
        int i;
        int max_ix = AvFILL(av);
        SV **svp;
        EXTEND(SP, max_ix + 1);
        for (i = 0; i <= max_ix; i++)  {
            svp = av_fetch(av, i, 0);
            PUSHs(svp ? *svp : &PL_sv_undef);
        }
    }

This function actually expects to be passed a reference to an array: the input typemap entry for AV* automatically takes care of dereferencing the argument and croaking if it's not actually a reference. The PUSHs() macro simply pushes an SV onto the stack, without any mortalising or copying. Any "holes" in the array are filled with undefs.

Note that there is a XPUSHs() macro which combines a push with an EXTEND(1); but if you know at the start how many items are to be pushed, it is more efficient to do a single large extend first.

Note also that there is always guaranteed to be one allocated slot on the stack when an XSUB is called, even if it has no arguments. So for the particular case of returning a single value, no extend is necessary.

Bootstrapping

In addition to the XS_Foo__Bar_baz() C function generated for each XSUB declaration, a boot_Foo__Bar() C function is also automatically generated, one for each XS file. This XSUB function is called once when the module is first loaded. For each declared XSUB in the file, a line similar to the following is added to the boot function:

newXS("Foo::Bar::baz", XS_Foo__Bar_baz);

(the exact details of the code will vary across releases and configurations). This call creates a CV, flags it as being an XSUB, adds a pointer from it to XS_Foo__Bar_baz(), then adds the CV to the *FOO::Bar::baz typeglob in the Perl interpreter's symbol table. It is the XS equivalent of the Perl-level

*FOO::Bar::baz  = sub { ... }

For some XSUBs, additional lines may be added by the parser to the boot XSUB to handle things like aliases or overloading.

You can add your own additional lines to the boot XSUB using the BOOT keyword.

A typical Perl module like Foo/Bar.pm should have code in it similar to:

package Foo::Bar;
our $VERSION = '1.01';
require XSLoader;
XSLoader::load();

This causes the Bar.so or Bar.dll file to be dynamically linked in and then the boot_Foo__Bar() function to be called. This boilerplate code is typically created automatically with h2xs when you first create the skeleton of a new distribution. See perlxstut for more details.

REFERENCE MANUAL

This part of the document explains what each XS keyword does. They are arranged in the approximate order in which they might appear within an XS file, and then might appear within an XSUB declaration. Related keywords are grouped together.

The MODULE Declaration

MODULE = Foo::Bar  PACKAGE = Foo::Bar
MODULE = Foo::Bar  PACKAGE = Foo::Bar::Baz
MODULE = Foo::Bar  PACKAGE = Foo::Bar      PREFIX = foobar_

The MODULE keyword is used to start the XS half of the file, and to specify the package of the functions which are being defined. The MODULE keyword must start on column one. All text preceding the first MODULE keyword is considered C code and is passed through to the output with POD stripped, but otherwise untouched.

It is usually necessary to include a blank line before each MODULE declaration.

For the first such declaration, the MODULE and PACKAGE values are typically the same. In subsequent entries, the PACKAGE value varies, while the MODULE value is kept unchanged. In fact, only the MODULE value from the last such declaration is used, and specifies the name of the boot XSUB which is called when the module is loaded (typically via use Foo::Bar).

The value of the PACKAGE keyword is analogous to the Perl package keyword, and determines which package any subsequent XSUBs will be created in. It is permissible to have the same PACKAGE value appear more than once, again similarly to Perl.

In theory the PACKAGE keyword is optional, and defaults to ''. This means that any subsequent XSUBs will be placed in the main:: package. In practice, you should always specify the package.

The optional PREFIX value is stripped from the XSUB's name when generating the XSUB's Perl name. It is typically used to simplify creating autocall XSUBs. It addresses the issue that while Perl has package names, C only has function name prefixes. Consider a C library called foobar, which has functions such as foobar_read() and foobar_write(). We want to make these accessible from a Perl module called Foo::Bar. In the presence of PREFIX = foobar_, any such prefix of each XSUB name will be stripped off when determining the XSUB's Perl name. For example:

MODULE = Foo::Bar  PACKAGE = Foo::Bar  PREFIX = foobar_

char* foobar_read(int n)

int   foobar_write(char *text, int n)

This will insert two XSUBs into the Perl namespace, called Foo::Bar::read() and Foo::Bar::write(), which when called, will themselves call the C functions foobar_read() and foobar_write().

File-scoped XS Keywords and Directives

After the first MODULE keyword, everything else in the file consists of XSUB definitions, plus anything that comes between the XSUBs. The XSUBs will be explained further down, but this section addresses the in-between stuff, which can consist of any of the following.

The following file-scoped keywords are supported. Note that the SCOPE can technically be a file-scoped keyword too, but is described further down as an XSUB keyword.

The REQUIRE: Keyword

REQUIRE: 3.58

The REQUIRE keyword is used to indicate the minimum version of the ExtUtils::ParseXS XS compiler (and its xsubpp wrapper) needed to compile the XS module. It is expected to be a floating-point number of the form /\d+\.\d+/. It is analogous to the way that use v5.xx indicates the minimum version of the perl interpreter needed in a Perl program.

The VERSIONCHECK: Keyword

VERSIONCHECK: DISABLE | ENABLE

Version checking (enabled by default) checks that the version compiled into the .so or .dll file matches the .pm file's $VERSION value, and if not, dies with an error message like:

Foo::Bar object version 1.03 does not match bootstrap parameter 1.04

Typically, when a module is built for the first time, the value of the $VERSION variable in the .pm file is copied to the generated Makefile as XS_VERSION, and from there, via a -DXS_VERSION=... compiler option, is baked into the boot XSUB. When the module is loaded and the boot code called, the versions are compared, and it croaks if there's a mismatch. This usually indicates that the .so and .pm files are from different installs: for example someone copied over a more recent version of the .pm file but forgot to copy or rebuild the .so.

If the version of the PM module is a floating point number, it will be stringified before the comparison, with a possible loss of precision (currently chopping to nine decimal places), so it may not match the version of the XS module any more. Quoting the $VERSION declaration to make it a string is recommended if long version numbers are used.

There is rarely any good reason to disable this check.

Note that this module version checking is completely unrelated to the REQUIRE keyword, which is a check against the version of the XS compiler.

The VERSIONCHECK keyword corresponds to xsubpp's -versioncheck and -noversioncheck options. This keyword overrides the command line options.

The PROTOTYPES: Keyword

PROTOTYPES: DISABLE | ENABLE

When prototypes are enabled (they are disabled by default), any subsequent XSUBs will be given a Perl prototype. The prototype string is usually generated from the XSUB's parameter list. This keyword may be used multiple times in an XS module to enable and disable prototypes for different parts of the module.

For example, these two XSUB declarations:

int add1(int a, int b)

PROTOTYPES: ENABLE

int add2(int a, int b)

behave similarly to the perl-level:

sub add1     { ... }
sub add2($$) { ... }

Note also that prototypes can be overridden on a per-XSUB basis with the XSUB-level PROTOTYPE keyword.

In general, XSUB prototypes (similarly to perl sub prototypes) are of very limited use and are typically only used to mimic the behaviour of Perl builtins. For example there is no way to implement a push @a, ...; style function without a way of telling the Perl interpreter not to flatten @a. Outside of these narrow uses, it is generally a mistake to use prototypes.

In the early days of XS it was thought that using prototypes was probably a Good Thing, and prototypes were enabled by default. This was soon changed to disabled by default, and a warning was added if you haven't explicitly indicated your preference: so in the absence of any PROTOTYPES keyword, you will get this nagging warning:

Please specify prototyping behavior for Foo.xs (see perlxs manual)

So 99% of the time you will want to add

PROTOTYPES: DISABLE

to the start of the XS half of your .xs file.

The PROTOTYPES keyword corresponds to xsubpp's -prototypes and -noprototypes options.

See "Prototypes" in perlsub for more information about Perl prototypes.

The EXPORT_XSUB_SYMBOLS: Keyword

EXPORT_XSUB_SYMBOLS: ENABLE | DISABLE

This keyword is present since xsubpp 3.04, and its value is disabled by default.

Before 3.04, the C function which implemented an XSUB was exported. Since 3.04, it is declared static by default. The old behaviour can be restored by enabling it. You are very unlikely to have a need for this keyword.

The INCLUDE: Keyword

INCLUDE: const-xs.inc
INCLUDE: some_command |

This keyword can be used to pull in the contents of another file to the "XS" part of an XS file. Unlike a top-level XS file, included files don't have a "C" first half, and the entire contents of the file are treated as XS, as if it had all been inserted at that line.

One common use of INCLUDE is to include constant definitions generated by ExtUtils::Constant.

If the parameters to the INCLUDE keyword are followed by a pipe (|) then the XS parser will interpret the parameters as a command. This feature is mildly deprecated in favour of the INCLUDE_COMMAND: directive, as documented below. The latter can be used to ensure that the perl (if any) used in the command is the same as the one running the XS parser.

The INCLUDE_COMMAND: Keyword

INCLUDE_COMMAND: $^X -e '...'

Since xsubpp 2.2205.

Similar to INCLUDE: some_command| except that the | is implicit, and it converts the special token $^X, if present, to the path of the perl interpreter which is running the XS parser.

The TYPEMAP: Keyword

TYPEMAP: <<EOF
myint  T_MYIV
INPUT
    T_MYIV
        $var = ($type)my_SvIV($arg)
OUTPUT
    T_MYIV
        my_sv_setiv($arg, (IV)$var);
EOF

Since xsubpp 3.01.

Typemaps are mappings and code templates which allow the XS parser to automatically generate code snippets which convert between Perl and C values. The TYPEMAP keyword can be used to embed typemap declarations directly into your XS code, instead of (or in addition to) typemaps in a separate file. Multiple such embedded typemaps will be processed in order of appearance in the XS code. Typemaps are processed in the order:

The most recently applied entries take precedence, so for example you can use TYPEMAP: to individually override specific TYPEMAP, INPUT, or OUTPUT entries in the system typemap. In general, typemap changes affect any subsequent XSUBs within the file, until further updates.

Note however that, due to a quirk in parsing, it is possible for a TYPEMAP: block which comes immediately after an XSUB to affect any entries used by that XSUB, as if the block had appeared just before the XSUB. If all such typemap blocks are placed near the start of an XS file, then this won't be an issue. Indeed, it can only be a possible issue if you want typemap meanings to change during the course of an XS file (which is rare).

The TYPEMAP keyword syntax is intended to mimic Perl's "heredoc" syntax, and the keyword must be followed by one of these three forms:

<<  FOO
<< 'FOO'
<< "FOO"

where FOO can be just about any sequence of characters, which must be matched at the start of a subsequent line.

See "Using Typemaps" and perlxstypemap for more details on writing typemaps.

The BOOT: Keyword

BOOT:
    # Print a message when the module is loaded
    printf("Hello from the bootstrap!\n");

The BOOT keyword is used to add code to the extension's bootstrap function. This function is generated by the XS parser and normally holds the statements necessary to register any XSUBs with Perl. It is usually called once, at use Foo::Bar time.

This keyword should appear on a line by itself. All subsequent lines will be interpreted as lines of C code to pass through, including C preprocessor directives, but excluding POD and # comments; until the next keyword or possible start of a new XSUB (/\n\n\S/).

The FALLBACK: Keyword

MODULE = Foo PACKAGE = Foo::Bar

FALLBACK: TRUE | FALSE | UNDEF

Since xsubpp 2.09_01.

It defaults to UNDEF for each package. It sets the default fallback handling behaviour for overloaded methods in the current package (i.e. Foo:Bar in the example above). It is analogous to the Perl-level:

package Foo::Bar;
use overload "fallback" => 1 | 0 | undef;

It only has any effect if there ends up being at least one XSUB in the current package with the OVERLOAD keyword present. See "fallback" in overload for more details.

The Structure of an XSUB

Following any file-scoped XS keywords and directives, an XSUB may appear. The start of an XSUB is usually indicated by a blank line followed by something starting on column one which isn't otherwise recognised as an XSUB keyword or file-scoped directive.

An XSUB definition consists of a declaration (typically two lines), followed by an optional body. The declaration specifies the XSUB's name, parameters and return type. The body consists of sections started by keywords, which may specify how its parameters and any return value should be processed, and what the main C code body of the XSUB consists of. Other keywords can change the behaviour of the XSUB, or affect how it is registered with Perl, e.g. with extra named aliases. In the absence of an explicit main C code body specified by the CODE or PPCODE keywords, the parser will generate a body automatically; this is referred to as autocall in this document.

Nothing can appear between keyword sections apart from POD, XS comments, and trailing blank lines, all of which are stripped out before the main parsing takes place. Anything else will either raise an error, or be interpreted as the start of a new XSUB.

An XSUB's body can be thought of as having up to five parts. These are, in order of appearance, the Input, Init, Code, Output and Cleanup parts. There is no formal syntax to define this structure; it's just an understanding that certain keywords may only appear in certain parts and thus may only appear after certain other keywords etc.

An XSUB Declaration

# A simple declaration:

int
foo1(int i, char *s)

# All on one line; plus a default parameter value:

int foo2(int i, char *s = "")

# Complex parameters; plus variable argument count:

int
foo3(OUT int i, IN_OUTLIST char *s, STRLEN length(s), ...)

# No automatic argument processing:

void
foo4(...)
    PPCODE:

# C++ method; plus various return type qualifiers:

NO_OUTPUT extern "C" static int
X::Y::foo5(int i, char *s) const

An XSUB declaration consists of a return type, name, parameters, and optional NO_OUTPUT, extern "C", static and const keywords.

An XSUB's return type and the NO_OUTPUT keyword

The return type can be any valid C type, including void. When non-void, it serves two purposes. First, it causes a C auto variable of that type to be declared, called RETVAL. Second, it (usually) makes the XSUB return a single SV whose value is set to RETVAL's value at the time of return. In addition, a non-void autocall XSUB will call the underlying C library function and assign its return value to RETVAL.

In addition the return type can be a Perl package name; see "Fully-qualified type names and Perl objects" for details.

If the return type is prefixed with the NO_OUTPUT keyword, then the RETVAL variable is still declared, but code to return its value is suppressed. It is typically useful when making an autocall function interface more Perl-like, especially when the C return value is just an error condition indicator. For example,

NO_OUTPUT int
delete_file(char *name)
  # implicit autocall code here: RETVAL = delete_file(name);
  POSTCALL:
    if (RETVAL != 0)
        croak("Error %d while deleting file '%s'", RETVAL, name);

Here the generated XS function returns nothing on success, and will die() with a meaningful error message on error. The XSUB's return type of int is only meaningful for declaring RETVAL and for doing the autocall.

The return type can also include the extern "C" and static modifiers, which if present must be in that order, and come between any NO_OUTPUT keyword and the return type. The extern declaration must be written exactly as shown, i.e. with a single space and with double quotes around the C. These two modifiers are mainly of use for XSUBs written in C++. A C++ XSUB declaration is also allowed to have a trailing const keyword, which mimics the C++ syntax. See "Using XS With C++" for more details.

An XSUB's name

The name of the XSUB is usually put on the line following the return type, in which case it must be on column one. It is permissible for both the type and name to be on the same line.

The name can be any valid Perl subroutine name. The PACKAGE value from the most recent MODULE declaration is used to give the XSUB it's fully-qualified Perl name.

If the name includes the package separator, ::, then it is treated as a C++ method declaration, and various extra bits of processing take place, such as declaring an implicit THIS parameter. The XSUB's Perl package name is still determined by the current XS package, and not the C++ class name. See "Using XS With C++" for more details.

An XSUB's parameter list

Following the XSUB's name, there is a comma-separated list of parameters within parentheses. Although this looks superficially the same as a C function declaration, it is different. In particular, it is parsed by the XS compiler, which is a simple regex-based text processor and which doesn't understand the full C type syntax; nor does it recognise C-style comments.

In fact all it does is extract the text between the (...) and split on commas, while having enough intelligence to ignore commas and a closing parenthesis within a quoted string. Once each parameter declaration is extracted, it is processed, as described below in "An XSUB Parameter".

Each parameter declaration usually generates a C auto variable declaration of the same name, along with initialisation code which assigns the value of the corresponding passed argument to that variable. Under some circumstances code can also be generated to return the value too.

Note that the original XS syntax required the type for each parameter to be specified separately in one or more INPUT sections, mimicking pre-C89 "K&R" C syntax. To support this, directly after the declaration there is an implicit INPUT section, without a need to include the actual keyword. You will see this pattern very frequently in older XS code.

Old style with an implicit INPUT keyword (a common pattern):

int
foo(a, b)
    long  a
    char *b
  CODE:
    ...

Old style with explicit INPUT keyword (unusual):

int
foo(a, b)
  INPUT:
    long  a
    char *b
  CODE:
    ...

New style (recommended for new code):

int
foo(long a, char *b)
  CODE:
    ...

Generally there is no reason to use the old style any more, apart from a few obscure features that can be specified on an INPUT line but not in the signature.

An XSUB Parameter

Some examples of valid XSUB parameter declarations:

char *foo             # parameter with type
Foo::Bar foo          # parameter with Perl package type
char *foo = "abc"     # default value
char *foo = NO_INIT   # doesn't complain if arg missing
OUT char *foo         # caller's arg gets updated
IN_OUTLIST char *foo  # parameter value gets returned
int length(foo)       # pseudo-parameter that gets the length of foo
foo                   # placeholder, or parameter without type
SV*                   # placeholder
...                   # ellipsis: zero or more further arguments

The most straightforward type of declaration in an XSUB's parameter list consists of just a C type followed by a parameter name, such as char *foo. This has two main effects. First, it causes a C auto variable of that name to be declared; and second, the variable is initialised to the value of the passed argument which corresponds to that parameter. For example,

void
foo(int i, char *s)

is roughly equivalent to the Perl:

sub foo {
    my $i = int($_[0]);
    my $s = "$_[1]";
    ...
}

and the generated C code may look something like:

if (items != 2)
   croak_xs_usage(cv,  "i, s");

{
    int   i = (int)SvIV(ST(0));
    char *s = (char *)SvPV_nolen(ST(1));
    foo(i, s); /* autocall */
    ...
}

In addition to the variable declaration and initialisation, the name of the parameter will usually be used in the usage message and in any autocall, as shown above. These variables are accessible for any user code in a CODE block or similar. Their values aren't normally returned.

There are several variations on this basic pattern, which are explained in the following subsections.

Fully-qualified type names and Perl objects

Foo::Bar
foo(Foo::Bar self, ...)

Normally the type of an XUB's parameter or return value is a valid C type, such as "char *". However you can also use Perl package names. When a type name includes a colon, it undergoes some extra processing; in particular, the actual type as emitted into the C file is transformed using s/:/_/g (unless xsubpp has been invoked with -hiertype), so that a legal C type is present. The complete effects for a type of Foo::Bar are as follows.

The type string Foo::Bar is looked up in the typemap as-is to find the logical XS type; then the INPUT and OUTPUT typemap templates are expanded with the $ntype variable set to "Foo::Bar" and the $type variable set to "Foo__Bar". The declaration of the corresponding auto variables uses the modified type string, so the example above might result in these declarations in the C code:

Foo__Bar  RETVAL;
Foo__Bar  self   = ...;

With the appropriate XS typemap entries and C typedefs, this can be used to assist in declaring XSUBs which are passed and return Perl objects. See "T_PTROBJ and opaque handles" for an example of this using the common T_PTROBJ typemap type.

Note that any check on whether the passed Perl object is of the correct class is down to the implementation in the particular typemap: for example, T_PTROBJ will croak unless the passed SV argument is blessed into the Foo::Bar or derived class.

XSUB Parameter Placeholders

Sometimes you want to skip an argument. There are two supported techniques for efficiently declaring a placeholder. Both of these will completely skip any declaration and initialisation of a C auto variable, but will still consume an argument.

A bare parameter name is treated as a placeholder if has a name but no type specified: neither in the signature, nor in any following INPUT section. For example:

void
foo(int a, b, char *c)
  CODE:
    ...

is roughly equivalent to the Perl:

sub foo {
    my $a = int($_[0]);
    my $c = "$_[2]";
    ...
}

A parameter containing just the specific type SV* and no name is treated specially. A bug in the XS parser meant that it used to skip any parameter declaration which wasn't parsable. This inadvertently made many things de facto placeholder declarations. A common usage was SV*, which is now officially treated as a placeholder for backwards compatibility. Any other bare types without a parameter name are errors since xsubpp 3.57. Note that the SV* text will appear in any usage() error message. For example,

void
foo(int a, SV*, char *c)

may croak with:

Usage: Foo::Bar::foo(a, SV*, c) at ...

Placeholders can't be used with autocall unless you use C_ARGS to override the missing argument. For example:

void
foo(int a, b, char *c)
    C_ARGS: a, c

Updating and returning parameter values: the IN_OUT etc keywords

IN         int i
           int i
IN_OUT     int i
IN_OUTLIST int i
OUT        int i
OUTLIST    int i

Normally a parameter declaration causes a C auto variable of the same name to be declared and initialised to the value of the corresponding passed argument. These modifiers can make parameters also update or return values, and can also cause the initialisation to be skipped. They come at the start of a parameter declaration.

These modifiers address the issue that, because a simple C function takes a fixed number of read-only parameters and returns a single value, the basic XSUB syntax has been designed to reflect that pattern.

The usual way to make a more complex C function API is to pass pointers to variables, which the C function will use to set or update the variables. For example, a couple of hypothetical C functions might be called as:

int time = ....;      // an integer in the range 0..86399
int hour, min, sec;
parse_time(time, &hour, &min, &sec); // set    hour, min, sec
increment_time(&hour, &min, &sec);   // update hour, min, sec

The XS IN_OUT etc modifiers allow you to write XSUBs which can wrap such functions with autocall, and in general update passed arguments or return multiple values.

The rules of these parameter modifiers are:

The approximate perl equivalents for these modifiers are given in the examples below, where the perl code real_foo(\$i) stands in for the C autocall foo(&i).

IN int i             sub foo {
   int i                  my $i = $_[N];
                          real_foo($i);
                     }

IN_OUT int i         sub foo {
                          my $i = $_[N];
                          real_foo(\$i);
                          $_[N] = $i;
                     }

IN_OUTLIST int i     sub foo {
                          my $i = $_[N];
                          real_foo(\$i);
                          return ..., $i, ...;
                     }

OUT int i            sub foo {
                          my $i;
                          real_foo(\$i);
                          $_[N] = $i;
                     }

OUTLIST int i        sub foo {
                          my $i; # NB $_[N] is not consumed
                          real_foo(\$i);
                          return ..., $i, ...;
                     }

Together, they allow you to wrap C functions which use pointers to return extra values; either preserving the C-ish API in perl (OUT), or providing a more Perl-like API (OUTLIST). For example, wrapping the parse_time() function from the example above could be done using OUT:

void
parse_time(int time, \
           OUT int hour, OUT int min, OUT int sec)

which could be called from perl as:

my ($hour, $min, $sec);
# set ($hour, $min, $sec) to (23,59,59):
parse_time(86399, $hour, $min, $sec);

Or by using OUTLIST:

void
parse_time(int time, \
           OUTLIST int hour, OUTLIST int min, OUTLIST int sec)

which could be called from perl as:

# set ($hour, $min, $sec) to (23,59,59):
my ($hour, $min, $sec) = parse_time(86399);

Default Parameter Values

int
foo(int i, char *s = "abc")

int
bar(int i, int j = i + ')', char *s = "abc,)")

int
baz(int i, char *s = NO_INIT)

Optional parameters can be indicated by appending = C_expression to the parameter declaration. The C expression will be evaluated if not enough arguments are supplied. Parameters with default values should come after any mandatory parameters (although this is currently not enforced by the XS compiler). The value can be any valid compile-time or run-time C expression (but see below), including the values of any parameters declared to its left. The special value NO_INIT indicates that the parameter is kept uninitialised if there isn't a corresponding argument.

The XS parser's handling of default expressions is rather simplistic. It just wants to extract parameter declarations (including any optional trailing default value) from a comma-separated list, but it doesn't understand C syntax. It can handle commas and closing parentheses within a quoted string, but currently not an escaped quote such as '\'' or "\"". Neither can it handle balanced parentheses such as int j = (i+1).

Due to an implementation flaw, default value expressions are currently evalled in double-quoted context during parsing, in a similar fashion to typemap templates. So for example char *s = "$arg" is expanded to char *s = "ST(0)" or similar. This behaviour may be fixed at some point; in the meantime, it is best to avoid the $ and @ characters within default value expressions.

The length(param_name) pseudo-parameter

int
foo(char *s, int length(s))

It is common for a C function to take a string pointer and length as two arguments, while in Perl, string-valued SVs combine both the string and length in a single value. To simplify generating the autocall code in such situations, the length(foo) pseudo-parameter acts as the length of the parameter foo. It doesn't consume an argument or appear in the XSUB's usage message, but it is passed to the autocalled C function. For example, this XS:

void
foo(char *s, short length(s), int t)

translates to something similar to this C code:

if (items != 2)
   croak_xs_usage(cv, "s, t");

{
    STRLEN STRLEN_length_of_s;
    short  XSauto_length_of_s;
    char * s = (char *)SvPV(ST(0), STRLEN_length_of_s);
    int    t = (int)SvIV(ST(1));

    XSauto_length_of_s = STRLEN_length_of_s;
    foo(s, XSauto_length_of_s, t);
}

and might be called from Perl as:

foo("abcd", 9999);

The exact C code generated will vary over releases, but the important things to note are:

Ellipsis: variable-length parameter lists

int
foo(char *s, ...)

An XSUB can have a variable-length parameter list by specifying an ellipsis as the last parameter, similar to C function declarations. Its main effect is to disable the error check for too many parameters. Any declared parameters will still be processed as normal, but the programmer will have to access any extra arguments manually, making use of the ST(n) macro to access the nth item on the stack, and the items variable, which indicates the total number of passed arguments, including any fixed arguments. Note that ST(0) is the first passed argument, while the first ellipsis argument is ST(i) where i is the number of fixed arguments preceding the ellipsis.

Note that currently XS doesn't provide any mechanism to autocall variable-length C functions, so the ellipsis should only be used on XSUBs which have a body.

For example, consider this Perl subroutine which returns the sum of all of its arguments which are within a specified range:

sub minmax_sum {
    my $min = shift;
    my $max = shift;
    my $RETVAL = 0;
    $RETVAL += $_ for grep { $min <= $_ && $_ <= $max } @_;
    return $RETVAL;
}

This XSUB provides equivalent functionality:

int
minmax_sum(int min, int max, ...)
  CODE:
    {
        int i = 2; /* skip the two fixed arguments */
        RETVAL = 0;

        for (; i < items; i++) {
            int val  = (int)SvIV(ST(i));
            if (min <= val && val <= max)
                RETVAL += val;
        }
    }
  OUTPUT:
    RETVAL

It is possible to write an XSUB which both accepts and returns a list. For example, this XSUB does the equivalent of the Perl map { $_*3 } ...

void
triple(...)
  PPCODE:
    SP += items;
    {
        int i;
        for (i = 0; i < items; i++) {
            int val  = (int)SvIV(ST(i));
            ST(i) = sv_2mortal(newSViv(val*3));
        }
    }

Note that the PPCODE keyword, in comparison to CODE, resets the local copy of the argument stack pointer, and relies on the coder to place any return values on the stack. The example above reclaims the passed arguments by setting SP back to the top of the stack, then replaces the items on the stack one by one.

The XSUB Input Part

Following an XSUB's declaration part, the body of the XSUB follows. The first part of the body is the input part, and is mainly concerned with declaring auto variables and assigning to them values extracted from the passed parameters. The two main keywords associated with this activity are PREINIT and INPUT. The first allows you to inject extra variable declaration lines, while the latter used to be needed to specify the type of each parameter, but is now mainly of historical interest. This is also the place for the rarely-used SCOPE keyword.

Note that the keywords described in "XSUB Generic Keywords" and "Sharing XSUB bodies" may also appear in this part, plus the C_ARGS and INTERFACE_MACRO keywords.

The PREINIT: Keyword

PREINIT:
    int i;
    char *prog_name = get_prog_name();

This keyword allows extra variables to be declared and possibly initialised immediately before the declarations of auto variables generated from any parameter declarations or INPUT lines. Any lines following PREINIT until the next keyword (except POD and XS comments) are copied out as-is to the C code file. Multiple PREINIT keywords are allowed.

It is sometimes needed because in traditional C, all variable declarations must come before any statements. While this is no longer a restriction in the perl interpreter source since Perl 5.36.0, the C compiler flags used when compiling XS code may be different and so, depending on the compiler, it may still be necessary to preserve the correct ordering.

Any variable declarations generated by INPUT and lines from PREINIT are output in the same order they appear in the XS source, followed by any variable declarations generated from the XSUB's parameter declarations. These may be followed by statements to initialise those those variables. Thus, any variable declarations in a later INIT or CODE block may be flagged as a declaration-after-statement.

PREINIT code shouldn't assume that any variables declared earlier have already been initialised; initialisation is deferred if the initialisation code (typically obtained from a typemap) isn't of the simple type var = init; form, or has a default value.

For example:

void
foo(int i = 0)
    PREINIT:
        int j = 1;
    CODE:
        bar(i, j);

might be translated into C code similar to:

{
    int j = 1;
    int i;

    if (items < 1)
        i = 0;
    else {
        i = (int)SvIV(ST(0));
    }
    bar(i, j);
}

Usually you could dispense with PREINIT by just wrapping the code in CODE blocks in braces, but it may be necessary if the ordering of the variable initialisations is sensitive, e.g. if it is affected by some changing global state.

The INPUT: Keyword

void
foo(a, b, c, d, e, int f)
  # implicit INPUT section
    int   a
  # explicit INPUT section
  INPUT:
    long  &b
    int    c = ($type)MySvIV($arg)
    int    d = NO_INIT
    int    e + if (some_condition) { $var += 1 }
  ...

Immediately following an XSUB's declaration, there is an implicit INPUT section, i.e. the parser behaves as if there was a literal "INPUT:\n" line injected before the first line of the body. This can be followed by zero or more explicit INPUT sections, possibly interleaved with other keywords and sections such as PREINIT.

When XS was first created, it was modelled on the syntax of pre-ANSI C, which required the types of parameters to be separately specified. It was later updated to allow parameter types to be specified in the parameter list, like ANSI C. Thus there is rarely any good reason to use INPUT sections now; but you will often encounter them in older code.

Each INPUT line, at a minimum, specifies the type of a parameter listed in the XSUB's signature, e.g.

char *s

In addition, the variable name may be prefixed with & to indicate that a pointer to the variable should be passed to any autocall function; and may have a postfix initialisation modifier starting with one of the three characters = + ;.

Note that if a variable name doesn't match any of the declared parameters, then it might be treated as an auto variable declaration (depending on the perl version and on whether it has an initialisation override). This misfeature may be deprecated at some point in the future, so don't rely on it: use a PREINIT section if necessary. These two examples are mostly equivalent, with the first form being preferred:

void
foo(int a)
  PREINIT:
     short b = 1;

void
foo(a)
     int a
     short b = 1;

The & variable modifier in INPUT

The & variable modifier has the single effect that the corresponding argument passed to an autocall function will have the variable name prefixed with &. Combined with OUTPUT: foo, this allows the address of a variable to be passed to a wrapped function, which updates that variable's value; on return, the XSUB updates the caller's arg with that value. The modern equivalent of this is to declare the parameter as IN_OUT. These two XSUBs are equivalent:

void
foo(IN_OUT int i)

void
foo(i)
    int &i
  OUTPUT:
    i

and they both wrap a C function called foo() that takes a single int * argument which (presumably) updates the integer pointed to. They both generate C code similar to this:

int i = (int)SvIV(ST(0));
foo(&i);
sv_setiv(ST(0), (IV)i);

Altering variable initialisation in INPUT

Normally each declared parameter causes a C auto variable of the same name to be declared, and for code to be planted which initialises that variable to the value of the corresponding passed argument. The initialisation code is usually obtained by expanding the typemap template corresponding to the parameter's type. It is possible to override, augment, or skip that initialisation code, by appending one of the three characters = + ; and an initialiser expression, to the INPUT line.

void
foo(a,b,c,d,e,f,g)

    # Use the standard typemap entry:
    int a
    # and with optional trailing colon
    int b;

    # Override the typemap entry:
    int c = ($type)MySvIV($arg)

    # Skip the initialisation entirely:
    int d = NO_INIT
    int e ; NO_INIT

    # Add deferred initialisation code
    # *in addition* to the standard init:
    int f + if (some_condition) { $var += 1 }

    # Add deferred initialisation code
    # *instead of* the standard init:
    int g ; if (some_condition) { $var += 1 }

Any override code is passed through template expansion in the same way that typemap templates are, with $var, $arg, $type etc being expanded. Deferred initialisation code is placed after all variable declarations.

In modern XS where INPUT is not often used, some of these initialiser effects can be achieved in other ways:

The SCOPE: Keyword and typemap entry

# XSUB-scoped
void
foo(int i)
  SCOPE: ENABLE
  CODE:
    ...

# file-scoped
SCOPE: ENABLE
void
bar(int i)
  CODE:
    ...

# typemap entry
TYPEMAP: <<EOF
INPUT
T_MYINT
   $var = my_int($arg); /* SCOPE */
EOF

The SCOPE keyword can be used to enable scoping for a particular XSUB (disabled by default). Its effect is to wrap the main body of the XSUB (including most parameter and return value processing) within an { ENTER; and LEAVE; } pair. This has the effect of clearing any accumulated savestack entries at the end of the code body. If disabled, then the savestack will usually be cleared by the caller anyway, so this is a rarely-used keyword.

The SCOPE keyword may be either XSUB-scoped or file-scoped (this refers to the scope of the keyword within the XS file, not to the scope generated by the keyword). For the first, it may appear anywhere in the input part or the XSUB. For the latter, it may appear anywhere in file scope, but due to a long-standing parser bug, the keyword's state is reset at the start of each XSUB, so it will only have any effect if it appears just before a XSUB declaration and as part of the same paragraph (i.e. with no intervening blank lines), such as in the example above. It will only affect the single following XSUB.

The XSUB-scoped form has been available since xsubpp 1.9506, but was broken in release 2.21 and fixed in 3.58. The file-scoped form has been available since 2.21.

To support potentially complex type mappings, if an INPUT typemap entry contains a code comment like /* SCOPE */, then scoping will be automatically enabled for any XSUB which uses that typemap entry. This currently only works for parameters whose type is specified using old-style INPUT lines rather than an ANSI-style declaration, i.e. not for foo(int i). In fact, the XS parser, when looking for a SCOPE comment in a typemap, is currently very lax: it's actually a case-insensitive match of any code comment which contains the text "scope" plus anything else. But you shouldn't rely on this; always use the form shown here. Even better, just don't use it at all.

The XSUB Init Part

Following an XSUB's input part, an optional init part follows. This consists solely of the INIT keyword described below, plus the keywords described in "XSUB Generic Keywords" and "Sharing XSUB bodies", plus the C_ARGS.

The INIT: Keyword

The INIT keyword allows arbitrary initialisation code to be inserted after any variable declarations (and their initialisations), but before the main body of code. It is primarily intended for use when the main body is an autocall to a C function. For example these two XSUBs are equivalent:

int
foo(int i)
  INIT:
    if (i < 0)
        XSRETURN_UNDEF;

int
foo(int i)
  CODE:
    if (i < 0)
        XSRETURN_UNDEF;
    RETVAL = foo(i);
  OUTPUT:
    RETVAL

Any lines following INIT until the next keyword (except POD and XS comments) are copied out as-is to the C code file. Multiple INIT keywords are allowed.

The XSUB Code Part

Following an XSUB's optional init part, an optional code part follows. This consists mainly of the CODE or PPCODE keywords, which provide the code block for the main body of the XSUB. These two keywords are similar, except that PPCODE can be thought of as acting at a lower level; it resets the stack pointer to the base of the stack frame and then relies on the programmer to push any return values; whereas CODE will (with prompting) automatically generate code to return the value of RETVAL.

There is also a rarely-used NOT_IMPLEMENTED_YET keyword which generates a body which croaks.

Only one of these keywords may appear in this part, and at most once; and no other keywords are recognised in this part (although such keywords could instead be processed in the tail or head of the preceding and following init and output parts).

In the absence of any of those three keywords, the XS compiler will generate an autocall: a call to the C function of the same name as the XSUB.

Auto-calling a C function

In the absence of any explicit main body code via CODE, PPCODE or NOT_IMPLEMENTED_YET, the XS parser will generate a body for you automatically (this is referred to as autocall in this document). In its most basic form, the parser assumes that the XSUB will be a simple wrapper for a C function of the same name, with the same parameters and return type as the XSUB. So for example, these two XSUB definitions are equivalent, but the first is an autocall with less boilerplate needed:

int
foo(char *s, short flags)

int
foo(char *s, short flags)
  CODE:
    RETVAL = foo(s, flags);
  OUTPUT:
    RETVAL

Note that the XSUB C function and the wrapped C function are two different entities; the first will have a name like XS_Foo__Bar_foo; when Perl code calls the 'Perl' function Foo::Bar::foo(), behind the scenes the Perl interpreter calls XS_Foo__Bar_foo(), which extracts the string and short int values from the two passed argument SVs, calls foo(), then stuffs its return value into an SV and returns that to the Perl caller.

The two basic types of generated autocall code are:

foo(a, b, c);

RETVAL = foo(a, b, c);

depending on whether the XSUB is declared void or not. The variables passed to the function are usually just the names of the XSUB's parameters, in the same order. Parameters with default values are included, while ellipses are ignored. So for example

int
foo(int a, int b = 0, ...)

generates this autocall code:

RETVAL = foo(a, b);

There are various keywords which can be used to modify the basic behaviour of an autocall.

The C_ARGS: Keyword

void foo1(int a, int b, int c)
  C_ARGS: b, a

void foo2(int a, int b)
  C_ARGS: a < 0 ? 0 : a,
          b,
          0

Normally the arguments for an autocall are generated automatically, based on the XSUB's parameter declarations. The C_ARGS keyword allows you to override this and manually specify the text that will be placed between the parentheses in the autocall. This is useful when the ordering and nature of parameters varies between Perl and C, without a need to write a CODE or PPCODE section.

The C_ARGS section consists of all lines of text until the next keyword or to the end of the XSUB, and is used without modification (except that any POD or XS comments will be stripped).

The CODE: Keyword

int
abs_double(int i)
  CODE:
    if (i < 0)
        i = -i;
    RETVAL = i * 2;
  OUTPUT:
    RETVAL

The CODE keyword is the usual mechanism for providing your own code as the main body of the XSUB. It is typically used when the XSUB, rather than wrapping a library function, is providing general functionality which can be more easily or efficiently implemented in C than in Perl. Alternatively, it can still be used to wrap a library function for cases which are too complex for autocall to handle.

Note that on entry to the CODE block of code, the values of any passed arguments will have been assigned to auto variables, but the original SVs will still be on the stack and accessible via ST(i) if necessary.

Similarly to autocall XSUBs, a RETVAL variable is declared if the return value of the XSUB is not void. Unlike autocall, you have to explicitly tell the XS compiler to generate code to return the value of RETVAL, by using the The OUTPUT keyword. (Requiring this was probably a bad design decision, but we're stuck with it now.) Newer XS parsers will warn if RETVAL is seen in the CODE section without a corresponding OUTPUT section.

A CODE XSUB will typically return just the RETVAL value (or possibly more items with the OUTLIST parameter modifiers). To take complete control over returning values, you can use the PPCODE keyword instead. Note that it is possible for a CODE section to do this too, by doing its own stack manipulation and then doing an XSRETURN(n) to return directly while indicating that there are n items on the stack. This bypasses the normal XSRETURN(1) etc that the XS parser will have planted after the CODE lines. But it is usually cleaner to use PPCODE instead.

Any lines following CODE until the next keyword (except POD and XS comments) are copied out as-is to the C code file. Multiple CODE keywords are not allowed.

The PPCODE: Keyword

# XS equivalent of: sub one_to_n { my $n = $_[0]; 1..$n }

void
one_to_n(int n)
  PPCODE:
    {
        int i;
        if (n < 1)
            Perl_croak_nocontext(
                "one_to_n(): argument %d must be >= 1", n);
        EXTEND(SP, n);
        for (i = 1; i <= n; i++)
            mPUSHi(i);
    }

The PPCODE keyword is similar to the CODE keyword, except that on entry it resets the stack pointer to the base of the current stack frame, and it doesn't generate any code to return RETVAL or similar: pushing return values onto the stack is left to the programmer. In this way it can be viewed as a lower-level alternative to CODE, when you want to take full control of manipulating the argument stack. The "PP" in its name stands for "PUSH/PULL", reflecting the low-level stack manipulation. PPCODE is typically used when you want to return several values or even an arbitrary list, compared with CODE, which normally returns just the value of RETVAL.

The PPCODE keyword must be the last keyword in the XSUB. Any lines following PPCODE until the end of the XSUB (except POD and XS comments) are copied out as-is to the C code file. Multiple PPCODE keywords are not allowed.

Typically you declare a PPCODE XSUB with a return type of void; any other return type will cause a RETVAL auto variable of that type to be declared, which will be otherwise unused.

On entry to the PPCODE block of code, the values of any declared parameters arguments will have already been assigned to auto variables, but the original SVs will still be on the stack and initially accessible via ST(i) if necessary. But the default assumption for a PPCODE block is that you have already finished processing any supplied arguments, and that you want to push a number of return values onto the stack. The simple one_to_n() example shown above is based on that assumption. But more complex strategies are possible.

There are basically two ways to access and manipulate the stack in a PPCODE block. First, by using the ST(i) macro, to get, modify, or replace the ith item in the current stack frame, and secondly to push (usually temporary) return values onto the stack. The first uses the hidden ax variable, which is set on entry to the XSUB, and is the index of the base of the current stack frame. This remains unchanged throughout execution of the XSUB. The second approach uses the local stack pointer, SP (more on that below), which on entry to the PPCODE block points to the base of the stack frame. Macros like mPUSHi() store a temporary SV at that location, then increment SP. On return from a PPCODE XSUB, the current value of SP is used to indicate to the caller how many values are being returned.

In general these two ways of accessing the stack should not be mixed, or confusion is likely to arise. The PUSH strategy is most useful when you have no further use for the passed arguments, and just want to generate and return a list of values, as in the one_to_n() example above. The ST(i) strategy is better when you still need to access the passed arguments. In the example below,

# XS equivalent of: sub triple { map { $_ * 3} @_ }

void
triple(...)
  PPCODE:
    SP += items;
    {
        int i;
        for (i = 0; i < items; i++) {
            int val  = (int)SvIV(ST(i));
            ST(i) = sv_2mortal(newSViv(val*3));
        }
    }

SP is first incremented to reclaim the passed arguments which are still on the stack; then one by one, each passed argument is retrieved, and then each stack slot is replaced with a new mortal value. When the loop is finished, the current stack frame contains a list of mortals, which is then returned to the caller, with SP indicating how many items are returned. Note that in this example the SP += items could have been done at the end instead. However, if the code was doing a mixture of updating the passed arguments and pushing extra return values, then setting it early (before the first push) would be important.

Before pushing return values onto the stack (or storing values at ST(i) locations higher than the number of passed arguments), it is necessary to ensure there is sufficient space on the stack. This can be achieved either through the EXTEND(SP, n) macro as shown in the one_to_n() example above, or by using the 'X' variants of the push macros, such as mXPUSHi(), which can be used to check and extend the stack by one each time. Doing a single EXTEND in advance is more efficient. EXTEND will ensure that there is at least enough space on the stack for n further items to be pushed.

If using the PUSH strategy, it is useful to understand in more detail how pushing and the local stack pointer, SP are implemented. The generated C file will have access to (among others) the following macro definitions or similar:

#define dSP       SV **sp = PL_stack_sp
#define SP        sp
#define PUSHs(s)  *++sp = (s)
#define mPUSHi(i) sv_setiv(PUSHs(sv_newmortal()), (IV)(i))
#define PUTBACK   PL_stack_sp = sp
#define SPAGAIN   sp = PL_stack_sp
#define dXSARGS   dSP; ....

The global (or per-interpreter) variable PL_stack_sp is a pointer to the current top-most entry on the stack, equal initially to &ST(items-1). On entry to the XSUB, the dXSARGS at its top will cause the sp variable to be declared and initialised. This becomes a local copy of the argument stack pointer. The standard stack manipulation macros such as PUSHs all use this local copy.

The XS parser will usually emit two lines of C code similar to these around the PP code block lines:

SP -= items;
... PP lines ...
PUTBACK; return;

This has the effect of resetting the local copy of the stack pointer (but not the stack pointer itself) back to the base of the current stack frame, discarding any passed arguments. The original arguments are still on the stack. PUSHs() etc will, starting at the base of the stack frame, progressively overwrite any original arguments. Finally, the PUTBACK sets the real stack pointer to the copy, making the changes permanent, and also allowing the caller to determine how many arguments were returned.

Any functions called from the XSUB will only see the value of PL_stack_sp and not SP. So when calling out to a function which manipulates the stack, you may need to resynchronise the two; for example:

PUTBACK;
push_contents_of_array(av);
SPAGAIN;

The EXTEND(SP,n) and mXPUSHfoo() macros will update both PL_stack_sp and SP if the extending causes the stack to be reallocated.

Note that there are several mPUSHfoo() macros, which generally create a temporary SV, set its value to the argument, and push it onto the stack. These are:

mPUSHs(sv)         mortalise and push an SV
mPUSHi(iv)         create+push mortal and set to the integer val
mPUSHu(uv)         create+push mortal and set to the unsigned val
mPUSHn(n)          create+push mortal and set to the num (float) val
mPUSHp(str, len)   create+push mortal and set to the string+length
mPUSHpvs("string") create+push mortal and set to the literal string
                     (perl 5.38.0 onwards)

The NOT_IMPLEMENTED_YET: Keyword

void
foo(int a)
    NOT_IMPLEMENTED_YET:

This keyword, as a fourth alternative to CODE, PPCODE and autocall, generates a main body for the XSUB consisting solely of the C code:

Perl_croak(aTHX_ "Foo::Bar::foo: not implemented yet");

The current implementation is quite buggy in terms of parsing and where the keyword can appear within an XSUB, so it's generally better to avoid it. It is documented here for completeness.

The XSUB Output Part

Following an XSUB's code part, any results may be post-processed and returned. Two keywords in particular support this: POSTCALL, which allows for a block of code to be added after any autocall in order to post-process return values from the call, and OUTPUT, which tells the parser to generate code to return the value of RETVAL or to update the values of one or more passed arguments.

These two optional keywords should each only be used once at most, and in that order; but due to a parsing bug (kept for backwards compatibility), they can appear in either order any number of times. But don't do that.

Note that the keywords described in "XSUB Generic Keywords" and "Sharing XSUB bodies" may also appear in this part.

The POSTCALL: Keyword

The POSTCALL keyword allows a block of code to be inserted directly after any autocall or CODE code block (although it's really only of use with autocall). It's typically used for cleaning up the return value from the autocall. For example these two XSUBs are equivalent:

int
foo(int a)
   POSTCALL:
     if (RETVAL < 0)
        RETVAL = 0

int
foo(int a)
   CODE:
     RETVAL = foo(a);
     if (RETVAL < 0)
        RETVAL = 0
  OUTPUT:
    RETVAL

The OUTPUT: Keyword

# Common usage:

OUTPUT:
  RETVAL

# Rare usage:

OUTPUT:
  arg0
  SETMAGIC: DISABLE
  arg1
  SETMAGIC: ENABLE
  arg2 sv_setfoo(ST[2], arg2)

The OUTPUT keyword can be used to indicate that the value of RETVAL should be returned to the caller on the stack, and/or that the values of certain passed Perl arguments should be updated with the current values of the corresponding parameter variables. Each non-blank line of the OUTPUT block should contain the name of one variable, with optional setting code, or a SETMAGIC: keyword with a value of ENABLE or DISABLE.

The common usage is to list just the RETVAL variable:

int
foo()
CODE:
    RETVAL = ...;
OUTPUT:
  RETVAL

It is needed for XSUBs containing a CODE block to tell the XS compiler to generate C code which will return the value of RETVAL to the caller. For autocall XSUBs, this is done automatically without the need for the OUTPUT keyword.

The second usage of OUTPUT is to specify parameters to be updated; this usage has been almost completely replaced by using the OUT parameter modifier. For example these two XSUBs have identical behaviours, but the second is the preferred form:

int
foo1(a)
   INPUT:
     int &a
   OUTPUT:
     a

int
foo2(IN_OUT int a)

They both cause output C code similar to this to be planted (with the first part derived from a typemap):

sv_setiv(ST(0), (IV)a);
SvSETMAGIC(ST(0));

which updates the value of the passed SV with the current value of a, and then calls the SV's set magic, if any: which will, for example, cause a tied variable to have its STORE() method called.

You can skip the planting of the SvSETMAGIC() magic call with SETMAGIC: DISABLE; in the example at the start of this section, arg0 and arg2 will have set magic, while arg1 won't. The SETMAGIC setting remains in force until another SETMAGIC, or notionally until the end of the current OUTPUT block. In fact the current setting will carry over into any further OUTPUT declarations within the same XSUB, or since xsubpp 3.58, only into any declarations within the same CASE branch.

The current setting of SETMAGIC is ignored for RETVAL, which is usually setting the value of a fresh temporary SV which won't have any attached magic anyway.

Finally, it is possible to override the typemap entry used to set the value of the temporary SV or passed argument from the RETVAL or other variables. Normally, in an XSUB like:

int
foo(int abc)
  OUTPUT:
    abc

the int type (via a two-stage lookup in the system typemap) will yield this output typemap entry:

sv_setiv($arg, (IV)$var);

which, after variable expansion, may yield

sv_setiv(ST(0), (IV)abc);

or similar. This can be overridden; for example

int
foo(int abc)
  OUTPUT:
    abc   my_setiv(ST(0), (IV)abc);

But importantly, unlike the similar syntax in INPUT lines, the override text is not variable expanded. It is thus tricky to ensure that the right arguments are used (such as ST(0)). Basically this feature has a design flaw and should probably be avoided. Since xsubpp 3.01 it's been possible to have locally defined typemaps using the TYPEMAP keyword which is probably a better way to modify how values are returned, for example,

typedef int myint
...
TYPEMAP: <<EOF
myint   T_MYINT
INPUT
T_MYINT
    $var = ($type)my_getiv($arg)
OUTPUT
T_MYINT
    my_setiv($arg, (IV)$var);
EOF

int
foo2(IN_OUT myint abc)

The XSUB Cleanup Part

Following an XSUB's output part, where code will have been planted to return the value of RETVAL and OUT/OUTLIST parameters, it's possible to inject some final clean-up code by using the CLEANUP keyword.

Note that the keywords described in "XSUB Generic Keywords" and "Sharing XSUB bodies" may also appear in this part.

The CLEANUP: Keyword

char *
foo(int a)
  CODE:
    RETVAL = get_foo(a);
  OUTPUT:
    RETVAL
  CLEANUP:
    free(RETVAL); /* assuming get_foo() returns a malloced buffer */

The CLEANUP keyword allows a block of code to be inserted directly after any output code which has been generated automatically or via the OUTPUT keyword. It can be used when an XSUB requires special clean-up procedures before it terminates. The code specified for the clean-up block will be added as the last statements in the XSUB before the final XSRETURN(1); or similar.

XSUB Generic Keywords

There are a few per-XSUB keywords which can appear anywhere within the body of an XSUB. This is because they affect how the XSUB is registered with the Perl interpreter, rather than affecting how the C code of the XSUB itself is generated. These are described in the following subsections. In addition there are a few more generic keywords which are described later under "Sharing XSUB bodies".

On aesthetic grounds, it is best to use these keywords near the start of the XSUB.

The PROTOTYPE: Keyword

int
foo1(int a, int b = 0)
  # this XSUB gets an auto-generated '$;$' prototype
  PROTOTYPE: ENABLE

int
foo2(int a, int b)
  # this XSUB doesn't get a prototype
  PROTOTYPE: DISABLE

int
foo3(SV* a, int b)
  # this XSUB gets the specified prototype:
  PROTOTYPE: \@$

int
foo4(int a, int b)
  # this XSUB gets a blank () prototype
  PROTOTYPE:

While the file-scoped PROTOTYPES keyword turns automatic prototype generation on or off for all subsequent XSUBs, the per-XSUB PROTOTYPE keyword overrides the setting for just the current XSUB. See the PROTOTYPES section for details of what a prototype is, and why you rarely need one.

This keyword's value can be either one of ENABLE/DISABLE to turn on or off automatic prototype generation, or it can specify an explicit prototype string, including the empty prototype.

The OVERLOAD: Keyword

MODULE = Foo PACKAGE = Foo::Bar

SV*
subtract(SV* a, SV* b, bool swap)
  OVERLOAD: - -=
  CODE:
    ...

The OVERLOAD keyword allows you to declare that this XSUB acts as an overload method for the specified operators in the current package. The example above is approximately equivalent to this Perl code:

package Foo::Bar;

sub subtract { ... }

use overload
    '-'  => \&subtract,
    '-=' => \&subtract;

The rest of the line following the keyword, plus any further lines until the next keyword, are interpreted as a space-separated list of overloaded operators. There is no check that they are valid operator names. The names and symbols will eventually end up within double-quoted strings in the C file, so double-quotes need to be escaped; in particular:

OVERLOAD: \"\"

This could be regarded as an implementation bug, but we're stuck with it now.

XSUBs used for overload methods are invoked with the same arguments as Perl subroutines would be: for example, an overloaded binary operator will trigger a call to the XSUB method with the first argument being an overloaded object representing one of the two operands of the binary operator; the second being the other operand (which may or may not be an object); and third, a swap flag. See overload for the full details of how these functions will be called, with what arguments. Note that swap can in fact be undef in addition to false, to indicate an assign overload such as +=. If this difference is important to your code, then declare swap as type SV* so that you can use SvOK() and SvTRUE() on it.

Bitwise operator methods sometimes take extra arguments: in particular under use feature 'bitwise'. So you may want to use an ellipsis (something like (lobj, robj, swap, ...)) to skip them.

The net effect of the OVERLOAD keyword is to add some extra code to the boot XSUB to register this XSUB as the handler for the specified overload actions, in the same way that use overload does for Perl methods.

See also the file-scoped FALLBACK keyword for details of how to set the fallback behaviour for the current package.

Note that OVERLOAD shouldn't be mixed with the ALIAS keyword; the value of ix will be undefined for any overload method call.

The "T_PTROBJ and opaque handles" section contains a fully-worked example of using the T_PTROBJ typemap to wrap a simple arithmetic library. The result of that wrapper allows you to write Perl code such as:

my $i2  = My::Num->new(2);
my $i7  = My::Num->new(7);
my $i13 = My::Num->new(13);

my $x = $i13->add($i7)->divide($i2);
printf "val=%d\n", $x->val();

Using overloading, we would like to be able to write those last two lines more simply as:

my $x = ($i13 + $i7)/$i2;
printf "val=%d\n", $x;

The following additions and modifications to that example XS code show how to add overloading:

FALLBACK: UNDEF

int
mynum_val(My::Num x, ...)
  OVERLOAD: 0+

My::Num
mynum_add(My::Num x, My::Num y, bool swap)
  OVERLOAD: +
  C_ARGS: x, y
  INIT:
    if (swap) {
        mynum* tmp = x; x = y; y = tmp;
    }

 # ... and three similar XSUBs for
 # mynum_subtract, mynum_multiply, mynum_divide ...

The FALLBACK line isn't actually necessary as this is the default anyway, but is included to remind you that the keyword can be used.

Overloading is added to the mynum_val() method so that it automatically returns the value of an object when used in a numeric context (such as for the printf above). The ellipsis is added to ignore the extra two arguments passed to an overload method.

The mynum_add() method from the T_PTROBJ example which, via aliasing, handled all four of the arithmetic operations, is now split into four separate XSUBs, since ALIAS and OVERLOAD don't mix.

The main change to each arithmetic XSUB, apart from adding the OVERLOAD keyword, is that there is an extra swap parameter. There's no real need to use it for addition and multiplication, but it is important for the non-commutative subtraction and division operations.

That example uses the T_PTROBJ typemap to process the second argument, which in the most general usage may not be an object. For example the second and third of these lines will croak with an Expected foo to be of type My::Num, got scalar error:

$i13 + My::Num->new(7);
$i13 + 7;
$i13 + "7";

If it is necessary to handle this, then you may need to create your own typemap: for example, something similar to T_PTROBJ, but with an INPUT template along the lines of:

T_MYNUM
    SV *sv = $arg;
    SvGETMAGIC(sv);
    if (!SvROK(sv)) {
        sv = sv_newmortal();
        sv_setref_pv(sv, "$ntype", mynum_new(SvIV($arg));
    }
    ....

Finally, although not directly related to XS, the following could be added to Num.pm to allow integer literals to be used directly:

sub import {
    overload::constant integer =>
        sub {
            my $str = shift;
            return My::Num->new($str);
        };
}

which then allows these lines:

my $i2  = My::Num->new(2);
my $i7  = My::Num->new(7);
my $i13 = My::Num->new(13);

to be rewritten more cleanly as:

my $i2  = 2;
my $i7  = 7;
my $i13 = 13;

The ATTRS: Keyword

MODULE = Foo::Bar PACKAGE = Foo::Bar

SV*
debug()
  ATTRS: lvalue
  PPCODE:
    # return $Foo::Bar::DEBUG, creating it if not already present
    # (NB: XPUSHs() not needed here as the stack always has one
    # allocated slot available when an XSUB is called):
    PUSHs(GvSV(gv_fetchpvs("Foo::Bar::DEBUG", GV_ADD, SVt_IV)));

The ATTRS keyword allows you to apply subroutine attributes to an XSUB in a similar fashion to Perl subroutines. The XSUB in the example above is equivalent to this Perl:

sub debug :lvalue { return $Foo::Bar::DEBUG }

and both can be called like this:

use Foo::Bar;
Foo::Bar::debug() = 99;
print "$Foo::Bar::DEBUG\n"; # prints 99

This keyword consumes all lines until the next keyword. The contents of each line are interpreted as space-separated attributes. The attributes are applied at the time the XS module is loaded. This:

void
foo(...)
  ATTRS: aaa
         bbb(x,y) ccc

is approximately equivalent to:

use attributes Foo::Bar, \&foo, 'aaa';
use attributes Foo::Bar, \&foo, 'bbb(x,y)';
use attributes Foo::Bar, \&foo, 'ccc';

User-defined attributes, just like with Perl subs, will trigger a call to MODIFY_CODE_ATTRIBUTES(), as described in attributes.

Note that not all built-in subroutine attributes necessarily make sense applied to XSUBs.

Currently the parsing of white-space is crude: bbb(x, y) is misinterpreted as two separate attributes, 'bbb(x,' and 'y)'.

The ATTRS keyword can't currently be used in conjunction with ALIAS or INTERFACE; in this case, the attributes are just silently ignored.

Sharing XSUB bodies

Sometimes you want to write several XSUBs which are very similar: they all have the same signature, have the same generated code to convert arguments and return values between Perl and C, and may only differ in a few lines in the main body or in which C library function they wrap. It is in fact possible to share the same XSUB function among multiple Perl CVs. For example, &Foo::Bar::add and &Foo::Bar::subtract could be two separate CVs in the Perl namespace which both point to the same XSUB, XS_Foo__Bar__add() say. But each CV holds some sort of unique identifier which can be accessed by the XSUB so that it can determine whether it should behave as add or subtract.

Both the ALIAS and INTERFACE keywords (described below) allow multiple CVs to share the same XSUB. The difference between them is that ALIAS is intended for when you supply the main body of the XSUB yourself (e.g. using CODE): it sets an integer variable, ix (derived from the passed CV), which you can use in a switch() statement or similar. Conversely, INTERFACE is intended for use with autocall; information stored in the CV indicates which C library function should be autocalled.

Finally, there is the CASE keyword, which allows the whole body of an XSUB (not just the CODE part) to have alternate cases. It can be thought of as a switch() analogue which works at the top-most XS level rather than at the C level. The value the CASE acts on could be items for example, or it could be used in conjunction with the ALIAS keyword and switch on the value of ix.

The ALIAS: Keyword

int add(int x, int y)
  ALIAS:
    # implicit: add = 0
    subtract = 1
    multiply = 2  divide = 3
  CODE:
    switch (ix) { ... }

Note that this keyword can appear anywhere within the body of an XSUB.

The ALIAS keyword allows a single XSUB to have two or more Perl names and to know which of those names was used when it was invoked. Each alias is given an integer index value, with the main name of the XSUB being index 0. This index is accessible via the variable ix which is initialised based on which CV (i.e. which Perl subroutine) was called.

Note that an XSUB may be shared by multiple CVs, and each CV may have multiple names. Given the add XSUB definition above, and given this Perl code:

use Foo::Bar;
BEGIN { *addition = *add }

Then in the Foo::Bar namespace, the entries add and addition point to the same CV, which has index 0 stored in it; while subtract points to a second CV with index 1, and so on. All four CVs point to the same C function, XS_Foo__Bar__add().

The alias name can be either a simple function name or can include a package name. The alias value to the right of the = may be either a literal positive integer or a word (which is expected to be a CPP define or enum constant).

The rest of the line following the ALIAS keyword, plus any further lines until the next keyword, are assumed to contain zero or more alias name and value pairs.

A warning will be produced if you create more than one alias to the same index value. If you want multiple aliases with the same value, then a backwards-compatible way of achieving this is via separate CPP defines to the same value, e.g.

#define DIVIDE   3
#define DIVISION 3

ALIAS:
  divide   = DIVIDE
  division = DIVISION

Since xsubpp 3.51, alias values may refer to other alias names (or to the main function name) by using => rather than the = symbol:

ALIAS:
  divide   =  3
  division => divide

Both alias names and => values may be fully-qualified:

ALIAS:
  red         =  1
  COLOR::red  => red
  COLOUR::red => COLOR::red

Note that any PREFIX is applied to the main name of the XSUB, but not to any aliases.

See "T_PTROBJ and opaque handles" for a fully-worked example using aliases.

See INTERFACE below for an alternative to ALIAS which is more suited for autocall. Note that ALIAS should not be used together with any of ATTRS, INTERFACE, or OVERLOAD.

The INTERFACE: Keyword

MODULE = Foo::Bar PACKAGE = Foo::Bar PREFIX = foobar_

int
arith(int a, int b)
    INTERFACE: foobar_add    foobar_subtract
               foobar_divide foobar_multiply

This keyword can appear anywhere within the initialisation part of an XSUB.

This keyword provides similar functionality to ALIAS, but is intended for XSUBs which use autocall. It allows a single XSUB to have multiple names in the Perl namespace which, when invoked, will call the correct wrapped C library function.

In the example above there is a single C XSUB function created (called XS_Foo__Bar_arith), plus four CVs in the Perl namespace called Foo::Bar::add etc. Calling Foo::Bar::add() from Perl invokes XS_Foo__Bar_arith() with some indication of which C function to call, which is then autocalled. ALIAS achieves this by storing an index value in each CV and making it available via the ix variable, while INTERFACE currently achieves this by storing a C function pointer in each CV. So the Foo::Bar::add() CV holds a pointer to the foobar_add() C function. The action of the XSUB is to extract the parameter values from the passed arguments and the function pointer from the CV, then call the underlying C function.

Note that storing a function pointer in the CV is an implementation detail which could change in the future. See "The INTERFACE_MACRO: Keyword" for details of how to customise the setting and retrieving of this value in the CV.

The rest of the line following the INTERFACE keyword, plus any further lines until the next keyword, are assumed to contain zero or more interface names, separated by white space (or commas).

An interface name is always used as-is for the name of the wrapped C function. If the name contains a package separator, then it will be used as-is to generate the Perl name; otherwise any prefix is stripped and the current package name is prepended. The following shows how a few such interface names would be processed (assuming the current PACKAGE and PREFIX are Foo::bar and foobar_):

Interface name     Perl function name   C function name
--------------     ------------------   ----------------
abc                Foo::Bar::abc        abc
foobar_abc         Foo::Bar::abc        foobar_abc
X::Y::foobar_def   X::Y::foobar_def     X::Y::foobar_def

Unlike ALIAS, the XSUB name is used only as the name of the generated C function; in the example above, it doesn't cause a Perl function called arith() to be created.

See "T_PTROBJ and opaque handles" for a complete example using INTERFACE with the T_PTROBJ typemap. But note that before xsubpp 3.60, INTERFACE would not work properly on XSUBs used with Perlish return types (as used by T_PTROBJ), such as

Foo::Bar
foo(...)
    ....

This has mostly been fixed in 3.60 onwards, but may generate invalid C code (in particular, invalid function pointer casts) for XSUBs having a C_ARGS keyword, unless the value of C_ARGS is a simple list of parameter names.

Note that INTERFACE should not be used together with either of ALIAS or ATTRS.

The INTERFACE_MACRO: Keyword

int
arith(int a, int b)
    INTERFACE:       add subtract divide multiply
    INTERFACE_MACRO: MY_FUNC_GET
                     MY_FUNC_SET

Note that this keyword is deprecated since it assumes a particular implementation for the INTERFACE keyword, which might change in future.

This keyword can appear anywhere within the input or initialisation parts of an XSUB.

By default, the C code generated by the INTERFACE keyword plants calls to two macros, XSINTERFACE_FUNC_SET and XSINTERFACE_FUNC, which are used respectively to set (at boot time) a field in the CV to the address of the C function pointer to use, and to retrieve (at run time) that value from the CV.

The INTERFACE_MACRO macro allows you to override the names of the two macros to be used for this purpose. The rest of the line following the INTERFACE_MACRO keyword, plus any further lines until the next keyword, should contain (in total) two words which are taken to be macro names.

The get macro takes three parameters: the return type of the function, the CV which holds the function's pointer value, and the field within the CV which has the pointer value. It should return a C function pointer. The setter macro has two parameters: the CV, and the function pointer.

Suppose that in the example above, pointers to the multiply(), divide(), add() and subtract() functions are kept in a global C array called arith_ptrs[] with offsets specified by the enum values multiply_off, divide_off, add_off and subtract_off. Then one could use:

#define MY_FUNC_GET(ret, cv, f) \
    ((XSINTERFACE_CVT_ANON(ret))arith_ptrs[CvXSUBANY(cv).any_i32])
#define MY_FUNC_SET(cv, f) \
    CvXSUBANY(cv).any_i32 = CAT2(f, _off)

to store an array index in the CV, rather than storing the actual function pointer.

The CASE: Keyword

int
foo(int a, int b = NO_INIT, int c = NO_INIT)
    CASE: items == 1
        C_ARGS: 0, a
    CASE: items == 2
        C_ARGS: b, a
    CASE:
        CODE:
            RETVAL = b > c ? foo(b, a) : bar(b, a);
        OUTPUT:
            RETVAL

The CASE keyword allows an XSUB to effectively have multiple bodies, but with only a single Perl name (unlike ALIAS, which has multiple names). Which body is run depends on which CASE expression is the first to evaluate to true. Unlike C's case keyword, execution doesn't fall though to the next branch, so there is no XS equivalent of the break keyword. The expression for the last CASE is optional, and if not present, acts as a default branch.

The example above translates to approximately this C code:

if (items < 1 || items > 3) { croak("..."); }

if (items == 1) {
    int RETVAL;
    int a = (int)SvIV(ST(0)); int b = /* etc */
    RETVAL = foo(0, a);
    /* ... return RETVAL as ST(0) ... */
}
else if (items == 2) {
    int RETVAL;
    int a = (int)SvIV(ST(0)); int b = /* etc */
    RETVAL = foo(b, a);
    /* ... return RETVAL as ST(0) ... */
}
else {
    int RETVAL;
    int a = (int)SvIV(ST(0)); int b = /* etc */
    RETVAL = b > c ? foo(b, a) : bar(b, a);
    /* ... return RETVAL as ST(0) ... */
}

XSRETURN(1);

Each CASE keyword precedes an entire normal XSUB body, including all keywords from PREINIT to CLEANUP. Generic XSUB keywords can be placed within any CASE body. The code generated for each if/else branch includes nearly all the code that would usually be generated for a complete XSUB body, including argument processing and return value stack processing.

Note that the CASE expressions are outside of the scope of any parameter variable declarations, so those values can't be used. Typical values which are in scope and might be used are the items variable which indicates how many arguments were passed (see "Ellipsis: variable-length parameter lists") and, in the presence of ALIAS, the ix variable.

Here's another example, this time in conjunction with ALIAS to wrap the same C function as two separate Perl functions, the second of which (perhaps for backwards compatibility reasons) takes its arguments in the reverse order. This is a somewhat contrived example, but demonstrates how the ALIAS keyword must be within one of the CASE branches (it doesn't matter which), as CASE must always appear in the outermost scope of the XSUB's body:

int
foo(int a, int b)
    CASE: ix == 0
    CASE: ix == 1
        ALIAS: foo_rev = 1
        C_ARGS: b, a

Note that using old-style parameter declarations in conjunction with INPUT allows the types of the parameters to vary between branches:

int
foo(a, int b = 0)
    CASE: items == 1
        INPUT:
            short a
    CASE: items == 2
        INPUT:
            long a

In practice, CASE produces bloated code with all the argument and return value processing duplicated within each branch, is not often all that useful, and can often be better written just by using a switch statement within a CODE block.

Using Typemaps

This section describes the basic facts about using typemaps. For full information on creating your own typemaps plus a comprehensive list of what standard typemaps are available, see the perlxstypemap document.

Typemaps are sets of rules which map C types such as int to logical XS types such as T_IV, and from there to INPUT and OUTPUT templates such as $var = ($type)SvIV($arg) and sv_setiv($arg, (IV)$var) which, after variable expansion, generate C code to convert back and forth between Perl arguments and C auto variables.

There is a standard system typemap file bundled with Perl for common C and Perl types, but in addition, you can add your own typemap file. From xsubpp 3.01 onwards you can also include extra typemap declarations in-line within the XS file.

Locations and ordering of typemap processing

Typemap definitions are processed in order, with more recent entries overriding any earlier ones. Definitions are read in first from files and then from TYPEMAP sections in the XS file.

When considering how files are located and read in, note that the XS parser will initially change directory to the directory containing the Foo.xs file that is about to be processed, which will affect any subsequent relative paths. Then any typemap files are located and read in. The files come from two sources: standard and explicit.

Standard typemap files are always called typemap and are searched for in a standard set of locations (relative to @INC and to the current directory), and any matched files are read in. These paths are, in order of processing:

"$_/ExtUtils/typemap" for reverse @INC

../../../../lib/ExtUtils/typemap
../../../../typemap
../../../lib/ExtUtils/typemap
../../../typemap
../../lib/ExtUtils/typemap
../../typemap
../lib/ExtUtils/typemap
../typemap
typemap

Note that searching @INC in reverse order means that typemap files found earlier in @INC are processed later, and thus have higher priority.

Explicit typemap files are specified either via xsubpp -typemap foo ... command line switches, or programmatically by an array passed as:

ExtUtils::ParseXS::process_file(..., typemap => ['foo',...]);

These files are read in order, and the parser dies if any explicitly listed file is not found.

Prior to xsubpp 2.09_01, @INC wasn't searched, and standard files were searched for and processed before any explicit ones. From 2.09_01 onwards, standard files were processed after any explicit ones. From 3.60 onwards, explicit files are again processed last, and thus take priority over standard files. Separately, in 3.01 onwards, TYPEMAP sections are then processed in order after all files have been processed.

Note also that ExtUtils::MakeMaker usually invokes xsubpp with two -typemap arguments: the first being the system typemap and the second being the module's typemap file, if any. This compensates for older Perls not searching @INC.

For a typical distribution, all this complication usually results in the typemap file bundled with Perl being read in first, then the typemap file included with the distribution adding to (and overriding) any standard definitions, then any TYPEMAP: entries in the XS file overriding everything.

Reusing, redefining and adding typemap entries

Both typemap files and TYPEMAP blocks can have up to three sections: TYPEMAP (which is implicit at the start of the file or block) and INPUT and OUTPUT. There is no requirement for all three sections to be present. Whatever is present is added to the global state for that section, either adding a new entry or redefining an existing entry.

Probably the simplest use of an additional typemap entry is to map a new C type to an existing XS type; for example, given this C type:

typedef enum { red, green, blue } colors;

then adding the following C-to-XS type-mapping entry to the typemap would be sufficient if you just want to treat such enums as simple integers when used as parameter and return types:

colors T_IV

Or you could override just an existing INPUT or OUTPUT template; for example:

OUTPUT
T_IV
    my_sv_setiv($arg, (IV)$var);

For a completely novel type you might want to add an entry to all three sections:

foo   T_FOO

INPUT
T_FOO
        $var = ($type)get_foo_from_sv($arg);

OUTPUT
T_FOO
        set_sv_to_foo($arg, $var);

Common typemaps

This section gives an overview of what common typemap entries are available for use. See the perlxstypemap document for a complete list, or examine the typemap file which is bundled with the Perl distribution. Also, see "T_PTROBJ and opaque handles" for a detailed dive into one particular typemap which is particularly useful for mapping between Perl objects and C handles. See "Returning Values from an XSUB" for a general discussion about returning one or more values from an XSUB, where typemaps can sometimes be of use (and sometimes aren't).

Standard signed C int types such as int, long and short, are all mapped to the T_IV XS type. Integer-like Perl types such as IV and I32 are also mapped to this. If a parameter is declared as something mapping to T_IV, then the IV value of the passed SV will be extracted (perhaps first converting a string value like "123" to an IV), then that value will be cast to the final C type, with the usual C rules for casting between integer types. Conversely, when returning a value, the C value is first cast to IV, then the SV is set to that IV value.

Similarly, common C and Perl unsigned types map to T_UV, and values are converted back and forth via (UV) casts. A few unsigned types such as U16 and U32 are instead mapped to T_U_SHORT and T_U_LONG XS types, but these have the same effect as T_UV.

The unsigned char type is treated similarly to other T_UV types, but char is treated as a string rather than an integer. A char parameter will treat its passed argument as a string and set the auto variable to the first byte of that string (which may produce weird results with UTF-8 strings). Returning a char value will return a one-character string to the Perl caller.

The char * type and its common variants are mapped to T_PV. Passed parameters will (via SvPV() or similar) return a string buffer representing that SV. This buffer may be part of the SV if that SV has a string value (or if it can be converted to a string value), or it may be a temporary buffer otherwise. For example, an SV holding a reference to an array might return a temporary string buffer with the value "ARRAY(0x12345678)". When an XSUB has a return type which maps to T_PV, the temporary SV which is to be returned gets assigned the current value of RETVAL, with the string's length being determined by strlen() or its equivalent.

See "Unicode and UTF-8" for the difficulties associated with handling UTF-8 strings.

The float, double and NV types map to T_FLOAT, T_DOUBLE and T_NV XS types, which all operate by converting to and from an SV via sv_setnv() and SvNV(sv) with suitable casting.

The SV* type maps to T_SV, which basically does no processing and allows you to access the actual passed SV argument.

T_PTROBJ and opaque handles

A common interface arrangement for C libraries is that some sort of create function creates and returns a handle, which is a pointer to some opaque data. Other function calls are then passed that handle as an argument, until finally some sort of destroy function frees the handle and its data.

The T_PTROBJ typemap is one common method for mapping Perl objects to such C library handles; see "T_PTROBJ" in perlxstypemap. Behind the scenes, it uses blessed scalar objects with the scalar's integer value set to the address of the handle. The INPUT code template of the T_PTROBJ typemap first checks that the passed SV argument is an RV referring to an object blessed into the Perl class (or a derived class) associated with the parameter's declared type, and then retrieves the pointer from the integer value of that object. The OUTPUT template creates a new blessed RV-to-SV with the handle address stored in it.

For the purposes of an example, we'll create here a minimal example C library called mynum, which we'll then proceed to wrap using XS. This library just stores an integer in its opaque data. In real life you would be wrapping an existing library which stores something more interesting, such as a complex number or a multiple precision integer.

The following sample library code might go in the initial 'C' part of the XS file:

typedef struct { int i; } mynum;

mynum* mynum_new(int i)
{
    mynum* x = (mynum*)malloc(sizeof(mynum));
    x->i = i;
    return x;
}

void   mynum_destroy  (mynum *x)
                      { free((void*)x); }

int    mynum_val      (mynum *x)
                      { return x->i; }

mynum* mynum_add      (mynum *x, mynum *y)
                      { return mynum_new(x->i + y->i); }

mynum* mynum_subtract (mynum *x, mynum *y)
                      { return mynum_new(x->i - y->i); }

mynum* mynum_multiply (mynum *x, mynum *y)
                      { return mynum_new(x->i * y->i); }

mynum* mynum_divide   (mynum *x, mynum *y)
                      { return mynum_new(x->i / y->i); }

The mynum struct holds the opaque handle data. The mynum_new() function creates a numeric value and returns a handle to it. The other functions then take such handles as arguments, including a destroy function to free a handle's data.

The following XS code shows an example of how this library might be wrapped and be made accessible from Perl via My::Num objects:

typedef mynum *My__Num;

MODULE = My::Num PACKAGE = My::Num PREFIX = mynum_

PROTOTYPES: DISABLE

TYPEMAP: <<EOF
My::Num T_PTROBJ
EOF

My::Num
mynum_new(class, int i)
    C_ARGS: i

void
DESTROY(My::Num x)
    CODE:
        mynum_destroy(x);

int
mynum_val(My::Num x)

My::Num
mynum_add(My::Num x, My::Num y)
  ALIAS: subtract = 1
         multiply = 2
         divide   = 3
  CODE:
    switch (ix) {
        case 0: RETVAL = mynum_add(x, y);      break;
        case 1: RETVAL = mynum_subtract(x, y); break;
        case 2: RETVAL = mynum_multiply(x, y); break;
        case 3: RETVAL = mynum_divide(x, y);   break;
    }
  OUTPUT:
    RETVAL

The XSUBs in this example are mostly declared with parameter and return types of My::Num which, as explained in "Fully-qualified type names and Perl objects", is looked up as-is in the typemap, but has s/:/_/g applied to the type name to convert it to the My__Num C type when used in the declaration of the XSUB's auto variables.

Going through this code in order: while still in the 'C' half of the XS file, we add a typedef which says that the My__Num C type is equivalent to a pointer to a handle from that arithmetic library.

Next, the MODULE line includes a mynum_ prefix, which means that the names of the XSUBs in the Perl namespace will be My::Num::new() etc rather than My::Num::mynum_new().

Then a TYPEMAP declaration is used to map the My::Num pseudo-type to the T_PTROBJ XS type.

Next comes the new() class method. This will be called from perl as My::Num->new(99); for example. Its first parameter will be the class name, which we don't use here, and the second parameter is the value to initialise the object to. The XSUB autocalls the library mynum_new() function with just the i value. This returns a handle, which the T_PTROBJ OUTPUT map converts into a blessed scalar ref containing the handle.

Next, the DESTROY() method is just a thin wrapper around mynum_destroy(), while val() returns the integer value of the object.

Finally, four binary functions are defined, sharing the same XSUB body via aliases. As an alternative, the code for the main XSUB could simplified using the INTERFACE keyword rather than using aliasing:

My::Num
arithmetic_interface(My::Num x, My::Num y)
    INTERFACE:
        mynum_add
        mynum_subtract
        mynum_multiply
        mynum_divide

but note that INTERFACE only supports Perlish return types such as My::Num from xsubpp 3.60 onwards.

This XS module might be accessed from Perl using code like this:

use My::Num;

my $i2  = My::Num->new(2);
my $i7  = My::Num->new(7);
my $i13 = My::Num->new(13);

my $x = $i13->add($i7)->divide($i2);
printf "val=%d\n", $x->val();  # prints "val=10"

See "The OVERLOAD: Keyword" for an example of how to extend this using overloading so that the expression could be written more simply as ($i13 + $i7)/$i2.

Note that, as a very special case, the XS compiler translates the XS typemap name using s/OBJ$/REF/ when looking up INPUT typemap entries for an XSUB named DESTROY. So for such subs, the T_PTRREF typemap entry will be used instead. This typemap is similar to T_PTROBJ, except that the class of the object isn't set or checked.

Using XS With C++

MODULE = Foo::Bar PACKAGE = Foo::Bar

# Class methods

X::Y*
X::Y::new(int i)

static int
X::Y::foo(int i)

# Object methods

int
X::Y::bar(int i)

int
X::Y::bar2(int i) const

void
X::Y::DESTROY()

# C-linkage function

extern "C" int
baz(int i)

XS provides limited support for generating C++ (as opposed to C) output files. Any XSUB whose name includes :: is treated as a C++ method. This triggers two main changes in the way the XSUB's code is generated:

This is mostly just syntactic sugar. The bar XSUB declaration above could be written longhand as:

int
bar(X::Y* THIS, int i)
    CODE:
        RETVAL = THIS->foo(i);
    OUTPUT:
        RETVAL

Note that the type of THIS (and, since xsubpp 3.55, CLASS) can be overridden with a line in an INPUT section:

int
X::Y::bar(int i)
    X::Y::Z *THIS

Finally, a plain C XSUB declaration can be prefixed with extern "C" to give that XSUB C linkage.

Some of the methods above might be called from Perl using code like this:

{
    my $obj = Foo::Bar->new(1);
    $obj->bar(2);
    # implicit $obj->DESTROY();
}

This example uses Foo::Bar rather than X::Y to emphasise that the name of the C++ class needn't follow the Perl package name.

The call to new() will pass the string "Foo::Bar" as the first argument, which can be used to allow multiple Perl classes to share the same new() method. In the simple worked example below, the package name is hard-coded and that parameter is unused. The new() method is expected to return a Perl object which in some way has a pointer to the underlying C++ object embedded within it. This is similar to the "T_PTROBJ and opaque handles" example of wrapping a C library which uses a handle, although with a subtle difference, as explained below.

Calling bar() passes this Perl object as the first argument, which the typemap will use to extract the C++ object pointer and assign to the THIS auto variable.

A complete C++ example

First, you need to tell MakeMaker or similar that the generated file should be compiled using a C++ compiler. For basic experimentation you may be able to get by with just adding these two lines to the WriteMakefile() method call in Makefile.PL:

CC => 'c++',
LD => '$(CC)',

but for portability in production use, you may want to use something like ExtUtils::CppGuess to automatically generate the correct options for ExtUtils::MakeMaker or Module::Build based on which C++ compiler is available.

Then create a .xs file like this:

#define PERL_NO_GET_CONTEXT

#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"

namespace Paint {
    class color {
        int c_R;
        int c_G;
        int c_B;
    public:
        color(int r, int g, int b) { c_R = r; c_G = g; c_B = b; }
        ~color()                   { printf("destructor called\n"); }
        int blue()                 { return c_B; }
        void set_blue(int b)       { c_B = b; };
        // and similar for red, green
    };
}

typedef Paint::color Paint__color;

MODULE = Foo::Bar PACKAGE = Foo::Bar

PROTOTYPES: DISABLE

TYPEMAP: <<EOF
Paint::color * T_PKG_OBJ

INPUT
T_PKG_OBJ
        SvGETMAGIC($arg);
        if (SvROK($arg) && sv_derived_from($arg, "$Package")) {
            IV tmp = SvIV((SV*)SvRV($arg));
            $var = INT2PTR($type,tmp);
        }
        else {
                const char* refstr = SvROK($arg)
                    ? "" : SvOK($arg) ? "scalar " : "undef";
            Perl_croak_nocontext(
                "%s: Expected %s to be of type %s; got %s%"
                SVf " instead",
                        ${$ALIAS?\q[GvNAME(CvGV(cv))]:\qq["$pname"]},
                        "$var", "$Package",
                        refstr, $arg
                );
        }

T_PKG_REF
        SvGETMAGIC($arg);
        if (SvROK($arg)) {
            IV tmp = SvIV((SV*)SvRV($arg));
            $var = INT2PTR($type,tmp);
        }
        else
            Perl_croak_nocontext("%s: %s is not a reference",
                        ${$ALIAS?\q[GvNAME(CvGV(cv))]:\qq["$pname"]},
                        "$var")

OUTPUT
T_PKG_OBJ
        sv_setref_pv($arg, "$Package", (void*)$var);

EOF

Paint::color *
Paint::color::new(int r, int g, int b)

int
Paint::color::blue()

void
Paint::color::set_blue(int b)

void
Paint::color::DESTROY()

In the C part of the XS file (or in this case, the C++ part), a trivial example C++ class is defined. This would more typically be a pre-existing library with just the appropriate #include. The example includes a namespace to make it clearer when something is a namespace, class name or Perl package. The Perl package is called Foo::Bar rather than Paint::color to again distinguish it. You could however call the Perl package Paint::color if you desired.

A single typedef follows to allow for XS-mangled class names, as explained in "Fully-qualified type names and Perl objects".

Then the MODULE line starts the XS part of the file.

Then there follows a full definition of a new typemap called T_PKG_OBJ. This is actually a direct copy of the T_PTROBJ typemap found in the system typemap file, except that all occurrences of $ntype have been replaced with $Package. It serves the same basic purpose as T_PTROBJ: embedding a pointer within a new blessed Perl object, and later, retrieving that pointer from the object. The difference is in terms of what package the object is blessed into. T_PTROBJ expects the type name (Paint::color) to already be a pointer type, but with a C++ XSUB, the implicit THIS argument is automatically declared to be of type Paint::color * (so Paint::color itself isn't necessarily a pointer type). In addition, when the Perl and C++ class names differ we want the object to be blessed using the Perl package name, not the C++ class name. In this example, the actual values of the two variables when the typemap template is being evalled, are:

$ntype   = "Paint::colorPtr";
$Package = "Foo::Bar";

The typemap also includes an INPUT definition for T_PKG_REF, which is an exact copy of T_PTRREF. This is needed because, as an optimisation, the XS parser automatically renames an INPUT typemap using s/OBJ$/REF/ if the name of the XSUB is DESTROY, on the grounds that it's not necessary to check that the object is the right class.

Finally the XS file includes a few XSUBs which are wrappers around the class's methods.

This class might be used like this:

use Foo::Bar;

my $color = Foo::Bar->new(0x10, 0x20, 0xff);
printf "blue=%d\n", $color->blue();   # prints 255
$color->set_blue(0x80);
printf "blue=%d\n", $color->blue();   # prints 128

Safely Storing Static Data in XS

You should generally avoid declaring static variables and similar mutable data within an XS file. The Perl interpreter binary is commonly configured to allow multiple interpreter structures, with a complete set of interpreter state per interpreter struct. In this case, you usually need your "static" data to be per-interpreter rather than a single shared per-process value.

This becomes more important in the presence of multiple threads; either via use threads or where the Perl interpreter is embedded within another application (such as a web server) which may manage its own threads and allocate interpreters to threads as it sees fit.

A macro framework is available to XS code to allow a single C struct to be declared and safely accessed. Behind the scenes, the struct will be allocated per interpreter or thread; on non-threaded Perl interpreter builds, the macros gracefully degrade to a single global instance. These macros have MY_CXT ("my context") as part of their names.

It is therefore strongly recommended that these macros be used by all XS modules that make use of static data.

When creating a new skeleton Foo.xs file, you can use the --global option of h2xs to also include a skeleton set of macros, e.g.

h2xs -A --global -n Foo::Bar

Below is a complete example module that makes use of the macros. It tracks the names of up to three blind mice.

#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#include "ppport.h"

#define MAX_NAME_LEN 100

/* Global Data */

#define MY_CXT_KEY "BlindMice::_guts" XS_VERSION

typedef struct {
    int count;
    char name[3][MAX_NAME_LEN+1];
} my_cxt_t;

START_MY_CXT

MODULE = BlindMice  PACKAGE = BlindMice

PROTOTYPES: DISABLE

BOOT:
{
    MY_CXT_INIT;
    MY_CXT.count = 0;
}

int
AddMouse(char *name)
  PREINIT:
    dMY_CXT;
  CODE:
    if (strlen(name) > MAX_NAME_LEN)
        croak("Mouse name too long\n");

    if (MY_CXT.count >= 3) {
        warn("Already have 3 blind mice");
        RETVAL = 0;
    }
    else {
        RETVAL = ++MY_CXT.count;
        strcpy(MY_CXT.name[MY_CXT.count - 1], name);
    }
  OUTPUT:
    RETVAL

char *
get_mouse_name(int mouse_num)
  PREINIT:
    dMY_CXT;
  CODE:
    if (mouse_num < 1 || mouse_num > MY_CXT.count)
        croak("There are only %d blind mice.", MY_CXT.count);
    else
        RETVAL = MY_CXT.name[mouse_num - 1];
  OUTPUT:
    RETVAL

void
CLONE(...)
  CODE:
    MY_CXT_CLONE;

The main points from this example are:

MY_CXT macros reference

MY_CXT_KEY

This macro is used to define a unique key to refer to the static data for an XS module. The suggested naming scheme, as used by h2xs, is to use a string that consists of a concatenation of the module name, the string ::_guts and the module version number:

#define MY_CXT_KEY "MyModule::_guts" XS_VERSION
my_cxt_t

The "static" values should be stored within a struct typedef which must always be called my_cxt_t. The other *MY_CXT* macros assume the existence of the my_cxt_t typedef name. For example:

typedef struct {
    int some_value;
    int some_other_value;
} my_cxt_t;
START_MY_CXT

This macro contains hidden boilerplate code. Always place the START_MY_CXT macro directly after the declaration of my_cxt_t.

MY_CXT_INIT

The MY_CXT_INIT macro initializes storage for the my_cxt_t struct.

It must be called exactly once, typically in a BOOT: section. If you are maintaining multiple interpreters, it should be called once in each interpreter instance, except for interpreters cloned from existing ones. (But see "MY_CXT_CLONE" below.)

dMY_CXT

Use the dMY_CXT macro (a declaration) at the start of all the XSUBs (and other functions) that access MY_CXT.

MY_CXT

Use the MY_CXT macro to access members of the my_cxt_t struct. For example, if my_cxt_t is

typedef struct {
    int index;
} my_cxt_t;

then use this to access the index member:

dMY_CXT;
MY_CXT.index = 2;
aMY_CXT/pMY_CXT

dMY_CXT may be quite expensive to calculate, and to avoid the overhead of invoking it in each function it is possible to pass the declaration onto other functions using the argument/parameter aMY_CXT/pMY_CXT macros, e.g.:

void sub1() {
    dMY_CXT;
    MY_CXT.index = 1;
    sub2(aMY_CXT);
}

void sub2(pMY_CXT) {
    MY_CXT.index = 2;
}

Analogously to pTHX, there are equivalent forms for when the macro is the first or last in multiple arguments, where an underscore is expanded to a comma where appropriate, i.e. _aMY_CXT, aMY_CXT_, _pMY_CXT and pMY_CXT_. These allow for the possibility that those macros might optimise away any actual argument without leaving a stray comma.

MY_CXT_CLONE

When a new interpreter is created as a copy of an existing one (e.g. via threads->create()), then by default, both interpreters share the same physical my_cxt_t structure. Calling MY_CXT_CLONE (typically via the package's CLONE() function), causes a byte-for-byte copy of the structure to be taken (but not a deep copy) and any future dMY_CXT will cause the copy to be accessed instead.

This is typically used within the CLONE method which is called each time an interpreter is copied (usually when creating a new thread). Other code can be added to CLONE() to deep copy items within the structure.

MY_CXT_INIT_INTERP(my_perl)
dMY_CXT_INTERP(my_perl)

These are variants of the MY_CXT_INIT and dMY_CXT macros which take an explicit perl interpreter as an argument.

Note that these macros will only work together within the same source file; that is, a dMY_CXT in one source file will access a different structure than a dMY_CXT in another source file.

EXAMPLES

Fairly complete examples of XS files can be found elsewhere in this document:

while "SYNOPSIS" contains an overview of an XS file and perlxstut contains various worked examples.

You can of course look at existing XS distributions on CPAN for inspiration, although bear in mind that many of these will have been created before this document was rewritten in 2025, and so may not follow current best practices.

Note that when wrapping a real library, you'll often need to add a line like this to the .xs file:

#include <foobar.h>

and add entries like:

LIBS  => ['-lfoo', '-lbar'],

to Makefile.PL or similar. And don't forget to add test scripts under t/.

CAVEATS

Use of standard C library functions

Often, the Perl API contains functions which you should use instead of the standard C library ones. See perlclib.

Event loops and control flow

Some modules have an event loop, waiting for user-input. It is highly unlikely that two such modules would work adequately together in a single Perl application.

In general, the perl interpreter views itself as the center of the universe as far as the Perl program goes. XS code is viewed as a help-mate, to accomplish things that perl doesn't do, or doesn't do fast enough, but always subservient to perl. The closer XS code adheres to this model, the less likely conflicts will occur.

XS VERSION

This document covers features supported by xsubpp 3.61.

AUTHOR DIAGNOSTICS

As of xsubpp 3.49, a few parser warnings are disabled by default. While developing you can set $ENV{AUTHOR_WARNINGS} to true in your environment or in your Makefile.PL, or set $ExtUtils::ParseXS::AUTHOR_WARNINGS to true via code, or pass author_warnings=>1 into process_file() explicitly. Currently this will enable stricter alias checking but more warnings might be added in the future. The kind of warnings this will enable are only helpful to the author of the XS file, and the diagnostics produced will not include installation specific details so they are only useful to the maintainer of the XS code itself.

AUTHOR

Originally written by Dean Roehrich <roehrich@cray.com>. Completely rewritten in 2025.

Maintained since 1996 by The Perl Porters, <perl5-porters@perl.org>.