Adding extension module examples to the packaging user guide

Continuing the discussion from PEP 803: Stable ABI for Free-Threaded Builds (packaging thread):

As discussed, this should be a new topic. My comment from the original thread:

In particular, the backends @rgommers mentioned are:

  • CMake
  • Meson
  • scikit-build-core
  • meson-python
  • maturin

To be honest, I thought scikit-build-core was the build backend corresponding to CMake, and the same for meson-python and Meson. And maturin is a slightly different case, as it’s for Rust rather than C extensions. So there’s only really two build backends for C code here, I think?

Nevertheless, I think having examples in the packaging guide that take an extremely simple C extension[1], and describe how to package it up using setuptools, scikit-build-code, and meson-python, would be really useful. By having a common description, with “tabbed” sections where you can pick your backend and see the differences, you get a great idea of all the common tasks, with a clear view of just how much (or ideally, little) difference the backends make.

With pure Python, I think this approach has been very effective, making it very clear that 99% of building a python package is backend-agnostic, and even the remaining 1% is very similar. This guides people to be less afraid of just choosing a backend based on what UI they prefer, and other “superficial” details :slightly_smiling_face:

To be clear, I’d strongly recommend not covering things like building for multiple platforms, bundling dependent libraries, cibuildwheel and auditwheel, in the basic page. By all means add them in a later section, but we should be showing that “write some C code and make it into an extension” isn’t fundamentally a hard problem[2].

Maybe even having an example of using Cython to wrap a C library would be useful. Again, that’s a common task that should be seen as relatively simple. But that might be a different section. Let’s keep the requirements for “Hello, world” minimal.

In case it’s not obvious, I’m very frustrated by the mystique that seems to have grown up around building C extensions, making it seem arcane and difficult. Being able to easily write your main code in Python, and the performance-critical bits in C, was what made Python so popular, and we’ve lost a lot of that these days.


  1. Having the same for Rust would also be great, but with only one backend, and PyO3 being the core interface library, I imagine anyone writing a Rust extension already knows to look at the maturin and PyO3 docs. ↩︎

  2. As opposed to publishing that code for a wide audience, which is… :scream: ↩︎

15 Likes

I’ve considered doing this a bit differently: given a simple extension, list the raw compiler command line you’d use, on all the tier-1 platforms (i.e. intentionally a very limited list). Then (or rather before that), link to an external guide.

I’d consider it CPython’s job to document how you’d build a tool like setuptools or maturin – even if the nitty-gritty platform specifics are left as the proverbial excercise to the reader.

The packaging user guide (https://packaging.python.org) is acting as that “external guide” in this context. I’d expect the raw compiler command line to be something the core documentation (specifically Extending and Embedding the Python Interpreter) covered.

3 Likes

Agreed that this is overdue. I think it’d be good to identify the pages of the packaging guide that need updating in this thread, and the essence of what to change - surely there’ll be lots of opinions.

You are correct, that’s why I wrote build backed/system in the other thread. The functionality is split across a (thin) build backend and a (much larger) build system. In addition, the situation is a little more complex for CMake-using projects, since there are at least three common ways of using CMake with a build backend:

  • scikit-build-core: the modern, recommended way
  • scikit-build: the legacy build backend. also still relies on setuptools
  • Using setuptools with a custom CMakeBuildExt(build_ext) type class that handles calling cmake internally. Not recommended, but quite commonly used.

And there’s also cmeel as another CMake-using backend. Furthermore, there are other less well-known backends for C/C++ like enscons (based on SCons) and pymsbuild (good for MSVC/Windows).

For Rust, maturin is the recommended build backend, but there’s also setuptools-rust.

For Fortran, meson-python or one of the ways of using cmake are your only two choices, setuptools has no support.

Agreed. I think having at least basic guidance for a simple package that uses either C or C++, possibly via Cython/pybind11/nanobind, is necessary (the advice and tradeoffs will be the same for all those cases). I’d also add short notes on Rust/Fortran, but perhaps not give examples.

I would consider using the stable ABI by default now. Either that, or planning to do so once pybind11 has support and Python 3.12 is a commonly used mininum supported version. Because it has matured, and it avoids the “you need to build 60 wheels” problem - that has gotten really out of hand.

I do think there’s many differences well beyond “superficial”. meson-python and scikit-build-core are similar in performance, usability and stability - and both way way better than setuptools in all of those respects. It’s long overdue to state that in the official guide - new projects should pick one of those two backends. I remember from the pure Python version that there was a big bikeshed about what the default backend should be in the tabbed selector (it ended up being hatchling, see here), so I’ll avoid making a proposal for now beyond “not setuptools”. In addition, we should consider whether more niche backends like enscons, cmeel and pymsbuild should be included - or even whether their authors want that.

5 Likes

I have an old project with a few Cython extensions, I modernize it many times, but I could never get rid of setup.py because every documentation or tutorial I found about “modern” tools only targeted pure python projects. I was wondering where the “glue language” aspect went, when everything seems so python-centric…

4 Likes

I added a bit of documentation for both scikit build and Meson when I was documenting the limited API support: The Limited API and Stable ABI — Cython 3.3.0a0 documentation . Hopefully that’s enough to get started a bit, although it’s largely just copied from their own tutorials with some small changes to enable limited API.

But you’re right that most of the documentation is on setuptools and that probably isn’t what most people are using now (hopefully). I’m not-very-secretly hoping that someone with a little more knowledge of these tools will come and improve it…. Or maybe we can just link to an improved Python packaging guide.

But I believe that both of these options are relatively straightforward to use with Cython, their own documentation isn’t too bad, and the people are happily using them. But as far as I’m concerned that largely happens out of sight for me.

3 Likes

At least for the Rust side, what about explaining what maturin and setuptools-rust are and why you need them, and then link to their docs for actually using them, Similar to Building and distribution - PyO3 user guide ? packaging.python.org feels like the wrong location for tool-specific documentation to me, especially since there’s so many things that need to be explained and documented. It’s valuable to have an official page that tells you which tools exist and how they fit into the python packaging architecture (e.g., there’s build backends and there’s build frontends), but I’m not sure if we should “vendor” e.g. the maturin getting started from the pyo3 or what maturin’s Readme explains.

For context, this is the current guidance (Tool recommendations - Python Packaging User Guide):

For packages with extension modules, it is best to use a build system with dedicated support for the language the extension is written in, for example:

  • Setuptools – natively supports C and C++ (with third-party plugins for Go and Rust),
  • meson-python – C, C++, Fortran, Rust, and other languages supported by Meson,
  • scikit-build-core – C, C++, Fortran, and other languages supported by CMake,
  • Maturin – Rust, via Cargo.

An overview about the C/C++ build options with more context and more opinionated guidance would really help beginners. From my experience in supporting users, I’d especially like to recommend a non-setuptools solution as first option. Many (new) users still seems to regard setuptools as the default and then struggles with its complexity or accidentally implement broken workflows, in ways that other tools often prevent. [1]


  1. Disclosure: I’m the former maintainer of setuptools-rust and the author of maturin, a non-setuptools competitor to setuptools-rust. ↩︎

3 Likes

That’s my ulterior motive as well. I don’t write C code much these days, but I would like a simple tutorial that showed me how I could wrap a C library as a Python extension, just for my own use, without needing to fight with setuptools. The classic “Hello, world” idea of giving you just enough to find out how to lay out your files, what commands to run, etc.

Let me frame this from an end user perspective, so that the experts have something to start from. Let’s say I have a trivially simple C function in adder.c:

int add_10(int n) {
    return n+10
}

I want to create a Python module adder that contains this function, so adder.add_10(5) gives 15.

I can get the boilerplate I need to make a module from the Python C API documentation, but (a) it’s pretty daunting, and (b) I’d just as happily write a little Cython wrapper. If I’m doing that, do I just write a adder.pyx file containing the following?

cdef extern from "adder.c":
    int add_10(int n)

If not, what do I need? I tried reading the Cython documentation, and it was surprisingly hard to find the answer. Maybe that’s OK, because Cython is a language of its own, and “wrapping C functions” is not its core feature. But in the PUG, starting with a C function and wrapping it is (I claim) the baseline we should be starting from, so maybe the PUG needs to cover that aspect of Cython specifically.

Anyway, now I have my C code wrapped, and I need to write a pyproject.toml. Most of that will be standard, with just the build_backend value being different from pure Python.

… and what else? How do I tell my build backend that it needs to invoke Cython? How do I tell it where my C compiler is? I’m on Windows, and I have Visual C installed, but typically it’s not on my PATH, because I don’t use it often and the vcvars scripts don’t interact well with my normal shell startup - I’d argue that’s pretty normal, the average Python user will install C to write a C extension, but won’t necessarily have a full C development workflow set up.

Get that sorted, and then I’m at the point where I can run py -m build. And that’s all I really need for a “Packaging C code” tutorial.

Note that I didn’t say anything here about which build backend I’d choose. I’d want the tutorial to show the various options, and I’d pick based on which one looks easiest. If I have to read the CMake documentation and write a CMakeList file, I’m picking something else[1]. If I have to declare where my Python interpreter is installed, and where the header files or libs are, I’m looking for a simpler backend. Like it or not, from what I recall setuptools worked all of that out for you, and that’s the baseline I’m looking at. I want to use a more modern build backend, but it’s more important to me that I don’t need to learn a bunch of things up front that I don’t need to. I’ll learn them later, when it matters.

In my experience, extension building (and more generally, C build system) documentation tends to be written by experts who really care about all the options and details, but who forget that the new user really isn’t interested - all they want is to get something trivial working right now. That’s the beauty of the “Hello, world” approach - no-one actually needs a program to display “Hello, world”, but once you’ve written that, you have established all of the groundwork to learn the rest of the details at your own pace, and based on what you are interested in.


  1. I’m willing to copy/paste a boilerplate one to start with, if I must. But I don’t want to have to understand it - that’ll come later, if at all. ↩︎

9 Likes

I’d 100% support a section in the PUG that does a “Hello, world” type extension in Rust. IMO it should be separate from the C and Python ones, though.

So the process for a reader goes:

  1. You want to write a package for Python. Great, you’re in the right place. First question, what language do you want to code in?
  2. Pick the page for your language. We’ll assume you know how to write code in your language, so let’s start with a trivial function, because you can extrapolate from there to more complex cases. So now, you need to know what you put “around” that function to make it a Python module. Here’s the basic structure, including all the files you need (notably pyproject.toml).
  3. Depending on your language, you may have a choice of build backends. Picking one is largely a matter of personal choice - for the basic example they should all be essentially the same.

And that’s it. You have a working extension, even if it doesn’t do much (yet). When you want to go deeper, look at your build backend’s documentation for thoughts on where to go next. Also, if you weren’t writing your package in Python, there’s probably some “wrapper” involved (Cython, PyO3, pybind11, …), go to your wrapper’s documentation when you want to get more complex.

4 Likes

This would be actually a fantastic addition.

I’m not sure how much CPython folks are exposed to C code wrapping situation but everyone is quite confused about cffi, ctypes, Cython, pybind11/nanobind + extern C, and many other ways that are still very high in the google searches.

Typically, folks write or find the C library first and then they are left with “how am I going to link it to Python now”. Then they find many options each are slightly outdated but enough to throw wrenches to the gearbox.

Just to explain it a bit more concretely, use cases involve “I want to pass a dict or a set from Python and let C read those and spit out some result”. It is not immediately “how am I going to use (insert some tool name) ”. But the current situation out there is first defining the tool then let you fit your code into that tool. Hence the examples would serve greatly if it includes actual items from CPython API and not just look at the fibonacci umpteenth time. In particular, things like PyArg_ParseTuple , Py_Err_xxx, PyDict_GetItemString are not so easy to pick up on. Not to mention borrowed and strong reference details surrounding these.

Hence what is really puzzling many is not “which tool” but “how do I link C and Python together”. In other words, as nicely put above, “how do I make the glue”. That’s why the adder example above does not reduce the confusion as intended because then it becomes “picking a tool” issue. But something like this does a lot in terms of reducing the confusion; it doesn’t even need an actual C function, because people would already know

#define PY_SSIZE_T_CLEAN
#include "Python.h"
#include "libIwanttolinkto.h"

struct {
    ....
} the_struct_C_lib_exposes_t;

double func_I_want_to_use(struct....., const int a, double x, double y);

static PyObject* myfunc(PyObject* Py_UNUSED(dummy), PyObject* args)
{
    PyObject* input_dict = NULL;
    double result = 0.0;
    ....
    if (!PyArg_ParseTuple(args, "O!...",
        &PyDict_Type, (PyObject **)&input_dict,  // O!
        ....
    ) {
        return NULL;
    }
    ...
    result = func_I_want_to_use(......)
    ...
    return Py_BuildValue(....)
}

static char doc_myfunc[] = ("This is my wrapped function...");

static struct PyMethodDef mylib_module_methods[] = {
    {"myfunc", myfunc, METH_VARARGS, doc_myfunc},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef_Slot mylib_module_slots[] = {
    {Py_mod_multiple_interpreters, Py_MOD_PER_INTERPRETER_GIL_SUPPORTED},
    {Py_mod_gil, Py_MOD_GIL_NOT_USED},
    {0, NULL},
};

static struct PyModuleDef moduledef = {
    .m_base = PyModuleDef_HEAD_INIT,
    .m_name = "mylib",
    .m_size = 0,
    .m_methods = mylib_module_methods,
    .m_slots = mylib_module_slots,
};


PyMODINIT_FUNC
PyInit_mylib(void)
{
    import_array();
    return PyModuleDef_Init(&moduledef);
}

This is pretty much the canonical example for creating the extension module I would have loved to see back in the day. From CPython side, you can insert as much recent additions like multi-phase init or subinterpreter stuff in this example. OK this is good then where do I say compile this and somehow make `from mylib import myfunc`.

We can start telling folks it used to be only setuptools and here is how to do it, OR! “this is quite some laborious thing to do, so why not using other tools if your functions are simple enough; here is Cython…” and so on.

All this is to say, the examples should be for the non-initiated, and not a notepad excerpt for the experts to remember once in a while.

Wholeheartedly agree. But the block between a non-initiated and the daily C extension author is huge and I think we are suffering a bit from expert blindness.

5 Likes

(personal opinion, though I am a member of CPython’s C API working group)

  • We want you to use those.
  • However, we can’t be responsible for their bugs and shortcomings
  • We do need to document, and test, how to write an extension “manually”. This indirectly documents how to write a wrapper/generator.
  • The “raw” C API should not be hidden. We don’t expect every wrapper to document every single function it (re-)exposes; that’s CPython’s job.

I’ll start rewriting the tutorial. I’d love your initial reviews (hopefully) next week.

4 Likes

Those are the responsibility of the core Python documentation to explain, not the packaging guide. The reason I’m asking for a trivial example is because we shouldn’t be replicating the core docs, we should focus on packaging.

I’m definitely not an expert here - that’s why I’m saying what I want and hoping the experts can help provide it :slightly_smiling_face: But I also want to ensure that we end up with focused documents that don’t overwhelm the reader (me!) with more than they need at any one time.

1 Like

Yes, this is absolutely a problem-focused issue not a tool-focused issue.

The matrix of tools is pretty complex, but I think it’s totally reasonable for the CPython core docs to demonstrate using “raw” C code and “raw” compiler commands, and the packaging docs to cover more of the options. We should point from the core docs to the packaging docs here as well, since core’s job is to provide the reference docs, but most users probably shouldn’t be using them (though some will).

As far as packaging docs go, an example like Ilhan’s implemented across the ~top 5 (popularity/active) languages/libraries (probably C, pybind, nanobind, Cython, PyO3) and ~top 5 backends (probably setuptools, Meson, Maturin, scikit-build-core, ???), also showing some kind of compatibility matrix for which backend can handle which languages, ought to be enough to illustrate the selection criteria users should go through.

FYI: Modernising "Building C and C++ Extensions" · Issue #108064 · python/cpython · GitHub (it’s not active, but it’s there for this task)

3 Likes

I agree again. But packaging is using these tools. I can’t create a wheel if I cannot make this glue work. That is the missing link as I tried to demonstrate.

It does not need to demonstrate Cython etc. But it needs to start from somewhere that the link is established first and then you can talk about packaging that code. Currently whoever is able to figure things out will not need the packaging example. But people get stuck in the glue step and never arrive at packaging. So the context is unfortunately and entirely missing .

1 Like

Just to be clear here, while I totally support better documentation of ways to implement an extension module, that is not the topic of this thread, as I intended it. Part of that is probably my fault - by suggesting a trivial C function plus a Cython wrapper as my canonical “trivial example”, I opened the door to questions around “how do I write a C extension”. That was my mistake. Maybe we should simply say that the starting point is your extension source code, in a file called ext.c, and leave it at that[1].

The problem I want this topic to focus on is, given I have the extension source, how do I package it into a wheel? At the moment, all we have is Packaging binary extensions - Python Packaging User Guide, which is far too wordy, goes off on all sorts of tangents[2], and doesn’t actually address the practical question of what files I need to write, and what I should put in them.

So what’s wanted is a document similar to Packaging Python Projects - Python Packaging User Guide but for native extensions. In fact, much of the content can simply be references back to the pure Python case, as the process is identical (everything in pyproject.toml that isn’t related to the build backend, for a start!). But the critical new information, that simply isn’t available at the moment, is “how do I tell the build system about my source code?”

It’s not a “missing link”. The problem is that there’s nothing at the moment for it to link to.

Step 1 is have an idea/problem that needs a native extension. You probably have a core algorithm that you know how to write in C.
Step 2 is find out how to integrate that C code with Python, interacting with Python types, exposing a Python API, etc. That’s what the core Python documentation covers.
Step 3 is to work out how to package that code for distribution. At the moment there’s nothing that tells you how to do that. The packaging user guide only talks about pure Python code, and maybe you can get a hint that what you want is a build backend that handles native code. But beyond that, you’re on your own.

My point is that we need to provide something covering step 3. You’re focusing on step 2, and while I agree it’s hard to find the information you need, it’s not missing. The core Python documentation may not be ideal, but at least it exists!

Please - can we split “improving the documentation of the Python C API” into its own thread? This thread was explicitly created to discuss documenting how to package extensions, and that goal is getting lost.


  1. I’d be disappointed, because there’s non-trivial questions around how to integrate pre-processing steps like Cython, but I’d rather not get that documented than have the whole effort get sidetracked onto improving the tutorials for the C API ↩︎

  2. Seriously, it discusses “maybe use PyPy rather than writing a C extension”. How is that about packaging an extension? ↩︎

I think we are agreeing again. However everytime this argument is involved, namely

This is not about X but Y.

We are adding more confusion to the piles. There is no need for topic hygiene when we are making tutorials and other details. I think that is an artificial division and makes things exceedingly complicated which is probably why we have at least 5 tools which attempt to do essentially the same thing.

Everytime I engage with any discussion it is coming to the same thing. The topic is not this but that but in fact I have to claim that the topic is precisely all of this working in harmony. You might think otherwise I can guarantee you nobody can come into Python ecosystem, figure out the C library details, then write a glue code I provided or Cython pieces you want to add and then figure out the build system and the backend and then fix the PyPI conda details. This is simply impossible with the tutorials we have. It is always secondary sources and lots of back and forth in StackOverflow, reddit, LLMs etc. That’s why people don’t move to modern practices because these sources dominate any changes CPython implements and it takes a decade for those sources to move.

Having said that, I am known to lose all the discussions in this front. But I would not find it interesting afterwards why you would say there is a mystique around these topics. This is why. The examples you have in mind above these days are covered by Numba and other JIT compilers because it is orders of magnitude easier to use Numba instead of this extension maze we are forcing folks to navigate in.

3 Likes

Sorry, I don’t mean to discourage you from contributing.

There’s a very practical difference I’m trying to emphasise (and apparently failing). The topics target different documents.

  • Discussions about writing native extensions will end up in the core Python documentation, on docs.python.org, maintained by the core developers.
  • Discussions about packaging native extensions will end up in the packaging user guide, on packaging.python.org, maintained by the packaging community.

My concern is that we keep discussions distinct so that the people maintaining each set of documents can focus on the discussions that they are concerned about, and not have to pick out relevant information from a “combined” discussion.

It looks like @encukou has the “writing native extensions” side of the discussion under control, which is why I’d rather we focus more on the “packaging native extensions” side. I’m not dismissing your concerns, just noting that they’ve been picked up and are being worked on.

3 Likes
2 Likes

Nice! Is there any chance of including that in the packaging user guide?

No problem, I didn’t take it as such either.

I understand but what I am failing to communicate is also that these are not separate subjects as folks tend to believe. The packaging and the native extensions are all in the same basket. Just because, say, SciPy or NumPy figured it out, does not mean everyone else now understands where binaries live or how things are used.

Consequently, folks start to start telling tales about how everything is a mess in Python and this is bad, that is slow etc. This is one of those cases where we are consistently failing everyone for documentation hygiene’s sake.

The absence of such combined discussion is the problem I’m trying to draw your attention to. If we want more people writing extension modules, or generating more glue, then these need to come together or sit very close to each other. Not just for C but also for Rust, Go etc.

Next to the Scientific Python pages, there is also pypackaging from a few years back

Why so many communities penned this much packaging tutorials is telling in my opinion.

1 Like