How fast can construct small list of strings in C for Python?

How fast can construct small list of strings in C for Python?

Daniel Lemire's blog

Python is probably the most popular programming language in the world right now. Python is easy to extend using C code.

You may want to return from Python a small data structure. When crossing from C to Python, there is an overhead. Thus, if performance is a concern, you do not want to return lots of small values such as hundreds of individual small strings.

However, you may want to return a list of strings such as

['zero elephant', 'one elephant is having fun', 'two elephants are having fun',
'three elephants are meeting with the president', 'four elephants',
'five elephants in an hexagon', 'six elephants are playing the saxophone',
'seven elephants are visiting the school', 'eight elephants are at Church',
'nine elephants are having a party']

I am going to assume that these strings are in an array called ‘numbers’ in my C code. Of course, if the strings are known at compile time, I could just precomputed the array and return it. However, I want to generate the content dynamically.

A reasonable C function which will return an list is as follows:

PyObject* all_strings(PyObject* self) {
  size_t len = sizeof(numbers) / sizeof(numbers[0]);
  PyObject* pyList = PyList_New(len);
  for (size_t i = 0; i < len; ++i) {
    PyObject* pyString = PyUnicode_FromString(numbers[i]);
    PyList_SET_ITEM(pyList, i, pyString);
  }
  return pyList;
}

It looks a bit complicated, but it is not. It is just a application of the Python C functions. The key element is the PyUnicode_FromString function which automatically takes a pointer to an UTF-8 string, and converts it into a Python string. In our case, all our strings are ASCII, so the function does more work than needed. In this instance, I know ahead of time the size of the list (hence PyList_New(len)), but I could also have constructed an empty list (PyList_New(0)) and appended strings to it as needed.

Can we do better? We might. Python introduced lower-level string construction functions. You start by constructing a string that cannot be resized, and you must tell it whether the string content fits in one byte (ASCII or Latin1), or whether it requires more storage per character (UTF-16 without surrogate pairs) or whether you need the full Unicode range. In my case, it easy: my strings are pure ASCII. In the general case, you would need to ascertain the string type (a functionality we plan to add to the simdutf library).

The code becomes slightly more complicated, but barely so…

PyObject* all_strings_new(PyObject* self) {
  size_t len = sizeof(numbers) / sizeof(numbers[0]);
  PyObject* pyList = PyList_New(len);
  for (size_t i = 0; i < len; ++i) {
    const char * str = numbers[i];
    size_t len = strlen(str);
    PyObject* py_string = PyUnicode_New(len, 127);
    Py_UCS1* data = PyUnicode_1BYTE_DATA(py_string);
    memcpy(data, str, len);
    PyList_SET_ITEM(pyList, i, py_string);
  }
  return pyList;
}

I wrote a small benchmark. You need Python and a development environment to build it. I ran the benchmark on my Apple M2 laptop, using LLVM 15 and Python 3.12. Your results will vary.

I my case, I get a very significant boost (2x) from the lower-level approach.

My data structure contains 295 bytes of strings, so the lower-level approach generates strings at over 2 GB/s, whereas the conventional approach has half the speed.

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page