Reflection-based JSON in C++ at Gigabytes per Second

Reflection-based JSON in C++ at Gigabytes per Second

Daniel Lemire's blog

JSON (JavaScript Object Notation) is a popular format for storing and transmitting data. It uses human-readable text to represent structured data in the form of attribute–value pairs and arrays. E.g., {"age":5, "name":"Daniel", toys:["wooden dog", "little car"]}. Ingesting and producing JSON documents can be a performance bottleneck. Thankfully, a few JSON parsers such as simdjson have shown that we can process JSON at high speeds, reaching gigabytes per second.

However, producing and ingesting JSON data can remain a chore in C++. The programmer often needs to address potential errors such as unexpected content.

Yet, often, the programmer only needs to map the content to and from a native C/C++ data structure. E.g., our JSON example might correspond to the following C++ structure:

struct kid {
  int age;
  std::string name;
  std::vector<std::string> toys;
};

Hence, we would often like to write simple code such as

kid k; store_json(k, out)

or

kid k; load_json(k, in)

Ideally, we would like this code to be fast and safe. That is, we would like to write and read our data structures at gigabytes per second. Further, we want proper validation especially since the JSON data might come from the Internet.

A library such as simdjson already allows elegant deserialization, e.g., we can make the following example work:

struct kid {
  int age;
  std::string name;
  std::vector<std::string> toys;
};


void demo() {
  auto json_str = 
R"({"age": 12, "name": "John", "toys": ["car", "ball"]})"_padded;
  simdjson::ondemand::parser parser;
  auto doc = parser.iterate(json_str);
  kid k = doc.get<kid>();
}

However, for the time being, it requires a bit of effort on the part of the programmer, as they need to write a custom function.

Similarly, going from the C++ structures to JSON can be made convenient, but typically only after the programmer has done a bit of work, writing custom code. E.g., in the popular JSON for Modern C++ (nlohmann/json) library, we need to provide glue functions:

void to_json(json&j, const person&p) {
  j = json{{"name", p.name}, {"address", p.address}, 
    {"age", p.age}};
}


void from_json(const json&j, person&p) {
  j.at("name").get_to(p.name);
  j.at("address").get_to(p.address);
  j.at("age").get_to(p.age);
}

We know that this extra effort (writing data-structure aware code) is unnecessary because other programming languages (Java, C#, Zig, Rust, Python, etc.) allow us to ‘magically’ serialize and deserialize with almost no specialized code.

One key missing feature in C++ is sufficiently powerful reflection. Reflection in programming languages refers to a mechanism that allows code to introspect its own structure. That is, if you cannot conveniently tell C++ that if a class has a std::string name attribute, then you should automatically search for a string value matching the key name in the JSON.

Thankfully, C++ is soon getting reflection in C++26. And it is getting powerful reflection: reflective metaprogramming. That is, future versions of C++ will allow

us to solve the serialization and deserialization problem we described at compile time: a software library can automate the production and consumption of JSON to and from a native data structure. Such a feature will simplify the life of the C++ programmer. Furthermore, because it can be done automatically as part of well tested library at compile time, it can generate fast and safe code.

Herb Sutter wrote about this exiciting development:

This is huge, because reflection (including generation) will be by far the most impactful feature C++ has ever added since C++98, and it will dominate the next decade and more of C++ usage.

Though many programming languages have reflection, the reflective metaprogramming provided by the upcoming C++ standard is unique to our knowledge. Some programming languages have long had powerful reflection capabilities, but they are mostly a runtime feature: they do not allow the equivalent of generating code. Other programming languages allow compile-time logic generation but they do not always allow optimal performance. We expect that the C++ approach might be especially appropriate for high performance.

Though current C++ compilers do not yet support reflection, we have access to prototypical compilers. In particular, engineers from Bloomberg maintain a fork of LLVM, the popular compiler framework. Though such an implementation should not be used in production and is subject to bugs and other limitations, it should be sufficient to test the performance: how fast can serialization-based JSON processing be in C++? Importantly, we are not immediately concerned with compilation speed: the Bloomberg fork is deliberately suboptimal in this respect.

To test out this new C++ feature, we wrote a prototype that can serialize and deserialize data structures to and from JSON strings automatically. Given an instance of `kid`, we can convert it to a JSON string without any effort, the compile-time reflection does the work:

kid k{12, "John", {"car", "ball"}};
std::print("My JSON is {}\n", simdjson::json_builder::to_json_string(k));

Benchmark Results with Speed-up

Automatically converting your C++ data types into JSON strings is convenient. But is it fast? To find out, we have used 2 instances for benchmarking, an “Artificial instance” that was obtained using the JSON Generator Website and the other one is the twitter.json file that is used by many other serialization libraries for benchmarking.

We test on two systems (Apple and Intel/Linux) with the experimental LLVM Bloomberg compiler.

M3 MAX Macbook Pro:

Intel Ice Lake:

Our benchmark showed that we are roughly 20× faster than nlohmann/json. This was achieved by combining the new reflection capabilities with some bit twiddling tricks and a string_builder class to help minimize the overhead of memory allocations. Importantly, our implementation requires little code. The C++26 reflection capabilities do all the hard work. In our benchmarks, we further compare against hand-written functions as well as an existing C++ reflection library (reflect-cpp). Hand-written code can be slightly faster than our reflection-based code, but the difference is small (10% to 20%). We can be twice as fast as reflect-cpp although it is also a very fast library.

Conclusion

Our results suggest that C++26 with reflection is a powerful tool that should allow C++ programmers to easily serialize and deserialize data structures at gigabytes per second. In the near future, the simdjson library will adopt reflection for both serialization and deserialization. Indeed, we expect that this will help users get high performance in some important cases with little effort.

Credit

Joint work with Francisco Geiman Thiesen.

We are grateful to @the-moisrex for teaching us about tag dispatching and initiating its adoption in the simdson library.

Appendix: complete code example

struct kid {
  int age;
  std::string name;
  std::vector<std::string> toys;
};


/**
 * The following function would print:
 *
 * I am 12 years old
 * I have a car
 * I have a ball
 * My name is John
 * My JSON is {"age":12,"name":"John","toys":["car","ball"]}
 *
 */
void demo() {
  auto json_str = R"({"age": 12, "name": "John", "toys": ["car", "ball"]})"_padded;
  simdjson::ondemand::parser parser;
  auto doc = parser.iterate(json_str);
  kid k = doc.get<kid>();
  std::print("I am {} years old\n", k.age);
  for (constauto &toy : k.toys) {
    std::print("I have a {}\n", toy);
  }
  std::print("My name is {}\n", k.name);
  std::print("My JSON is {}\n",    simdjson::json_builder::to_json_string(k));
}

References:

  1. Langdale, Geoff, and Daniel Lemire. Parsing gigabytes of JSON per second, The VLDB Journal 28.6 (2019): 941-960.
  2. Keiser, John, and Daniel Lemire. On‐demand JSON: A better way to parse documents?, Software: Practice and Experience 54.6 (2024): 1074-1086.
  3. Wyatt Childers, Peter Dimov, Dan Katz, Barry Revzin, Andrew Sutton, Faisal Vali, Daveed Vandevoorde, Reflection for C++26, WG21 P2996R3, 2024-05-22


Generated by RSStT. The copyright belongs to the original author.

Source

Report Page