Christoph Breitkopf: Functional Valhalla?

Christoph Breitkopf: Functional Valhalla?

Planet Haskell (noreply@blogger.com (bokesan))

Pointer-rich data layouts lead to suboptimal performance on modern hardware. For an excellent introduction to this, see the article The Road to Valhalla. While it is specifically about Java, many parts of the article also apply to other languages. To summarize some of the key points of the article:

  • In 1990, a main memory fetch was about as expensive as an arithmetic operation. Now, it might be a hundred times slower.
  • A pointer-rich data layout involving indirections between data at different locations is not ideal for today’s hardware.
  • A language should make flat (cache-efficient) and dense (memory-efficient) memory layouts possible without compromising abstraction or type safety.

Consider a vector of records (or tuples, structures, product types - I’ll stay with “record” in this article). A pointer-rich layout has each record allocated separately in the heap, with a vector containing pointers to the records. For example, given a “Point” record of two numbers:

The flat and dense layout has the records directly in the array:

(Note that there is another flat layout, namely, using one vector per field of the record. This is better suited to instruction-level parallelism or specialized hardware (e.g., GPUs), especially when the record fields have different sizes. But it is less suited for general-purpose computing, as reading a single vector element requires one memory access per field, whereas the “vector of records” layout above requires only one access per record. Such a layout can be easily implemented in any language that has arrays of native types, whether in the language itself or in a library (e.g., OCaml’s Owl library). Thus, in this article, I will only consider the “array of records” layout above.)

Functional language considerations

Things should be much easier in functional languages than in Java: we have purity, referential transparency, and everything is a value. So it should be simple enough to store these values in memory in their native representation. But there are reasons that that is often not the case in practice:

  • Lazyness: a value can be a computation that produces a value only when needed.
  • Layout polymorphism: unless we replicate the code for every type (as, for example, Rust does), we need to be able to store every possible value in the same kind of slot.
  • Dynamically typed languages require type information at runtime.
  • Functional languages often have automatic memory management, which may require runtime type information.
  • Many of our languages are not purely functional, but contain impure features.
  • Pure languages often lack traditional vectors or arrays, since making them perform well in immutable code is not easy.
  • Historical reasons: Graph reduction was a common implementation technique for lazy languages, and graphs involve pointers.
  • Implementation restrictions: not being mainstream, fewer resources are devoted to implementation and optimization.

Many implementations can not even lay out native types flat in records, so a Point record of IEEE 754 double-precision numbers may actually look like this in memory:

The (very short) List

So, given a record type, which functional languages allow a collection of values of that type to have a flat, linear memory layout? The number of programming languages that claim to be “functional” is huge, so the ones listed here are just a selection based on my preferences - mainly languages that allow that layout, and some I have some experience with and can speculate on how easy or hard it would be to add that as a library or extension.

Since the Point record can be misleading in its simplicity when it comes to the question of whether the functionality could be implemented as a library, I’ll point out that there are records where the layout is a bit more interesting:

  • Records containing different types with different storage sizes, for example, one 64-bit float and one 32-bit integer. On most architectures, this will require 4 bytes of padding between elements.
  • Records containing native values along with something that has to be represented as a pointer, for example, a reference-type or a lazy value. In a flat layout, this means that every nth element will be a pointer, requiring special support from the memory management system, either by providing layout information or by using a conservative GC that treats everything as a potential pointer.

Pure languages:
Clean

Yes: Clean has unboxed arrays of records in the base language.

Caveat: it does not have integer types of specific sizes and only one floating-point type, making it harder to reduce memory usage by using the smallest type just large enough to support the required value range. It seems possible to implement such types in a library (the mTask system does that).

Futhark

No. Futhark does not intend to be a general-purpose language, so this is not surprising.

I mention it here because it does have arrays of records, but, since it targets GPUs and related hardware, it uses the “record of arrays” layout mentioned above.

Haskell

Yes. Not in the base language, but there is library support via Data.Vector.Unboxed. Types that implement the Unbox type class can be used in these vectors. Many basic types and tuples have an Unbox instance. However, when you care about efficiency, you probably do not want to use tuples but rather a data type with strict fields, i.e., not:

type Point = (Double, Double)

but:

data Point = Point !Double !Double

Writing an Unbox instance for such a type is not trivial. The vector-th-unbox library makes it easier, but requires Template Haskell. Unboxed vectors are implemented by marshalling the values to byte arrays, so records with pointer fields are not supported.

Impure Languages
F#

Yes, even records with pointer fields. Records have structural equality, and you can use structs or the [<Struct>] attribute to get a flat layout.


And that’s all I could find. Unless I follow Wikipedia's list of functional programming languages, which contains languages such as C++, C#, Rust, or Swift, that allow the flat layout, but don’t really fit my idea of a functional language. But SML, OCaml, Erlang (Elixir, Gleam), Scala? Not that I could see (but please correct me if I’m wrong).

Rolling your own

Since there is a library implementation for Haskell, maybe that’s a possibility for other languages?

You should be able to implement flat layouts in any language that supports byte vectors. More interesting is how well such a library fits into the language, and whether a user of the library has to write code or annotations for user-defined record types, or whether the library can handle part or all of that automagically.

I’ll only mention my beloved Lisp/Scheme here. Lisp’s uniform syntax and macro system are a bonus here, but the lack of static typing makes things harder.

In Scheme, R6RS (and R7RS with the help of some SRFIs) has byte-vectors and marshalling to/from them in the standard library. But Scheme does not have type annotations, so you either need to offer a macro to define records with typed fields or to define how to marshal the fields of a regular (sealed) record. Since you can shadow standard procedures in a library, you can write code that looks like regular Scheme code, but, perhaps surprisingly, loses identity when storing/retrieving values from records:

(let ((vec (make-typed-vector 'point 1000))
(pt (make-point x y)))
(vector-set! vec 0 pt)
(eq? (vector-ref vec 0) pt))
⇒ #f

(But then, you probably shouldn’t be using eq? when doing functional programming in Scheme).

The same approach is possible in Common Lisp. In contrast to Scheme, it does have optional type annotations, and, together with a helper library for accessing the innards of floats and either the meta-object protocol to get type information or (probably better) a macro to define typed records, an implementation should be reasonably straightforward. Making it play nice with inheritance and the dynamic nature of Common Lisp (e.g., adding slots to classes or even changing an object's class at runtime) would be a much harder undertaking.

Conclusion

Of the functional languages I looked at, only F# fully supports flat and dense memory layouts. Among the pure languages, Haskell and Clean come close.

The question is how important this really is. There’s a good argument to be made for turning to more specialized languages like Futhark if you mainly care about performance. On the other hand, having a uniform codebase in one language also has advantages.

Then, the performance story has changed, too. While the points Project Valhalla raises remain true in principle, processor designers are aware of this as well. They are doing their best to hide memory latency with techniques such as out-of-order execution or humongous caches. Thus, on a modern CPU, the effects of a pointer-rich layout are often only observable with large working set sizes.

Still, given the plethora of imperative language that can get you to Valhalla, support for this in the functional landscape seems lacking. In the future, I hope to see more languages or libraries that will make this possible.

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page