This article discusses the general topic of serialization in .NET and gives an overview of the implementation details in FsPickler.
Serialization in the .NET framework is something often considered as being problematic or even broken. This could be tracked to the following reasons:
There is no runtime or type system support for serialization. Rather, serializers are library implementations that mostly depend on the reflection system to extrapolate serialization rules for each type (resulting in runtime errors if this is not possible).
There is confusion as to which is the prefered methodology when defining serialization semantics for types. The BCL itself comes with a multitude of approaches most of which are dated, unsafe and inefficient by design.
The BCL comes with the following patterns for defining serializable types:
Field-based serialization: objects are pickled by serializing the contents of their fields. This behaviour is tuned by applying an assortment of attributes such as
IDeserializationCallbackinterface. This is arguably the most problematic approach, since it is sensitive to internal implementation details. Incidentally, this is the pattern of choice for most F# types.
Property-based or DataContract serialization: objects are pickled by serializing the contents of specially flagged properties. Properties should be settable and are usually required to be public. In most cases, a parameterless constructor is also necessary. Property-based serialization is probably the most popular approach.
ISerializable types: possibly intended as a type-safe replacement of field-based serialization. It fails to live up to this role, mostly due to the nature of interfaces; it still requires a special constructor for deserialization which, if missing, would result in a runtime error. It requires separate implementations for serialization and deserialization, often leading to bugs and introducing lots of boilerplate code. Its dynamic nature also makes it impossible to reason about the components it serializes a priori. It is however an indispensible pattern since it is used by many BCL types, most important being exceptions, whose metadata is impossible to serialize otherwise.
The BCL ships with a couple of general-purpose serializer libraries but these suffer either from obsolescence (e.g. BinaryFormatter) or performance (NetDataContractSerializer). Many third-party libraries have emerged attempting to address these shortcomings, some with success. However, most of them seem to suffer when it comes to properly supporting serialization of .NET objects. In particular many of them:
offer sketchy or no support for reflection types, like
- require library-specific dependencies for serializable type definitions.
- do not support inaccessible types/constructors/properties.
- fail to address subtype polymorphism.
- do not support types like multi-dimensional arrays or delegates.
- fail to properly handle cyclic objects.
- have limited or nonexistent support for F# types.
The need for a new serialization library was realized while developing for MBrace. MBrace is a framework for distributed computation and big data that is essentially built on the idea of a distributed continuation monad. To be able to distribute continuations, you need to be able to serialize closures. The inherently intrinsic and arbitrary nature of closures means that they are not supported by most serialization libraries. Initially we started using NetDataContractSerializer as our library of choice, but that eventually took its toll on overall performance, so we decided to build a new library from scratch.
FsPickler was created with two goals in mind: performance and completeness in supported objects. Importantly, it was conceived as a .NET serializer, rather than an F# serializer: if it is a .NET object designed to be serialized, it should be serializable in FsPickler. In that sense, it was designed with a goal to embrace the imperfect world of .NET serialization in its totality, albeit with an eye for correctness.
At this point, we should acknowledge Anton Tayanovskyy and his great F# snippet Union-Friendly Generic Binary Serializer which served as the initial inspiration for this library.
FsPickler is founded on the functional programming notion of picklers and pickler combinators. The concept was originally described in Andrew Kennedy's Pickler Combinators. A pickler is essentially the type:
1: 2: 3: 4: 5:
Read functions represent serialization and deserialization
implementations for values of type
'T. When compared to other .NET serializers,
picklers prove exceptionally fast since they directly handle the serialized value
without any reliance on reflection-based resolution.
Pickler combinators are functions that operate on picklers, generating picklers of more complex types. For instance, the following could be a pickler combinator implementation for rank-1 arrays:
1: 2: 3: 4: 5:
Given a sufficient range of primitive picklers and pickler combinators, it is possible to define pickling rules for most essentially serializable .NET types. However, these would still need to be declared manually by their implementor.
FsPickler attempts to solve this by providing an automated pickler generation framework: picklers are generated at runtime and on demand using a combination of reflection and dynamic IL generation. Picklers are cached for future use, hence the cost of generation has a constant price.
Moreover, it extends the concept of picklers so that features particular to the .NET framework are accommodated: object orientation and subtype polymorphism, null references and cyclic objects.
It should be mentioned for the sake of completeness that a similar effort has been undertaken by the Scala community. There are differences of approach however, since this mostly relies on the metaprogramming facility offered by scala.
So what qualifies as a serializable type in FsPickler?
The short answer would be any type
'T for which we present an instance of
To make the question more meaningful: for what types does the library auto-generate picklers?
This is something that can only be answered at runtime, typically by calling
Primitive types and
string are serializable but pointers, COM objects and MarshalByRef types are not.
A managed class or struct is serializable if and only if it carries the
Serializable flag or attribute
and moreover satisfies either of the following:
- is array/nullable/tuple/enum of serializable element types.
ISerializableinterface and has a matching constructor implementation.
DataContractattribute and all
DataMembermembers are of serializable type. Properties carrying the
DataMemberattribute must be settable. Parameterless constructors or public properties are not required.
- is an F# union or record type in which all contained fields are serializable.
CustomPicklerattribute and an accompanying pickler factory method.
is neither of the above and all its contained fields are of serializable type. Parameterless constructors or public fields are not required.
We now discuss how the functional programming concept of picklers can be reconciled with the object-oriented .NET framework. It is useful to note here that reference type serialization essentially means object graph serialization. Object graphs in .NET can often contain null references or cycles, which ought to be fully supported. FsPickler automatically augments reference types so that such occurences are handled effectively.
Object graphs are serialized by performing depth-first traversal;
this approach improves performance by eliminating unnecessary
boxings in object queues but is sensitive to stack overflow conditions.
for keeping track of traversed references and detecting cycles.
Cyclic objects are re-instantiated lazily through shallow field copying.
FsPickler discriminates between reference types that are sealed and those that are not sealed. A value of non-sealed type can either be instance of this type or of a proper subtype. In the context of open hierarchies, it is clear that a type-fixed pickler implementation simply cannot cater to arbitrary subtype instances.
For this reason, picklers will only be used to serialize values whose reflected type is identical to the pickler's. When coming across an object of non-sealed type, FsPickler will examine its reflected type; if found to differ, a new pickler will be generated on-the-fly for that particular subtype.
The above implies that
obj is serializable in FsPickler, but
will only be used with
obj() instances. Similarly, picklers for interfaces
or abstract classes are possible, but on their own are incapable of serializing anything.
This is how serialization of F# closures is made possible:
for every internal closure type generated by the F# compiler,
a special pickler is generated for handling it.
Conceptually, pickler implementations found in the library can roughly be separated into the following categories:
Atomic picklers: self containing pickler implementations for types like primitives, strings,
System.Guid. These do little other than delegating serialization to the underlying pickle format implementation.
Pickler combinators: as described earlier, these are parametric functions generating picklers out of simpler components. Examples are option, list, array and tuple picklers.
Pickler generators: for nominal types that are not expressions on known types, a different generation strategy is required. The type structure is determined using reflection and a new pickler is constructed from that information. FsPickler can emit dynamic methods to ensure that no reflection or boxing penalty is incurred during actual serialization. Pickler generators can be further separated into the following kinds:
- Field-based pickler generation.
- DataContract pickler generation.
- ISerializable pickler generation.
- F# union/record/exception pickler generation.
So how does FsPickler generate a pickler (or fail trying) for a given type parameter? I will attempt to explain by giving a simplified example:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30:
The above declarations define a notion of "type shape", a partitioning of types according
to certain traits satisfied. This pattern gives naturally rise to a "type shape visitor",
which we will be making use of later. Given an arbitrary type
'T, we can use reflection to extract
and construct the shape it fits in, albeit in a packaged manner:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19:
We can now use the type shape visitor to plug in all required pickler combinators. This will result in a generator function that recursively calls appropriate pickler combinators in a strongly typed manner:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20:
generate function can now be used for auto-generating derivative picklers:
1: 2: 3: