Records in C# v9.0 and .Net 5.0. Steve Love explains a new way to define types in C#.
The release of .Net 5.0 in November 2020 was a major upgrade, bringing .Net Core and .Net Framework together under a single banner with a number of improvements, updates, and fixes. The release also represents an update to the C# language, and .Net 5.0 brings C# to version 9.0. One of the flagship features of C# v9.0 is the record
, a new way to create user-defined types.
C# has supported classes and structs since it was introduced in the early 2000s, and the general concept of user-defined types goes back to the 1960s. A reasonable question to ask, then, is why does C# need a new way to create user-defined types?
In this article we’ll look at what records are useful for, and how to use them in our own code. We’ll contrast them with classes and structs, the other main ways of creating user-defined types in C#, with some common use-cases and examples. We will also look at some of the performance characteristics of records at a high level.
Let’s begin with some examples of their main features.
What is a record?
Records are a light-weight way to define a type that has value semantics for the purposes of comparing two variables for equality. Here is the full definition of a simple record type:
public record Point(int X, int Y);
The Point
type defined here is almost as simple as it could be. The record definition itself hides the fact that a Point
has two int
properties named X
and Y
, and a constructor taking two int
parameters to initialize those property values. The syntax shown here is known as Positional Record. The X
and Y
parameters in the record declaration correspond to properties of the same name and type in the Point
record. How we create an instance of a Point
and access its properties will be familiar to any C# developer:
var coord = new Point(2, 3); Assert.That(coord.X, Is.EqualTo(2)); Assert.That(coord.Y, Is.EqualTo(3));
Those generated properties have no set
accessors, so Point
is immutable. Once we’ve created an instance of Point
, its value can never be changed. This is in keeping with the common recommendation to make all value types, and value-like types, immutable. Some familiar examples include string
, which is a class but has value-like semantics, and System.DateTime
, which is a struct.
Equality
When we compare two record instances to determine if they’re equal, their values are compared. For two Point
variables, this means that both the X
and Y
properties match. This is similar behaviour to comparing two struct instances, but differs from most class types. We use reference variables to manipulate class instances on the heap. Two references compare equal if they refer to the same instance in memory.
The following code shows two Point
values being compared for equality in different ways. Firstly, they’re compared using ==
which gives a value-based comparison. Next, they’re compared using ReferenceEquals
, a method defined on the object
base class that returns true
if two variables refer to the same instance. Note that deliberately performing a reference comparison gives a different result from a direct equality check:
public record Point(int X, int Y); var coord = new Point(2, 3); var copy = new Point(2, 3); Assert.That(coord == copy, Is.True); Assert.That(ReferenceEquals(coord, copy), Is.False);
Two instances of the Point
record compare equal if all of their respective properties compare equal, irrespective of whether they refer to the same object in memory. This is a defining characteristic of value types. More generally, two record instances compare equal if they are the same type and all their properties also compare equal.
Copying
Records are in fact reference types. They are allocated on the heap, are subject to garbage collection, and under normal circumstances are copied by reference. If we assign one Point
variable to another, as shown below, we have two references to the same Point
instance:
var coord = new Point(2, 3); var copy = coord; Assert.That(ReferenceEquals(coord, copy), Is.True);
There are times when we need to copy a value, but change only some of its properties. Records have an associated feature called non-destructive mutation which allows us to create a new instance from an existing one, but with some altered properties. When we assign one record variable to another, we can add a with clause to the assignment, as shown here:
var coord = new Point(2, 3); var copy = coord with { Y = 30 };
In this example, copy
is an independent instance of Point
. It’s a copy of the original coord
record, except that it has a different value for the Y
property. The X
property of copy
is taken from the corresponding X
property of coord
, and is unchanged in copy
. Again, we can confirm this with a few tests:
Assert.That(coord.Y, Is.EqualTo(3)); Assert.That(copy.Y, Is.EqualTo(30)); Assert.That(coord.X, Is.EqualTo(copy.X)); Assert.That(ReferenceEquals(coord, copy), Is.False);
Since there are only two properties in this example, the benefit of using the with
syntax in this way isn’t immediately obvious. For records with several properties, however, this approach may be significantly more compact than the alternative of creating a new instance, and passing in a mixture of values and properties from an existing record to the constructor.
Deconstruction
We’ve examined construction of records so far, so in the interest of balance let’s have a look at deconstruction. This is the process of capturing the component properties of a record into individual variables, like this:
var coord = new Point(12, 34); var (x, y) = coord; Assert.That(x, Is.EqualTo(coord.X)); Assert.That(y, Is.EqualTo(coord.Y));
Here, the coord
variable is being deconstructed into the named variables x
and y
. We probably wouldn’t use deconstruction directly like this; it’s more useful when we call a method that returns a record instance but we want separate variables. Note that the names of the individual variables can be different to the property names in the record. We can use any valid variable name for those variables.
Sometimes we don’t need to capture all the components because we’re only interested in a subset of the values. Instead of creating a variable that is never used, we can use the underscore as a placeholder like this:
public Point ParsePoint(string coordinate) { // ... } var (_, height) = ParsePoint("3,2"); Assert.That(height, Is.EqualTo(2));
In this example, only the second component of the record – the Y
property – is copied to the named height
variable. The placeholder, known as a discard, tells the compiler to ignore the X
property of the Point
record. For records with more than two properties we can use the underscore identifier to discard multiple values.
String representation
Record types have a built-in consistent string representation available using the ToString
method. This method is available to all types, since it’s defined on the object base class. However, unless we override it for ourselves in classes and structs, it returns just the name of the type, qualified with its namespace.
Calling ToString
on a record instance, however, returns just the type name along with the names and values of all of the properties, like this:
var coord = new Point(12, 34); Console.WriteLine(coord);
This gives the output:
Point { X = 12, Y = 34 }
The Console.WriteLine
method calls ToString
to obtain the representation. We might also use string interpolation to embellish the output with the variable name like this:
Console.WriteLine($"{nameof(coord)} = {coord}");
Giving the output:
coord = Point { X = 12, Y = 34 }
Although logging is the obvious choice as a good candidate for uses like this, there are other potential benefits because the output is easily parsed to re-create an object, although records don’t provide that facility for us – we have to write that ourselves.
Inheritance
We can inherit one record type from another in exactly the same way as we can with classes, and the semantics are broadly the same as for class inheritance. For example, we can use a base-class reference to an inherited record, and we can cast from one type to another. In the following example, we derive a Point3d
record from our Point
record:
public record Point(int X, int Y); public record Point3d(int X, int Y, int Z) : Point(X, Y); var coord3d = new Point3d(2, 3, 4); var coord = (Point)coord3d; Point point = new Point3d(2, 3, 4); var point3d = point as Point3d;
If we attempt to cast from a base type to a more derived type when the conversion isn’t valid, we get an InvalidCastException
, just as with classes:
var coord = new Point(2, 3); Assert.That(() => (Point3d)coord, Throws.TypeOf<InvalidCastException>());
Also in exactly the same way as we would with a derived class, we can use an instance of a derived record as an argument to a method expecting a base class record, as shown here:
public double LineDistance(Point a, Point b) { // ... } var pointa = new Point3d(2, 5, 4); var pointb = new Point3d(6, 8, 4); Assert.That(LineDistance(pointa, pointb), Is.EqualTo(5));
Records can only inherit from other records, so we can’t inherit a record type from a class, nor can a class derive from a record.
We can seal a record to prevent it from being inherited. Once again, the syntax is identical to that we’d use for a class, as shown here:
public sealed record Speed(double Amount);
It’s common for value types to be sealed
when they’re modelled as classes. The built-in string
class is a case in point, and all struct types are effectively implicitly sealed. Conceptually, values are fine-grained bits of information, often representing transient and even trivial bits of data. Values are different from entities most clearly in that value types place great importance on their state.
Value semantics
When we say that value types place great importance on their state rather than their identity, what we really mean is that we determine if two values are equal according to the value they represent. This differs from entity types where they compare equal if they represent the same instance in memory. This latter behaviour is often called reference semantics, where we can have more than one reference to an object instance.
Struct instances are all independent of each other. They have true value semantics in that we can’t generally have two variables representing the same instance. Structs are copied by value, so when we assign one to another we get a whole new distinct instance. However, since two values compare equal if they have the same state, it makes no difference that they’re distinct instances.
Records live in a middle ground. Under the covers, records are really classes and so instances are copied by reference. When we assign one record variable to another, we get a new reference to the same instance in memory, unless we explicitly ask for a copy using the with
syntax.
However, records are value like in that when we compare them for equality, it’s their state that’s compared, not their identity.
This behaviour of comparing values instead of identities is much the same as for the string
type. string
is a class, and so is a reference type. Strings are copied by reference, so the contents of a string variable aren’t usually copied. However, strings have value-like behaviour for the purposes of equality comparisons. The string
class overrides the Equals
method, and implements the IEquatable<string>
interface, which defines a type-specific overload for the Equals
method.
String also has an operator==
definition, which overrides the behaviour of the built-in comparison with ==
. When we compare two string variables using either the Equals
method or using ==
, we’re determining if the two strings have the same value, whether or not they refer to the same string instance.
Equal by value
We can emulate value based comparison in our own classes by overriding the Equals(object?)
method, implementing IEquatable
for our type, and by providing both operator==
and operator!=
. There are some subtleties and potential pitfalls to be aware of in those implementations, including the need to handle null
values correctly, and making sure we correctly handle any possible base class implementations. If we were to implement our Point
type as a class, it might look something like Listing 1.
public class Point : IEquatable<Point> { public Point(int x, int y) => (X, Y) = (x, y); public int X { get; } public int Y { get; } public bool Equals(Point? other) => !ReferenceEquals(other, null) && GetType() == other.GetType() && X == other.X && Y == other.Y; public override bool Equals(object? obj) => Equals(obj as Point); public override int GetHashCode() => HashCode.Combine(X, Y); public static bool operator==(Point? left, Point? right) => left?.Equals(right) ?? ReferenceEquals(right, null); public static bool operator!=(Point? left, Point? right) => !(left == right); } |
Listing 1 |
Note that if we override the Equals(object?)
method, we also need to override GetHashCode
. If we only override one or the other, we’ll get a warning from the compiler. The reason it’s important is that two objects that compare equal should also have equal hash codes. If we fail to observe this rule, we risk being unable to find objects that are used as keys in collections that depend on hash codes for lookup, such as Dictionary
and HashSet
.
The overridden GetHashCode
method in the class shown above might not be the most efficient implementation, but it does guarantee that if two instances of Point
are equal, they will also definitely have the same hash code.
With record types, the compiler provides the implementations for each of those members. The code generated by the compiler takes all of the fields declared in the record into account to provide a value-based equality comparison. When we create our own record types, we’re freed from the need to provide all of this boilerplate code just to be able to compare the values of two variables.
What about structs?
Instead of using a class, we can also model our own types using a struct
. All structs derive implicitly from the System.ValueType
class which provides the necessary overrides to give structs value semantics when we compare them for equality. In addition to the Equals
method, ValueType
also overrides GetHashCode
in a way that ensures that equal instances have matching hash codes.
We might therefore choose to model our Point
type as a struct like Listing 2.
public struct Point { public Point(int x, int y) => (X, Y) = (x, y); public int X { get; } public int Y { get; } } |
Listing 2 |
This is significantly simpler than our class definition for Point
, and only a little more verbose than the record version. There are limitations to structs, however.
The first thing to note is that we can’t compare two struct instances with ==
unless we provide our own implementation of operator==
. The implementation of that is straightforward enough, however, and with a matching operator!=
it looks very much like the version for the class implementation (Listing 3).
public static bool operator==(Point? left, Point? right) => left?.Equals(right) ?? !right.HasValue; public static bool operator!=(Point? left, Point? right) => !(left == right); |
Listing 3 |
Struct instances can’t normally be null
, but our implementations of ==
and !=
here also cater for nullable Point
values.
Much more significant are the implementations of the Equals
and GetHashCode
methods provided by the ValueType
base class. Those implementations must cater for every possible struct type, and must therefore be very general. Structs can contain any number of fields, and there is no restriction on the types of those fields. How, then, can the base class implementation work correctly in all cases?
ValueType implementations
For GetHashCode
, the answer is straightforward. The hash code for a value is calculated from the first non-null field in the struct. If there are no non-null fields, the hash code is 0
. This has the correct behaviour in that any two equal values will always have the same hash code. It’s not necessarily the most efficient implementation, because two values can differ in all their other fields, but will have the same hash code if just the first fields are equal. This might slow down lookups requiring hash codes when we have large numbers of values to be compared.
The Equals
method needs to be a bit more sophisticated, because comparing only the first field will not be correct in all cases. To determine if two values are equal, all the fields must be compared. In order for this to work for any value type, the implementation of ValueType.Equals
uses reflection to discover the fields, and compares the two values by calling Equals
on each field. See [Tepliakov] for more information on how Equals
and GetHashCode
are implemented.
Reflection is a wonderfully powerful tool used in a variety of circumstances, but one thing it most certainly is not is fast. Fortunately, there are optimizations that remove both the need for reflection and the restriction of calculating hash codes from only the first field. In fact, our Point
struct would most likely benefit from this optimization because it has two int
fields.
Where a struct has only built-in integral type fields, the Equals
method can perform a simple bit-wise comparison of two values, and GetHashCode
uses bit-masks and bit-shifting on the raw memory representation to calculate a hash code very quickly.
The optimization gets disabled in a wide variety of relatively common cases, however. If a struct contains any field that’s a reference, a floating-point value, or itself provides an override for either the Equals
or GetHashCode
methods, the slower algorithm must be used.
For the incorrigibly curious, the reference implementation of ValueType.Equals
can be found in [Equals]. The key optimization is the call to CanCompareBits
, and for the gory details (in C++), see [DotNetCoreRuntime].
The bottom line here really is that we need to override both Equals
and GetHashCode
for struct types if we need to be sure about the performance of the implementation. These methods are generated for record types by the compiler. There is no base-class implementation that needs to cater for every possible combination of fields. The code is injected directly into a record, almost exactly as if we’d hand-written it ourselves.
All structs are implicitly sealed, which means implementing equality for a struct is relatively straightforward. Records can inherit from other records, and this makes implementing equality more complicated. To see exactly why that is, let’s look at a naïve implementation for a derived class.
Equality and inheritance
Earlier we saw a class called Point
that had an override of the Equals
method taking an object
parameter, and a type-specific overload of Equals
. Here is the Point
class again, along with a Point3d
class that inherits from it (see Listing 4).
public class Point : IEquatable<Point> { public Point(int x, int y) => (X, Y) = (x, y); public int X { get; } public int Y { get; } public bool Equals(Point? other) => !ReferenceEquals(other, null) && GetType() == other.GetType() && X == other.X && Y == other.Y; public override bool Equals(object? obj) => Equals(obj as Point); // ... } public class Point3d : Point, IEquatable<Point3d> { public Point3d(int x, int y, int z) : base(x, y) => Z = z; public int Z { get; } public bool Equals(Point3d? other) => !ReferenceEquals(other, null) && Z == other.Z && base.Equals(other); public override bool Equals(object? obj) => Equals(obj as Point3d); // ... } |
Listing 4 |
The implementation of the IEquatable
interface in each of these classes, that is the Equals
method taking a Point
or Point3d
rather than object
, follows Microsoft’s advice on correctly defining equality for a class as shown in [MSDN2015]. For brevity, they’re not exactly the same, but they are equivalent to those shown online.
The key points here are that the derived class determines that the properties specific to it are equal, and if they are, it defers to the base class to perform its own comparison. The base class checks that both values being compared are exactly the same type before also comparing its individual properties. The type check is required to catch the following comparison:
var point = new Point(2, 3); var point3d = new Point3d(2, 3, 4); Assert.That(point.Equals(point3d), Is.False);
Here we’re comparing a Point
variable with an instance of Point3d
. The Equals
method actually being used here is the one defined on the Point
base class. The point3d
variable will be implicitly cast to a Point
. The comparison fails the type check in Point.Equals
because the run time types of the two objects being compared aren’t exactly the same.
Even though the X
and Y
properties match in both objects, the two objects don’t have the same value. A Point3d
instance has an extra property named Z
that will not be considered by the base class Equals
method.
We wouldn’t usually directly assign a derived type to a base class reference like this. It would more usually occur when we call a method taking parameters of the base class type.
Base class comparisons
Standing in for a real method taking parameters of Point
type in this example is a simple method named AreEqual
(Listing 5).
bool AreEqual(Point left, Point right) { return left.Equals(right); } var p1 = new Point3d(2, 3, 1); var p2 = new Point3d(2, 3, 500); Assert.That(p1.Equals(p2), Is.False); Assert.That(AreEqual(p1, p2), Is.False); |
Listing 5 |
In this example, we create two Point3d
instances that differ in their Z
property. We confirm they do indeed compare not equal when we call the Equals
method. On the last line we call the AreEqual
method, which takes two parameters of the base class type.
This test fails because the call to AreEqual
actually returns True. This time, both objects are exactly the same type, and neither one is null
. More than that, their X
and Y
properties both match. However, the comparison of Z
properties never happens when the objects are compared using their base class type.
If we change the AreEqual
method to take object
parameters instead of Point
, the test will pass, because object.Equals
is a virtual method call. However, in keeping with the advice given on the MSDN, the type-specific overload of Equals
is not virtual. When we use a Point
variable to call the Equals
method, the Point
implementation will be called, irrespective of whether the variable actually refers to a more derived type.
We can resolve this problem by making Point.Equals
virtual, and adding an override for it to the Point3d
class. There are some subtleties to doing this, however, and it’s very easy to get wrong.
Records, as we noted earlier, can inherit from other records as long as the base record isn’t sealed. Moreover, records behave correctly with inheritance and don’t exhibit the problems demonstrated here. The key is in how equality is implemented for records.
Compiler-generated Equals
The code generated by the compiler to implement equality diverges from that recommended in [MSDN2015] – quite rightly, since that implementation isn’t sufficient, as we’ve demonstrated. Let’s begin with the base type Point
. Again, for the sake of brevity, the code in Listing 6 isn’t exactly the same as that created by the compiler, but its equivalent.
public class Point : IEquatable<Point> { public Point(int x, int y) => (X, Y) = (x, y); public int X { get; } public int Y { get; } protected virtual Type EqualityContract => typeof(Point); public virtual bool Equals(Point? other) => !ReferenceEquals(other, null) && EqualityContract == other.EqualityContract && X == other.X && Y == other.Y; public override bool Equals(object? obj) => Equals(obj as Point); // ... } |
Listing 6 |
There are two things of note here. The first is the synthesized EqualityContract
method. This is used in the Equals
method to confirm that both the invoking object and the argument are exactly the same type. It replaces the call to object.GetType
for this purpose.
The GetType
method is available to any type, but it’s a non-virtual method that involves a native system call. The EqualityContract
method is virtual, but makes use of the typeof
operator which is evaluated at compile time. The result of both GetType
and EqualityContract
under these circumstances is identical, but EqualityContract
uses information available to the compiler, whereas GetType
calculates the required Type
to return at run time.
The second thing to note is that the type-specific implementation of the Equals
method is itself virtual. The importance of this becomes apparent when we look at the equivalent code in the derived Point3d
class.
Inheriting Equals
Listing 7 is the equivalent code for Point3d
that derives from the Point
type.
public class Point3d : Point, IEquatable<Point3d> { public Point3d(int x, int y, int z) : base(x, y) => Z = z; public int Z { get; } protected override Type EqualityContract => typeof(Point3d); public sealed override bool Equals(Point? other) => Equals((object?)other); public virtual bool Equals(Point3d? other) => base.Equals(other as Point) && Z == other.Z; public override bool Equals(object? obj) => Equals(obj as Point3d); // ... } |
Listing 7 |
Not only does Point3d
provide its own type-specific implementation for the IEquatable
interface, it also overrides the base class’s type-specific Equals
. The override invokes the Equals
method taking object?
as its argument. This in turn resolves to the Point3d.Equals(object?)
method, which attempts to cast its parameter to a Point3d
.
We should also note that the type-specific implementation of Equals
is sealed in the Point3d
class. This means that if we were to inherit from Point3d
– for the sake of the argument let’s call it Point4d
– that more derived type cannot override that method. Sealing a method has the effect preventing a derived type from further customising the implementation of it, but the method is still available for more derived types to call. Our potential Point4d
type could still override the Equals(Point3d?)
method, however.
Testing Equals for records
There are other minor differences between our Point.Equals
implementation and that shown previously, but the main point is that if we were to model our Point
and Point3d
types as classes, there is quite a lot of boilerplate we need to provide in order for equality to work correctly.
Using records to model these types saves a great deal of code that would otherwise have to not only be written, but tested too. We previously saw a test for equality for our original class implementation of Point
and Point3d
that failed. Here it is once more:
bool AreEqual(Point left, Point right) { return left.Equals(right); } var p1 = new Point3d(2, 3, 1); var p2 = new Point3d(2, 3, 500); Assert.That(p1.Equals(p2), Is.False); Assert.That(AreEqual(p1, p2), Is.False);
Where Point3d
is a record that inherits from a Point
record, this test now passes. There is more than just equality to consider when we inherit from a value type, however.
Style over substitutability
Although the compiler generates code to correctly perform an equality comparison for records that inherit from one another, it can’t generate code for any of the other operations we might need to implement. For example, if we wanted to implement the IComparable
interface for our Point
and Point3d
types, we’d have to implement it ourselves.
Would it make sense for us to compare a Point3d
instance to determine if it was less than an instance of a Point
? What about the other way around? What compromises might we have to make?
Inheritance and virtual methods work well for entity types where we want to customize or embellish the behaviour of a base class. We also get the benefit of substitutability between the base type and derived type. An instance of a derived type can be used anywhere a base type reference is needed. This allows us to write code in terms of a base type that can be used seamlessly by objects that inherit from that base type.
Entities are the higher-order objects in our designs. They usually represent the persistent information about a system, and the processing of that information in collaboration with other entities. Identity is often important for entities, because we often need to use a specific instance. By contrast, values place no importance on identity. One value is as good as any other value with the same state.
The benefits of inheritance are much less clear for values, which is the reason that structs don’t – indeed cannot – take part in inheritance relationships. It’s also the reason that value-like classes such as string
are sealed. Substitutability doesn’t work so well for values; it’s not fair to say that a Point3d
is substitutable for a Point
because they have different values, and the value is what really matters for a value type.
Has-A versus IS-A
Inheritance is commonly employed to re-use the characteristics of a type and build on it. When we derive a type from a non-abstract base, such as when inheriting Point3d
from Point
, we’re really inheriting the implementation. Substitutability between types works best when the implementation doesn’t matter. What we really want is to represent the same interface.
More formally the distinction is between class inheritance and type inheritance. By deriving a Point3d
from a Point
we’re using class inheritance. In order to make it work correctly, we must alter the interface.
However, a much simpler solution would be to discard the inheritance altogether, and simply have Point3d
contain an instance of a Point
. We get all the benefits of re-using the implementation of Point
, but have none of the difficulties of substitutability. Furthermore, we’d make both classes sealed
and the implementations of both would be more straightforward. Perhaps even better, we make them structs instead of classes.
Consider the struct in Listing 8.
public readonly struct Point3d : IEquatable<Point3d> { public Point3d(int x, int y, int z) => (xy, this.z) = (new Point(x, y), z); public int X => xy.X; public int Y => xy.Y; public int Z => z; public bool Equals(Point3d other) => xy.Equals(other.xy) && z == other.z; public override bool Equals(object? obj) => obj is Point3d other && Equals(other); public override int GetHashCode() => HashCode.Combine(xy, z); public static bool operator==(Point3d? left, Point3d? right) => left?.Equals(right) ?? !right.HasValue; public static bool operator!=(Point3d? left, Point3d? right) => !(left == right); private readonly Point xy; private readonly int z; } |
Listing 8 |
Here we have a Point3d
type modelled as a struct that contains an instance of a Point
as a field. We have no need to consider the case where a base class parameter might really be a Point3d
because that’s not possible. The only overridden methods are those necessary to provide the basic equality and hash code calculations from the object
base class.
We can’t use an instance of Point3d
anywhere that a Point
is needed. We might provide an explicit conversion – or projection – to a Point
that could be used to invoke a method expecting Point
variables. In all other respects, the behaviour of this struct matches all the expected behaviour from a Point3d
that inherits from a Point
.
The one possible objection to this is that structs are copied and passed by value, whereas records are copied and passed by reference. Since a Point3d
contains an instance of another struct, we might expect its performance to suffer as a result of needing to copy the whole instance rather than just the reference.
As with all such questions, we must invoke the wisdom, or at least the objectivity, of a performance profiler.
Performance of structs and records
Our Point3d
struct doesn’t do much other than being a value. Similarly, the most important aspect of the record equivalent is its value. Therefore the most obvious thing to compare between the two is how equality is implemented. Just as important as the Equals
implementation is the GetHashCode
method. We should, then, measure the performance characteristics of both methods.
One simple way to do that is to employ a HashSet
, which will use GetHashCode
to determine where to look for a key, and then use Equals
to determine an exact match. A hash set is a unique collection of keys, so a useful test would be to attempt to introduce duplicate keys so that we can be sure a full lookup of a value takes place.
The following simple test creates a list of Point3d
objects, and we deliberately introduce duplicate values. We use the source
list to populate a HashSet
using the ToHashSet
method, which simply discards any values that have already been added to the collection. (See Listing 9.)
const int N = 50000000; const int Filter = 10000; var source = Enumerable.Range(0, N) .Select(i => new Point3d(0, 0, i % Filter)) .ToList(); var unique = source.ToHashSet(); Assert.That(unique.Count, Is.EqualTo(Filter)); Assert.That(unique.Contains(new Point3d (0, 0, Filter - 1)), Is.True); |
Listing 9 |
The number of elements is intentionally very large in order to scale-up the relative cost of each method call to make the differences observable. All the following results were obtained by profiling a test using the dotTrace
profiler from JetBrains (https://www.jetbrains.com/profiler/) using a straightforward wall-clock time report. In each case, the test was profiled using a Release build.
Profile results
Figure 1 contains the results from running this test using our Point3d
record, which inherits from a Point
record. the same test was profiled using our Point3d
struct, which contains an instance of a Point
struct. The results are also in Figure 1.
Records ► 5.52% Hashset_of_records • 5,622 ms • TestRecords.Hashset_of_records() ► 3.00% ToList • 3,055 ms • System.Linq.Enumerable.ToList(IEnumerable) 2.51% ToHashSet • 2,554 ms • System.Linq.Enumerable.ToHashSet(IEnumerable) ► 0.47% Equals • 478 ms • Point3d.Equals(Point3d) ► 0.35% GetHashCode • 357 ms • Point3d.GetHashCode() Structs ► 2.95% Hashset_of_structs • 3,002 ms • TestStructs.Hashset_of_structs() 2.28% ToHashSet • 2,325 ms • System.Linq.Enumerable.ToHashSet(IEnumerable) ► 0.09% GetHashCode • 94 ms • Point.GetHashCode() ► 0.66% ToList • 677 ms • System.Linq.Enumerable.ToList(IEnumerable) |
Figure 1 |
The headline time shows that the test using structs took not much more than half the time of the test using records. Note that the ToHashSet
call is somewhat slower for records, but calls to Equals
and GetHashCode
are much slower than for structs. In fact, the cost of Equals
for the struct type doesn’t even register, which means the JIT compiler probably inlined the code.
The Equals
method for records is relatively expensive owing to the number of virtual method calls it makes, in this case to the EqualityContract
method.
The remainder of the time difference between the struct and record versions is most likely down to the fact that the struct instances are copied by value, but for the records, only the references are copied from the source to the hash set. The difference of ~200ms is negligible really, considering the huge number of elements we were using.
However, copying by value versus copying by reference has another, less obvious implication, which goes some way towards explaining the significant difference in the cost of the call to ToList
.
The impact of the managed heap
Records are reference types, allocated on the heap, and are subject to garbage collection in the same way that class instances are. We deliberately introduced duplicate values in our source list, and when the ToHashSet
method discards those duplicates they become unreachable, and so are eligible for garbage collection. Struct instances are never individually garbage collected, they simply go out of scope when they’re no longer needed.
Adding such a large number of elements to the list would certainly put some pressure on memory, and very likely use up enough space to cause several garbage collections. We can see this by digging into the ToList
call (see Figure 2).
3.00% ToList • 3,055 ms • System.Linq.Enumerable.ToList(IEnumerable) 2.76% <Hashset_of_records>b__13_0 • 2,809 ms • <Hashset_of_records>b__13_0(Int32) 1.60% [Garbage collection] • 1,633 ms <0.01% [Thread suspended] • 5.8 ms ► <0.01% Point3d..ctor • 5.7 ms • Point3d..ctor(Int32, Int32, Int32) |
Figure 2 |
The cost of the garbage collection here isn’t objects actually being collected, it’s most likely the cost of tracing references to each object to determine if they can be collected.
In fact, since we’re putting so much pressure on memory here, it’s likely that even the discarded objects stay in memory for much longer than necessary because they’ll survive successive garbage collections caused by the huge number of memory requests being made.
All of which demonstrates that while copying objects by reference might be cheaper than copying by value, the associated cost of inhabiting the managed heap can offset that benefit and even overwhelm it.
Summary
The new record types in C# v9.0 provide us with a very compact way of defining value-like types without the need to manually write all the boilerplate code to perform equality correctly. The syntax we’ve explored in this article relates to positional records, which is the most compact representation that allows the compiler the greatest flexibility to generate code on our behalf.
We can choose to write our own version of almost any of the methods generated by the compiler if we wish. The exceptions to this are that we can’t provide our own operator==
or operator!=
. If we want to customize the behaviour of equality, we need to write our own type-specific Equals
method for the type. The compiler-generated operator==
just forwards to the Equals
method anyway.
Any method we write ourselves prevents the compiler from synthesizing its own version; it simply uses the version we provide.
However, since the compiler provides efficient and correct implementations for each of those methods, there seems to be little benefit in writing our own. If we feel the need to have more control over equality, we may as well just use a struct. Where we just need a simple representation of a value, records work very well and the associated facility of non-destructive mutation with the with
keyword is a very useful way of handling those values.
Just because we can inherit one record from another, doesn’t mean that we should. Values in general make poor parents, and so records, like structs and other value-like types such as string
, should be sealed to prevent further derivation.
We also need to understand that records really are classes under the hood; when we create a record type, the compiler injects a class definition for us. Records are therefore reference types, and so live on the managed heap. This means they are garbage collected, and we might therefore consider using a struct anyway if we’re very sensitive to performance.
An overview of records in C# v9.0, and more detail on what methods the compiler provides can be found at [MSDN2020].
References
[Equals] https://referencesource.microsoft.com/#mscorlib/system/valuetype.cs,22
[DotNetCoreRuntime] https://github.com/dotnet/runtime/blob/01116d4e145d17adefc1237d55b1e3574919b1c1/src/coreclr/vm/comutilnative.cpp#L1738
[MSDN2015] https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/statements-expressions-operators/how-to-define-value-equality-for-a-type
[MSDN2020] https://docs.microsoft.com/en-us/dotnet/csharp/whats-new/csharp-9#record-types
[Tepliakov] https://devblogs.microsoft.com/premier-developer/performance-implications-of-default-struct-equality-in-c/
has been a professional programmer for over 20 years and is still finding new ways to be lazy.