In my recent blog posts I've been benchmarking different ways to cast in C#. If you have a keen eye, you may have noticed that the results do not match up. In one experiment, a direct cast (no cast at all) takes 15ns, while in another experiment, a more complicated generic cast takes only 3.5ns. That's a big discrepancy, what's going on?

If you look at the code, it's obvious. In one experiment, we're benchmarking the cost of the casting and that's it. In the other, we're benchmarking the cost of casting plus two method calls. Consequently, the results from different experiments are not comparable. It doesn't have to be this way.

The solution is to use what I'm calling an additive baseline and a multiplicative baseline.

The Problem

Let's use the CovariantCastingBenchmarks class from my variant casting experiment to demonstrate the problem. Here are the results from running that experiment using BenchmarkDotNet. Just to be different, I've run the benchmarks on .NET Core.

BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows 10.0.14393
Processor=Intel(R) Core(TM) i7 CPU 970 3.20GHz, ProcessorCount=12
Frequency=3128910 Hz, Resolution=319.6001 ns, Timer=TSC
dotnet cli version=1.0.3
  [Host]     : .NET Core 4.6.25009.03, 64bit RyuJIT
  DefaultJob : .NET Core 4.6.25009.03, 64bit RyuJIT

                Method |       Mean |    StdDev | Scaled | Scaled-StdDev |
---------------------- |----------- |---------- |------- |-------------- |
                Direct | 16.2276 ns | 0.0310 ns |   1.00 |          0.00 |
 ExplicitCovariantCast | 80.6373 ns | 0.0351 ns |   4.97 |          0.01 |
  DynamicCovariantCast | 29.4733 ns | 0.1694 ns |   1.82 |          0.01 |

These results show that dynamic covariant casting is approximately 2.75x faster than explicit covariant casting. But is that correct? It's not.

The benchmark is measuring the performance of some function, but the operation we're trying to benchmark is just one component of that function. The test is fair, because it has been designed so every method has the same overhead. However, the analysis was wrong and therefore, the results are misleading. The results don't reflect the cost of the operation; instead, they reflect the cost of the operation plus overhead.

The Solution

We need to remove the overhead from our analysis.

The direct method (the baseline), which involves no casting, reflects the cost of the overhead alone. By subtracting the runtime of the direct method from every method, we subtract the cost of the overhead and find the cost of the operations.

                Method |       Mean |
---------------------- |----------- |
                Direct |     0.0 ns |
 ExplicitCovariantCast | 64.4097 ns |
  DynamicCovariantCast | 13.2457 ns |

From these results, we see that dynamic covariant casting is about 4.85x faster than explicit covariant casting. This result is the correct result.

Additive Baselines

What happened to the baseline? It turns out that Direct was really a no-op, because it was all overhead. I'm calling this an additive baseline.

When benchmarking, you should try to remove the cost of any overhead by creating an additive baseline that has the overhead and nothing else.

Note that in many benchmarking situations there is no overhead and therefore, no need for an additive baseline. For example, if you're benchmarking a function that you're trying to optimize, there'll be no overhead.

Multiplicative Baselines

With the overhead removed using the additive baseline, we still need another baseline with which to compare our methods. This is what we normally think of as a baseline, what BenchmarkDotNet calls a baseline, and what I'm calling a multiplicative baseline to distinguish it from an additive baseline.

In our casting example, we're comparing the cost of different casting methods. So the obvious choice for the baseline is a simple explicit cast. Let's add one of those to the CovariantCastingBenchmarks:

public partial class CovariantCastingBenchmarks
{
    static object simple = new Simple();
    
    [Benchmark]
    public void NormalCast() => Cast((Simple)simple);

    static void Cast(Simple input) => input.ToString();
    class Simple { }
}

If we incorporate the results of NormalCast, normalize the results by subtracting our additive baseline (Direct), and then finally, compute the scale factor using the multiplicative baseline (NormalCast), we get these results:

                Method |       Mean |    StdDev | Normalized | Scaled |
---------------------- |----------- |---------- |----------- |------- |
                Direct | 16.2276 ns | 0.0310 ns |     0.0 ns |   0.00 |
            NormalCast | 16.5152 ns | 0.0073 ns |  0.2876 ns |   1.00 |
 ExplicitCovariantCast | 80.6373 ns | 0.0351 ns | 64.4097 ns | 223.96 |
  DynamicCovariantCast | 29.4733 ns | 0.1694 ns | 13.2457 ns |  46.06 |

BenchmarkDotNet

Unless I've overlooked it, BenchmarkDotNet doesn't have an in-built option to create the normalized and scaled columns in the last set of results.

However, BenchmarkDotNet supports custom columns. So, by adding an attribute to identify the additive baseline, you could create the normalized and scaled columns by implementing IColumn.

If you like this idea and want to contribute to BenchmarkDotNet, I've created the issue for you.