Previously, we saw that covariant and contravariant casting is slow: 100x slower than normal casting. It turns out that covariant and contravariant casting is even slower than casting to dynamic and using dynamic dispatch: 3x slower than dynamic. This is significant as IEnumerable<T> is covariant.

A number of readers found the results in my last post curious and decided to dig deeper. Kévin Gosse made the surprising discovery that using dynamic was "three times faster than the explicit cast". In this post, I verify Kévin's results and discuss when you might be able to use dynamic to optimize performance.

TL;DR

The first dynamic call is 1200x slower. However, on subsequent calls, covariant and contravariant casting is more than 3x slower than casting to dynamic and using dynamic dispatch.

Benchmark Code

To verify the results, I created two classes based on Mike's code, one to test covariant casting, the other to test contravariant casting.

Each class benchmarks the cost of four operations:

  1. Direct Casting
  2. Implicit Casting
  3. Explicit Casting
  4. Dynamic Casting

Direct casting and implicit casting do not involve any casting at all, as the types are compatible. This can be verified by checking the IL. Explicit casting involves casting to a covariant or contravariant type, depending on which is being tested. Dynamic casting involves casting to dynamic and then using dynamic dispatch to call the method.

Here's the code:

public class CovariantCastingBenchmarks
{
    static ICovariant<string> specificCovariant = new Covariant<string>();
    static ICovariant<object> generalCovariant = specificCovariant;

    [Benchmark(Baseline = true)]
    public void Direct() => SpecificCovariant(specificCovariant);
    
    [Benchmark]
    public void Implicit() => GeneralCovariant(specificCovariant);

    [Benchmark]
    public void Explicit() => SpecificCovariant((ICovariant<string>)generalCovariant);

    [Benchmark]
    public void Dynamic() => SpecificCovariant((dynamic)generalCovariant);

    interface ICovariant<out T> { }
    class Covariant<T> : ICovariant<T> { }
    static void SpecificCovariant(ICovariant<string> input) => input.ToString();
    static void GeneralCovariant(ICovariant<object> input) => input.ToString();
}
public class ContravariantCastingBenchmarks
{
    static IContravariant<object> generalContravariant = new Contravariant<object>();
    static IContravariant<string> specificContravariant = generalContravariant;

    [Benchmark(Baseline = true)]
    public void Direct() => GeneralContravariant(generalContravariant);

    [Benchmark]
    public void Implicit() => SpecificContravariant(generalContravariant);

    [Benchmark]
    public void Explicit() => GeneralContravariant((IContravariant<object>)specificContravariant);

    [Benchmark]
    public void Dynamic() => GeneralContravariant((dynamic)specificContravariant);

    interface IContravariant<in T> { }
    class Contravariant<T> : IContravariant<T> { }
    static void SpecificContravariant(IContravariant<string> input) => input.ToString();
    static void GeneralContravariant(IContravariant<object> input) => input.ToString();
}

Results

I ran the benchmarks on both 64-bit with RyuJIT and 32-bit with LegacyJIT. As the relative performance was very similar, I'm only showing the 64-bit with RyuJIT results:

BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7 CPU 970 3.20GHz, ProcessorCount=12
Frequency=3128908 Hz, Resolution=319.6003 ns, Timer=TSC
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
  DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0

Covariant Casting Benchmarks
==============================================================
   Method |        Mean |    StdDev | Scaled | Scaled-StdDev |
--------- |------------ |---------- |------- |-------------- |
   Direct |  15.0372 ns | 0.0711 ns |   1.00 |          0.00 |
 Implicit |  14.6883 ns | 0.0059 ns |   0.98 |          0.00 |
 Explicit | 114.5109 ns | 0.0360 ns |   7.62 |          0.03 |
  Dynamic |  34.4756 ns | 0.2480 ns |   2.29 |          0.02 |

Contravariant Casting Benchmarks
==============================================================
   Method |        Mean |    StdDev | Scaled | Scaled-StdDev |
--------- |------------ |---------- |------- |-------------- |
   Direct |  15.0462 ns | 0.0627 ns |   1.00 |          0.00 |
 Implicit |  14.7959 ns | 0.0803 ns |   0.98 |          0.01 |
 Explicit | 111.4398 ns | 0.0429 ns |   7.41 |          0.03 |
  Dynamic |  34.3615 ns | 0.0600 ns |   2.28 |          0.01 |

These results show that as Kévin discovered, dynamic is more than three times faster than explicit covariant casting and more than three times faster than contravariant casting.

Optimizing Performance by using Dynamic

This makes it look like you should always prefer to use dynamic over explicit covariant and contravariant casts. However, these benchmark results do not provide the complete picture.

BenchmarkDotNet calculates the mean runtime by calling the benchmarked method numerous times to reduce the variance that results from background activity on your computer. This is great and is usually what you want, but the first dynamic call has a huge cost. This is not reflected in the results.

The cost of the first call to explicitly cast a covariant or contravariant type is the same as the millionth call. The cost of the first dynamic call is massively higher than the second call onwards. On my computer, the first dynamic call was about 1200x slower than the first call to Explicit.

Therefore, if you're only performing a few casts, don't try to optimize covariant or contravariant casting by switching to dynamic. On the other hand, if you're casting millions of times, dynamic is worth investigating.

If you find yourself applying the dynamic optimization, remember that the DLR improves performance after the first call by caching the delegate it creates. If you make many different dynamic calls, you might find cached items expiring and then the large cost of the first call will apply again.

Conclusion

Covariant and contravariant casting is very slow. It is 100x slower than normal casting and 3x slower than using dynamic.

The first dynamic call is 1200x slower than the first covariant or contravariant cast. So, don't try to optimize by switching to dynamic unless you're casting many times.