As we saw in my previous post, there are three ways to cast safely in C# 7. In this post, I micro-benchmark the three methods of safe casting and dive into the IL to understand the differences.

The three methods of safe casting (from my previous post) are:

  • as (Safe Casting with as)
    • Convert with as, then compare to null
  • Classic is (Safe Casting with is)
    • Check with is, then use the cast operator explicitly
  • Modern is (Safe Casting with is and type patterns)
    • Convert with is using C# 7's type patterns

Last time I mentioned that a problem with classic is is that input was accessed twice and that this was avoided in as and modern is. Once it's compiled in release mode and optimized, is there any difference in performance? Does this difference manifest itself in the bytecode? Is there any difference between as and modern is? These are the questions I'll investigate and answer in this post.

TL;DR

The performance of modern is and as are practically identical, but they're about twice as fast as classic is.

Modern is might have a razor thin performance edge over as.

Benchmark Code

To compare the performance of the three methods, I wrote some simple code for the methods and a baseline. The baseline includes all the common operations that are included with each method: a comparison and a method call. This allows us to separate the cost of the boilerplate code from the safe casting operations we want to benchmark.

If you want to repeat my experiments for yourself, here is the code for the entire program:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace Experiments
{
    public class Casting
    {
        private readonly object input = "woof";
        private readonly string input2 = "woof";

        [Benchmark]
        public int As()
        {
            string text = input as string;
            if(text != null)
            {
                return text.Length;
            }

            return 0;
        }

        [Benchmark]
        public int ClassicIs()
        {
            if(input is string)
            {
                string text = (string)input;
                return text.Length;
            }

            return 0;
        }

        [Benchmark]
        public int ModernIs()
        {
            if(input is string text)
            {
                return text.Length;
            }

            return 0;
        }

        [Benchmark]
        public int Baseline()
        {
            if(input2 != null)
                return input2.Length;

            return 0;
        }
    }

    public class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run<Casting>();
        }
    }
}

Benchmark Results

To run the benchmark, I used BenchmarkDotNet. This is a great library that is simple to use. It takes care of all the nitty gritty of running a benchmark properly and even calculates statistics to help you analyze the results.

You can use BenchmarkDotNet in three easy steps:

  1. Add BenchmarkDotNet to your project using NuGet.
  2. Add [Benchmark] attributes to the methods you want to benchmark.
  3. Run the benchmark using BenchmarkRunner.Run<ClassName>().

Here are the results for the different safe casting methods:

BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7 CPU 970 3.20GHz, ProcessorCount=12
Frequency=3128910 Hz, Resolution=319.6001 ns, Timer=TSC
  [Host]     : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
  DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0

    Method |      Mean |    StdDev |
---------- |---------- |---------- |
 ClassicIs | 2.0814 ns | 0.0006 ns |
  ModernIs | 0.9003 ns | 0.0004 ns |
        As | 0.9081 ns | 0.0107 ns |
  Baseline | 0.1127 ns | 0.0002 ns |

From these results, we see that modern is and as are nearly identical and that they're about twice as fast as classic is. So, feel free to cast safely using is with type patterns, there is no performance penalty for its succinct syntax.

IL Code Analysis and Comparison

Mark Stoddard asked that I compare the bytecode for the different methods. So, we'll now use ILSpy to look at what differences exist between the three approaches to safe casting at the IL code level.

Here are the lines of bytecode that are unique to each of the three safe casting methods. The remaining code is boilerplate that is shared by all three methods and the baseline method. You can find the full IL code for each method in the appendix at the end of this post.

As
-----------
IL_0001: isinst [mscorlib]System.String
IL_0006: stloc.0
IL_0007: ldloc.0

Classic Is
-----------
IL_0001: isinst [mscorlib]System.String
IL_0009: castclass [mscorlib]System.String

Modern Is
-----------
IL_0001: isinst [mscorlib]System.String
IL_0006: dup
IL_0007: stloc.0

The first thing we notice is that all three methods use isinst to check the type. The difference is in what they do with the result of calling isinst, which returns either null or a cast instance of the variable on the top of the stack. as and modern is store this result using stloc, but classic is throws this result away. Therefore, classic is needs an expensive call to castclass that the other methods do not. Which is why classic is is much more expensive than the rest.

as and modern is are almost identical. as stores the result using stloc and then loads it back onto the stack using ldloc ready for the branch. Whereas modern is uses dup to duplicate the result on the stack and then stores the result using stloc, which leaves the duplicated result on the stack ready for the branch. So the only difference is that as uses ldloc to get a value onto the stack and modern is uses dup.

Why does Modern is use dup in place of ldloc?

You might wonder if there is any reason at all for the difference between as and modern is as they are equivalent and their performance is nearly identical. Well, it seems that, as you might imagine, dup, duplicating the value on the top of the stack, is ever so slightly faster than ldloc, loading the value of a variable onto the stack.

We see this difference in the earlier benchmark results as a razor thin edge of 0.0078 nanoseconds, in favor of modern is over as; please note that you shouldn't read too much into this as it is well within the margin of error for the benchmark.

The earlier benchmark results were run on 64-bit with RyuJIT. If we run them on 32-bit with LegacyJIT, the difference is a more pronounced, but still tiny 0.0276 nanoseconds, in favor of modern is over as. This miniscule difference is still not particularly significant; it is within 3 standard deviations.

BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7 CPU 970 3.20GHz, ProcessorCount=12
Frequency=3128910 Hz, Resolution=319.6001 ns, Timer=TSC
  [Host]     : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0
  DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0

    Method |      Mean |    StdDev |
---------- |---------- |---------- |
 ClassicIs | 1.5004 ns | 0.0005 ns |
  ModernIs | 0.7412 ns | 0.0104 ns |
        As | 0.7688 ns | 0.0002 ns |
  Baseline | 0.1882 ns | 0.0006 ns |

Conclusion

You should use the modern is for safe casting. Compared to classic is, it's twice as fast and much more succinct. Compared to as, it's much more succinct and might have a razor thin performance advantage.

Appendix - Full IL Code

Here are the full IL code listings for each method.

As
-----------
IL_0000: ldarg.0
IL_0001: isinst [mscorlib]System.String
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: brfalse.s IL_0011

IL_000a: ldloc.0
IL_000b: callvirt instance int32 [mscorlib]System.String::get_Length()
IL_0010: ret

IL_0011: ldc.i4.0
IL_0012: ret
Classic Is
-----------
IL_0000: ldarg.0
IL_0001: isinst [mscorlib]System.String
IL_0006: brfalse.s IL_0014

IL_0008: ldarg.0
IL_0009: castclass [mscorlib]System.String
IL_000e: callvirt instance int32 [mscorlib]System.String::get_Length()
IL_0013: ret

IL_0014: ldc.i4.0
IL_0015: ret
Modern Is
-----------
IL_0000: ldarg.0
IL_0001: isinst [mscorlib]System.String
IL_0006: dup
IL_0007: stloc.0
IL_0008: brfalse.s IL_0011

IL_000a: ldloc.0
IL_000b: callvirt instance int32 [mscorlib]System.String::get_Length()
IL_0010: ret

IL_0011: ldc.i4.0
IL_0012: ret
Baseline
-----------
IL_0000: ldarg.0
IL_0001: brfalse.s IL_000a

IL_0003: ldarg.0
IL_0004: callvirt instance int32 [mscorlib]System.String::get_Length()
IL_0009: ret

IL_000a: ldc.i4.0
IL_000b: ret

Addendum A - BenchmarkDotNet Baseline

Update (12th April 2017): As Kristian Hellang points out in the comments, BenchmarkDotNet includes the option to label one of the benchmarked methods as a baseline. We do that by setting the Baseline parameter to true in the Benchmark attribute:

[Benchmark(Baseline=true)]
public int Baseline()
{
    if(input2 != null)
        return input2.Length;

    return 0;
}

Doing so causes BenchmarkDotNet to generate results that include the columns Scaled and Scaled-StdDev:

BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7 CPU 970 3.20GHz, ProcessorCount=12
Frequency=3128909 Hz, Resolution=319.6002 ns, Timer=TSC
  [Host]     : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0
  DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1637.0

    Method |      Mean |    StdDev | Scaled | Scaled-StdDev |
---------- |---------- |---------- |------- |-------------- |
 ClassicIs | 1.5005 ns | 0.0002 ns |   8.02 |          0.01 |
  ModernIs | 0.7678 ns | 0.0002 ns |   4.10 |          0.00 |
        As | 0.7694 ns | 0.0006 ns |   4.11 |          0.00 |
  Baseline | 0.1872 ns | 0.0002 ns |   1.00 |          0.00 |

Addendum B - Assembly Code Analysis

Update (18th April 2017): Following George Pollard's suggestion, I dug into the assembly code to see if the difference in the IL between modern is and as persisted. It did not, the JIT optimized the difference away, and on my computer, they therefore, have identical performance. Although highly unlikely, your results may differ, read on to find out why.

To access the assembly code from Visual Studio: set a breakpoint in each method, switch to debug mode, and then use Go To Disassembly (ALT+G) when your code hits the breakpoint. To ensure you get the optimized assembly code: set optimize code in the build tab of project properties, then in options / debugging / general, untick both Enable Just My Code and Suppress JIT optimization on module load (Managed only).

I examined the assembly code for modern is and as on both x86 and x64. While there were subtle differences between the x86 and x64 code, in both cases, the fully optimized assembly code was identical for modern is and as. So, despite the difference in the IL, this did not persist through to the assembly level and the difference was optimized away.

It should be noted that C#'s JIT (just-in-time) compiler is different from an ahead-of-time compiler like you would use in C++. When you compile a C++ program, you target a specific processor and operating system and the compiler generates an executable that is optimized for and only runs on that platform. The JIT compiles your C# program at runtime, so it can be optimized for and run on any platform that is supported by the JIT; even platforms that did not exist when you compiled your C# program into IL.

The result is that if you view the assembly code of modern is and as under a different JIT compiler or a different platform, it's possible you might find differences, because their IL is different and so it might be interpreted differently. However, this is extremely unlikely, since as we've already seen, their IL is equivalent, so a good optimizer should optimize them both to the fastest assembly code for a given platform, which should be the same for both as they're equivalent.