coalescing operator custom implicit conversion behaviour

Note: this appears to have been fixed in Roslyn

This question arose when writing my answer to this one, which talks about the associativity of the null-coalescing operator.

Just as a reminder, the idea of the null-coalescing operator is that an expression of the form

x ?? y

first evaluates x , then:

  • If the value of x is null, y is evaluated and that is the end result of the expression
  • If the value of x is non-null, y is not evaluated, and the value of x is the end result of the expression, after a conversion to the compile-time type of y if necessary
  • Now usually there's no need for a conversion, or it's just from a nullable type to a non-nullable one - usually the types are the same, or just from (say) int? to int . However, you can create your own implicit conversion operators, and those are used where necessary.

    For the simple case of x ?? y x ?? y , I haven't seen any odd behaviour. However, with (x ?? y) ?? z (x ?? y) ?? z I see some confusing behaviour.

    Here's a short but complete test program - the results are in the comments:

    using System;
    
    public struct A
    {
        public static implicit operator B(A input)
        {
            Console.WriteLine("A to B");
            return new B();
        }
    
        public static implicit operator C(A input)
        {
            Console.WriteLine("A to C");
            return new C();
        }
    }
    
    public struct B
    {
        public static implicit operator C(B input)
        {
            Console.WriteLine("B to C");
            return new C();
        }
    }
    
    public struct C {}
    
    class Test
    {
        static void Main()
        {
            A? x = new A();
            B? y = new B();
            C? z = new C();
            C zNotNull = new C();
    
            Console.WriteLine("First case");
            // This prints
            // A to B
            // A to B
            // B to C
            C? first = (x ?? y) ?? z;
    
            Console.WriteLine("Second case");
            // This prints
            // A to B
            // B to C
            var tmp = x ?? y;
            C? second = tmp ?? z;
    
            Console.WriteLine("Third case");
            // This prints
            // A to B
            // B to C
            C? third = (x ?? y) ?? zNotNull;
        }
    }
    

    So we have three custom value types, A , B and C , with conversions from A to B, A to C, and B to C.

    I can understand both the second case and the third case... but why is there an extra A to B conversion in the first case? In particular, I'd really have expected the first case and second case to be the same thing - it's just extracting an expression into a local variable, after all.

    Any takers on what's going on? I'm extremely hesistant to cry "bug" when it comes to the C# compiler, but I'm stumped as to what's going on...

    EDIT: Okay, here's a nastier example of what's going on, thanks to configurator's answer, which gives me further reason to think it's a bug. EDIT: The sample doesn't even need two null-coalescing operators now...

    using System;
    
    public struct A
    {
        public static implicit operator int(A input)
        {
            Console.WriteLine("A to int");
            return 10;
        }
    }
    
    class Test
    {
        static A? Foo()
        {
            Console.WriteLine("Foo() called");
            return new A();
        }
    
        static void Main()
        {
            int? y = 10;
    
            int? result = Foo() ?? y;
        }
    }
    

    The output of this is:

    Foo() called
    Foo() called
    A to int
    

    The fact that Foo() gets called twice here is hugely surprising to me - I can't see any reason for the expression to be evaluated twice.


    Thanks to everyone who contributed to analyzing this issue. It is clearly a compiler bug. It appears to only happen when there is a lifted conversion involving two nullable types on the left-hand side of the coalescing operator.

    I have not yet identified where precisely things go wrong, but at some point during the "nullable lowering" phase of compilation -- after initial analysis but before code generation -- we reduce the expression

    result = Foo() ?? y;
    

    from the example above to the moral equivalent of:

    A? temp = Foo();
    result = temp.HasValue ? 
        new int?(A.op_implicit(Foo().Value)) : 
        y;
    

    Clearly that is incorrect; the correct lowering is

    result = temp.HasValue ? 
        new int?(A.op_implicit(temp.Value)) : 
        y;
    

    My best guess based on my analysis so far is that the nullable optimizer is going off the rails here. We have a nullable optimizer that looks for situations where we know that a particular expression of nullable type cannot possibly be null. Consider the following naive analysis: we might first say that

    result = Foo() ?? y;
    

    is the same as

    A? temp = Foo();
    result = temp.HasValue ? 
        (int?) temp : 
        y;
    

    and then we might say that

    conversionResult = (int?) temp 
    

    is the same as

    A? temp2 = temp;
    conversionResult = temp2.HasValue ? 
        new int?(op_Implicit(temp2.Value)) : 
        (int?) null
    

    But the optimizer can step in and say "whoa, wait a minute, we already checked that temp is not null; there's no need to check it for null a second time just because we are calling a lifted conversion operator". We'd them optimize it away to just

    new int?(op_Implicit(temp2.Value)) 
    

    My guess is that we are somewhere caching the fact that the optimized form of (int?)Foo() is new int?(op_implicit(Foo().Value)) but that is not actually the optimized form we want; we want the optimized form of Foo()-replaced-with-temporary-and-then-converted.

    Many bugs in the C# compiler are a result of bad caching decisions. A word to the wise: every time you cache a fact for use later, you are potentially creating an inconsistency should something relevant change . In this case the relevant thing that has changed post initial analysis is that the call to Foo() should always be realized as a fetch of a temporary.

    We did a lot of reorganization of the nullable rewriting pass in C# 3.0. The bug reproduces in C# 3.0 and 4.0 but not in C# 2.0, which means that the bug was probably my bad. Sorry!

    I'll get a bug entered into the database and we'll see if we can get this fixed up for a future version of the language. Thanks again everyone for your analysis; it was very helpful!

    UPDATE: I rewrote the nullable optimizer from scratch for Roslyn; it now does a better job and avoids these sorts of weird errors. For some thoughts on how the optimizer in Roslyn works, see my series of articles which begins here: https://ericlippert.com/2012/12/20/nullable-micro-optimizations-part-one/


    This is most definitely a bug.

    public class Program {
        static A? X() {
            Console.WriteLine("X()");
            return new A();
        }
        static B? Y() {
            Console.WriteLine("Y()");
            return new B();
        }
        static C? Z() {
            Console.WriteLine("Z()");
            return new C();
        }
    
        public static void Main() {
            C? test = (X() ?? Y()) ?? Z();
        }
    }
    

    This code will output:

    X()
    X()
    A to B (0)
    X()
    X()
    A to B (0)
    B to C (0)
    

    That made me think that the first part of each ?? coalesce expression is evaluated twice. This code proved it:

    B? test= (X() ?? Y());
    

    outputs:

    X()
    X()
    A to B (0)
    

    This seems to happen only when the expression requires a conversion between two nullable types; I've tried various permutations with one of the sides being a string, and none of them caused this behaviour.


    If you take a look at the generated code for the Left-grouped case it actually does something like this ( csc /optimize- ):

    C? first;
    A? atemp = a;
    B? btemp = (atemp.HasValue ? new B?(a.Value) : b);
    if (btemp.HasValue)
    {
        first = new C?((atemp.HasValue ? new B?(a.Value) : b).Value);
    }
    

    Another find, if you use first it will generate a shortcut if both a and b are null and return c . Yet if a or b is non-null it re-evaluates a as part of the implicit conversion to B before returning which of a or b is non-null.

    From the C# 4.0 Specification, §6.1.4:

  • If the nullable conversion is from S? to T? :
  • If the source value is null ( HasValue property is false ), the result is the null value of type T? .
  • Otherwise, the conversion is evaluated as an unwrapping from S? to S , followed by the underlying conversion from S to T , followed by a wrapping (§4.1.10) from T to T? .
  • This appears to explain the second unwrapping-wrapping combination.


    The C# 2008 and 2010 compiler produce very similar code, however this looks like a regression from the C# 2005 compiler (8.00.50727.4927) which generates the following code for the above:

    A? a = x;
    B? b = a.HasValue ? new B?(a.GetValueOrDefault()) : y;
    C? first = b.HasValue ? new C?(b.GetValueOrDefault()) : z;
    

    I wonder if this is not due to the additional magic given to the type inference system?

    链接地址: http://www.djcxy.com/p/9938.html

    上一篇: PHP三元运算符vs null合并运算符

    下一篇: 合并运算符自定义隐式转换行为