coalescing operator custom implicit conversion behaviour

2018-06-02 18:56:47

Note: this appears to have been fixed in Roslyn

This question arose when writing my answer to this one, which talks about the associativity of the null-coalescing operator.

Just as a reminder, the idea of the null-coalescing operator is that an expression of the form

x ?? y

first evaluates x , then:

If the value of x is null, y is evaluated and that is the end result of the expression

If the value of x is non-null, y is not evaluated, and the value of x is the end result of the expression, after a conversion to the compile-time type of y if necessary

Now usually there's no need for a conversion, or it's just from a nullable type to a non-nullable one - usually the types are the same, or just from (say) int? to int . However, you can create your own implicit conversion operators, and those are used where necessary.

For the simple case of x ?? y x ?? y , I haven't seen any odd behaviour. However, with (x ?? y) ?? z (x ?? y) ?? z I see some confusing behaviour.

Here's a short but complete test program - the results are in the comments:

using System;

public struct A
{
    public static implicit operator B(A input)
    {
        Console.WriteLine("A to B");
        return new B();
    }

    public static implicit operator C(A input)
    {
        Console.WriteLine("A to C");
        return new C();
    }
}

public struct B
{
    public static implicit operator C(B input)
    {
        Console.WriteLine("B to C");
        return new C();
    }
}

public struct C {}

class Test
{
    static void Main()
    {
        A? x = new A();
        B? y = new B();
        C? z = new C();
        C zNotNull = new C();

        Console.WriteLine("First case");
        // This prints
        // A to B
        // A to B
        // B to C
        C? first = (x ?? y) ?? z;

        Console.WriteLine("Second case");
        // This prints
        // A to B
        // B to C
        var tmp = x ?? y;
        C? second = tmp ?? z;

        Console.WriteLine("Third case");
        // This prints
        // A to B
        // B to C
        C? third = (x ?? y) ?? zNotNull;
    }
}

So we have three custom value types, A , B and C , with conversions from A to B, A to C, and B to C.

I can understand both the second case and the third case... but why is there an extra A to B conversion in the first case? In particular, I'd really have expected the first case and second case to be the same thing - it's just extracting an expression into a local variable, after all.

Any takers on what's going on? I'm extremely hesistant to cry "bug" when it comes to the C# compiler, but I'm stumped as to what's going on...

EDIT: Okay, here's a nastier example of what's going on, thanks to configurator's answer, which gives me further reason to think it's a bug. EDIT: The sample doesn't even need two null-coalescing operators now...

using System;

public struct A
{
    public static implicit operator int(A input)
    {
        Console.WriteLine("A to int");
        return 10;
    }
}

class Test
{
    static A? Foo()
    {
        Console.WriteLine("Foo() called");
        return new A();
    }

    static void Main()
    {
        int? y = 10;

        int? result = Foo() ?? y;
    }
}

The output of this is:

Foo() called
Foo() called
A to int

The fact that Foo() gets called twice here is hugely surprising to me - I can't see any reason for the expression to be evaluated twice.

Thanks to everyone who contributed to analyzing this issue. It is clearly a compiler bug. It appears to only happen when there is a lifted conversion involving two nullable types on the left-hand side of the coalescing operator.

I have not yet identified where precisely things go wrong, but at some point during the "nullable lowering" phase of compilation -- after initial analysis but before code generation -- we reduce the expression

result = Foo() ?? y;

from the example above to the moral equivalent of:

A? temp = Foo();
result = temp.HasValue ? 
    new int?(A.op_implicit(Foo().Value)) : 
    y;

Clearly that is incorrect; the correct lowering is

result = temp.HasValue ? 
    new int?(A.op_implicit(temp.Value)) : 
    y;

My best guess based on my analysis so far is that the nullable optimizer is going off the rails here. We have a nullable optimizer that looks for situations where we know that a particular expression of nullable type cannot possibly be null. Consider the following naive analysis: we might first say that

result = Foo() ?? y;

is the same as

A? temp = Foo();
result = temp.HasValue ? 
    (int?) temp : 
    y;

and then we might say that

conversionResult = (int?) temp

is the same as

A? temp2 = temp;
conversionResult = temp2.HasValue ? 
    new int?(op_Implicit(temp2.Value)) : 
    (int?) null

But the optimizer can step in and say "whoa, wait a minute, we already checked that temp is not null; there's no need to check it for null a second time just because we are calling a lifted conversion operator". We'd them optimize it away to just

new int?(op_Implicit(temp2.Value))

My guess is that we are somewhere caching the fact that the optimized form of (int?)Foo() is new int?(op_implicit(Foo().Value)) but that is not actually the optimized form we want; we want the optimized form of Foo()-replaced-with-temporary-and-then-converted.

Many bugs in the C# compiler are a result of bad caching decisions. A word to the wise: every time you cache a fact for use later, you are potentially creating an inconsistency should something relevant change . In this case the relevant thing that has changed post initial analysis is that the call to Foo() should always be realized as a fetch of a temporary.

We did a lot of reorganization of the nullable rewriting pass in C# 3.0. The bug reproduces in C# 3.0 and 4.0 but not in C# 2.0, which means that the bug was probably my bad. Sorry!

I'll get a bug entered into the database and we'll see if we can get this fixed up for a future version of the language. Thanks again everyone for your analysis; it was very helpful!

UPDATE: I rewrote the nullable optimizer from scratch for Roslyn; it now does a better job and avoids these sorts of weird errors. For some thoughts on how the optimizer in Roslyn works, see my series of articles which begins here: https://ericlippert.com/2012/12/20/nullable-micro-optimizations-part-one/

This is most definitely a bug.

public class Program {
    static A? X() {
        Console.WriteLine("X()");
        return new A();
    }
    static B? Y() {
        Console.WriteLine("Y()");
        return new B();
    }
    static C? Z() {
        Console.WriteLine("Z()");
        return new C();
    }

    public static void Main() {
        C? test = (X() ?? Y()) ?? Z();
    }
}

This code will output:

X()
X()
A to B (0)
X()
X()
A to B (0)
B to C (0)

That made me think that the first part of each ?? coalesce expression is evaluated twice. This code proved it:

B? test= (X() ?? Y());

outputs:

X()
X()
A to B (0)

This seems to happen only when the expression requires a conversion between two nullable types; I've tried various permutations with one of the sides being a string, and none of them caused this behaviour.

If you take a look at the generated code for the Left-grouped case it actually does something like this ( csc /optimize- ):

C? first;
A? atemp = a;
B? btemp = (atemp.HasValue ? new B?(a.Value) : b);
if (btemp.HasValue)
{
    first = new C?((atemp.HasValue ? new B?(a.Value) : b).Value);
}

Another find, if you use first it will generate a shortcut if both a and b are null and return c . Yet if a or b is non-null it re-evaluates a as part of the implicit conversion to B before returning which of a or b is non-null.

From the C# 4.0 Specification, §6.1.4:

If the nullable conversion is from S? to T? :

If the source value is null ( HasValue property is false ), the result is the null value of type T? .

Otherwise, the conversion is evaluated as an unwrapping from S? to S , followed by the underlying conversion from S to T , followed by a wrapping (§4.1.10) from T to T? .

This appears to explain the second unwrapping-wrapping combination.

The C# 2008 and 2010 compiler produce very similar code, however this looks like a regression from the C# 2005 compiler (8.00.50727.4927) which generates the following code for the above:

A? a = x;
B? b = a.HasValue ? new B?(a.GetValueOrDefault()) : y;
C? first = b.HasValue ? new C?(b.GetValueOrDefault()) : z;

I wonder if this is not due to the additional magic given to the type inference system?

链接地址: http://www.djcxy.com/p/9938.html

上一篇: PHP三元运算符vs null合并运算符

下一篇: 合并运算符自定义隐式转换行为