Linq to SQL creating extra sub select when doing join

2018-06-14 08:08:21

I have a simple parent child relationship that I would like to load with LINQ to SQL. I want to load the children at the same time as the parent. The generated SQL is doing too much work. It is trying to count the children as well as join to them. I will not update these objects. I will not add children to the parent. I'm only interested in reading it. I have simplified the tables down to the bare minimum. In reality I have more columns. LINQ to SQL is generating the following SQL

SELECT [t0].[parentId] AS [Id], [t0].[name], [t1].[childId] AS [Id2], 
[t1].[parentId], [t1].[name] AS [name2], 
( SELECT COUNT(*)
    FROM [dbo].[linqchild] AS [t2]
    WHERE [t2].[parentId] = [t0].[parentId]
) AS [value]
FROM [dbo].[linqparent] AS [t0]
LEFT OUTER JOIN [dbo].[linqchild] AS [t1] ON [t1].[parentId] = [t0].[parentId]
ORDER BY [t0].[parentId], [t1].[childId]

I don't know why the SELECT COUNT(*) ... is there. I'd rather it went away. Both the parent and child tables will have millions of rows in them in production. The extra query is costing a great deal of time. It seems unecessary. Is there a way to make it go away? I'm also not sure where the ORDER BY is coming from either.

The classes look like this.

[Table(Name = "dbo.linqparent")]
public class LinqParent
{
    [Column(Name = "parentId", AutoSync = AutoSync.OnInsert, IsPrimaryKey = true, IsDbGenerated = true, CanBeNull = false)]
    public long Id { get; set; }

    [ Column( Name = "name", CanBeNull = false ) ]
    public string name { get; set; }

    [Association(OtherKey = "parentId", ThisKey = "Id", IsForeignKey = true)]
    public IEnumerable<LinqChild> Kids { get; set; }
}


[Table(Name = "dbo.linqchild")]
public class LinqChild
{
    [Column(Name = "childId", AutoSync = AutoSync.OnInsert, IsPrimaryKey = true, IsDbGenerated = true, CanBeNull = false)]
    public long Id { get; set; }

    [ Column( Name = "parentId", CanBeNull = false ) ]
    public long parentId { get; set; }

    [Column(Name = "name", CanBeNull = false)]
    public string name { get; set; }
}

I'm using something like the following to query, there would be a where clause in production and an index that matches.

using (DataContext context = new DataContext(new DatabaseStringFinder().ConnectionString, new AttributeMappingSource()) { ObjectTrackingEnabled = false, DeferredLoadingEnabled = false })
{
    var loadOptions = new DataLoadOptions();
    loadOptions.LoadWith<LinqParent>(f => f.Kids);
    context.LoadOptions = loadOptions;
    var table = context.GetTable<LinqParent>();
    context.Log = Console.Out;

    // do something with table.
}

Unfortunately, no. ORM's are never the most performant solution; you'll always get better performance if you write your own SQL (or use stored procedures), but that's the tradeoff that gets made.

What you're seeing is standard practice with ORM's; rather than using a multiple result query (which seems to me to be the most efficient way, but I'm not an ORM library author), the ORM will flatten the entire graph into a single query and bring back all of the information it needs--including information that helps it determine what bits of data are duplicated--to rebuild the graph.

This is also where the ORDER BY comes from, as it requires that linked entities be in contiguous blocks.

The query that is being generated is not all that inefficient. If you look at the estimated execution plan you will see that the count(*) expense is very minimal. The order by clause should be ordering by your primary key which is probably your clustered index, so it also should have very little impact on performance.

One thing to make sure of when testing performance on your LINQ queries, is to make sure that the context.Log is not being set. Setting this to Console.Out will cause a huge performance hit.

Hope this helps.

Edit:

After looking a little closer at the execution plan, I see that even though my Count(*) was just a clustered index scan, it was still 33% of my execution, so I agree it is kind of annoying to have this extra sub-select in the sql. If this really is the performance bottle neck, then you might want to consider creating a view or stored proc to return your results.

链接地址: http://www.djcxy.com/p/40808.html

上一篇: 如何深入克隆实体并避免导航属性导致的错误？

下一篇: Linq to SQL在进行连接时创建额外的子选择