Linq to combine (or) join two datatables into one

2018-06-19 01:58:19

I'm having problem in getting correct data from two datatables into one by using Linq in C#.
My datatables' data are coming from Excel file reading (not from DB).

I have tried below linq but return rows count is not what I want (my goal is to retrieve all data but for verification, I'm checking on row count so that I can know is it correct or not easily).

In dt1, I have 2645 records.
In dt2, I have 2600 records.

Return row count is 2600 (it looks like it is doing right join logic).

var v1 = from d1 in dt1.AsEnumerable()
         from d2 in dt2.AsEnumerable()
         .Where(x => x.Field<string>(X_ITEM_CODE) == d1.Field<string>(X_NO) 
         || x.Field<string>(X_ITEM_KEY) == d1.Field<string>(X_NO))
         select dt1.LoadDataRow(new object[] 
         { 
             // I use short cut way instead of Field<string> for testing purpose.
             d1[X_NO],
             d2[X_ITEM_CODE] == null ? "" : d2[X_ITEM_CODE] ,
             d2[X_ITEM_KEY] == null ? "" : d2[X_ITEM_KEY],
             d2[X_COSTS],
             d2[X_DESC],
             d2[X_QTY]== null ? 0 : dt[X_QTY]
         }, false);

         dt1 = v1.CopyToDataTable();
         Console.WriteLine(dt1.Rows.Count);

I tried to use 'join' but my problem is the X_NO value can be either in X_ITEM_CODE or X_ITEM_KEY, so I can only put one condition in ON xxx equals yyy.

I would like to try 'join' if my above condition is suitable to use too. Please provide me some guide. Thanks.

[Additional Info]
I already tried foreach loop + dt1.Select(xxxx) + dt1.Rows.Add(xxx), it is working well but with around 2 minutes to complete the job.
I'm looking for a faster way and from above Linq code I tried, it seems faster than my foreach looping so I want to give Linq a chance.
For demo purpose, I only put a few columns in above example, my actual column count is 12 columns.

I afraid my post will become very long if I put on my foreach loop so I skip it when I post this question.
Anyway, below is the code and sample data. For those who can edit and think it is too long, kindly take out unnecessary/unrelated code or lines.

DataRow[] drs = null;
DataRow drO = null;

foreach (DataRow drY in dt2.Rows)
{
    drs = null;
    drs = dt1.Select(X_NO + "='" + drY[X_ITEM_KEY] + "' OR " + X_NO + "='" + drY[X_ITEM_CODE] + "'");
    if (drs.Length >= 0)
    {
        // drs Leng will always 1 because no duplicate.
        drs[0][X_ITEM_CODE] = drY[X_ITEM_CODE];
        drs[0][X_ITEM_KEY] = drY[X_ITEM_KEY];
        drs[0][X_COST] = clsD.GetInt(drY[X_COST]);      // If null, return 0.
        drs[0][X_DESC] = clsD.GetStr(drY[X_DESC]);      // If null, return "".
        drs[0][X_QTY] = clsD.GetInt(drY[X_QTY]);
    }
    else
    {
        // Not Found in ITEM CODE or KEY, add it.
        drO = dtOutput.NewRow();
        drO[X_ITEM_CODE] = drY[X_ITEM_CODE];
        drO[X_ITEM_KEY] = drY[X_ITEM_KEY];
        drO[X_COST] = clsD.GetInt(drY[X_COST]);
        drO[X_DESC] = clsD.GetStr(drY[X_DESC]);
        drO[X_QTY] = clsD.GetInt(drY[X_QTY]);
        dt1.Rows.Add(drO);
    }
}
// Note: For above else condition, I didn't put in my Linq testing yet.
// If without else condition, my dt1 will still have same record count.

[dt1 data]
X_NO,X_ITEM_CODE,X_ITEM_KEY,COST,DESC,QTY,....
AA060210A,,,,,,....
AB060220A,,,,,....
AC060230A,,,,,....
AD060240A,,,,,....

[dt2 data]
X_ITEM_CODE,X_ITEM_KEY,COST,DESC,QTY
AA060210A,AA060211A,100.00,PART1,10000
AB060221A,AB060220A,120.00,PART2,500
AC060232A,AC060230A,150.00,PART3,100
AD060240A,AD060243A,4.50,PART4,15250

[Update 2]
I tried below 'join' and it return nothing. So, can I assume join also will not help?

var vTemp1 = from d1 in dt1.AsEnumerable()
             join d2 in dt2.AsEnumerable()
             on 1 equals 1
             where (d1[X_NO] == d2[X_ITEM_CODE] || d1[X_NO] == d2[X_ITEM_KEY])
             select dt1.LoadDataRow(new object[] 
             { 
                d1[X_NO],
                d2[X_ITEM_CODE] == null ? "" : d2[X_ITEM_CODE] ,
                d2[X_ITEM_KEY] == null ? "" : d2[X_ITEM_KEY],
                d2[X_COST],
                d2[X_DESC],
                d2[X_QTY]== null ? 0 : d2[X_QTY]
             }, false);

Console.WriteLine(vTemp1.Count()); // return zero.

LINQ supports only equijoins, so apparently join operator cannot be used. But using LINQ query with Cartesian product and where will not give you any performance improvement.

What you really need (being LINQ or not) is a fast lookup by dt1[X_NO] field. Since as you said it is unique, you can build and use a dictionary for that:

var dr1ByXNo = dt1.AsEnumerable().ToDictionary(dr => dr.Field<string>(X_NO));

and then modify your process like this:

foreach (DataRow drY in dt2.Rows)
{
    if (dr1ByXNo.TryGetValue(drY.Field<string>(X_ITEM_KEY), out dr0) ||
        dr1ByXNo.TryGetValue(drY.Field<string>(X_ITEM_CODE), out dr0)) 
    {
        dr0[X_ITEM_CODE] = drY[X_ITEM_CODE];
        dr0[X_ITEM_KEY] = drY[X_ITEM_KEY];
        dr0[X_COST] = clsD.GetInt(drY[X_COST]);      // If null, return 0.
        dr0[X_DESC] = clsD.GetStr(drY[X_DESC]);      // If null, return "".
        dr0[X_QTY] = clsD.GetInt(drY[X_QTY]);
    }
    else
    {
        // Not Found in ITEM CODE or KEY, add it.
        drO = dtOutput.NewRow();
        drO[X_ITEM_CODE] = drY[X_ITEM_CODE];
        drO[X_ITEM_KEY] = drY[X_ITEM_KEY];
        drO[X_COST] = clsD.GetInt(drY[X_COST]);
        drO[X_DESC] = clsD.GetStr(drY[X_DESC]);
        drO[X_QTY] = clsD.GetInt(drY[X_QTY]);
        dt1.Rows.Add(drO);
    }
}

Since you are adding new records to the dt1 during the process, depending of your requirements you might need to add at the end of the else (after dt1.Rows.Add(drO); line) the following

dr1ByXNo.Add(dr0.Field<string>(X_NO), dr0);

I didn't include it because I don't see your code setting the new record X_NO field, so the above will produce duplicate key exception.

链接地址: http://www.djcxy.com/p/53750.html

上一篇: 我在迭代数据读取器对象时想知道“yield”的连接状态和对代码性能的影响

下一篇: Linq将（或）将两个数据表合并为一个