Find the one matching set in a number of sets

I have a table (@t1) that contains a number of sets. I want to find a perfect match for @t2 in @t1.

In this example, the desired result is 1.

(Set 1 matches perfectly, set 2 contains three elements while @t2 only contains two elements, set 3 contains less elements than @t2, set 4 contains NULL elements which are not allowed in @t2 and set 5 contains the right number of elements but one of the elements is not equal.)

DECLARE @t1 TABLE (id INT, data INT);
DECLARE @t2 TABLE (data INT PRIMARY KEY);

INSERT INTO @t1 (id, data)
VALUES
(1, 1),
(1, 2),
(2, 1),
(2, 2),
(2, 3),
(3, 1),
(4, NULL),
(4, NULL),
(5, 1),
(5, 3);

INSERT @t2 (data)
VALUES
(1),
(2);

I have a query that may be getting the job done, but it looks somehow wretched too me.

WITH t1 AS
(
    SELECT id, data
    FROM @t1
    WHERE data IS NOT NULL
),
t1_count AS
(
    SELECT id, RCount = COUNT(*)
    FROM @t1
    WHERE data IS NOT NULL
    GROUP BY id
)
SELECT t1.id
FROM t1
JOIN t1_count ON t1.id = t1_count.id
FULL JOIN @t2 t2 ON t1.data = t2.data
WHERE t1_count.RCount = (SELECT RCount = COUNT(*) FROM @t2)
GROUP BY t1.id
HAVING COUNT(t1.data) = COUNT(t2.data);

EDIT (GarethD's comment):

WITH t1 AS
(
    SELECT
        id,
        data,
        RCount = COUNT(*) OVER(PARTITION BY id)
    FROM @t1
    WHERE data IS NOT NULL
)
SELECT t1.id
FROM t1
FULL JOIN @t2 t2 ON t1.data = t2.data
WHERE t1.RCount = (SELECT RCount = COUNT(*) FROM @t2)
GROUP BY t1.id
HAVING COUNT(t1.data) = COUNT(t2.data);

What you want is called Exact Relational Division . Unfortunately, SQL Server has no native operator for this, but it is a well-documented problem. One possible solution (idea taken from an article by Joe Celko) is to compare counts, similar to what you are already doing:

SELECT t1.id
  FROM @t1 AS t1 LEFT JOIN @t2 AS t2 ON t1.data = t2.data
 GROUP BY t1.id
HAVING COUNT(t1.data) = (SELECT COUNT(data) FROM @t2)
   AND COUNT(t2.data) = (SELECT COUNT(data) FROM @t2);

Note that both HAVING comparisons are necessary:

  • The first one ensures that t1 has exactly the required number of rows and
  • the second one ensures that these rows only contain values from t2 (otherwise, t2.data would be NULL via the LEFT JOIN. Recall that COUNT(x) only counts non-null values of x).

  • One way to solve this is to concatenate the values for each ID and the values in the second table and compare them. You can apply ordering of the concatenation as well.

    For example, the following code will concatenate the values from the second table:

    DECLARE @Test VARCHAR(MAX) = (
    
                                    SELECT ',' + data
                                    FROM @t2 
                                    ORDER BY data
                                    FOR XML PATH(''), TYPE
                               ).value('.', 'VARCHAR(MAX)')
    
    SELECT @test -- 12
    

    and the following will do this for the first table:

    SELECT id
          ,(
    
                SELECT ',' + data
                FROM @t1 
                WHERE id = t1.id
                ORDER BY data
                FOR XML PATH(''), TYPE
           ).value('.', 'VARCHAR(MAX)')
    FROM @t1 t1
    GROUP BY id
    

    You can filter the values easily with where clause:

    SELECT *
    FROM
    (
        SELECT id
              ,(
    
                    SELECT ',' + data
                    FROM @t1 
                    WHERE id = t1.id
                    ORDER BY data
                    FOR XML PATH(''), TYPE
               ).value('.', 'VARCHAR(MAX)') data
        FROM @t1 t1
        GROUP BY id
    ) DS
    WHERE data = @test
    

    Concatenating values in T-SQL does not look nice , but you can apply an aggregation concatenation function using SQL CLR (check this article) and you will have something like this then:

    DECLARE @Test VARCHAR(MAX);
    
    SELECT @Test = [dbo].[Concatenate] (data) 
    FROM @t2 
    
    SELECT id
    FROM @t1 
    GROUP BY id
    HAVING [dbo].[Concatenate] (data) = @Test
    

    Maybe this one seems simpler for you?

    SELECT dat.id
    FROM (
        SELECT t1.id as id, t2.data as data2, sourcecount.cnt as scnt, dataCount.cnt as dcnt, COUNT(*) OVER(PARTITION BY t1.id) as mcnt
        FROM @t1 as t1
        INNER JOIN (SELECT t.id, COUNT(*) as cnt FROM @t1 as t  GROUP BY t.id) as sourceCount
                ON t1.id = sourceCount.id
        INNER JOIN @t2 as t2
                ON t1.data = t2.data
        CROSS JOIN (SELECT COUNT(*) as cnt FROM @t2) as dataCount
    ) as dat
    WHERE dat.scnt = dat.dcnt AND dat.mcnt = dat.dcnt
    GROUP BY dat.id
    

    Which produces by the way the same execution plan as yours, but is maybe a bit more readable.

    Best regards, Ionic

    链接地址: http://www.djcxy.com/p/85672.html

    上一篇: 如何使用Drive.API从Google Drive下载文件?

    下一篇: 找到多个组中的一个匹配组