Find the one matching set in a number of sets

2018-06-30 17:04:39

I have a table (@t1) that contains a number of sets. I want to find a perfect match for @t2 in @t1.

In this example, the desired result is 1.

(Set 1 matches perfectly, set 2 contains three elements while @t2 only contains two elements, set 3 contains less elements than @t2, set 4 contains NULL elements which are not allowed in @t2 and set 5 contains the right number of elements but one of the elements is not equal.)

DECLARE @t1 TABLE (id INT, data INT);
DECLARE @t2 TABLE (data INT PRIMARY KEY);

INSERT INTO @t1 (id, data)
VALUES
(1, 1),
(1, 2),
(2, 1),
(2, 2),
(2, 3),
(3, 1),
(4, NULL),
(4, NULL),
(5, 1),
(5, 3);

INSERT @t2 (data)
VALUES
(1),
(2);

I have a query that may be getting the job done, but it looks somehow wretched too me.

WITH t1 AS
(
    SELECT id, data
    FROM @t1
    WHERE data IS NOT NULL
),
t1_count AS
(
    SELECT id, RCount = COUNT(*)
    FROM @t1
    WHERE data IS NOT NULL
    GROUP BY id
)
SELECT t1.id
FROM t1
JOIN t1_count ON t1.id = t1_count.id
FULL JOIN @t2 t2 ON t1.data = t2.data
WHERE t1_count.RCount = (SELECT RCount = COUNT(*) FROM @t2)
GROUP BY t1.id
HAVING COUNT(t1.data) = COUNT(t2.data);

EDIT (GarethD's comment):

WITH t1 AS
(
    SELECT
        id,
        data,
        RCount = COUNT(*) OVER(PARTITION BY id)
    FROM @t1
    WHERE data IS NOT NULL
)
SELECT t1.id
FROM t1
FULL JOIN @t2 t2 ON t1.data = t2.data
WHERE t1.RCount = (SELECT RCount = COUNT(*) FROM @t2)
GROUP BY t1.id
HAVING COUNT(t1.data) = COUNT(t2.data);

What you want is called Exact Relational Division . Unfortunately, SQL Server has no native operator for this, but it is a well-documented problem. One possible solution (idea taken from an article by Joe Celko) is to compare counts, similar to what you are already doing:

SELECT t1.id
  FROM @t1 AS t1 LEFT JOIN @t2 AS t2 ON t1.data = t2.data
 GROUP BY t1.id
HAVING COUNT(t1.data) = (SELECT COUNT(data) FROM @t2)
   AND COUNT(t2.data) = (SELECT COUNT(data) FROM @t2);

Note that both HAVING comparisons are necessary:

The first one ensures that t1 has exactly the required number of rows and

the second one ensures that these rows only contain values from t2 (otherwise, t2.data would be NULL via the LEFT JOIN. Recall that COUNT(x) only counts non-null values of x).

One way to solve this is to concatenate the values for each ID and the values in the second table and compare them. You can apply ordering of the concatenation as well.

For example, the following code will concatenate the values from the second table:

DECLARE @Test VARCHAR(MAX) = (

                                SELECT ',' + data
                                FROM @t2 
                                ORDER BY data
                                FOR XML PATH(''), TYPE
                           ).value('.', 'VARCHAR(MAX)')

SELECT @test -- 12

and the following will do this for the first table:

SELECT id
      ,(

            SELECT ',' + data
            FROM @t1 
            WHERE id = t1.id
            ORDER BY data
            FOR XML PATH(''), TYPE
       ).value('.', 'VARCHAR(MAX)')
FROM @t1 t1
GROUP BY id

You can filter the values easily with where clause:

SELECT *
FROM
(
    SELECT id
          ,(

                SELECT ',' + data
                FROM @t1 
                WHERE id = t1.id
                ORDER BY data
                FOR XML PATH(''), TYPE
           ).value('.', 'VARCHAR(MAX)') data
    FROM @t1 t1
    GROUP BY id
) DS
WHERE data = @test

Concatenating values in T-SQL does not look nice , but you can apply an aggregation concatenation function using SQL CLR (check this article) and you will have something like this then:

DECLARE @Test VARCHAR(MAX);

SELECT @Test = [dbo].[Concatenate] (data) 
FROM @t2 

SELECT id
FROM @t1 
GROUP BY id
HAVING [dbo].[Concatenate] (data) = @Test

Maybe this one seems simpler for you?

SELECT dat.id
FROM (
    SELECT t1.id as id, t2.data as data2, sourcecount.cnt as scnt, dataCount.cnt as dcnt, COUNT(*) OVER(PARTITION BY t1.id) as mcnt
    FROM @t1 as t1
    INNER JOIN (SELECT t.id, COUNT(*) as cnt FROM @t1 as t  GROUP BY t.id) as sourceCount
            ON t1.id = sourceCount.id
    INNER JOIN @t2 as t2
            ON t1.data = t2.data
    CROSS JOIN (SELECT COUNT(*) as cnt FROM @t2) as dataCount
) as dat
WHERE dat.scnt = dat.dcnt AND dat.mcnt = dat.dcnt
GROUP BY dat.id

Which produces by the way the same execution plan as yours, but is maybe a bit more readable.

Best regards, Ionic

链接地址: http://www.djcxy.com/p/85672.html

上一篇: 如何使用Drive.API从Google Drive下载文件？

下一篇: 找到多个组中的一个匹配组