Adding N number of dynamic columns in sql query

I have a table which is called datarecords which contains 7 fixed columns that are always required in select query. A user can add as many custom columns they want. I am storing this information in a table called datacolumn and the values are stored in another table called datavalue .

Now I want to create a query which bring the 7 fixed columns from datarecord , and then add custom columns and bring the data value from these tables since each data record have corresponding value in data value table.


You can try to PIVOT the custom attributes from rows into columns, but you'll find that even with support for PIVOT in Microsoft SQL Server, you need to know the attributes in advance of writing the query, and the query code needs to specify all the attributes. There's no way in SQL to ask for all the custom attributes to magically fill as many columns as necessary.

You can retrieve an arbitrary number of custom attributes only by fetching them row by row, as they are stored in the database. Then write application code to loop over the results. If you want, you can write a class to map the multiple rows of custom attributes into fields of an object in your application.

It's awkward and inelegant to query non-relational data using SQL. This is because SQL is designed to assume each logical entity of the same type has a fixed number of columns, and that you know the columns before you write the query. If your entity has variable attributes, it can't be stored as a relation, by definition.

Many people try to extend this using the design you're using, but they find it's hard to manage and doesn't scale well. This design is usually called the Entity-Attribute-Value model, or key-value pairs. For more details on the pitfalls of the EAV design, see my book SQL Antipatterns.

If you need to support custom attributes, here are a few alternatives:

  • Store all the custom attributes together in a BLOB, with some internal structure to delimit field names and values (Serialized LOB). You can optionally create inverted indexes to help you look up rows where a given field has a given value (see How FriendFeed Uses MySQL).

  • Use a document-oriented database such as MongoDB or Solr for the dynamic data.

  • Use ALTER TABLE to add conventional columns to the table when users need custom attributes. This means you either need to enforce the same set of custom attributes for all users, or else store all users' custom attributes and hope your table doesn't get too wide (Single Table Inheritance), or create a separate table per user, either for all columns (Concrete Table Inheritance) or for just the custom columns (Class Table Inheritance).


  • EDIT: See note at bottom for more detail.

    I am facing the same problem, and I found a solution that is slow. Maybe someone else has a solution for speeding up my findings. In my code, I have a table with three columns: Col1, Col2, Col3. Col1 is my record ID. Col2 is the name of my dynamic columns. Col3 is the value at that column. So if I wanted to represent a record with ID 1, two columns 2 and 3, and values at those columns: 4 and 5, I would have the following:

    Col1, Col2, Col3
    1, 2, 4
    1, 3, 5
    

    Then we pivot over column 2 and select the MAX (or MIN or AVG, doesn't matter since col2 and col3 combinations are unique) col3 in the pivot. In order to accomplish the pivot with a variable number of columns, we use dynamic SQL generation to generate our SQL. This works well for small input data (I believe the derived table inside the FROM clause of the dynamic SQL). Once your dataset gets large, the average function starts taking a long time to execute. A very long time. It looks like this starts at around 1000 rows, so maybe there's a hint or another method that makes this shorter.

    As a note, since the values for Col2 and Col3 map 1:1, I also tried dynamically generating a SELECT statement like the following:

    SELECT Col1,
       CASE WHEN Col2 = '4' THEN Col3 END [4],
       CASE WHEN Col2 = '5' THEN Col3 END [5],
       CASE WHEN Col2 = '6' THEN Col3 END [6], -- ... these were dyanmically generated
    FROM #example
    GROUP BY Col1
    

    This was just as slow for my dataset. Your milege may vary. Here is a full example of how this works written for SQL Server (2005+ should run this).

    --DROP TABLE #example
    CREATE TABLE #example
    (
        Col1 INT,
        Col2 INT,
        Col3 INT
    )
    
    INSERT INTO #example VALUES (2,4,10)
    INSERT INTO #example VALUES (2,5,20)
    INSERT INTO #example VALUES (2,6,30)
    INSERT INTO #example VALUES (2,7,40)
    INSERT INTO #example VALUES (2,8,50)
    INSERT INTO #example VALUES (3,4,11)
    INSERT INTO #example VALUES (3,5,22)
    INSERT INTO #example VALUES (3,6,33)
    INSERT INTO #example VALUES (3,7,44)
    INSERT INTO #example VALUES (3,8,55)
    
    DECLARE @columns VARCHAR(100)
    SET @columns = ''
    
    SELECT @columns = @columns + '[' + CAST(Col2 AS VARCHAR(10)) + '],'
    FROM (SELECT DISTINCT Col2 FROM #Example) a
    
    SELECT @columns = SUBSTRING(@columns, 0, LEN(@columns) )
    
    DECLARE @dsql NVARCHAR(MAX)
    
    SET @dsql = '
    select Col1, ' + @columns + '
    from
        (select Col1, Col2, Col3 FROM #example e) a
    PIVOT
    (
        MAX(Col3)
        FOR Col2 IN (' + @columns + ')
    ) p'
    
    print @dsql
    EXEC sp_executesql @dsql
    

    EDIT: Because of the unique situation in which I am doing this, I managed to get my speed-up using two tables (one with the entities and another with the attribute-value pairs), and creating a clustered index on the attribute-value pairs which includes all columns (ID, Attribute, Value). I recommend you work around this approach another way if you need fast inserts, large numbers of columns, many data rows, etc.. I have some known certainties about the size and growth rates of my data, and myy solution is suited to my scope.

    There are many other solutions which are better suited to solve this problem. For example, if you need fast inserts and single-record reads (or slow reads don't matter) you should consider packing an XML string into a field and serializing/deserializing in the database consumer. If you need ultra-fast writes, ultra-fast reads, and data columns are very rarely added then you may consider altering your table. This is a bad solution in most practice, but may fit some problems. If you have columns that change frequently enough, but you also need fast reads and writes are not an issue then my solution may work for you up to a certain dataset size.

    链接地址: http://www.djcxy.com/p/68628.html

    上一篇: 高性能自定义用户字段

    下一篇: 在sql查询中添加N个动态列