High performance Custom user fields

looking for examples/tutorial for custom user fields, not via EAV

EAV is going to be problematic for various reasons such as performance

  • there are many base entities/tables with over 100000 records each
  • there will likely be over a dozen attributes
  • the records are to be displayed in a flat ui grid incl. custom fields so flattening them would be an issue while maintaining performance
  • Looking at enabling this via DDL where all custom fields would go into a matching table such as

    <tablename>_custom_<userid>
    

    and all user attributes would map to a column each and all their metadata stored in a metadata table

    the retrieval would be simpler where the query would simply be

    select  * 
    from <tablename> A, tableName_custom_userid B 
    where B.KeyField = A.KeyField --( perhaps using outer join, haven't gone that far yet )
    

    Wondering if there are any gotchas down the road that i need to be aware of ?

    of course any samples/pointers would be helpful to kickstart the effort

    specifically would appreciate any advice on using DDL for Sql Server compact 4


    One technique I have seen used is to use a sort of 'hard-coded' EAV pattern. Don't hang up! It worked well with the dataset sizes you were talking about and didn't actually use EAV - it was only EAV-esque.

    The idea is to have a set of tables to store these custom attributes within it, with some triggers (described below) on them. The custom attributes tablesets store metadata about the attribute (what table it goes with, data type, constraints, etc). You can get very fancy with this but I did not haev the need.

    The triggers on your meta-tables are there to re-generate views that rollup base+extension into first class objects within the DB. So instead of table person + employee extension table, you have an employee view that includes both. When you drop a new value into the custom attributes tables, the triggers will re-roll the views and include the new stuff. If you wanted to go nuts, you could also have the triggers re-write stored procedures as well. Depending on how your mid-tier code is structured, you would still be forced to re-code some, however this would be the case anyway should you be applying rules that read the data.

    In testing, I found that for the relatively small # of records you're talking about, performance was somewhat slower but followed roughly the same pattern of degradation (2x the number of records, ~2x as slow).

    -- edits --

    How I saw it done, you had a table that represented your first class objects, so a row for 'person' and a row for 'employee,' etc. We'll call that FCO. Then you had a secondary table that stored what tables represented the FCO. We'll call that Srcs.. For person, there would be one row, which is the person table. For Employee, there would be two rows, the person table and the Employee extension. There is a third table, called Attribs, which stores the columns from the tables that constitute the FCO. For simplicity, we'll say Employee has ID, Name and Address, and Employee has Hire Date and Department, and obviously PersonID referring back to Person table. So, 2 rows in FCO table (person and employee), 3 rows in Src table, 8 rows in Attribs.

    The view, we'll call it vw_Employee, selects PersonID, Name, Address, Hire Date, Department from the two tables. It is built by a SQL stored procedure we'll call OnMetadataChange.

    This SP is fired (by trigger or batch process), and its purpose is to generate the CREATE VIEW statements. It will iterate through every First Class Object, collect which fields from which tables constitute the view, and will issue a CREATE statement based on that. So OnMetadataChange produces a DROP and CREATE for each view, it generates a dynamic SQL statement that is executed once per entry in FCO table. It is preferable to do this with Triggers but not necessary. Hopefully your FCO definitions won't change too often, and when they do, there will probably be a code release as well. You can run your OnMetadataChange SP at that time.

    The end result is a 2-layer database. The views constitute the First Class Object layer, which is meaningful to the application. The application only uses views. The tables constitute the 'physical' layer, which the application shouldn't care about. The meta-tables are essentially your mapping between the FCO layer and the physical layer. It takes some time to set it up, but it's quite effective, and gives you many of the benefits of EAV, while at the same time giving you the concrete benefits of 3nf tables (indexability, etc).

    If you'd like I can throw some sample SQL out there.


    Part of the problem you are having is that you are trying to store schema-less data in a SQL database, which is not its strength. There are three approaches that would make your life far easier:

    1) Have a column which stores the serialized custom fields, with whatever format is mst convenient. For example, this column could store xml. Upsides are that you can use SQL Server Compact and pulling back a record is trivial. Downsides are that you always have to pull/push the entire xml blob to do an update, and it is difficult to impossible to query on any custom fields.

    2) Upgrade to SQL Server Express, and use XML columns. This is nearly the same as the first suggestion, except that any server ready version of SQL Server has native support for XML data. These columns can have indexes added and fields within the data can be used in queries.

    3) Use a Schema-less Database, like MongoDB or CouchDB. These databases are all about storing schemaless data, so your custom fields will be no different than any other field. As such, you can index and query custom fields. Upsides are that custom data is incredibly easy to work with, downsides are that you would have to spend some time rethinking how you store data to fit within their model.

    If you do not need to query based on custom fields, or if you can query custom fields within business logic, then the first option can work for you. In any other case, I would err towards something with more capabilities than compact. If cost is the deciding factor, both SQL Server Express and MongoDB are free.

    链接地址: http://www.djcxy.com/p/68630.html

    上一篇: 如何从MySQL中的表中删除列

    下一篇: 高性能自定义用户字段