Strategy to optimize this large SQL insert via C#?

2018-06-26 13:25:09

I have about 1.5 million files I need to insert records for in the database. Each record is inserted with a key that includes the name of the file.

The catch: The files are not uniquely identified currently.

So, what we'd like to do is, for each file:

Insert a record. One of the fields in the record should include an amazon S3 key which should include the ID of the newly inserted record.

Rename the file to include the ID so that it matches the format of the key.

The best thing I can think to do is:

Run an individual insert command that returns the ID of the added row.

Add that back as a property to the individual business object I'm looping through.

Generate an update statement that updates the S3 key to include the ID

Output the file, concatenate the ID into the end the file name.

As I can tell, that looks to be :

1.5 million insert statements

individual SqlCommand executions and read because we need the ID back),

1.5 million times setting a property on an object.

1.5 million update statements generated and executed

Perhaps could make this a one giant concatenated update statement to do them all at once; not sure if that helps

1.5 million file copies.

I can't get around the actual file part, but for the rest, is there a better strategy I'm not seeing?

If you make the client application generate the IDs you can use a straight-forward SqlBulkCopy to insert all rows at once. It will be done in seconds.

If you want to keep the IDENTITY property of the column, you can run a DBCC CHECKIDENT(RESEED) to advance the identity counter by 1.5m to give you a guaranteed gap that you can insert into. If the number of rows is not statically known you can perform the inserting in smaller chunks of maybe 100k until you are done.

You will cut the number of SQL statements in half up by not relying on the database to generate your ID for each row. Do everything locally (including the assignment of an ID) and then do a single batch of inserts at the end, with identity_insert on .

This will cause SQL Server to use your ID's for this batch of records.

If this is still too slow (and 1.5 million inserts might be), the next step would be to output your data to a text file (XML, comma delimited, or whatever) and then do a bulk import operation on the file.

That's as fast as you will be able to make it, I think.

链接地址: http://www.djcxy.com/p/74286.html

上一篇: 以Play 2.1为后端构建AngularJS客户端代码的好方法

下一篇: 通过C＃优化这个大型SQL插入的策略？