Variation in speed of SQL inserts
The script I'm working on is designed to update a database table which records the country of use and the status of all IP addresses (or almost all of them). Currently I'm keeping it simple and am only fetching data from the 5 RIRs (Regional Internet Registries) and saving that to my database.
Initially the speeds were impractical but they have been improved signficantly by reducing the amount of information in the log and grouping the SQL inserts into groups of 1000 and using a single query. However, when running the script now I get very large variations in the speed of the SQL inserts and I was wondering if anyone knew why.
Here are the some of the speeds I've recorded. In the test I separated out the time taken to execute the iterations of the script in PHP and the time taken to apply the sql statement, I've not included the PHP times in the list below as the effect was negligible; no more than 1 second for even the largest blocks of data.
Test Speeds (number of data rows being inserted remains the same throughout)
Test 1 Total SQL executing time: 33 seconds
Test 2 Total SQL executing time: 72 seconds
Test 3 Total SQL executing time: 78 seconds
Other tests continued to fluctuate between ~30 seconds and ~80 seconds.
I have two questions:
1) Should I accept these disparities as the way of the world, or is there a reason for them?
2) I felt nervous about lumping the ~185000 row inserts into one query. Is there any reason I should avoid using one query for these inserts? I've not worked with this amount of data being saved at one time before.
Thank you
__
The database table is as follows.
Sorage Engine - InnoDB
Columns:
id - int, primary key
registry - varchar(7)
code - varchar(2)
type - varchar(4)
start - varchar(15)
value - int
date - datetime
status - varchar(10)
1) Should I accept these disparities as the way of the world, or is there a reason for them?
Variations in speed may be due to competing processes using the disk-IO - so waiting for resources. If this is a production server not a lonely testing server then certainly some other processes are requesting access to the disk.
2) I felt nervous about lumping the ~185000 row inserts into one query. Is there any reason I should avoid using one query for these inserts? I've not worked with this amount of data being saved at one time before.
You should also divide the inserts into groups of X inserts, and insert each group as a transaction.
Determining the value of X some other way except experimentally is hard.
Grouping inserts into transactions ensures data is written (committed) to disk only after each transaction not after each (auto committed) insert.
This has a good effect on disk-IO and if you group to many inserts into one transaction it can have a bad effect on available memory. If the amount of uncommitted data is too big for current memory the DBMS will start writing the data to an internal log (on disk).
So X depends on the number of inserts, the amount of data associated with each insert, the allowed memory/user/session parameters. And many other things.
There are some cool (free) tools from percona. They help you monitor DB activity.
You can also look at vmstat watch -n .5 'vmstat'
See the amount and variation of data being written to disk by the activities of the production environment.
Start your script up and wait until you notice a step up in the number of bytes being written to disk. If writing the step up is pretty much a constant value (above the normal production use) then it's thrashing & swapping, if it's rhythmical then it's only writing for commits.
链接地址: http://www.djcxy.com/p/11280.html下一篇: SQL插入速度的变化