In this case MyISAM is dramatically faster than InnoDB in mysql
I have been writing the results from an algorithm that calculates distances between customers in a InnoDB table. For example if my customers were A, B, C and D, the table in the database looks like this, among other columns:
From | To | Distance
A B 344
A C 274
A D 182
B C 338
And so on... It is a lot of rows I think I will hit 50 million.
The other columns are product_type and value. Those tell me how much the customer B (customer_to in the columns) buys of that product_type. That means that I have each pair multiple times depending on how many product_types the customer B buys.
I needed to to a query to group each customer with the products his neighbors buys and the value. The query looks like this:
select customer_from, product_type, avg(value) as opportunity
from customer_distances
where distance < 500
group by customer_from, product_type
order by opportunity desc;
The innodb table could not answer me that query. Despite I changed the net_read_timeout to 28800, the mysql connection was lost during the query.
I tough it had something to do with innodb build for transactional processing and not for intensive queries. So I created a new table with MyIsam as engine and insert-select all the records from the innodb table.
As expected, the select was very fast (70 segs) and all other selects like count( distinct customer_from), where almost instantaneous.
Just for curiosity I tried to continue the process of inserting the distances in the myisam table. It was a surprise for me when the program started to run at least 100 times faster than when it was working on the innodb table -for INSERTS!
For each customer the program inserts something like 3000 rows (one for each neighbor for each product_type. Something like 300 neighbors and 10 product_types per customer). With the innodb table inserting a single customer took something between 40 and 60 seconds (aprox. 3000 rows). With the myisam table, it takes 1 second to insert 3 customers (9000 rows aprox).
Some extra information:
So in summary the question is: Why is MyISAM that fast with insert statements? What do you think?
EDIT 1: I'm adding the create statements for both tables, the innodb and myisam. EDIT 2: I deleted some unuseful information and formated a little bit here and there.
/* INNODB TABLE */
CREATE TABLE `customer_distances` (
`customer_from` varchar(50) NOT NULL,
`customer_from_type` varchar(50) DEFAULT NULL,
`customer_from_segment` varchar(50) DEFAULT NULL,
`customer_from_district` int(11) DEFAULT NULL,
`customer_from_zone` int(11) DEFAULT NULL,
`customer_from_longitud` decimal(15,6) DEFAULT NULL,
`customer_from_latitud` decimal(15,6) DEFAULT NULL,
`customer_to` varchar(50) NOT NULL,
`customer_to_type` varchar(50) DEFAULT NULL,
`customer_to_segment` varchar(50) DEFAULT NULL,
`customer_to_district` int(11) DEFAULT NULL,
`customer_to_zone` int(11) DEFAULT NULL,
`customer_to_longitud` decimal(15,6) DEFAULT NULL,
`customer_to_latitud` decimal(15,6) DEFAULT NULL,
`distance` decimal(10,2) DEFAULT NULL,
`product_business_line` varchar(50) DEFAULT NULL,
`product_type` varchar(50) NOT NULL,
`customer_from_liters` decimal(10,2) DEFAULT NULL,
`customer_from_dollars` decimal(10,2) DEFAULT NULL,
`customer_from_units` decimal(10,2) DEFAULT NULL,
`customer_to_liters` decimal(10,2) DEFAULT NULL,
`customer_to_dollars` decimal(10,2) DEFAULT NULL,
`customer_to_units` decimal(10,2) DEFAULT NULL,
`liters_opportunity` decimal(10,2) DEFAULT NULL,
`dollars_opportunity` decimal(10,2) DEFAULT NULL,
`units_oportunity` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`cliente_desde`,`cliente_hasta`,`grupo`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/* MYISAM TABLE */
CREATE TABLE `customer_distances` (
`customer_from` varchar(50) NOT NULL,
`customer_from_type` varchar(50) DEFAULT NULL,
`customer_from_segment` varchar(50) DEFAULT NULL,
`customer_from_district` int(11) DEFAULT NULL,
`customer_from_zone` int(11) DEFAULT NULL,
`customer_from_longitud` decimal(15,6) DEFAULT NULL,
`customer_from_latitud` decimal(15,6) DEFAULT NULL,
`customer_to` varchar(50) NOT NULL,
`customer_to_type` varchar(50) DEFAULT NULL,
`customer_to_segment` varchar(50) DEFAULT NULL,
`customer_to_district` int(11) DEFAULT NULL,
`customer_to_zone` int(11) DEFAULT NULL,
`customer_to_longitud` decimal(15,6) DEFAULT NULL,
`customer_to_latitud` decimal(15,6) DEFAULT NULL,
`distance` decimal(10,2) DEFAULT NULL,
`product_business_line` varchar(50) DEFAULT NULL,
`product_type` varchar(50) NOT NULL,
`customer_from_liters` decimal(10,2) DEFAULT NULL,
`customer_from_dollars` decimal(10,2) DEFAULT NULL,
`customer_from_units` decimal(10,2) DEFAULT NULL,
`customer_to_liters` decimal(10,2) DEFAULT NULL,
`customer_to_dollars` decimal(10,2) DEFAULT NULL,
`customer_to_units` decimal(10,2) DEFAULT NULL,
`liters_opportunity` decimal(10,2) DEFAULT NULL,
`dollars_opportunity` decimal(10,2) DEFAULT NULL,
`units_oportunity` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`cliente_desde`,`cliente_hasta`,`grupo`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Inserts
INSERT
immediately. This can be remedied by clumping 100-1000 rows at a time. autocommit
and BEGIN..COMMIT
. Select
Tuning
key_buffer_size
to 20% of RAM and innodb_buffer_pool_size=0
. key_buffer_size
to only 10M and innodb_buffer_pool_size
to 70% of RAM. Normalization and saving space
DECIMAL(10,2)
is not the best in most cases. Consider FLOAT
for non-money (such as distance
). Consider fewer digits; that handles up to 99,999,999.99, and takes 5 bytes. customer_from
and customer_to
. Have a Customers
table, with both in it. latidud DECIMAL(6,4)
and longitud (7,4)
, for a total of 7 bytes. (These give 16m/52ft resolution.) Result
After those suggestions, the 50M-row table will be very much smaller, and run very much faster in both engines. Then run the comparison again.
链接地址: http://www.djcxy.com/p/52238.html上一篇: Google提交时自定义搜索