In this case MyISAM is dramatically faster than InnoDB in mysql

I have been writing the results from an algorithm that calculates distances between customers in a InnoDB table. For example if my customers were A, B, C and D, the table in the database looks like this, among other columns:

From | To    | Distance
  A     B        344
  A     C        274
  A     D        182
  B     C        338

And so on... It is a lot of rows I think I will hit 50 million.

The other columns are product_type and value. Those tell me how much the customer B (customer_to in the columns) buys of that product_type. That means that I have each pair multiple times depending on how many product_types the customer B buys.

I needed to to a query to group each customer with the products his neighbors buys and the value. The query looks like this:

select customer_from, product_type, avg(value) as opportunity
from customer_distances
where distance < 500
group by customer_from, product_type
order by opportunity desc; 

The innodb table could not answer me that query. Despite I changed the net_read_timeout to 28800, the mysql connection was lost during the query.

I tough it had something to do with innodb build for transactional processing and not for intensive queries. So I created a new table with MyIsam as engine and insert-select all the records from the innodb table.

As expected, the select was very fast (70 segs) and all other selects like count( distinct customer_from), where almost instantaneous.

Just for curiosity I tried to continue the process of inserting the distances in the myisam table. It was a surprise for me when the program started to run at least 100 times faster than when it was working on the innodb table -for INSERTS!

For each customer the program inserts something like 3000 rows (one for each neighbor for each product_type. Something like 300 neighbors and 10 product_types per customer). With the innodb table inserting a single customer took something between 40 and 60 seconds (aprox. 3000 rows). With the myisam table, it takes 1 second to insert 3 customers (9000 rows aprox).

Some extra information:

  • The mysql database is in my PC (localhost).
  • The program written in java and is running from my pc.
  • I'm using prepared statements and I only change the data between each row and the next. This is related to this question Why is myisam storage engine is faster than Innodb storage engine
  • So in summary the question is: Why is MyISAM that fast with insert statements? What do you think?

    EDIT 1: I'm adding the create statements for both tables, the innodb and myisam. EDIT 2: I deleted some unuseful information and formated a little bit here and there.

    /* INNODB TABLE */
    CREATE TABLE `customer_distances` (
      `customer_from` varchar(50) NOT NULL,
      `customer_from_type` varchar(50) DEFAULT NULL,
      `customer_from_segment` varchar(50) DEFAULT NULL,
      `customer_from_district` int(11) DEFAULT NULL,
      `customer_from_zone` int(11) DEFAULT NULL,
      `customer_from_longitud` decimal(15,6) DEFAULT NULL,
      `customer_from_latitud` decimal(15,6) DEFAULT NULL,
      `customer_to` varchar(50) NOT NULL,
      `customer_to_type` varchar(50) DEFAULT NULL,
      `customer_to_segment` varchar(50) DEFAULT NULL,
      `customer_to_district` int(11) DEFAULT NULL,
      `customer_to_zone` int(11) DEFAULT NULL,
      `customer_to_longitud` decimal(15,6) DEFAULT NULL,
      `customer_to_latitud` decimal(15,6) DEFAULT NULL,
      `distance` decimal(10,2) DEFAULT NULL,
      `product_business_line` varchar(50) DEFAULT NULL,
      `product_type` varchar(50) NOT NULL,
      `customer_from_liters` decimal(10,2) DEFAULT NULL,
      `customer_from_dollars` decimal(10,2) DEFAULT NULL,
      `customer_from_units` decimal(10,2) DEFAULT NULL,
      `customer_to_liters` decimal(10,2) DEFAULT NULL,
      `customer_to_dollars` decimal(10,2) DEFAULT NULL,
      `customer_to_units` decimal(10,2) DEFAULT NULL,
      `liters_opportunity` decimal(10,2) DEFAULT NULL,
      `dollars_opportunity` decimal(10,2) DEFAULT NULL,
      `units_oportunity` decimal(10,2) DEFAULT NULL,
      PRIMARY KEY (`cliente_desde`,`cliente_hasta`,`grupo`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    /* MYISAM TABLE */
    CREATE TABLE `customer_distances` (
      `customer_from` varchar(50) NOT NULL,
      `customer_from_type` varchar(50) DEFAULT NULL,
      `customer_from_segment` varchar(50) DEFAULT NULL,
      `customer_from_district` int(11) DEFAULT NULL,
      `customer_from_zone` int(11) DEFAULT NULL,
      `customer_from_longitud` decimal(15,6) DEFAULT NULL,
      `customer_from_latitud` decimal(15,6) DEFAULT NULL,
      `customer_to` varchar(50) NOT NULL,
      `customer_to_type` varchar(50) DEFAULT NULL,
      `customer_to_segment` varchar(50) DEFAULT NULL,
      `customer_to_district` int(11) DEFAULT NULL,
      `customer_to_zone` int(11) DEFAULT NULL,
      `customer_to_longitud` decimal(15,6) DEFAULT NULL,
      `customer_to_latitud` decimal(15,6) DEFAULT NULL,
      `distance` decimal(10,2) DEFAULT NULL,
      `product_business_line` varchar(50) DEFAULT NULL,
      `product_type` varchar(50) NOT NULL,
      `customer_from_liters` decimal(10,2) DEFAULT NULL,
      `customer_from_dollars` decimal(10,2) DEFAULT NULL,
      `customer_from_units` decimal(10,2) DEFAULT NULL,
      `customer_to_liters` decimal(10,2) DEFAULT NULL,
      `customer_to_dollars` decimal(10,2) DEFAULT NULL,
      `customer_to_units` decimal(10,2) DEFAULT NULL,
      `liters_opportunity` decimal(10,2) DEFAULT NULL,
      `dollars_opportunity` decimal(10,2) DEFAULT NULL,
      `units_oportunity` decimal(10,2) DEFAULT NULL,
      PRIMARY KEY (`cliente_desde`,`cliente_hasta`,`grupo`)
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
    

    Inserts

  • InnoDB, by default, "commits" each INSERT immediately. This can be remedied by clumping 100-1000 rows at a time.
  • Batching inserts will speed up both MyISAM and InnoDB - perhaps by 10x.
  • Learn about autocommit and BEGIN..COMMIT .
  • Select

  • InnoDB consumes more disk space than MyISAM -- typically 2x-3x; this impacts table scans, which you are probably
  • For that query, a composite index on (customer_from, product_type, distance) would probably help both engines.
  • Tuning

  • When running just MyISAM, set key_buffer_size to 20% of RAM and innodb_buffer_pool_size=0 .
  • When running just InnoDB, set key_buffer_size to only 10M and innodb_buffer_pool_size to 70% of RAM.
  • Normalization and saving space

  • Smaller --> more cacheable --> less I/O --> faster (in either engine)
  • DECIMAL(10,2) is not the best in most cases. Consider FLOAT for non-money (such as distance ). Consider fewer digits; that handles up to 99,999,999.99, and takes 5 bytes.
  • It is usually not a good idea to have replicated columns, such as the 10 columns of customer_from and customer_to . Have a Customers table, with both in it.
  • Each of your latitud and longitud are 7 bytes and have unnecessary resolution. Suggest latidud DECIMAL(6,4) and longitud (7,4) , for a total of 7 bytes. (These give 16m/52ft resolution.)
  • Result

    After those suggestions, the 50M-row table will be very much smaller, and run very much faster in both engines. Then run the comparison again.

    链接地址: http://www.djcxy.com/p/52238.html

    上一篇: Google提交时自定义搜索

    下一篇: 在这种情况下,MyISAM比MySQL中的InnoDB快得多