slow SQLite read speed (100 records a second)

I have a large SQLite database (~134 GB) that has multiple tables each with 14 columns, about 330 million records, and 4 indexes. The only operation used on the database is "Select *" as I need all the columns(No inserts or updates). When I query the database, the response time is slow when the result set is big (takes 160 seconds for getting ~18,000 records).

I have improved the use of indexes multiple times and this is the fastest response time I got.

I am running the database as a back-end database for a web application on a server with 32 GB of RAM.

is there a way to use RAM (or anything else) to speed up the query process?

Here is the code that performs the query.

async.each(proteins,function(item, callback) {

   `PI[item] = [];                      // Stores interaction proteins for all query proteins
    PS[item] = [];                      // Stores scores for all interaction proteins
    PIS[item] = [];                     // Stores interaction sites for all interaction proteins
    var sites = {};                     // a temporarily holder for interaction sites

var query_string = 'SELECT * FROM ' + organism + PIPE_output_table +
        ' WHERE ' + score_type + ' > ' + cutoff['range'] + ' AND (protein_A = "' + item + '" OR protein_B = "' + item '") ORDER BY PIPE_score DESC';

db.each(query_string, function (err, row) {

        if (row.protein_A == item) {
            PI[item].push(row.protein_B);

            // add 1 to interaction sites to represent sites starting from 1 not from 0
            sites['S1AS'] = row.site1_A_start + 1;
            sites['S1AE'] = row.site1_A_end + 1;
            sites['S1BS'] = row.site1_B_start + 1;
            sites['S1BE'] = row.site1_B_end + 1;

            sites['S2AS'] = row.site2_A_start + 1;
            sites['S2AE'] = row.site2_A_end + 1;
            sites['S2BS'] = row.site2_B_start + 1;
            sites['S2BE'] = row.site2_B_end + 1;

            sites['S3AS'] = row.site3_A_start + 1;
            sites['S3AE'] = row.site3_A_end + 1;
            sites['S3BS'] = row.site3_B_start + 1;
            sites['S3BE'] = row.site3_B_end + 1;

            PIS[item].push(sites);
            sites = {};
        }
}

The query you posted uses no variables.

It will always return the same thing: all the rows with a null score whose protein column is equal to its protein_a or protein_b column. You're then having to filter all those extra rows in Javascript, fetching a lot more rows than you need to.

Here's why...


If I'm understanding this query correctly, you have WHERE Score > [Score] . I've never encountered this syntax before, so I looked it up.

[keyword] A keyword enclosed in square brackets is an identifier. This is not standard SQL. This quoting mechanism is used by MS Access and SQL Server and is included in SQLite for compatibility.

An identifier is something like a column or table name, not a variable.

This means that this...

SELECT * FROM [TABLE]
WHERE Score > [Score] AND
      (protein_A = [Protein] OR protein_B = [Protein])
ORDER BY [Score] DESC;

Is the same as this...

SELECT * FROM `TABLE`
WHERE Score > Score AND
      (protein_A = Protein OR protein_B = Protein)
ORDER BY Score DESC;

You never pass any variables to the query. It will always return the same thing.

This can be seen here when you run it.

db.each(query_string, function (err, row) {

Since you're checking that each protein is equal to itself (or something very like itself), you're likely fetching every row. And it's why you have to filter all the rows again. And that is one of the reasons why your query is so slow.

    if (row.protein_A == item) {

BUT! WHERE Score > [Score] will never be true, a thing cannot be greater than itself except for null ! Trinary logic is weird. So only if Score is null can that be true.

So you're returning all the rows whose score is null and the protein column is equal to protein_a or protein_b . This is a lot more rows than you need, I guess you have a lot of rows with null scores.


Your query should incorporate variables (I'm assuming you're using node-sqlite3) and pass in their values when you execute the query.

var query = "                                              
    SELECT * FROM `TABLE`                                  
    WHERE Score > $score AND                               
          (protein_A = $protein OR protein_B = $protein)   
    ORDER BY Score DESC;                                   
";
var stmt = db.prepare(query);
stmt.each({$score: score, $protein: protein}, function (err, row) {
    PI[item].push(row.protein_B);
    ...
});
链接地址: http://www.djcxy.com/p/19822.html

上一篇: SQLite插入包装在一个事务中:没有速度改进

下一篇: 缓慢的SQLite读取速度(每秒100条记录)