缓慢的SQLite读取速度（每秒100条记录）

2018-06-06 09:01:42

我有一个大的SQLite数据库（〜134 GB），它有多个表，每个表有14列，大约3.3亿条记录和4个索引。数据库上唯一使用的操作是“Select *”，因为我需要所有列（没有插入或更新）。当我查询数据库时，如果结果集很大（获取约18,000条记录需要160秒），则响应时间会很慢。

我多次改进了索引的使用，这是我得到的最快响应时间。

我将数据库作为32 GB RAM的服务器上的Web应用程序的后端数据库运行。

有没有办法使用RAM（或其他）来加速查询过程？

这是执行查询的代码。

async.each(proteins,function(item, callback) {

   `PI[item] = [];                      // Stores interaction proteins for all query proteins
    PS[item] = [];                      // Stores scores for all interaction proteins
    PIS[item] = [];                     // Stores interaction sites for all interaction proteins
    var sites = {};                     // a temporarily holder for interaction sites

var query_string = 'SELECT * FROM ' + organism + PIPE_output_table +
        ' WHERE ' + score_type + ' > ' + cutoff['range'] + ' AND (protein_A = "' + item + '" OR protein_B = "' + item '") ORDER BY PIPE_score DESC';

db.each(query_string, function (err, row) {

        if (row.protein_A == item) {
            PI[item].push(row.protein_B);

            // add 1 to interaction sites to represent sites starting from 1 not from 0
            sites['S1AS'] = row.site1_A_start + 1;
            sites['S1AE'] = row.site1_A_end + 1;
            sites['S1BS'] = row.site1_B_start + 1;
            sites['S1BE'] = row.site1_B_end + 1;

            sites['S2AS'] = row.site2_A_start + 1;
            sites['S2AE'] = row.site2_A_end + 1;
            sites['S2BS'] = row.site2_B_start + 1;
            sites['S2BE'] = row.site2_B_end + 1;

            sites['S3AS'] = row.site3_A_start + 1;
            sites['S3AE'] = row.site3_A_end + 1;
            sites['S3BS'] = row.site3_B_start + 1;
            sites['S3BE'] = row.site3_B_end + 1;

            PIS[item].push(sites);
            sites = {};
        }
}

您发布的查询不使用变量。

它将始终返回相同的结果：所有具有空分值的行的protein列等于其protein_a或protein_b列。然后，您必须在Javascript中过滤所有这些额外的行，获取比您需要更多的行。

这是为什么...

如果我正确理解这个查询，你有WHERE Score > [Score] 。我从来没有遇到过这种语法，所以我查了一下。

[关键字]方括号内的关键字是一个标识符。这不是标准的SQL。此引用机制由MS Access和SQL Server使用，并包含在SQLite中以实现兼容性。

标识符就像列或表名，而不是变量。

这意味着这...

SELECT * FROM [TABLE]
WHERE Score > [Score] AND
      (protein_A = [Protein] OR protein_B = [Protein])
ORDER BY [Score] DESC;

与此相同...

SELECT * FROM `TABLE`
WHERE Score > Score AND
      (protein_A = Protein OR protein_B = Protein)
ORDER BY Score DESC;

您从不将任何变量传递给查询。 它会一直返回相同的东西。

当你运行它时，可以在这里看到它。

db.each(query_string, function (err, row) {

既然你正在检查每种蛋白质是否与自身相同（或者与自身非常相似），你可能会抓取每一行。这就是为什么你必须再次过滤所有行。这就是您的查询速度如此之慢的原因之一。

    if (row.protein_A == item) {

但！ WHERE Score > [Score]永远不会是真的， 除了null之外 ，一件事情不可能比它本身更大！三元逻辑很奇怪。所以只有当Score是空的时候才可以。

因此，您要返回所有分数为空的行，并且protein列等于protein_a或protein_b 。这比你需要的行数多得多，我猜你有很多行的空分数。

你的查询应该包含变量（我假设你正在使用node-sqlite3），并在你执行查询时传入它们的值。

var query = "                                              
    SELECT * FROM `TABLE`                                  
    WHERE Score > $score AND                               
          (protein_A = $protein OR protein_B = $protein)   
    ORDER BY Score DESC;                                   
";
var stmt = db.prepare(query);
stmt.each({$score: score, $protein: protein}, function (err, row) {
    PI[item].push(row.protein_B);
    ...
});

链接地址: http://www.djcxy.com/p/19821.html

上一篇: slow SQLite read speed (100 records a second)

下一篇: Sqlite insert performance for smartphone