Node.js：计算文件中的行数

2018-06-18 13:00:37

我有大的文本文件，其范围在30MB到10GB之间。如何使用Node.js文件中的行数？

我有这些限制：

整个文件不需要写入内存

子进程不需要执行任务

解决方案不使用wc：

var i;
var count = 0;
require('fs').createReadStream(process.argv[2])
  .on('data', function(chunk) {
    for (i=0; i < chunk.length; ++i)
      if (chunk[i] == 10) count++;
  })
  .on('end', function() {
    console.log(count);
  });

它速度较慢，但并不如你所期望的那么多 - 对于140M +文件包括0.6s，包括node.js加载和启动时间

>time node countlines.js video.mp4 
619643

real    0m0.614s
user    0m0.489s
sys 0m0.132s

>time wc -l video.mp4 
619643 video.mp4
real    0m0.133s
user    0m0.108s
sys 0m0.024s

>wc -c video.mp4
144681406  video.mp4

你可以这样做，因为评论建议使用wc

var exec = require('child_process').exec;

exec('wc /path/to/file', function (error, results) {
    console.log(results);
});

我们可以使用indexOf让虚拟机找到换行符：

function countFileLines(filePath){
  return new Promise((resolve, reject) => {
  let lineCount = 0;
  fs.createReadStream(filePath)
    .on("data", (buffer) => {
      let idx = -1;
      lineCount--; // Because the loop will run once for idx=-1
      do {
        idx = buffer.indexOf(10, idx+1);
        lineCount++;
      } while (idx !== -1);
    }).on("end", () => {
      resolve(lineCount);
    }).on("error", reject);
  });
};

这个解决方案的功能是使用.indexOf找到第一个换行符的位置。它增加lineCount ，然后找到下一个位置。 .indexOf的第二个参数告诉从哪里开始寻找换行符。这样我们就跳过了大块的缓冲区。 while循环将为每个换行运行一次，再加一个。

我们让节点运行时搜索我们在较低级别上实现的应该更快。

在我的系统上，这大约是在大文件（111 MB）上对缓冲区长度运行for循环的两倍。

链接地址: http://www.djcxy.com/p/52283.html

上一篇: Node.js: Count the number of lines in a file

下一篇: Read a text file using Node.js?