使用Node.js和CoffeeScript遍历文件中的行

2018-06-18 13:04:46

我使用CoffeScript和以下函数使用Node.js来遍历文件中的行：

each_line_in = (stream, func) ->
    fs.stat stream.path, (err, stats) ->
        previous = []
        stream.on 'data', (d) ->
            start = cur = 0
            for c in d
                cur++
                if c == 10
                    previous.push(d.slice(start, cur))
                    func previous.join('')
                    previous = []
                    start = cur
            previous.push(d.slice(start, cur)) if start != cur

有没有更好的方式做到这一点，而无需将整个文件读入内存？ 通过“更好”，我的意思是更简洁，内置Node.js，更快，或更正确。如果我在写Python，我会这样做：

def each_line_in(file_obj, func):
    [ func(l) for l in file_obj ]

我看到了这个使用Peteris Krumin的“懒惰”模块的问题，但是我想完成这个不添加外部依赖关系的问题。

这是一个相当有效的方法：

eachLineIn = (filePath, func) ->

  blockSize = 4096
  buffer = new Buffer(blockSize)
  fd = fs.openSync filePath, 'r'
  lastLine = ''

  callback = (err, bytesRead) ->
    throw err if err
    if bytesRead is blockSize
      fs.read fd, buffer, 0, blockSize, null, callback

    lines = buffer.toString('utf8', 0, bytesRead).split 'n'
    lines[0] = lastLine + lines[0]
    [completeLines..., lastLine] = lines
    func(line) for line in completeLines
    return

  fs.read fd, buffer, 0, blockSize, 0, callback
  return

您应该在硬件和操作系统上进行基准测试，以查找大型文件的最佳blockSize值。

请注意，这假定文件行仅由n划分。如果你不确定你的文件使用什么，你应该使用正则表达式来split ，例如：

.split(/(rn)|r|n/)

这是一个使用ReadStream的简洁版本，例如stream = fs.createReadStream(filepath)

for_each_line = (stream, func) ->
  last = ""
  stream.on('data', (chunk) ->
    lines = (last + chunk).split("n")
    [lines...,last] = lines
    for line in lines
      func(line)
  )
  stream.on('end', () ->
    func(last)
  )

createReadStream选项可以根据需要设置缓冲区大小和编码。

这会剥离' n'，但如果需要的话可以将其添加回去。它还处理最后一行，但如果文件以' n'结尾，则该行将为空。

这三个版本的时间差别不大。

链接地址: http://www.djcxy.com/p/52291.html

上一篇: Iterate through lines in a file with Node.js and CoffeeScript

下一篇: Read a file one character at a time in node.js?