Efficient way to iterate over list of files

I am searching for an efficient way to iterate over thousands of files in one or more directories.

The only way to iterate over files in a directory seems to be File.list*() functions. These functions effectively load the entire list of files in some sort of Collection and then let the user iterate over it. This seems to be impractical in terms of time/memory consumption. I tried looking at commons-io and other similar tools. but they all ultimately call File.list*() somewhere inside. JDK7's walkFileTree() came close, but I don't have control over when to pick the next element.

I have over 150,000 files in a directory and after many -Xms/-Xmm trial runs I got rid of memory overflow issues. But the time it takes to fill the array hasn't changed.

I wish to make some sort of an Iterable class that uses opendir()/closedir() like functions to lazily load file names as required. Is there a way to do this?

Update:

Java 7 NIO.2 supports file iteration via java.nio.file.DirectoryStream. It is an Iterable class. As for JDK6 and below, the only option is File.list*() methods.


Here is an example of how to iterate over directory entries without having to store 159k of them in an array. Add error/exception/shutdown/timeout handling as necessary. This technique uses a secondary thread to load a small blocking queue.

Usage is:

FileWalker z = new FileWalker(new File(""), 1024); // start path, queue size
Iterator<Path> i = z.iterator();
while (i.hasNext()) {
  Path p = i.next();
}

The example:

public class FileWalker implements Iterator<Path> {
  final BlockingQueue<Path> bq;
  FileWalker(final File fileStart, final int size) throws Exception {
  bq = new ArrayBlockingQueue<Path>(size);
  Thread thread = new Thread(new Runnable() {
    public void run() {
      try {
        Files.walkFileTree(fileStart.toPath(), new FileVisitor<Path>() {
          public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
            return FileVisitResult.CONTINUE;
          }
          public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
            try {
              bq.offer(file, 4242, TimeUnit.HOURS);
            } catch (InterruptedException e) {
              e.printStackTrace();
            }
            return FileVisitResult.CONTINUE;
          }
          public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException {
            return FileVisitResult.CONTINUE;
          }
          public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
            return FileVisitResult.CONTINUE;
          }
        });
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
  });
  thread.setDaemon(true);
  thread.start();
  thread.join(200);
}
public Iterator<Path> iterator() {
  return this;
}
public boolean hasNext() {
  boolean hasNext = false;
  long dropDeadMS = System.currentTimeMillis() + 2000;
  while (System.currentTimeMillis() < dropDeadMS) {
    if (bq.peek() != null) {
      hasNext = true;
      break;
    }
    try {
      Thread.sleep(1);
    } catch (InterruptedException e) {
      e.printStackTrace();
    }
  }
  return hasNext;
}
public Path next() {
  Path path = null;
  try {
    path = bq.take();
  } catch (InterruptedException e) {
    e.printStackTrace();
  }
  return path;
}
public void remove() {
  throw new UnsupportedOperationException();
}
}

This seems to be impractical in terms of time/memory consumption.

Even 150,000 file won't consume an impractical amount of memory.

I wish to make some sort of an Iterable class that uses opendir()/closedir() like functions to lazily load file names as required. Is there a way to do this?

You would need to write or find a native code library in order to access those C functions. It is probably going to introduce more problems than it solves. My advice would be to just use File.list() and increase the heap size.


Actually, there's another hacky alternative. Use System.exec to run the ls command (or the windows equivalent) and write your iterator to read and parse the command output text. That avoids the nastiness associated with using native libraries from Java.


你能否通过文件类型对你的装载进行分组以缩小批量范围?

链接地址: http://www.djcxy.com/p/58558.html

上一篇: RxJS:可观察对象和单个观察者的递归列表

下一篇: 循环遍历文件列表的有效方法