0

I have used recursion to search for particular type of file (for example .pdf files is used here). My recursion algorithm searches for all subfolder. However I found that it lacks performance when there is too many sub-folder. sub-sub-folder, sub-sub-sub-folder. I want to know if there is better algorithm for file searching.

Below is my recursion code for file searching. I have used .pdf file as an example

import java.io.File;
public class FInd {
    public static void main(String[] args) {
        File f = new File("D:/");
        find(f);    
    }
    public static void find(File f){    
        File []list = f.listFiles();
        try{
            for(int i=0;i<list.length && list.length>0;i++){    
                if(list[i].isFile() && (list[i].getName().contains(".pdf")) ||
                        list[i].getName().contains(".PDF"))
                    System.out.println(list[i].getAbsolutePath());
                if(list[i].isDirectory()) find(list[i]);
            }
        }catch(Exception e){    
        }
    }
}

This code is somewhat faster or equal to when compared to search option in file explorer. I want to know any faster algorithm than this

6
  • 3
    Recursion is not an algorithm, its an implementation choice. It seems that you have a search space and you need to explore it to find the file. So, unless there is a name-wise relationship between folders, you need to explore the whole space. Commented Apr 5, 2017 at 16:32
  • 1
    stackoverflow.com/questions/4852531/… Commented Apr 5, 2017 at 16:32
  • Use Files.walkFileTree if you are using jdk7 or more docs.oracle.com/javase/tutorial/essential/io/find.html Commented Apr 5, 2017 at 16:38
  • 2
    Possible duplicate of Better file search algorithm than creating a list of files Commented Apr 5, 2017 at 16:40
  • I don't think changing search pattern (aka algorithm) will make a difference to the overall performance of scanning entire directory hierarchy. Performance is entirely constrained by disk performance for actually reading the directory structure. No way to improve that, other than replacing drive with SSD. Commented Apr 5, 2017 at 16:41

4 Answers 4

2

try the iterative way

public class Find {
public static void main(String[] args) {

  File f = new File("D:/");

  Stack stack = new Stack<File>();

  stack.push(f);

  while (!stack.empty())
  {    
      f = (File) stack.pop();
      File []list = f.listFiles();
      try{
          for(int i=0;i<list.length && list.length>0;i++){    
              if(list[i].isFile() && (list[i].getName().contains(".pdf")) ||
                      list[i].getName().contains(".PDF"))
                  System.out.println(list[i].getAbsolutePath());
              if(list[i].isDirectory()) stack.push(list[i]);
          }
      }catch(Exception e){    
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. This was helpful. Just small correction : f = stack.pop(); File []list = f2.listFiles(); should be replaced with f = (File) stack.pop(); File []list = f.listFiles();
I'm glad it was helpful. I will edit the the answer (I have no Idea why I put f2 instead of f :p).
2

the problem with threading is that launching them has a cost, so the increase in file browsing + recursion has to be better than the additional cost of N folders/threads.

This is a simple method that uses a loop (the classical replacement for recursion)

static boolean avoidRecursion(String target){
    File currentDir = new File(System.getProperty("user.home"));
    Stack<File> dirs = new Stack<File>();
    dirs.push(currentDir);

    do{
        for(File f : dirs.pop().listFiles()){
            if (f.isDirectory())
                dirs.push(f);
            else{
                if (f.getName().equals(target))
                    return true;
            }
        }
    }while(!dirs.isEmpty());
    return false;
}

Measure both approaches and choose the option that is faster

1 Comment

This looks interesting. I will give it a try
1

Probaply you could use multithreading...

Each folder you enter, you start at new thread... Even if you have more threads than your CPU, it ist not a Problem since Windows Can run much more threads...

4 Comments

Unlikely to improve performance. More likely to reduce performance, since performance bottleneck is reading from disk, and trying to read from two places on disk at the same time will just kill performance. (disclaimer: assuming non-SSD disk)
Also, you should be very careful and limit the number of threads. Creating thousands of threads is likely to hang up your entire OS.
@JaroslawPawlak you are right, but keep in mind, that the thread stops after finishing an folder :)... Mostly, folders are not so long (File content) which why threads are closing really fast too...
But He could also create an specific amount of threads and handle theese...
1

Use the Files.walk() method which returns a Java8 Stream. You can parallelize that calculation quite easily by using a parallel stream.

Use the following convenient idiom in a try with resources method:

try(Stream vals = Files.walk(rootPath)){ .... }

In the rootPath, you could use Paths.get("root location") to actually get to the root location.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.