You need the Executor threadpool and some classes:
A Fsearch class. This contains your container for the results. It also has a factory method that returns an Ffolder, counting up a 'foldersOutstanding' counter, and an OnComplete that counts them back in by counting down 'foldersOutstanding':
You need a Ffolder class to represent a folder and is passed its path as ctor parameter. It should have a run method that iterates is folder path that is supplied as a parameter along with the Fsearch instance.
Create and load up an Fsearch with the root folder and fire it into the pool. It creates a folder class, passing its root path and itslef, and loads that on. Then it waits on a 'searchComplete' event.
That first Ffolder iterates its folder, creating, (or depooling), DFiles for each 'ordinary' file and pushing them into the Fsearch container. If it finds a folder, it gets another Ffolder from the Fsearch, loads it with the new path and loads that onto the pool as well.
When an Ffolder has finished iterating its own folder, it calls the OnComplete' method of the Fsearch. The OnComplete is counting down the 'foldersOutstanding' and, when it is decremented to zero, all the folders have been scanned and files processed. The thread that did this final decrement signals the searchComplete event so that the Fsearch can continue. The Fsearch could call some 'OnSearchComplete' event that is was passed when it was created.
It goes almost without saying that the Fsearch callbacks must be thread-safe.
Such an exercise does not have to be academic.
The container in the Fsearch, where all the DFiles go, could be a producer-consumer queue. Other threads could start processing the DFiles as the search is in progress, instead of waiting until the end.
I have done this before, (but not in Java), - it works OK. A design like this can easily do multiple searches in parallel - it's fun to issue an Fsearch for several hard drive roots at once - the clattering noise is impressive
Forgot to say - the big gain from such a design is when searching several networked drives with high latency. They can all be searched in parallel. The speedup over a miserable single-threaded sequential search is many times. By the time a single-thread seach has finished queueing up the DFiles for one drive, the multi-search has searched four drives and already had most of its DFiles processed.
NOTE:
1) If implemented strictly as above, the threadpool thread taht executes the FSearch is blocked on the 'OnSearchComplete' event until the search is over, so 'using up' one thread. There must therefore be more threadpool threads than live Fsearch instances else there will be no threads left over to do the actual searching, (yes, of course that happened to me:).
2) Unlike a single-thread search, results don't come back in any sort of predictable or repeatable order. If, for example, you signal your results as they come in to a GUI thread and try to display them in a TreeView, the path through the treeview component will likely be different for each result, updating the visual treeview will be lengthy. This can result in the Windows GUI input queue getting full, (10000 limit), because the GUI cannot keep up or, if using object pools for the Ffolder etc, the pool can empty, slugging performance and, if the GUI thread tries to get an Ffolder to issue a new search from the empty pool and so blocks, all-round deadlock with all Ffolder instances stuck in Windows messages, (yes, of course that happened to me:). It's best to not let such things happen!
Example - something like this I found - it's quite old Windows/C++ Builder code but it still works - I tried it on my Rad Studio 2009 , removed all the legacy/proprietary gunge and added some extra comments. All it does here is count up the folders and files, just as an example. There are only a couple of 'runnable' classes The myPool->submit() methods loads a runnable onto the pool and it's run() method gets executed. The base ctor has an 'OnComplete' EventHander, (TNotifyEvent), delgate parameter - that gets fired by the pool thread when the run() method returns.
//******************************* CLASSES ********************************
class DirSearch; // forward dec.
class ScanDir:public PoolTask{
String FmyDirPath;
DirSearch *FmySearch;
TStringList *filesAndFolderNames;
public: // Counts for FmyDirPath only
int fileCount,folderCount;
ScanDir(String thisDirPath,DirSearch *mySearch);
void run(); // an override - called by pool thread
};
class DirSearch:public PoolTask{
TNotifyEvent FonComplete;
int dirCount;
TEvent *searchCompleteEvent;
CRITICAL_SECTION countLock;
public:
String FdirPath;
int totalFileCount,totalFolderCount; // Count totals for all ScanDir's
DirSearch(String dirPath, TNotifyEvent onComplete);
ScanDir* getScanDir(String path); // get a ScanDir and inc's count
void run(); // an override - called by pool thread
void __fastcall scanCompleted(TObject *Sender); // called by ScanDir's
};
//******************************* METHODS ********************************
// ctor - just calls base ctor an initialzes stuff..
ScanDir::ScanDir(String thisDirPath,DirSearch *mySearch):FmySearch(mySearch),
FmyDirPath(thisDirPath),fileCount(0),folderCount(0),
PoolTask(0,mySearch->scanCompleted){};
void ScanDir::run() // an override - called by pool thread
{
// fileCount=0;
// folderCount=0;
filesAndFolderNames=listAllFoldersAndFiles(FmyDirPath); // gets files
for (int index = 0; index < filesAndFolderNames->Count; index++)
{ // for all files in the folder..
if((int)filesAndFolderNames->Objects[index]&faDirectory){
folderCount++; //do count and, if it's a folder, start another ScanDir
String newFolderPath=FmyDirPath+"\\"+filesAndFolderNames->Strings[index];
ScanDir* newScanDir=FmySearch->getScanDir(newFolderPath);
myPool->submit(newScanDir);
}
else fileCount++; // inc 'ordinary' file count
}
delete(filesAndFolderNames); // don't leak the TStringList of filenames
};
DirSearch::DirSearch(String dirPath, TNotifyEvent onComplete):FdirPath(dirPath),
FonComplete(onComplete),totalFileCount(0),totalFolderCount(0),dirCount(0),
PoolTask(0,onComplete)
{
InitializeCriticalSection(&countLock); // thread-safe count
searchCompleteEvent=new TEvent(NULL,false,false,"",false); // an event
// for DirSearch to wait on till all ScanDir's done
};
ScanDir* DirSearch::getScanDir(String path)
{ // up the dirCount while providing a new DirSearch
EnterCriticalSection(&countLock);
dirCount++;
LeaveCriticalSection(&countLock);
return new ScanDir(path,this);
};
void DirSearch::run() // called on pool thread
{
ScanDir *firstScanDir=getScanDir(FdirPath); // get first ScanDir for top
myPool->submit(firstScanDir); // folder and set it going
searchCompleteEvent->WaitFor(INFINITE); // wait for them all to finish
}
/* NOTE - this is a DirSearch method, but it's called by the pool threads
running the DirScans when they complete. The 'DirSearch' pool thread is stuck
on the searchCompleteEvent, waiting for all the DirScans to complete, at which
point the dirCount will be zero and the searchCompleteEvent signalled.
*/
void __fastcall DirSearch::scanCompleted(TObject *Sender){ // a DirSearch done
ScanDir* thiscan=(ScanDir*)Sender; // get the instance that completed back
EnterCriticalSection(&countLock); // thread-safe
totalFileCount+=thiscan->fileCount; // add DirSearch countst to totals
totalFolderCount+=thiscan->folderCount;
dirCount--; // another one gone..
LeaveCriticalSection(&countLock);
if(!dirCount) searchCompleteEvent->SetEvent(); // if all done, signal
delete(thiscan); // another one bites the dust..
};
..and here it is, working:
