0

I am trying to get files created in directory between 7AM-7PM.

ERROR_FOLDER = os.path.join(ROOT_FOLDER,"Folder","archive")

for csv in os.listdir(ERROR_FOLDER):
    path = os.path.join(ERROR_FOLDER,csv)      
    filetime = datetime.fromtimestamp(
            os.path.getctime(path))   
    if (date_start < filetime < date_end):
        files.append(csv)

Its working ok-ish spending 13 sec to return the result from 6000 item. Is there any faster way you would suggest to me? Also how to turn it into list-comprehension?

 [os.path.basename(x) for x in glob.glob(os.path.join(ROOT_FOLDER,"Excavation","Archive","*")) if (date_start < filetime < date_end)]

doesnt seem to work

4
  • try with a list comprehension instead of append. It's in general faster, but I am not sure which is the bottleneck here. Commented Feb 21, 2021 at 19:23
  • Also comparing timestamps instead of datetime could help. Looks like you are comparing full dates anyway, not only the hours Commented Feb 21, 2021 at 19:30
  • @Valentino [os.path.basename(x) for x in glob.glob(os.path.join(ROOT_FOLDER,"Excavation","Archive","*")) if (date_start < filetime < date_end)] i tried this but not working, can you see any issue here ? Commented Feb 21, 2021 at 19:35
  • Better to add the code in your question, is not really readable in the comment. And add the error you get, please. Just "not working" is not really useful for debugging. Commented Feb 21, 2021 at 20:12

2 Answers 2

1

Use os.scandir instead, and get the file times from it instead of calling getctime() on each file.

Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.

Sign up to request clarification or add additional context in comments.

Comments

1

One inefficiency is that you take N date-time values and convert them, just to compare them to two other values. Convert those two other values so that you can avoid those N conversions.

Another thing is that you build this list at all. Just yield each path in turn instead of storing them in a list, which (I assume) is getting processed in sequence anyway.

This also gives way to another optimization and that is to do the scanning of the directory asynchronously. That allows you to block just one thread of execution doing the harddisk IO while a second thread is processing the results already.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.