0

I am currently using os.walk to navigate through all subfolders and files in a massive Network drive directory, However, Whenever my VPN disconnects, The for loop fails. Next day when I re-run my code, I would like to resume from the last file that was processed. What modifications should I make in my code below?

import os

directory = '//DirectoryName/FolderName'

for root, dirs, files in os.walk((os.path.normpath(directory)), topdown=False):
  for name in files:
        Source_File = os.path.join(root,name)
        #This loads the file to s3 bucket
        s3_client.upload_file(Source_File, bucket, Target_File)

The directory is really massive, Has hundreds of sub-folders, and thousands of files in total.

7
  • Keep track of the files you already processed in a separate file Commented Sep 16, 2022 at 17:56
  • Are you sure, what you do is legal? Commented Sep 16, 2022 at 17:57
  • @treuss, What do you mean? I am doing this work as a part of my job. Commented Sep 16, 2022 at 18:40
  • @rdas, That is a good point. But how do I resume from where I left off the previous day? Commented Sep 16, 2022 at 18:40
  • You read the file at the start of the script loading all the file names into a set or something similar. Then when walking the directory tree, you can skip any files which are already in the set. Commented Sep 16, 2022 at 18:44

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.