0

I have a script that I have almost 100% complete however there is just one more step that I can not figure out. My script currently checks the destination to see if the file already exists and if it does then the file from the source location is not moved over. The problem I am running into is that the code doesn't check all of the subdirectories as well just the root directory.

I am using os.walk to go through all of the files in the source folder but am not sure how to os.walk the destination folder and the source folder in conjunction with each other.

import time
import sys
import logging
import logging.config


def main():
    purge_files

def move_files(src_file):

    try:
        #Attempt to move files to dest
        shutil.move(src_file, dest)
        #making use of the OSError exception instead of FileExistsError due to older version of python not contaning that exception 
    except OSError as e:
        #Log the files that have not been moved to the console
        logging.info(f'Files File already exists: {src_file}')
        print(f'File already exists: {src_file}')
        #os.remove to delete files that are already in dest repo
        os.remove(src_file)
        logging.warning(f'Deleting: {src_file}')

def file_loop(files, root):

    for file in files:
        #src_file is used to get the full path of everyfile
        src_file = os.path.join(root,file)

        #The two variables below are used to get the files creation date
        t = os.stat(src_file)
        c = t.st_ctime
        #If the file is older then cutoff code within the if statement executes

        if c<cutoff:

            move_files(src_file)
        #Log the file names that are not older then the cutoff and continue loop
        else:
            logging.info(f'File is not older than 14 days: {src_file}')
            continue

def purge_files():

    logging.info('invoke purge_files method')
    #Walk through root directory and all subdirectories
       for root, subdirs, files in os.walk(source):
          dst_dir = root.replace(source, dest)

           #Loop through files to grab every file
           file_loop(files, root)

       return files, root, subdirs


files, root, subdirs = purge_files()

I expected the output to move all files from source to dest. Before the files are moved I expected to check all files in the dest location including subdir of dest if any of them are the same file as the source then they wont be moved to the dest. I do not want the folders that are in source. I just want all the files to be moved to the root directory.

0

2 Answers 2

1

I can see that you have written a large part of the code already, but as it is currently posted it contains quite some errors:

  • The code is incorrectly indented, making it invalid Python code.
  • Some import statements are missing (e.g. for shutil).
  • You are referring to undefined variables (e.g. source).

If I copy-paste your code into my IDE, I get 26 errors from pep8 and pylint, and after fixing the indentation error I get 49 errors. It makes me wonder if this is your actual code or if you made copy-paste errors. Anyway, using an IDE will definitely help you validate your code and catch errors earlier. Try it!

Since I cannot run your code, I can't exactly tell why it's not working, but I can give you some pointers.

One thing that raises much questions is the following line:

dst_dir = root.replace(source, dest)

Apart from the bad indentation, the variable dst_dir is not used anywhere. So what's the point of this statement? Also note that this replaces all occurences of source in root. For trivial cases this will be no problem, but it's not very robust in all circumstances. So use path operations from the standard library when possible and try to avoid performing manual string operations on paths. In Python 3.4 the Pathlib module was introduced. I recommend using that.

Using os.walk() can be quite convenient in some cases, but may not be the best solution for your use case. Maybe using os.listdir() recursively will be much easier, especially since the destination directory will be flat (i.e. a fixed directory without sub directories).

A possible implementation (using pathlib and os.listdir()) could be as follows:

import logging
import os
import pathlib
import shutil
import time

SOURCE_DIR_PATH = pathlib.Path('C:\\Temp')
DESTINATION_DIR_PATH = pathlib.Path('D:\\archive')

CUTOFF_DAYS = 14
CUTOFF_TIME = time.time() - CUTOFF_DAYS * 24 * 3600  # two weeks


def move_file(src_file_path, dst_dir_path):
    logging.debug('Moving file %s to directory %s', src_file_path,
                  dst_dir_path)
    return  # REMOVE THIS LINE TO ACTUALLY PERFORM FILE OPERATIONS
    try:
        shutil.move(str(src_file_path), str(dst_dir_path))
    except OSError:
        logging.info('File already exists in destination directory: %s',
                     src_file_path)
        logging.warning('Deleting file %s', src_file_path)
        src_file_path.unlink()


def move_files(src_file_paths, dst_dir_path):
    for src_file_path in src_file_paths:
        if src_file_path.stat().st_ctime < CUTOFF_TIME:
            logging.info('Moving file older than %d days: %s', CUTOFF_DAYS,
                         src_file_path)
            move_file(src_file_path, dst_dir_path)
        else:
            logging.info('Not moving file less than %d days old: %s',
                         CUTOFF_DAYS, src_file_path)


def purge_files(src_dir_path, dst_dir_path):
    logging.info('Scanning directory %s', src_dir_path)
    names = os.listdir(src_dir_path)
    paths = [src_dir_path.joinpath(name) for name in names]
    file_paths = [path for path in paths if path.is_file()]
    dir_paths = [path for path in paths if path.is_dir()]
    # Cleanup files
    move_files(file_paths, dst_dir_path)
    # Cleanup directories, recursively.
    for dir_path in dir_paths:
        purge_files(dir_path, dst_dir_path)


def main():
    logging.basicConfig(format='%(message)s', level=logging.DEBUG)
    purge_files(SOURCE_DIR_PATH, DESTINATION_DIR_PATH)


if __name__ == '__main__':
    main()

I tested this code and it worked.

Note that I used the same error handling for move_file as in your example. However, I don't think it's quite robust. What if two files with the same name exist in the source directory (in different subdirectories, or during different times)? Than the second file will be deleted without backup. Also, in case of other errors (such as "disk full" or "network error"), the code just assumes that the file was already backed up and the original is deleted. I don't know your use case, but I would seriously consider rewriting this function.

However, I hope these suggestions and example code will get you on the right track.

Sign up to request clarification or add additional context in comments.

1 Comment

I agree 100% this was my first post on stack and I did not realize how bad it copied over. I currently use visual studio enterprise as my IDE. I was able to complete the project by completely restarting from the ground up. It is located at github.com/TheMuellTrain/purge_files if you would like to take a look. I would love some feedback.
-1

You might want to clean up your code, it is full of bugs. E.g. 'purge_files' instead of 'purge_files()' in main, indentation errors inside purge_files etc. Also the seemingly random newlines between the code makes it a bit awkward to read (for me at least) :)

EDIT: I quickly went over your code and changed a few things. Mainly variable names. I noticed that you had a few variables with undescriptive names ('i', 't', etc.) together with a comment describing what that variable meant. If you just change the variable name to something more descriptive, you don't need the comments and your code is even easier to rad. Note that I didn't test this code and tbh it probably does not even run (since that was not my goal but rather to show some of the style changes that I would suggest) :)

import os 
import shutil
import time
import errno
import time
import sys
import logging
import logging.config


# NOTE: It is a convention to write constants in all caps
SOURCE = r'C:\Users\Desktop\BetaSource'
DEST = r'C:\Users\Desktop\BetaDest'
#Gets the current time from the time module
now = time.time()
#Timer of when to purge files
cutoff = now - (14 * 86400)
all_sources = []
all_dest_dirty = []
logging.basicConfig(level = logging.INFO,
                    filename = time.strftime("main-%Y-%m-%d.log"))


def main():
    # NOTE: Why is this function called / does it exist? It only sets a global
    # 'dest_files' which is never used...
    dest_files()
    purge_files()


# I used the dess_files function to get all of the destination files
def dest_files():
    for root, subdirs, files in os.walk(DEST):
        for file in files:
            # NOTE: Is it really necessary to use a global here?
            global all_dirty
            all_dirty.append(files)


def purge_files():
    logging.info('invoke purge_files method')
    # I removed all duplicates from dest because cleaning up duplicates in
    # dest is out of the scope
    # NOTE: This is the perfect usecase for a set
    all_dest_clean = set(all_dest_dirty)
    # os.walk used to get all files in the source location 
    for source_root, source_subdirs, source_files in os.walk(SOURCE):
        # looped through every file in source_files
        for file in source_files:
            # appending all_sources to get the application name from the
            # file path
            all_sources.append(os.path.abspath(file).split('\\')[-1]) 
            # looping through each element of all_source
            for source in all_sources:
                # logical check to see if file in the source folder exists
                # in the destination folder
                if source not in all_dest_clean:
                    # src is used to get the path of the source file this
                    # will be needed to move the file in shutil.move
                    src =  os.path.abspath(os.path.join(source_root, source))
                    # the two variables used below are to get the creation
                    # time of the files
                    metadata = os.stat(src)
                    creation_time = metadata.st_ctime
                    # logical check to see if the file is older than the cutoff
                    if creation_time < cutoff:
                        logging.info(f'File has been succesfully moved: {source}')
                        print(f'File has been succesfully moved: {source}')
                        shutil.move(src,dest)
                        # removing the already checked source files for the
                        # list this is also used in other spots within the loop
                        all_sources.remove(source)
                    else:
                        logging.info(f'File is not older than 14 days: {source}')
                        print(f'File is not older than 14 days: {source}')
                        all_sources.remove(source)
                else:
                    all_sources.remove(source)
                    logging.info(f'File: {source} already exists in the destination')
                    print(f'File: {source} already exists in the destination')


if __name__ == '__main__':
    main()

1 Comment

I agree 100% this was my first post on stack and I did not realize how bad it copied over. I currently use visual studio enterprise as my IDE. I was able to complete the project by completely restarting from the ground up. It is located at github.com/TheMuellTrain/purge_files if you would like to take a look. I would love some feedback.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.