Okay I spent just a little while with your code @Valentyn, and I think it is mostly unharmed...
sql_utils/__init__.py
import os
def walked_sql_paths(config, sub_dir):
"""
Generates `tuple` of absolute and relative file paths
- `config` should contain a 'MAIN_DIR' key with a value similar to
- `/home/Mentor`
- `/home/StudentName`
- `sub_dir` should contain a string such as `Homework`
"""
## ... I am guessing that ya might be doing something
## else with the `config` object, if this is not
## the case, then this could be simplified to
## only taking a `path` argument instead.
target_dir = os.path.join(config['MAIN_DIR'], sub_dir)
for dirpath, subdirs, files in walk(target_dir):
for item in files:
if not item.endswith('.sql'):
continue
sql_abs = os.path.join(dirpath, item)
sql_rel = os.path.basename(dirpath)
yield sql_abs, sql_rel
That stuff between """ (triple quotes) be a "docstring", and is accessable via either help(walked_sql_paths) or print(walked_sql_paths.__doc__). The "dunder" or "Magic Method" stuff I'll not cover here as that's a whole 'nother-can-o-worms. What is important is that accessible documentation is something that Python allows for, while code that doesn't require it is something to strive for.
I'm using yield in the above for loop so that it yields partial results to whatever calls next() or __next__() methods (called by for loops and other processes implicitly), generators are a cheep way of optimizing code as well as ensuring that users experience less herky jerky loading between results; even if things are taking awhile this'll usually feel faster in other-words.
The assignments of sql_abs and sql_rel are first for readability, and second for making it easy to later do something like yield sql_rel, sql_abs instead. Otherwise there's little reason to prefer it over the answer posted by @Peilonrayz.
Here's one way of using the above modified code...
from sql_utils import walked_sql_paths
## ... setting of `mentors` and `students` `_config` objects
## and other stuff I am guessing will go here...
students_paths = walked_sql_paths(config = students_config,
sub_dir = 'Students Homework')
mentors_paths = walked_sql_paths(config = mentors_config,
sub_dir = 'Homework')
for s_paths, m_paths in zip(students_paths, mentors_paths):
if not s_paths[0] == m_paths[0]:
print("Warning, continuing past -> {s_rel_path} and {m_rel_path} miss-match!".format(
s_rel_path = s_path[0],
m_rel_path = m_path[0]))
continue
print("Time to compare -> Mentors {m_abs_path} with Students {s_abs_path}".format(
m_abs_path = m_paths[1],
s_abs_path = s_paths[1]))
I'm using zip to zip-up the two generators in the above for loop because it's a built-in that seems to do what ya want.
Hopefully none of this is mind blowing because like I stated in your question's comments @Valentyn, you where really close to something that I'd not be able to add to.
Looking at the folder structure a bit closer, it looks like things'll could get just a bit fancier with the loops. What's your preference on ordering?
My thoughts would be to iterate over Students_Homework/ of students and then zip-up between sub-folders, in which case it maybe possible to cache the Mentor's folders on the first pass. However, that would not be nice to scale or if there's lots of sub-directories... Another thought would be to iterate over the Mentor's 1-n folders and zip-up on each student in turn. Feel free to comment with a preference as to which might be more helpful.
Thoughts on the future, using try/except you can code for cases where, Student3 didn't turn in the 5.sql file espected in 2's folder, so here's some skeleton-code that'll hopefully get ya a little closer to fault tolerantness...
def safety_zipper(*iters, search_strs):
"""
Allows for doing something _clever_ where things could have gone painfully wrong
- `iters`, `list` of iterables that each output `tuple`s of length two `(rel_path, abs_path)`
- `search_strs`, `list` of `str`ings to search for matches on `rel_path`
Yields `list` of `tuple`s `(rel_path, abs_path)`
"""
for search_str in search_strs:
partial_results = []
for thing in iters:
try:
path_tuple = thing.next()
except (GeneratorExit, StopIteration):
## Note passing can be dangerous, I only do it
## because the parent loop will exit, eventually
print("Warning {abs_path} miss-match with {search_str}".format(
abs_path = path_tuple[1],
search_str = search_str))
pass
else: ## No error so do things with next thing
## Uncomment the following if useful
# abs_path = path_tuple[1]
rel_path = path_tuple[0]
if search_str == rel_path:
partial_results.append(path_tuple)
continue
## Deal with miss-matches in a clever way here, such
## as if a student is late to turn in an assignment.
finally:
## Finally runs regardless, well so long as another
## exception is not raised before reaching here.
## Only included for completeness and in-case ya
## wanted to do something fancy here too.
pass
yield partial_results
... I'll warn that the above is not complete, but essentially it'll allow for catching cases where Student directories or files do not match those of the Mentor's file paths. It may have to be stacked to be able to check for differences in both directories and files, and pre-loading search_strs list would either require foreknowledge or pre-parsing a chunk of Mentor's file paths to populate.
But whatever's downstream will have a much cleaner input and require much less edge-case detection.
for _ in <iterable>could do withoutiter = 0stuff if ya useenumerate(), eg.for i, thing in enumerate(<iterable>)... though why you're pre-building an empty nested list is a little beyond me, I'd use alistoftuplesor adictionary... Looks like thefind_mentors_sqlandfind_students_sqlcould be merged into one function, and I thinkwalk_path = os.path.join(p1, p2)you'll find more portable thanwalkingp1 + '\\p2'... finally for this comment, those one-linerforloops withextendhurt the code's readability. \$\endgroup\$"Is there a more elegant implementation?", because I spent a little more time with the code. \$\endgroup\$