Comparing each item from dir with each item from another dir

Question

The task is to compare students homework SQL files with mentors SQL files.

I've written two functions, which return a two-dimensional array (1st elements are an absolute path, 2nd are a relative).

Then I'm going to compare the relative path of students and mentors and execute SQL files (finding using absolute path) if these values are equal

Is there a more elegant implementation?

The folder structure of mentors dir:

Homework (folder)
  ├ 1 (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql
  ├ 2 (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql
  ├ n (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql

The folder structure of students dir:

├Students Homework (folder)
 ├Student1(folder)
  ├ 1 (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql
  ├ 2 (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql
  ├ n (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql
 ├Student2(folder)
  ├ 1 (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql
  ├ 2 (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql
  ├ n (folder)
  | ├ 1.sql
  | ├ 2.sql
  | └ n.sql

"Mentors" function:

def find_mentors_sql(config):

    mentors_sql_abs = []
    mentors_sql_rel = []

    for dirpath, subdirs, files in walk(config["MAIN_DIR"] + '\\Homework'):
        mentors_sql_abs.extend(path.join(dirpath, x) for x in files if x.endswith(".sql"))
        mentors_sql_rel.extend(path.join(path.basename(dirpath), x) for x in files if x.endswith(".sql"))

    mentors_sql = [[0] * 2 for i in range(len(mentors_sql_abs))]

    iter = 0
    for _ in mentors_sql_abs:
        mentors_sql[iter][0] = mentors_sql_abs[iter]
        iter += 1

    iter1 = 0
    for _ in mentors_sql_rel:
        mentors_sql[iter1][1] = mentors_sql_rel[iter1]
        iter1 += 1

    return mentors_sql

"Students" function (the logic is similar to the previous one:

def find_students_sql(config):

    students_sql_abs = []
    students_sql_rel = []

    for dirpath, subdirs, files in walk(config["MAIN_DIR"] + '\\Students Homework'):
        students_sql_abs.extend(path.join(dirpath, x) for x in files if x.endswith(".sql"))
        students_sql_rel.extend(path.join(path.basename(dirpath), x) for x in files if x.endswith(".sql"))

    students_sql = [[0] * 2 for i in range(len(students_sql_abs))]

    iter = 0
    for _ in students_sql:
        students_sql[iter][0] = students_sql_abs[iter]
        iter += 1

    iter1 = 0
    for _ in students_sql:
        students_sql[iter1][1] = students_sql_rel[iter1]
        iter1 += 1

    return students_sql
```

Getting close to perfect there... Quick tips; for _ in <iterable> could do without iter = 0 stuff if ya use enumerate(), eg. for i, thing in enumerate(<iterable>)... though why you're pre-building an empty nested list is a little beyond me, I'd use a list of tuples or a dictionary... Looks like the find_mentors_sql and find_students_sql could be merged into one function, and I think walk_path = os.path.join(p1, p2) you'll find more portable than walking p1 + '\\p2'... finally for this comment, those one-liner for loops with extend hurt the code's readability. — S0AndS0
– S0AndS0, Commented Apr 28, 2019 at 18:22
@S0AndS0 Comments are for seeking clarification to the question, and may be deleted. Please put all suggestions for improvements in answers. — 200_success
– 200_success, Commented Apr 28, 2019 at 20:17
Wups @200_success, I'll attempt to do better in the future, for now my attempt at an answer will go into trying to address "Is there a more elegant implementation?", because I spent a little more time with the code. — S0AndS0
– S0AndS0, Commented Apr 28, 2019 at 21:35

S0AndS0 · Accepted Answer · 2019-04-29 18:25:31Z

Okay I spent just a little while with your code @Valentyn, and I think it is mostly unharmed...

`sql_utils/init.py`

import os


def walked_sql_paths(config, sub_dir):
    """
    Generates `tuple` of absolute and relative file paths

    - `config` should contain a 'MAIN_DIR' key with a value similar to
        - `/home/Mentor`
        - `/home/StudentName`
    - `sub_dir` should contain a string such as `Homework`
    """
    ## ... I am guessing that ya might be doing something
    ##     else with the `config` object, if this is not
    ##     the case, then this could be simplified to
    ##     only taking a `path` argument instead.
    target_dir = os.path.join(config['MAIN_DIR'], sub_dir)

    for dirpath, subdirs, files in walk(target_dir):
        for item in files:
            if not item.endswith('.sql'):
                continue

            sql_abs = os.path.join(dirpath, item)
            sql_rel = os.path.basename(dirpath)
            yield sql_abs, sql_rel

That stuff between """ (triple quotes) be a "docstring", and is accessable via either help(walked_sql_paths) or print(walked_sql_paths.__doc__). The "dunder" or "Magic Method" stuff I'll not cover here as that's a whole 'nother-can-o-worms. What is important is that accessible documentation is something that Python allows for, while code that doesn't require it is something to strive for.

I'm using yield in the above for loop so that it yields partial results to whatever calls next() or __next__() methods (called by for loops and other processes implicitly), generators are a cheep way of optimizing code as well as ensuring that users experience less herky jerky loading between results; even if things are taking awhile this'll usually feel faster in other-words.

The assignments of sql_abs and sql_rel are first for readability, and second for making it easy to later do something like yield sql_rel, sql_abs instead. Otherwise there's little reason to prefer it over the answer posted by @Peilonrayz.

Here's one way of using the above modified code...

from sql_utils import walked_sql_paths


## ... setting of `mentors` and `students` `_config` objects
##     and other stuff I am guessing will go here...


students_paths = walked_sql_paths(config = students_config,
                                  sub_dir = 'Students Homework')

mentors_paths = walked_sql_paths(config = mentors_config,
                                 sub_dir = 'Homework')


for s_paths, m_paths in zip(students_paths, mentors_paths):
    if not s_paths[0] == m_paths[0]:
        print("Warning, continuing past -> {s_rel_path} and {m_rel_path} miss-match!".format(
            s_rel_path = s_path[0],
            m_rel_path = m_path[0]))
        continue

    print("Time to compare -> Mentors {m_abs_path} with Students {s_abs_path}".format(
        m_abs_path = m_paths[1],
        s_abs_path = s_paths[1]))

I'm using zip to zip-up the two generators in the above for loop because it's a built-in that seems to do what ya want.

Hopefully none of this is mind blowing because like I stated in your question's comments @Valentyn, you where really close to something that I'd not be able to add to.

Looking at the folder structure a bit closer, it looks like things'll could get just a bit fancier with the loops. What's your preference on ordering?

My thoughts would be to iterate over Students_Homework/ of students and then zip-up between sub-folders, in which case it maybe possible to cache the Mentor's folders on the first pass. However, that would not be nice to scale or if there's lots of sub-directories... Another thought would be to iterate over the Mentor's 1-n folders and zip-up on each student in turn. Feel free to comment with a preference as to which might be more helpful.

Thoughts on the future, using try/except you can code for cases where, Student3 didn't turn in the 5.sql file espected in 2's folder, so here's some skeleton-code that'll hopefully get ya a little closer to fault tolerantness...

def safety_zipper(*iters, search_strs):
  """
  Allows for doing something _clever_ where things could have gone painfully wrong

  - `iters`, `list` of iterables that each output `tuple`s of length two `(rel_path, abs_path)`
  - `search_strs`, `list` of `str`ings to search for matches on `rel_path`

  Yields `list` of `tuple`s `(rel_path, abs_path)`
  """
  for search_str in search_strs:
      partial_results = []
      for thing in iters:
          try:
              path_tuple = thing.next()
          except (GeneratorExit, StopIteration):
              ## Note passing can be dangerous, I only do it
              ##  because the parent loop will exit, eventually
              print("Warning {abs_path} miss-match with {search_str}".format(
                  abs_path = path_tuple[1],
                  search_str = search_str))
              pass
          else:  ## No error so do things with next thing
              ## Uncomment the following if useful
              # abs_path = path_tuple[1]
              rel_path = path_tuple[0]
              if search_str == rel_path:
                  partial_results.append(path_tuple)
                  continue

              ## Deal with miss-matches in a clever way here, such
              ##  as if a student is late to turn in an assignment.

          finally:
              ## Finally runs regardless, well so long as another
              ##  exception is not raised before reaching here.
              ##  Only included for completeness and in-case ya
              ##  wanted to do something fancy here too.
              pass

      yield partial_results

... I'll warn that the above is not complete, but essentially it'll allow for catching cases where Student directories or files do not match those of the Mentor's file paths. It may have to be stacked to be able to check for differences in both directories and files, and pre-loading search_strs list would either require foreknowledge or pre-parsing a chunk of Mentor's file paths to populate.

But whatever's downstream will have a much cleaner input and require much less edge-case detection.

Peilonrayz · Accepted Answer · 2020-03-06 00:09:08Z

2

It's recommended to use enumerate rather than iter (renamed i), _ and indexing.

for i, abs_path in enemerate(mentors_sql_abs):
    mentors_sql[i][0] = abs_path

It's better to use zip rather than build mentors_sql manually.
Your function could further be simplified if you don't extend mentors_sql_* and just yield the values.
Please only use one string delimiter, either ' or ".
x is a pretty bad variable name, I would use file. Even for a comprehension it's pretty poor, as x isn't short hand for anything.
The only difference between the two functions as you walk different paths. And so you can change your input to account for this, and use one function.
I don't see the need for returning both relative and absolute paths, and so won't comment on it too much. You may want to return one and convert when needed.

def find_sql(path):
    for dirpath, subdirs, files in walk(path):
        for file in files:
            if file.endswith('.sql'):
                yield (
                    path.join(dirpath, file),
                    path.join(path.basename(dirpath), file)
                )


mentors = find_sql(config["MAIN_DIR"] + '\\Homework')
students = find_sql(config["MAIN_DIR"] + '\\Students Homework')

edited Mar 6, 2020 at 0:09

answered Apr 28, 2019 at 21:10

Peilonrayz♦

44.6k7 gold badges80 silver badges158 bronze badges

\$\begingroup\$ iter is also a built-in being shadowed here (although you don't use it in the end). \$\endgroup\$

Graipher
– Graipher

2019-04-29 11:53:18 +00:00
Commented Apr 29, 2019 at 11:53
\$\begingroup\$ I agree in general. But when actually iterating over something, calling the iterating variable iter can be quite confusing IMO. \$\endgroup\$

Graipher
– Graipher

2019-04-29 12:00:45 +00:00
Commented Apr 29, 2019 at 12:00
\$\begingroup\$ @Graipher I agree iter is a terrible variable name. It doesn't matter if it's it's shadowing a builtin or not, it's just bad. \$\endgroup\$

Peilonrayz
– Peilonrayz ♦

2020-03-06 00:09:57 +00:00
Commented Mar 6, 2020 at 0:09
\$\begingroup\$ Working through a back log of about a year of comments? In any case, I agree, although I'm also not a big fan of my usual choice of it. \$\endgroup\$

Graipher
– Graipher

2020-03-06 00:16:10 +00:00
Commented Mar 6, 2020 at 0:16
\$\begingroup\$ @Graipher Something like that yeah. Hadn't seen you around either so had to give you a ping ;P Hmm, I've seen it for iterator/iterable, I think i is ok but not great - at least it's kinda a standard. \$\endgroup\$

Peilonrayz
– Peilonrayz ♦

2020-03-06 00:19:36 +00:00
Commented Mar 6, 2020 at 0:19

Add a comment |

Stack Exchange Network

Comparing each item from dir with each item from another dir

2 Answers 2

`sql_utils/init.py`

You must log in to answer this question.

Hot Network Questions

Comparing each item from dir with each item from another dir

2 Answers 2

sql_utils/__init__.py

You must log in to answer this question.

Related

Hot Network Questions

`sql_utils/init.py`