5
import sys
import glob
import os.path

list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files

for file_name in list_of_files:
    print(file_name)

f= open(file_name, 'r')
lst = []
for line in f:
   line.strip()
   line = line.replace("\n" ,'')
   line = line.replace("//" , '')
   lst.append(line)
f.close()

f=open(os.path.join('/Users/Emily/UpdatedTopics',
os.path.basename(file_name)) , 'w')

for line in lst:
   f.write(line)
f.close()

I was able to read my files and do some pre-processing. The problem I'm facing is that when I write the files out, I can only see one file. I should get 500 files.

2
  • 1
    You are only working on one file_name, the last one used in your first for-loop. If you want to work on all of the names, your logic needs to be inside the loop. Commented Dec 5, 2016 at 1:57
  • 1
    @EmilyG The next step in the evolution of your code is to organize the logic into small, single-purpose functions. For example, as you loop over the input file paths, (a) call a read_file() function to read the lines of text, (b) pass those lines to a clean_lines() function that cleans up the lines and returns a new list of lines, (c) pass the input file path to an output_file_path() function that returns the output file path, and finally (d) pass the output file path and the cleaned up lines to a write_file() function that writes an output file. Good luck! Commented Dec 5, 2016 at 2:33

3 Answers 3

4

As currently written, the only file that gets processed is the last file in the list of file names. You need to indent so that each file gets processed in your loop.

import sys
import glob
import os.path

list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files

for file_name in list_of_files:
    print(file_name)

    # This needs to be done *inside the loop*
    f= open(file_name, 'r')
    lst = []
    for line in f:
       line.strip()
       line = line.replace("\n" ,'')
       line = line.replace("//" , '')
       lst.append(line)
    f.close()

    f=open(os.path.join('/Users/Emily/UpdatedTopics',
    os.path.basename(file_name)) , 'w')

    for line in lst:
       f.write(line)
    f.close()
Sign up to request clarification or add additional context in comments.

Comments

4

Python uses indentation instead of curly braces to help group code. Right now the way your code is indented, Python is interpreting it like this:

# get list of files
list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files

# loop through all file names
for file_name in list_of_files:
    # print the name of file
    print(file_name)

# PROBLEM: you remove your indentation so we are no longer in
# our for loop.  Now we take the last value of file_name (or the
# last file in the list) and open it and then continue the script
f= open(file_name, 'r')
...

Notice that we leave the for loop because of the change in indentation. The rest of your script runs only on the last file opened in the for loop.

Comments

1

Try this

import os
path = "/Users/Emily/Topics/"
for root,dirs,files in os.walk(path):
   for dir in dirs:
       write_files = [os.path.join(dir) + ".txt"]
       for wf in write_files:
           with open(wf,"w") as outfile:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.