2

My input files are html files with no extension. Desired output is regex matched URLs from all files from the root_dir and results joined in single file. My regex works and I can output results from a single file.

import re
with open('/Users/files/filename') as f:
    for line in f:
        urls = re.findall (r"([\w%~\+-=]*\.mp3)", line);
        print (*urls)

I could use glob but unsure how to:

import glob
import re
root_dir = '/Users/files/'
for filename in glob.iglob(root_dir + '**/*.*', recursive=True):
        urls = re.findall (r"([\w%~\+-=]*\.mp3)", line);
        print (*urls)
4
  • no regex recursion in pythan, yes ? Commented Jun 12, 2020 at 21:59
  • Your first code sample works on opening a directory open('/Users/files/')?? Your second code sample misses an open(filename) statement. Commented Jun 12, 2020 at 22:09
  • @Han-KwangNienhuys no it does not open directory, i fixed the code to show that. Commented Jun 12, 2020 at 22:53
  • @Edward so you are saying there is no way to recursively use regex in python? Commented Jun 12, 2020 at 22:54

1 Answer 1

1

Use

import re, glob                                 # Import the libraries

root_dir = r'/Users/files'                      # Set root directory
save_to_file = r'/Users/urls_extracted.txt'     # File path to save results to
all_files = glob.glob("{}/*".format(root_dir))  # Get a glob with filepaths

with open(save_to_file, 'w') as fw:             # Open stream to write to
  for filename in all_files:                    # Iterate over the files
    with open(filename, 'r') as fr:             # Open file to read from  
      for url in re.findall(r"[\w%~+\-=]*\.mp3", fr.read()): # Get all matches and iterate over them
        fw.write("{}\n".format(url))            # Write each URL to write stream

Note that the dash must be escaped in the regular expression if you meant a - character and not a range.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, this worked well. Is there any way you could include some comment describing the code above or is it too cumbersome?
@x0rz0r Added comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.