Appending multiple for-loop outputs to a list

Question

I am using RegEx to extract some data from a txt file. I've made the below for-loops to extract emails and birthdates and (tried) to append the outputs to a list. But when I print my list only the first appended output is printed. The birtdate RegEx works fine when run by itself. I'm sure I'm doing something very basic wrong.

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")

list = []

for i in f:
    if re.findall(r"((?i)[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.])", i):
        list.append(i)

for k in f:
    if re.findall(r'\d\d-\d\d-\d\d\d\d', k):
        list.append(k)

print(list)
f.close()

Not an answer but just noticing that you are using the case-insensitive modifier (?i) in your first pattern. So you could get rid of A-Z. Also in your second regex > \d\d\d\d is better written \d{4} — JvdV
– JvdV, Commented Apr 10, 2020 at 14:17
Does this answer your question? Read multiple times lines of the same file Python — azro
– azro, Commented Apr 10, 2020 at 14:17
your iterator f has reached the end of file (EOF) already when you're entering the second loop. So you either need to do f.seek(0) before the second loop, or just | two regexes, I think piping two regexes should work just fine — hardhypochondria
– hardhypochondria, Commented Apr 10, 2020 at 14:18

Lydia van Dyke · Accepted Answer · 2020-04-10 15:48:01Z

1

You try to read the same file twice. The second for-loop will not do anything. Have a look at this to understand:

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")
print(list(f))
print("second time:")
print(list(f))

Output:

['1234567890abcdefghijklmopqrstuvwxyz'] # or whatever your content is :)
second time:
[]

To fix this you can store the result of the file in a list (if you are not dealing with huge files, of course):

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")
content = list(f)


for i in content:
   ... 

for k in content:
   ...

In your specific example it would be cleaner (and faster) to do all processing in a single for-loop, though. However, the mistake was to try to read twice from the same file without resetting it.

edited Apr 10, 2020 at 15:48

answered Apr 10, 2020 at 14:20

Lydia van Dyke

2,5263 gold badges15 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

abhinonymous Over a year ago

Note of caution, if the file is large, storing it as a list can result in size of list being HUGE.

Lydia van Dyke Over a year ago

True. I just hoped the list of emails and birthdays is not in the order of millions.

Lydia van Dyke Over a year ago

@abhinonymous : added a note about this.

abhinonymous Over a year ago

Imagine doing that over a wiki dump, I'm sure someone has done that at some point of time :)

abhinonymous · Accepted Answer · 2020-04-10 14:21:02Z

1

Try this:

with open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8") as f:
    i = f.readline()
    if re.findall(r"((?i)[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.])", i):
        list.append(i)
    if re.findall(r'\d\d-\d\d-\d\d\d\d', k):
        list.append(i)

in your code, after the first for loop, f is now pointing to the end of the file and so the second for loop doesn't "run" as you're intending it to run.

So to modify your code to get it to do what you intended you would close file after first loop and reopen it before second loop so that the file pointer f points to begining of file again:

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")

list = []

for i in f:
    if re.findall(r"((?i)[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.])", i):
        list.append(i)

f.close()

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")
for k in f:
    if re.findall(r'\d\d-\d\d-\d\d\d\d', k):
        list.append(k)

print(list)
f.close()

edited Apr 10, 2020 at 14:21

answered Apr 10, 2020 at 14:17

abhinonymous

3292 silver badges13 bronze badges

1 Comment

azro Over a year ago

Please when answering, explain to the OP it's error, and how do your code can fix it. The main goal of SO is to make people learn stuff, not copy code that just work

Collectives™ on Stack Overflow

Appending multiple for-loop outputs to a list

2 Answers 2

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related