Python file search using regex

Question

I have a file that has many lines. Each line starts with {"id": followed by the id number in quotes. (i.e {"id": "106"). I am trying to use regex to search the whole document line by line and print the lines that match 5 different id values. To do this I made a list with the ids and want to iterate through the list only matching lines that start with {"id": "(id number from list)". I am really confused on how to do this. Here is what I have so far:

f= "bdata.txt"    
statids = ["85", "106", "140", "172" , "337"] 
x= re.findall('{"id":', statids, 'f')
for line in open(file):
            print(x)

The error code I keep getting is: TypeError: unsupported operand type(s) for &: 'str' and 'int'

I need to whole line to be matched so I can split it and put it into a class.

Any advice? Thanks for your time.

If you're not married to regex, a simple if line.startswith('{"id":') and int(line[6:]) in statids: — Adam Smooch
– Adam Smooch, Commented Oct 19, 2021 at 20:56
You'll also want to open your file for reading properly - you may want to google Python3's open() — Adam Smooch
– Adam Smooch, Commented Oct 19, 2021 at 20:57
Is this a json file? Why not just load the file as a dictionary and filter by id — RJ Adriaansen
– RJ Adriaansen, Commented Oct 19, 2021 at 20:57

Arvind Kumar Avinash · Accepted Answer · 2021-10-19 21:15:31Z

2

You can retrieve the id from the line using the regex, ^\{\"id\": \"(\d+)\" where the value of group#1 will give you the id. Then, you can check if the id is present in statids.

Demo:

import re

statids = ["85", "106", "140", "172", "337"]

with open("bdata.txt") as file:
    for line in file:
        search = re.search('^\{\"id\": \"(\d+)\"', line)
        if search:
            id = search.group(1)
            if id in statids:
                print(line.rstrip())

For the following sample content in the file:

{"id": "100" hello
{"id": "106" world
{"id": "2" hi
{"id": "85" bye
{"id": "10" ok
{"id": "140" good
{"id": "165" fine
{"id": "172" great
{"id": "337" morning
{"id": "16" evening

the output will be:

{"id": "106" world
{"id": "85" bye
{"id": "140" good
{"id": "172" great
{"id": "337" morning

edited Oct 19, 2021 at 21:15

answered Oct 19, 2021 at 21:08

Arvind Kumar Avinash

81.1k10 gold badges98 silver badges144 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

SGT Anonymous Over a year ago

That worked! So now how would I take those lines and insert them into a class? I only need part of the lines. I have the class created already.

Bryant Novas · Accepted Answer · 2021-10-19 21:32:31Z

0

I the issue here is the way you're using re.findall, according to the docs you have to pass a regular expression as the first argument and the string that you want to match the expression to as the second argument. In your case I think this is how you should do it:

pattern = f'id: ({"|".join(statsids)})'
with open(f) as file:
  for line in file:
      match = re.findall(pattern, line)
      print(match.group(0))

in the regex the pipe operator "|" works same as or so by joining all the ids as an string with | in between them will find all the cases where it matches one id or the other. the match.group line returns where it was found.

edited Oct 19, 2021 at 21:32

answered Oct 19, 2021 at 21:17

Bryant Novas

11 bronze badge

Collectives™ on Stack Overflow

Python file search using regex

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related