2

I have a file that has many lines. Each line starts with {"id": followed by the id number in quotes. (i.e {"id": "106"). I am trying to use regex to search the whole document line by line and print the lines that match 5 different id values. To do this I made a list with the ids and want to iterate through the list only matching lines that start with {"id": "(id number from list)". I am really confused on how to do this. Here is what I have so far:

f= "bdata.txt"    
statids = ["85", "106", "140", "172" , "337"] 
x= re.findall('{"id":', statids, 'f')
for line in open(file):
            print(x)

The error code I keep getting is: TypeError: unsupported operand type(s) for &: 'str' and 'int'

I need to whole line to be matched so I can split it and put it into a class.

Any advice? Thanks for your time.

3
  • If you're not married to regex, a simple if line.startswith('{"id":') and int(line[6:]) in statids: Commented Oct 19, 2021 at 20:56
  • You'll also want to open your file for reading properly - you may want to google Python3's open() Commented Oct 19, 2021 at 20:57
  • Is this a json file? Why not just load the file as a dictionary and filter by id Commented Oct 19, 2021 at 20:57

2 Answers 2

2

You can retrieve the id from the line using the regex, ^\{\"id\": \"(\d+)\" where the value of group#1 will give you the id. Then, you can check if the id is present in statids.

Demo:

import re

statids = ["85", "106", "140", "172", "337"]

with open("bdata.txt") as file:
    for line in file:
        search = re.search('^\{\"id\": \"(\d+)\"', line)
        if search:
            id = search.group(1)
            if id in statids:
                print(line.rstrip())

For the following sample content in the file:

{"id": "100" hello
{"id": "106" world
{"id": "2" hi
{"id": "85" bye
{"id": "10" ok
{"id": "140" good
{"id": "165" fine
{"id": "172" great
{"id": "337" morning
{"id": "16" evening

the output will be:

{"id": "106" world
{"id": "85" bye
{"id": "140" good
{"id": "172" great
{"id": "337" morning
Sign up to request clarification or add additional context in comments.

1 Comment

That worked! So now how would I take those lines and insert them into a class? I only need part of the lines. I have the class created already.
0

I the issue here is the way you're using re.findall, according to the docs you have to pass a regular expression as the first argument and the string that you want to match the expression to as the second argument. In your case I think this is how you should do it:

pattern = f'id: ({"|".join(statsids)})'
with open(f) as file:
  for line in file:
      match = re.findall(pattern, line)
      print(match.group(0))

in the regex the pipe operator "|" works same as or so by joining all the ids as an string with | in between them will find all the cases where it matches one id or the other. the match.group line returns where it was found.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.