0

I am trying to find and extract a pattern living in a json file. If I do this as a test, it finds and prints it, because the json.dumps makes it a string:

    my_mi =  {"_links": {"self": {"href": "/xx-beta/media/111ee111-1e11-11a1-b111/metadata"}}}
    new = json.dumps(my_mi)
    my_id = re.findall(r'\w{1,9}\-\w{1,5}\-\w{1,5}\-\w{1,5}\-\w{1,13}', 
    new) 
    print my_id

The problem is that when I try using it as a json file, I'm having trouble converting it in a way that it will work without throwing the error "TypeError: <open file 'resTwo.json', mode 'r' at 0x1109eee40> is not JSON serializable", which is what it does with the following:

    with open ("resTwo.json", "r") as input_file:
        new = json.dumps(input_file)

        my_id = (re.findall(r'\w{1,9}\-\w{1,5}\-\w{1,5}\-\w{1,5}\-\w{1,13}', new))
        print my_id

I thought json.dumps converted into a string so the regex would then work as in the test example?

2 Answers 2

1

The rows returned from a csv reader object will be lists. re.findall expects a string as the second argument.

Either specify which field you want the regex to match on, or add another for-loop to iterate through each of the fields (i.e. iterate the row).

Sign up to request clarification or add additional context in comments.

4 Comments

So the string I want is in row[0], but when I get it to print that out it looks like this: {"_links": {"self": {"href": "/xx-beta/media/111ee111-1e11-11a1-b111-111bb11b0ada/metadata"}} so if I want the regex to match on that field, do I need to convert it to a string so it could do that? if it iterates through the rows they still aren't in a format that works for the regex, i.e. the string that the findall is expecting, correct? So I would like more explanation on how to do that in order to match the regex expected syntax. Thanks.
That doesn't look much like a csv file
It's json, but the file is saved as a csv in a script it's coming from. If it were a json to begin with, would there be a simpler solution? I tried working with it as a json but didn't succeed in getting that to be a string either or something the regex could work with.
I'm going to rework the question as json rather than csv because that makes more sense given the syntax of it.
0

I solved it with this:

    for value in input_file:
        mediaid = (re.findall(r'\w{1,9}\-\w{1,5}\-\w{1,5}\-\w{1,5}\-\w{1,13}', value))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.