1

I have a description field that is embedded within json and I'm unable to utilize json libraries to parse this data.

I use {0,23} in order in attempt to extract first 23 characters of string, how to extract entire value associated with description ?

   import re

    description = "'\description\" : \"this is a tesdt \n another test\" "

    re.findall(r'description(?:\w+){0,23}', description, re.IGNORECASE)

For above code just ['description'] is displayed

13
  • 1
    There are no characters matching \w imrediately after description so this is completely expected. Perhaps you are looking for .{0,23}? Commented Apr 23, 2018 at 16:35
  • 1
    Even if you are unable to import json (but why??) using regex for this seems misdirected, especially if you are unfamiliar with regex. Commented Apr 23, 2018 at 16:38
  • 2
    It may be helpful to know why you can't use any JSON libraries. Commented Apr 23, 2018 at 16:43
  • 1
    In case the problem with JSON libraries is that the JSON is embedded in a larger document like a webpage and you don't know how to parse only the JSON, check out github.com/alexmojaki/jsonfinder Commented Apr 23, 2018 at 17:02
  • 1
    This is a typical bad question. "Have some problem (which is not demonstrated in the question) and I want to solve it with a regex". A regex is obviously the wrong approach here. Commented Apr 23, 2018 at 17:10

2 Answers 2

1

You could try this code out:

import re

description = "description\" : \"this is a tesdt \n another test\" "

result = re.findall(r'(?<=description")(?:\s*\:\s*)(".{0,23}?(?=")")', description, re.IGNORECASE+re.DOTALL)[0]

print(result)

Which gives you the result of:

"this is a tesdt 
 another test"

Which is essentially:

\"this is a tesdt \n another test\"

And is what you have asked for in the comments.


Explanation -

(?<=description") is a positive look-behind that tells the regex to match the text preceded by description"
(?:\s*\:\s*) is a non-capturing group that tells the regex that description" will be followed by zero-or-more spaces, a colon (:) and again zero-or-more spaces.
(".{0,23}?(?=")") is the actual match desired, which consists of a double-quotes ("), zero-to-twenty three characters, and a double-quotes (") at the end.

Sign up to request clarification or add additional context in comments.

10 Comments

how to match until a double quotes is met ?
@blue-sky You'll have to elaborate on that, because in your sample input, description is immediately followed by a double quotation mark.
@hek2mgl I apologize for using regex here, even though I have heard that using a JSON Library is better than regex in such cases, as comments in my previous answers. However, I know absolutely nothing about JSON, or its libraries, and I am accustomed to using regex. The question seemed simple enough, so I used regex in my answer.
@hek2mgl I thought that since the OP tagged the question with regex, he would be familiar with it, and I also saw some comments telling him to use JSON Libraries. So I thought I might as well add whatever limited information I knew as an answer, solve his problem, and then he would also be able to later learn about JSON Parsing.
@hek2mgl Perhaps you could add an answer using JSON, and tell him how it would be easier to use that instead of regex. I'm sure that would be better appreciated :)
|
0
# First just creating some test JSON

import json

data = {
    'items': [
        {
            'description': 'A "good" thing',

            # This is ignored because I'm assuming we only want the exact key 'description'
            'full_description': 'Not a good thing'
        },
        {
            'description': 'Test some slashes: \\ \\\\ \" // \/ \n\r',
        },
    ]
}

j = json.dumps(data)

print(j)

# The actual code

import re

pattern = r'"description"\s*:\s*("(?:\\"|[^"])*?")'
descriptions = [

    # I'm using json.loads just to parse the matched string to interpret
    # escapes properly. If this is not acceptable then ast.literal_eval
    # will probably also work
    json.loads(d)
    for d in re.findall(pattern, j)]

# Testing that it works

assert descriptions == [item['description'] for item in data['items']]

1 Comment

Honestly, what's the point here? You encourage the OP to parse json with regular expressions?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.