0

I have a string like this

msg = b'@\x06string\x083http://schemas.microsoft.com/2003/10/Serialization/\x9a\x05\x18{"PUID":"9279565","Title":"Risk Manager","Description":"<strong>Risk Manager </strong><br />\\n<br />\\nLentech, Inc. is currently seekinga Risk Manager inGreenbelt,"}\x01'

The string {"PUID":"9279565","Title":"Risk Manager","Description":"<strong>Risk Manager </strong><br />\\n<br />\\nLentech, Inc. is currently seekinga Risk Manager inGreenbelt,"} is json parsable. So I come up with the following code to remove garbage strings from the above msg

x1 =  msg.split(b'{"',1)[1]
>>> 
>>> x1
b'PUID":"9279565","Title":"Risk Manager","Description":"<strong>Risk Manager </strong><br />\\n<br />\\nLentech, Inc. is currently seekinga Risk Manager inGreenbelt,"}\x01'
x2 = x1[::-1].split(b'}"', 1)[1][::-1]
>>> x2
b'PUID":"9279565","Title":"Risk Manager","Description":"<strong>Risk Manager </strong><br />\\n<br />\\nLentech, Inc. is currently seekinga Risk Manager inGreenbelt,'
>>> final_msg = b'{"%s"}'%x2
>>> final_msg
b'{"PUID":"9279565","Title":"Risk Manager","Description":"<strong>Risk Manager </strong><br />\\n<br />\\nLentech, Inc. is currently seekinga Risk Manager inGreenbelt,"}'
>>> import json
>>> json.loads(final_msg)
{'Description': "<strong>Risk Manager </strong><br />\\n<br />\\nLentech, Inc. is currently seekinga Risk Manager inGreenbelt,'", 'Title': 'Risk Manager', "b'PUID": '9279565'}

Its a bad way of doing what is required, I would like to know a more optimized way of achieving the result. I think regex can be helpful here but I have a very limited knowledge of regular expressions.

Thanks in advance

5
  • Azure Service Bus? Commented Jul 7, 2017 at 8:16
  • Yes. you are right :) Commented Jul 7, 2017 at 8:20
  • 1
    Please always explain your context to prevent xy problems. Commented Jul 7, 2017 at 8:24
  • There is nothing bad with what you are doing, You just got a messy response (probably not intended to be consumed as a json) so you have to deal with messy ways to extract the data you need Commented Jul 7, 2017 at 8:41
  • Already asked the problem here - stackoverflow.com/questions/44647351/…, We have decided to go for the 3rd case, as using HTTP protocol have its own limitations Commented Jul 7, 2017 at 8:42

2 Answers 2

1

There you go:

import re
final_msg = re.search("{.*}", msg).group(0)
Sign up to request clarification or add additional context in comments.

1 Comment

Just be aware that this won't work with a nested dictionary or with multiple JSON objects in one string.
0

You can convert byte type to string type first

msg = str(msg)

After which you can write a generator function along with enumeration to pull out the index of the symbol you are searching for

def gen_index(a_string):
    for i,symbol in enumerate(a_string):
        if symbol == '{':
            yield i
    for j , symbol in enumerate(a_string):
       if symbol == '}':
           yield j

 >>>a = list(gen_index(msg))  # returns the array
 >>># use array slicing to output to json. We need the first occurance of '{' and the last occurance of '}'
 import json
 json_output = json.loads(msg[a[0]:a[-1]+1])

1 Comment

hopefully it will take care of the edge cases where there is a dictionary inside the json. Might work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.