0

I have some data that are not properly saved in an old database. I am moving the system to a new database and reformatting the old data as well. The old data looks like this:

a:10:{
    s:7:"step_no";s:1:"1";
    s:9:"YOUR_NAME";s:14:"Firtname Lastname";
    s:11:"CITIZENSHIP"; s:7:"Indian";
    s:22:"PROPOSE_NAME_BUSINESS1"; s:12:"ABC Limited";
    s:22:"PROPOSE_NAME_BUSINESS2"; s:15:"XYZ Investment";
    s:22:"PROPOSE_NAME_BUSINESS3";s:0:"";
    s:22:"PROPOSE_NAME_BUSINESS4";s:0:"";
    s:23:"PURPOSE_NATURE_BUSINESS";s:15:"Some dummy content";
    s:15:"CAPITAL_COMPANY";s:24:"20 Million Capital";
    s:14:"ANOTHER_AMOUNT";s:0:"";
}

I want the new look to be in proper JSON format so I can read in python jut like this:

data = {
    "step_no": "1",
    "YOUR_NAME":"Firtname Lastname",
    "CITIZENSHIP":"Indian",
    "PROPOSE_NAME_BUSINESS1":"ABC Limited",
    "PROPOSE_NAME_BUSINESS2":"XYZ Investment",
    "PROPOSE_NAME_BUSINESS3":"",
    "PROPOSE_NAME_BUSINESS4":"",
    "PURPOSE_NATURE_BUSINESS":"Some dummy content",
    "CAPITAL_COMPANY":"20 Million Capital",
    "ANOTHER_AMOUNT":""
}

I am thinking using regex to strip out the unwanted parts and reformatting the content using the names in caps would work but I don't know how to go about this.

0

1 Answer 1

2

Regexes would be the wrong approach here. There is no need, and the format is a little more complex than you assume it is.

You have data in the PHP serialize format. You can trivially deserialise it in Python with the phpserialize library:

import phpserialize
import json

def fixup_php_arrays(o):
    if isinstance(o, dict):
        if isinstance(next(iter(o), None), int):
            # PHP has no lists, only mappings; produce a list for
            # a dictionary with integer keys to 'repair'
            return [fixup_php_arrays(o[i]) for i in range(len(o))]
        return {k: fixup_php_arrays(v) for k, v in o.items()}
    return o

json.dumps(fixup_php(phpserialize.loads(yourdata, decode_strings=True)))

Note that PHP strings are byte strings, not Unicode text, so especially in Python 3 you'd have to decode your key-value pairs after the fact if you want to be able to re-encode to JSON. The decode_strings=True flag takes care of this for you. The default is UTF-8, pass in an encoding argument to pick a different codec.

PHP also uses arrays for sequences, so you may have to convert any decoded dict object with integer keys to a list first, which is what the fixup_php_arrays() function does.

Demo (with repaired data, many string lengths were off and whitespace was added):

>>> import phpserialize, json
>>> from pprint import pprint
>>> data = b'a:10:{s:7:"step_no";s:1:"1";s:9:"YOUR_NAME";s:18:"Firstname Lastname";s:11:"CITIZENSHIP";s:6:"Indian";s:22:"PROPOSE_NAME_BUSINESS1";s:11:"ABC Limited";s:22:"PROPOSE_NAME_BUSINESS2";s:14:"XYZ Investment";s:22:"PROPOSE_NAME_BUSINESS3";s:0:"";s:22:"PROPOSE_NAME_BUSINESS4";s:0:"";s:23:"PURPOSE_NATURE_BUSINESS";s:18:"Some dummy content";s:15:"CAPITAL_COMPANY";s:18:"20 Million Capital";s:14:"ANOTHER_AMOUNT";s:0:"";}'
>>> pprint(phpserialize.loads(data, decode_strings=True))
{'ANOTHER_AMOUNT': '',
 'CAPITAL_COMPANY': '20 Million Capital',
 'CITIZENSHIP': 'Indian',
 'PROPOSE_NAME_BUSINESS1': 'ABC Limited',
 'PROPOSE_NAME_BUSINESS2': 'XYZ Investment',
 'PROPOSE_NAME_BUSINESS3': '',
 'PROPOSE_NAME_BUSINESS4': '',
 'PURPOSE_NATURE_BUSINESS': 'Some dummy content',
 'YOUR_NAME': 'Firstname Lastname',
 'step_no': '1'}
>>> print(json.dumps(phpserialize.loads(data, decode_strings=True), sort_keys=True, indent=4))
{
    "ANOTHER_AMOUNT": "",
    "CAPITAL_COMPANY": "20 Million Capital",
    "CITIZENSHIP": "Indian",
    "PROPOSE_NAME_BUSINESS1": "ABC Limited",
    "PROPOSE_NAME_BUSINESS2": "XYZ Investment",
    "PROPOSE_NAME_BUSINESS3": "",
    "PROPOSE_NAME_BUSINESS4": "",
    "PURPOSE_NATURE_BUSINESS": "Some dummy content",
    "YOUR_NAME": "Firstname Lastname",
    "step_no": "1"
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.