converting text file to json in python

Question

I have multiple documents that together are approximately 400 GB and I want to convert them to json format in order to drop to elasticsearch for analysis.

Each file is approximately 200 MB.

Original file looked like:

IUGJHHGF@BERLIN:lhfrjy
0t7yfudf@WARSAW:qweokm246
0t7yfudf@CRACOW:Er747474
0t7yfudf@cracow:kui666666
000t7yf@Vienna:1йй2ц2й2цй2цц3у

It has the characters that are not only English. key1 is always separated with @, where city was separated either by ; or :

After I have parsed it with code:

#!/usr/bin/env python

# coding: utf8
import json


with open('2') as f:
   for line in f:
      s1 = line.find("@")
      rest = line[s1+1:]
      if rest.find(";") != -1:
         if rest.find(":") != -1:
            print "FOUND BOTH : ; "
            s2 = -0
         else:
            s2 = s1+1+rest.find(";")
      elif rest.find(":") != -1:
         s2 = s1+1+rest.find(":")
      else:
         print "FOUND NO : ; "
         s2 = -0

      key1 = line[:s1]
      city = line[s1+1:s2]
      description = line[s2+1:len(line)-1]

All file looks like:

RRS12345 Cracow Sunflowers
RRD12345 Berin Data

After that parsing I want to have the output:

  {  
   "location_data":[  
      {  
         "key1":"RRS12345",
         "city":"Cracow",
         "description":"Sunflowers"
      },
      {  
         "key1":"RRD123dsd45",
         "city":"Berlin",
         "description":"Data"
      },
      {  
         "key1":"RRD123dsds45",
         "city":"Berlin",
         "description":"1йй2ц2й2цй2цц3у"
      }
   ]
}

How can I convert it to the required json format quickly, where we do not have only English characters?

Can you show what you tried and describe how exactly it failed? — glibdud
– glibdud, Commented May 23, 2018 at 12:10
Do you need to use Python in particular, or would a faster non-Python solution do? — AKX
– AKX, Commented May 23, 2018 at 12:10
Do any of the cities have spaces in their names? Or spaces in the descriptions? If so, what does that look like? — PaulMcG
– PaulMcG, Commented May 23, 2018 at 12:29
I could do theoretically print at the end of the script that I have wrote and force that json syntax manually, but that is just so dump solution. — creed
– creed, Commented May 23, 2018 at 12:43

Vitalii Safin · Accepted Answer · 2018-05-23 12:17:12Z

3

import json


def process_text_to_json():
    location_data = []
    with open("file.txt") as f:
        for line in f:
            line = line.split()
            location_data.append({"key1": line[0], "city": line[1], "description": line[2]})

    location_data = {"location_data": location_data}
    return json.dumps(location_data)

Output sample:

{"location_data": [{"city": "Cracow", "key1": "RRS12345", "description": "Sunflowers"}, {"city": "Berin", "key1": "RRD12345", "description": "Data"}, {"city": "Cracow2", "key1": "RRS12346", "description": "Sunflowers"}, {"city": "Berin2", "key1": "RRD12346", "description": "Data"}, {"city": "Cracow3", "key1": "RRS12346", "description": "Sunflowers"}, {"city": "Berin3", "key1": "RRD12346", "description": "Data"}]}

answered May 23, 2018 at 12:17

Vitalii Safin

764 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rakesh · Accepted Answer · 2018-05-23 12:10:35Z

0

Iterate over each line and form your dict.

Ex:

d = {"location_data":[]}
with open(filename, "r") as infile:
    for line in infile:
        val = line.split()
        d["location_data"].append({"key1": val[0], "city": val[1], "description": val[2]})

print(d)

answered May 23, 2018 at 12:10

Rakesh

82.9k17 gold badges85 silver badges122 bronze badges

Collectives™ on Stack Overflow

converting text file to json in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related