1

My JSON looks like this (but with many lines like these):

{"text": "Home - Homepage des Kunstvereins Pro Ars Lausitz e.V.\nKunst. Und so weiter.", "timestamp": "2018-01-20T18:56:35Z", "url": "http://proarslausitz.de/1.html"}
{"text": "Bildnummer: 79800031\nVektorgrafikSkalieren Sie ohne Aufl\u00f6sungsverlust auf jede beliebige. Ende.", "url": "http://www.shutterstock.com/de/pic.mhtml?id=79800031&src=lznayUu4-IHg9bkDAflIhg-1-15"}

I want to create a .txt file containing just the text from text. So it would be just:

Home - Homepage des Kunstvereins Pro Ars Lausitz e.V.\nKunst. Und so weiter. Bildnummer: 79800031\nVektorgrafikSkalieren Sie ohne Aufl\u00f6sungsverlust auf jede beliebige. Ende.

No strings, no nothing. The encoding (because of umlauts) I think is not hard to solve afterwards. But regarding text extraction, I know I can do:

json_object = json.loads(json_object_string)
print(json_object["text"])

But that's just for a line. Do I need to iterate over the lines? How can I merge the texts into a single .txt file?

1
  • 1
    just iterate over the lines Commented Nov 29, 2021 at 0:46

2 Answers 2

2
with open("file.txt", 'w') as txt_file:
    for i in range(len(js_file['...'])):
        txt_file.write(js['...'][i]['text'])

txt_file.close()

replace '...' with the name of the main key for the json file

Sign up to request clarification or add additional context in comments.

Comments

1

I'm not entirely sure there is a way to "vectorize" copying values from a json, and even if there was, iterating still gets the job done just fine in my opinion. If I were to iterate through every line of that long JSON and put each "text" into a text file, I would do it like this:

import json

# removed escape sequences, that is not focus of problem
test = '[{"text": "Home - Homepage des Kunstvereins Pro Ars Lausitz e.V.Kunst. Und so weiter.", "timestamp": "2018-01-20T18:56:35Z", "url": "http://proarslausitz.de/1.html"}, {"text": "Bildnummer: 79800031VektorgrafikSkalieren Sie ohne Aufl sungsverlust auf jede beliebige. Ende.", "url": "http://www.shutterstock.com/de/pic.mhtml?id=79800031&src=lznayUu4-IHg9bkDAflIhg-1-15"}]'

# as you said loading the object from list of dicts into json
test_json = json.loads(test)

# opens a new text file to put the json text into
with open("json_output.txt", 'w+') as file:
    for line in test_json:
       # assuming the text includes /n write function will paste each dict on different line
       file.write(line.get("text"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.