Convert array of JSON objects to string in pyspark

Question

I have one requirement in which I need to create a custom JSON from the columns returned from one PySpark dataframe. So I wrote one UDF like the below which will return a JSON in String format from UDF for each row.

Parameter "entities" are in the array of JSON format.

def halResponse(entities, admantx, copilot_id): 
  json_resp = "{\"analyzedContent\": {"+json.dumps(entities)+"}}"
  return json_resp

But in the response, I am not getting proper JSON i.e instead of proper key: value pair, I am just getting values(actual values replace with * for security purpose), not key and value.

Find the sample response:

  "analyzedContents": [
    {
      "entities": [
        [
          "******",
          *,
          *********,
          [
            [
              "***********",
              "***********",
              "***********",
              [
                "*****************"
              ],
              **********
            ]
          ],
          "**************"
        ]
      ]
    }
  ]
}

Please help me to resolve this issue. After fixing, I should get the below sample response

  "analyzedContents": [
    {
      "entities": [
        [
          "key":******",
          "key":*,
          "key":*********,
          [
            [
              "key":"***********",
              "key":"***********",
              "key":"***********",
              [
                "key":"*****************"
              ],
              "key":**********
            ]
          ],
          "key":"**************"
        ]
      ]
    }
  ]
}

try using F.to_json spark.apache.org/docs/latest/api/python/… — mck
– mck, Commented Dec 23, 2020 at 13:39
But i am getting this 'TypeError: can only concatenate str (not "NoneType") to str' when I am concatenating it in udf — Vishnu Chaturvedi
– Vishnu Chaturvedi, Commented Dec 23, 2020 at 13:57
could you edit your question and show the udf and the code that you used (with F.to_json)? — mck
– mck, Commented Dec 23, 2020 at 13:58

mck · Accepted Answer · 2020-12-23 14:18:37Z

2

Try this without using an UDF:

import pyspark.sql.functions as F

df2 = df.withColumn(
    'response',
    F.concat(
        F.lit("{\"analyzedContent\": {"),
        F.to_json(F.col("entities")),
        F.lit("}}")
    )
)

answered Dec 23, 2020 at 14:18

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Convert array of JSON objects to string in pyspark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related