3

I have one requirement in which I need to create a custom JSON from the columns returned from one PySpark dataframe. So I wrote one UDF like the below which will return a JSON in String format from UDF for each row.

Parameter "entities" are in the array of JSON format.

def halResponse(entities, admantx, copilot_id): 
  json_resp = "{\"analyzedContent\": {"+json.dumps(entities)+"}}"
  return json_resp

But in the response, I am not getting proper JSON i.e instead of proper key: value pair, I am just getting values(actual values replace with * for security purpose), not key and value.

Find the sample response:

  "analyzedContents": [
    {
      "entities": [
        [
          "******",
          *,
          *********,
          [
            [
              "***********",
              "***********",
              "***********",
              [
                "*****************"
              ],
              **********
            ]
          ],
          "**************"
        ]
      ]
    }
  ]
}

Please help me to resolve this issue. After fixing, I should get the below sample response

  "analyzedContents": [
    {
      "entities": [
        [
          "key":******",
          "key":*,
          "key":*********,
          [
            [
              "key":"***********",
              "key":"***********",
              "key":"***********",
              [
                "key":"*****************"
              ],
              "key":**********
            ]
          ],
          "key":"**************"
        ]
      ]
    }
  ]
}
17
  • try using F.to_json spark.apache.org/docs/latest/api/python/… Commented Dec 23, 2020 at 13:39
  • And how that JSON can be converted to string Commented Dec 23, 2020 at 13:40
  • it is a string, no need for further conversion. Commented Dec 23, 2020 at 13:40
  • But i am getting this 'TypeError: can only concatenate str (not "NoneType") to str' when I am concatenating it in udf Commented Dec 23, 2020 at 13:57
  • could you edit your question and show the udf and the code that you used (with F.to_json)? Commented Dec 23, 2020 at 13:58

1 Answer 1

2

Try this without using an UDF:

import pyspark.sql.functions as F

df2 = df.withColumn(
    'response',
    F.concat(
        F.lit("{\"analyzedContent\": {"),
        F.to_json(F.col("entities")),
        F.lit("}}")
    )
)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.