1

This is the schema of the data and wanted to extract 'from' in this. Tried using the df3 =df.select(df.transcript.data.from.alias("Type")) and getting invalid syntax error.

How to extract this.

root
 |-- contactId: long (nullable = true)
 |-- mediaLegId: string (nullable = true)
 |-- transcript: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- action: string (nullable = true)
 |    |    |-- data: struct (nullable = true)
 |    |    |    |-- chatId: string (nullable = true)
 |    |    |    |-- customerInfo: struct (nullable = true)
 |    |    |    |    |-- customerIdentifierToken: string (nullable = true)
 |    |    |    |    |-- customerIdentifierType: string (nullable = true)
 |    |    |    |    |-- customerName: string (nullable = true)
 |    |    |    |    |-- initialQuestion: string (nullable = true)
 |    |    |    |-- entryPoint: string (nullable = true)
 |    |    |    |-- from: string (nullable = true)
 |    |    |    |-- lang: string (nullable = true)
 |    |    |    |-- parkDuration: long (nullable = true)
 |    |    |    |-- parkNote: string (nullable = true)
 |    |    |    |-- participant: struct (nullable = true)
 |    |    |    |    |-- disconnectReason: string (nullable = true)
 |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |-- participantId: string (nullable = true)
 |    |    |    |    |-- preferences: struct (nullable = true)
 |    |    |    |    |    |-- language: string (nullable = true)
 |    |    |    |    |-- state: string (nullable = true)
 |    |    |    |    |-- userName: string (nullable = true)
 |    |    |    |-- reconnected: boolean (nullable = true)
 |    |    |    |-- relatedData: string (nullable = true)
 |    |    |    |-- text: string (nullable = true)
 |    |    |    |-- timestamp: long (nullable = true)
 |    |    |    |-- transcriptText: string (nullable = true)
 |    |    |    |-- transferNote: string (nullable = true)

| | | |-- transcriptText: string (nullable = true) | | | |-- transferNote: string (nullable = true)

1
  • 1
    can you share sample data Commented Jan 28, 2020 at 12:04

1 Answer 1

1

Try using it like this

from pyspark.sql import functions as F

df.select(F.explode("transcript").alias('transcript')).select('transcript.*').select("data.*").select("from").show()
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the response Amit this works if I just need only that column, is there any way that I can use that in combination with other attributes as well? Tried this sample1_2 =df.select(df.contactId.alias("ContactId"),\ df.select(F.explode("transcript").alias('transcript')).select('transcript.*').select("data.*").select("from"),\ df.transcript.data.participant.participantId.alias("participantId"),\ df.transcript.data.text.alias("text") ).toDF('contactId','Type','participantId','Transcript')
Got this error : TypeError: Invalid argument, not a string or column: DataFrame[from: string] of type <class 'pyspark.sql.dataframe.DataFrame'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
can you provide sample data?
Try this : df.select("contactId", F.explode("transcript").alias('transcript')).select("contactId", 'transcript.*').select("contactId", "data.*").select("contactId", "from", "text", "participant.*").select("contactId", "from", "text", "participantId").show()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.