4

I want to add new 2 columns value services arr first and second value but I'm getting the error:

Field name should be String Literal, but it's 0;

production_target_datasource_df.withColumn("newcol",production_target_datasource_df["Services"].getItem(0))
    +------------------+--------------------+
    |         cid      |            Services|
    +------------------+--------------------+
    |845124826013182686|     [112931, serv1]|
    |845124826013182686|     [146936, serv1]|
    |845124826013182686|      [32718, serv2]|
    |845124826013182686|      [28839, serv2]|
    |845124826013182686|       [8710, serv2]|
    |845124826013182686|    [2093140, serv3]|
2
  • 3
    Edit your question to include the output of production_target_datasource_df.printSchema(). Commented Jun 13, 2019 at 13:56
  • What have you tried so far. Do you have any code to show ? Commented Jun 13, 2019 at 16:34

2 Answers 2

4

You don't have to use .getItem(0)

production_target_datasource_df["Services"][0] would be enough.

# Constructing your table:
from pyspark.sql import Row

df = sc.parallelize([Row(cid=1,Services=["2", "serv1"]),
Row(cid=1, Services=["3", "serv1"]),
Row(cid=1, Services=["4", "serv2"])]).toDF()
df.show()
+---+----------+
|cid|  Services|
+---+----------+
|  1|[2, serv1]|
|  1|[3, serv1]|
|  1|[4, serv2]|
+---+----------+

# Adding the two columns:
new_df = df.withColumn("first_element", df.Services[0])
new_df = new_df.withColumn("second_element", df.Services[1])
new_df.show()

+---+----------+-------------+--------------+
|cid|  Services|first_element|second_element|
+---+----------+-------------+--------------+
|  1|[2, serv1]|            2|         serv1|
|  1|[3, serv1]|            3|         serv1|
|  1|[4, serv2]|            4|         serv2|
+---+----------+-------------+--------------+
Sign up to request clarification or add additional context in comments.

1 Comment

AnalysisException: "Field name should be String Literal, but it's 0;"
1

As the error is saying, you need to pass a string not a 0. Then, you wonder : what string should I pass ?

If you follow @pault advice, and printSchema, you will actually know what are the corresponding keys to your values in the list.

Here is the documentation of getItem, helping you figure this out. enter image description here

Another way to know what to pass, is to simply pass any string, you could type:

production_target_datasource_df.withColumn("newcol",production_target_datasource_df["Services"].getItem('0'))

and the logs will tell you what keys were expected.

Hope this helps ;)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.