I have a requirement to convert table using pivot.I have tried it in pyspark sql but facing issues.
Store Week xCount
------- ---- ------
102 1 96
101 1 138
105 1 37
109 1 59
101 2 282
102 2 212
105 2 78
109 2 97
105 3 60
102 3 123
101 3 220
109 3 87
I would like it to come out as a pivot table, like this:
Store 1 2 3 .
-----
101 138 282 220
102 96 212 123
105 37
109
code i am using is below
from pyspark import SparkContext,SparkConf
from builtins import int
#from org.spark.com.PySparkDemo import data
from pyspark.sql import Row
from pyspark.sql.context import SQLContext
conf = SparkConf().setAppName("FileSystem").setMaster("local")
sc=SparkContext(conf=conf)
sqlContext=SQLContext(sc)
a = sc.textFile("C:/Users/ashok.darsi/Desktop/Data1.txt")
b = a.map(lambda x:x.split(",")).map(lambda x :Row(store=int(x[0]),week=int(x[1]),xcount=int(x[2])))
c = sqlContext.createDataFrame(b)
#c.show()
c.registerTempTable("d")
e = sqlContext.sql("select * from d pivot(max(xcount)) for week in ([1],[2],[3])) as pivt")
e.show()
error i am facing
select * from d pivot(max(xcount)) for week in ([1],[2],[3])) as pivt
---------------------^^^
pyspark.sql.utils.ParseException: "\nmismatched input '(' expecting {<EOF>, ',', 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 'NATURAL', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}(line 1, pos 21)\n\n== SQL ==\nselect * from d pivot(max(xcount)) for week in ([1],[2],[3])) as pivt\n---------------------^^^\n"
SUCCESS: The process with PID 8668 (child process of PID 2192) has been terminated.
SUCCESS: The process with PID 2192 (child process of PID 4940) has been terminated.
SUCCESS: The process with PID 4940 (child process of PID 11192) has been terminated.