0

I am testing some basic queries from spark-1.5.1 on casandra 2.1.12. Having this wired issue when i try to split the data by '=' i.e. action column in the table. It parses correctly where as in case of '|'. It returns the individual character. Why is it so.

Moreover, value of action column is not shown completely. So, how do view complete value of column on stdout.

 import org.apache.spark.sql.cassandra.CassandraSQLContext
    import org.apache.spark.sql.cassandra._
    import org.apache.spark.sql

    val csc = new CassandraSQLContext(sc)
    csc.setKeyspace("test")

    val maxDF = csc.sql("select action, split(action, '=')[0], split(action, '=')[1], split(action, '=')[2] from testdata" )

    maxDF.show

Output of Splitting '='

    scala> maxDF.show
    +--------------------+------+-----------+---------+
    |              action|   _c1|        _c2|      _c3|
    +--------------------+------+-----------+---------+
    | car=10.288|city=262|   car|10.288|city|      262|
    |kms=0-|year=0-|bu...|   kms|    0-|year|0-|budget|
    |city=40|pc=40|car=10|  city|      40|pc|   40|car|
    |city=40|pc=40|car...|  city|      40|pc|   40|car|
    |city=40|pc=40|car...|  city|      40|pc|   40|car|
    |                pn=1|    pn|          1|     null|
    | city=10|pc=10|car=9|  city|      10|pc|   10|car|
    |city=10|pc=10|car...|  city|      10|pc|   10|car|
    |city=10|pc=10|car...|  city|      10|pc|   10|car|
    |city=10|pc=10|car...|  city|      10|pc|   10|car|
    |city=10|pc=10|car...|  city|      10|pc|   10|car|
    |  city=10|pc=10|pn=1|  city|      10|pc|    10|pn|
    |   year=0-|so=1|sc=0|  year|      0-|so|     1|sc|
    |year=0-|so=1|sc=0...|  year|      0-|so|     1|sc|
    |             year=8-|  year|         8-|     null|
    |budget=6-12|city=...|budget|  6-12|city|    10|pc|
    |budget=6-12|city=...|budget|  6-12|city|    10|pc|
    |budget=6-12|city=...|budget|  6-12|city|    10|pc|
    |budget=6-12|city=...|budget|  6-12|city|    10|pc|
    |car=9.266|city=24...|   car| 9.266|city|   246|pc|
    +--------------------+------+-----------+---------+
    only showing top 20 rows

Output of splitting '|'

val maxDF = csc.sql("select action, split(action, '|')[0], split(action, '|')[1], split(action, '|')[2] from testdata" )

    maxDF.show

    +--------------------+---+---+---+
    |              action|_c1|_c2|_c3|
    +--------------------+---+---+---+
    | car=10.288|city=262|   |  c|  a|
    |kms=0-|year=0-|bu...|   |  k|  m|
    |city=40|pc=40|car=10|   |  c|  i|
    |city=40|pc=40|car...|   |  c|  i|
    |city=40|pc=40|car...|   |  c|  i|
    |                pn=1|   |  p|  n|
    | city=10|pc=10|car=9|   |  c|  i|
    |city=10|pc=10|car...|   |  c|  i|
    |city=10|pc=10|car...|   |  c|  i|
    |city=10|pc=10|car...|   |  c|  i|
    |city=10|pc=10|car...|   |  c|  i|
    |  city=10|pc=10|pn=1|   |  c|  i|
    |   year=0-|so=1|sc=0|   |  y|  e|
    |year=0-|so=1|sc=0...|   |  y|  e|
    |             year=8-|   |  y|  e|
    |budget=6-12|city=...|   |  b|  u|
    |budget=6-12|city=...|   |  b|  u|
    |budget=6-12|city=...|   |  b|  u|
    |budget=6-12|city=...|   |  b|  u|
    |car=9.266|city=24...|   |  c|  a|
    +--------------------+---+---+---+
1
  • maybe try split(action,"\=") or something similar... Commented Mar 17, 2016 at 7:28

2 Answers 2

2

The vertical pipe "|" separates a series of alternatives, and in your case there are no alternatives so it simply returns the longest matching pattern from that character own, which is the character.

Use split(action, '\\|')

Sign up to request clarification or add additional context in comments.

3 Comments

Could you please explain in more detail what do you mean by series of alternatives. I am new to spark. So, not aware or it. Moreover, how to view the complete value of column. Currently we see ... for lengthy string. Also, can we write this output to a file also i.e. csv or any other format
'\\|' has the same problem for me. I had to use '\\\\|'
@Naresh Alternatives in regex pattern. For the second part is just how it is displayed. the sql call returns a DataFrame, so you can use this example to save to csv
1

split(action, '\\|') still has the same problem for me. I had to use split(action, '\\\\|')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.