Extracting Specific Field from String in Scala

Question

My dataframe returns the below result as String.

  QueryResult{status='success', finalSuccess=true, parseSuccess=true, allRows=[{"cbcnt":0}], signature={"cbcnt":"number"}, info=N1qlMetrics{resultCount=1, errorCount=0, warningCount=0, mutationCount=0, sortCount=0, resultSize=11, elapsedTime='5.080179ms', executionTime='4.931124ms'}, profileInfo={}, errors=[], requestId='754d19f6-7ec1-4609-bf2a-54214d06c57c', clientContextId='542bc4c8-1a56-4afb-8c2f-63d81e681cb4'}   |

  QueryResult{status='success', finalSuccess=true, parseSuccess=true, allRows=[{"cbcnt":"2021-07-30T00:00:00-04:00"}], signature={"cbcnt":"String"}, info=N1qlMetrics{resultCount=1, errorCount=0, warningCount=0, mutationCount=0, sortCount=0, resultSize=11, elapsedTime='5.080179ms', executionTime='4.931124ms'}, profileInfo={}, errors=[], requestId='754d19f6-7ec1-4609-bf2a-54214d06c57c', clientContextId='542bc4c8-1a56-4afb-8c2f-63d81e681cb4'}

I just want

"cbcnt":0  <-- Numeric part of this

Expected Output

col
----
0
2021-07-30

Tried:

.withColumn("CbRes",regexp_extract($"Col", """"cbcnt":(\S*\d+)""", 1))

Output

 col
    ----
    0
    "2021-07-30 00:00:00   --<--additional " is coming

There is nothing built into Spark to help you with this. You will have to use transformation to do it yourself by splitting strings using regex and such with plain Scala. — Filip
– Filip, Commented Sep 10, 2021 at 13:02

werner · Accepted Answer · 2021-09-10 15:03:20Z

1

Using the Pyspark function regexp_extract:

from pyspark.sql import functions as F

df = <dataframe with a column "text" that contains the input data">
df.withColumn("col", F.regexp_extract("text", """"cbcnt":(\d+)""", 1)).show()

answered Sep 10, 2021 at 15:03

werner

15k6 gold badges36 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

VnS Over a year ago

Works well.. Thnx

VnS Over a year ago

When "cbcnt": "2021-07-30T00:00:00-04:00" inplace of digit then what i need to do. "\d+" takes only digit. I want the date part ie. 2021-07-30

werner Over a year ago

@VnS you can try df.withColumn("col", F.regexp_extract("text", """"cbcnt":"(\d{4}-\d{2}-\d{2}).*".""", 1)).show()

VnS Over a year ago

This doesn't give the correct result. Column now becomes null. My column has Numeric plus date content as string. I want something which pick anything which comes after cbcnt either number of date.

werner Over a year ago

@VnS I don't quite understand if you only want to get the date part or anything after cbcnt. Maybe you could create a new question with example input data and the expected output?

|

Amerousful · Accepted Answer · 2021-09-10 14:53:21Z

1

Extract via regex:

val value = "QueryResult{status='success', finalSuccess=true, parseSuccess=true, allRows=[{\"cbcnt\":0}], signature={\"cbcnt\":\"number\"}, info=N1qlMetrics{resultCount=1, errorCount=0, warningCount=0, mutationCount=0, sortCount=0, resultSize=11, elapsedTime='5.080179ms', executionTime='4.931124ms'}, profileInfo={}, errors=[], requestId='754d19f6-7ec1-4609-bf2a-54214d06c57c', clientContextId='542bc4c8-1a56-4afb-8c2f-63d81e681cb4'}   |"
val regex = """"cbcnt":(\d+)""".r.unanchored
val s"${regex(result)}" = value

println(result)

Output:

answered Sep 10, 2021 at 14:53

Amerousful

2,6001 gold badge16 silver badges32 bronze badges

9 Comments

VnS Over a year ago

This is erroring out. The error is as below: method s is not a case class, nor does it have an unapply/unapplySeq member val s"${regex(result)}" = value

Amerousful Over a year ago

@vnsingh Then your version of Scala < 2.13. Starting Scala 2.13 add this

Amerousful Over a year ago

Nevertheless, I believe that Werner's answer is more correct. Since it is in the context of using a Apache-Spark.

VnS Over a year ago

When "cbcnt": "2021-07-30T00:00:00-04:00" inplace of digit then what i need to do. "\d+" takes only digit. I want the date part ie. 2021-07-30

Amerousful Over a year ago

allRows.*?cbcnt":(.*?)}

|

Collectives™ on Stack Overflow

Extracting Specific Field from String in Scala

2 Answers 2

7 Comments

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related