1

I have some code from PySpark 1.5 that I unfortunately have to port backwards to Spark 1.3. I have a column with elements that are alpha-numeric but I only want the digits. An example of the elements in 'old_col' of 'df' are:

 '125 Bytes'

In Spark 1.5 I was able to use

df.withColumn('new_col',F.regexp_replace('old_col','(\D+)','').cast("long"))

However, I cannot seem to come up with a solution using old 1.3 methods like SUBSTR or RLIKE. Reason being the number of digits in front of "Bytes" will vary in length, so what I really need is the 'replace' or 'strip' functionality I can't find in Spark 1.3 Any suggestions?

1 Answer 1

2

As long as you use HiveContext you can execute corresponding Hive UDFs either with selectExpr:

df.selectExpr("regexp_extract(old_col,'([0-9]+)', 1)")

or with plain SQL:

df.registerTempTable("df")
sqlContext.sql("SELECT regexp_extract(old_col,'([0-9]+)', 1) FROM df")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.