1

I have a dataframe column named Body(String). The body column data looks like this

<p>I want to use a track-bar to change a form's opacity.</p>

<p>This is my code:</p>

 <pre><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>

<p>When I build the application, it gives the following error:</p>

<blockquote>
  <p>Cannot implicitly convert type 'decimal' to 'double'.</p>
</blockquote>

<p>I tried using <code>trans</code> and <code>double</code> but then the 
control doesn't work. This code worked fine in a past VB.NET project. </p>
,While applying opacity to a form should we use a decimal or double value?

Using Body I want to prepare two separate columns code and text. Code is between elements named code and text is everything else.

I have created a UDF which looks like this

 case class bodyresults(text:String,code:String)
 val Body:String=>bodyresults=(body:String)=>{ val xmlbody=scala.xml.XML.loadString(body)
val code = (xmlbody \\ "code").toString;
val text = "I want every thing else as text. what should I do"
(text,code)
}
val bodyudf=udf(Body)
val posts5=posts4.withColumn("codetext",bodyudf(col("Body")))

This is not working. My questions are 1.As you can see there is no root node in the data. can I still use scala XML parsing? 2. how to parse everything else except code into text.

If there is something wrong in my code please let me know

Expected output:

 (code,text)
 code = decimal trans = trackBar1.Value / 5000;this.Opacity = trans;trans double  
 text = everything else  
5
  • What's the error, if any? What is your expected output? Commented Sep 20, 2017 at 6:35
  • In spark-shell its not showing any error message. There is something wrong in the UDF Body. spark-shell is not creating a function. Commented Sep 20, 2017 at 7:06
  • Okay. Code tags are at multiple places. Do you want all of them or just the one inside pre i.e decimal trans = ...? Commented Sep 20, 2017 at 7:09
  • all of them together.I added the expected output Commented Sep 20, 2017 at 7:18
  • In that case, will trans and double be removed from the last paragraph? Commented Sep 20, 2017 at 7:21

1 Answer 1

1

Instead of doing a replace, you can also use RewriteRule and override transform method of XML class to empty to <pre> tag in your xml.

case class bodyresults(text:String,code:String)

val bodyudf = udf{ (body: String)  =>

    // Appending body tag explicitly to the xml before parsing  
    val xmlElems = XML.loadString(s""" <body> ${body} </body> """)
    // extract the code inside the req
    val code = (xmlElems \\ "body" \\ "pre" \\ "code").text

    val text = (xmlElems \\ "body").text.replaceAll(s"${code}" ,"" )

    bodyresults(text, code)
}

This UDF will return a StructType like :

org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function1>,StructType(StructField(text,StringType,true), StructField(code,StringType,true)),List(StringType))

You can call it on you posts5 dataframe now like :

val posts5 = df.withColumn("codetext", bodyudf($"xml") )
posts5: org.apache.spark.sql.DataFrame = [xml: string, codetext: struct<text:string,code:string>]

To extract a specific column :

posts5.select($"codetext.code" ).show
+--------------------+
|                code|
+--------------------+
|decimal trans = t...|
+--------------------+
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you so much.I understand it now. I don't have enough reputation to upvote your answer.
when I try to implement this I am getting an error SAXParseException: The entity "nbsp" was referenced, but not declared. I have appended <?xml version="1.0" encoding="utf-8"?> to the string but its not working. Do you happen to know anything about that?
Try to prepend <!ENTITY nbsp "&#160;"> after <?xml version..> and see if it works.
No it causes another error "The markup in the document preceding the root element must be well-formed."

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.