3

I have a DataFrame witch contains a Binary column Type.

DataFrame :

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|BinaryGeometry                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[00 00 00 00 01 03 00 00 00 01 00 00 00 11 00 00 00 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40 FC FF EF 00 68 AA 1A C0 BF EE 57 20 85 19 41 40 04 00 F0 00 8C 86 1A C0 CC DC 8B DC AE 1A 41 40 FF FF EF 00 44 74 1A C0 CA 9D 5D 61 10 1C 41 40 FF FF EF 00 64 63 1A C0 BF 1F 98 0B 3A 1D 41 40 FF FF EF 00 44 47 1A C0 E4 6B A0 DD CE 1D 41 40 FC FF EF 00 D8 2B 1A C0 54 E4 71 67 6D 1C 41 40 FF FF EF 00 44 1A 1A C0 BF 1F 98 0B 3A 1D 41 40 02 00 F0 00 80 0B 1A C0 0D 80 00 13 2F 23 41 40 02 00 F0 00 B0 35 1A C0 CC F6 23 F8 BD 26 41 40 04 00 F0 00 0C 43 1A C0 73 1A 44 AF 16 26 41 40 02 00 F0 00 40 5A 1A C0 FF 54 9C 7C 2D 27 41 40 02 00 F0 00 50 68 1A C0 87 6E B9 42 44 28 41 40 02 00 F0 00 00 7C 1A C0 78 2B 85 BA F5 26 41 40 FC FF EF 00 18 91 1A C0 49 96 6F 58 C6 28 41 40 02 00 F0 00 B0 BC 1A C0 91 FA 4B 0E 7F 20 41 40 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I'm triying to read this column to extract the geometry format.

from my research i found that there is a function from the geospark library which is ST_GeomFromWKB which takes a long binary as argument.

SO I'm doing the following code :

df.withColumn("BinaryGeometry", hex(col("BinaryGeometry")))
                    .withColumn("BinaryGeometry",expr("ST_GeomFromWKB(BinaryGeometry)"))

I get the following output witch is not correct :

POINT(0 0)

How can I read this column to get the correct geometry value ?

EDIT

The structute of column on MySQL enter image description here

And when I track the data it seems like that : enter image description here

When I click on [GEOMETRY - 113 o] a text file downloads.

The text file contains this data :

             ے FPہ>Pے{ح@@€àOہZDچ4ح@@‏ہ¨KہإضآTح@@‏àمJہî=Oƒح@@û ڑLہCBsn«ح@@ے FPہ>Pے{ح@@

EDIT2

I have this table on MySQL database

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|BinaryGeometry
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|'POLYGON((-6.70005799736828 34.2118684058451,-6.66641236748546 34.1993751935347,-6.63139344658703 34.208461349766,-6.6135406633839 34.2192498881446,-6.5970611711964 34.2283339016717,-6.5695953508839 34.2328755410488,-6.54281617607921 34.2220887476075,-6.5256500383839 34.2283339016717,-6.51123048271984 34.2748740913813,-6.55242921318859 34.3026724029165,-6.56547547783703 34.2975672800575,-6.58813477959484 34.3060756457653,-6.60186768975109 34.314583149474,-6.62109376396984 34.3043740415805,-6.64169312920421 34.318553022848,-6.68426515068859 34.2538774367323,-6.70005799736828 34.2118684058451))',0
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

When I load this table in Spark I get :

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|BinaryGeometry                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[00 00 00 00 01 03 00 00 00 01 00 00 00 11 00 00 00 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40 FC FF EF 00 68 AA 1A C0 BF EE 57 20 85 19 41 40 04 00 F0 00 8C 86 1A C0 CC DC 8B DC AE 1A 41 40 FF FF EF 00 44 74 1A C0 CA 9D 5D 61 10 1C 41 40 FF FF EF 00 64 63 1A C0 BF 1F 98 0B 3A 1D 41 40 FF FF EF 00 44 47 1A C0 E4 6B A0 DD CE 1D 41 40 FC FF EF 00 D8 2B 1A C0 54 E4 71 67 6D 1C 41 40 FF FF EF 00 44 1A 1A C0 BF 1F 98 0B 3A 1D 41 40 02 00 F0 00 80 0B 1A C0 0D 80 00 13 2F 23 41 40 02 00 F0 00 B0 35 1A C0 CC F6 23 F8 BD 26 41 40 04 00 F0 00 0C 43 1A C0 73 1A 44 AF 16 26 41 40 02 00 F0 00 40 5A 1A C0 FF 54 9C 7C 2D 27 41 40 02 00 F0 00 50 68 1A C0 87 6E B9 42 44 28 41 40 02 00 F0 00 00 7C 1A C0 78 2B 85 BA F5 26 41 40 FC FF EF 00 18 91 1A C0 49 96 6F 58 C6 28 41 40 02 00 F0 00 B0 BC 1A C0 91 FA 4B 0E 7F 20 41 40 04 00 F0 00 DC CC 1A C0 87 14 01 81 1E 1B 41 40] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

then to get the original data 'Polygon .........' I do the following code :

df.withColumn("geom",expr("ST_GeomFromWKB(BinaryGeometry)"));

But I get the following error :

20/08/10 22:28:50 ERROR Executor: Exception in task 87.0 in stage 39.0 (TID 929) java.lang.ClassCastException: [B cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.geosparksql.expressions.ST_GeomFromWKB.eval(Constructors.scala:174)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_39$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)

EDIT3 enter image description here

9
  • 1
    Can you give me a sample file and the way to load it to get the same Binary as you do? I did an example where I store a geometry as WKB into a text file, when DF is loaded the type is String but the expressions select ST_GeomFromWKB(value) from df works Commented Aug 9, 2020 at 22:07
  • Tank you for your reply , I load it from a mySQL table . The data on MySQL is : 'POLYGON((-7.57839202924515 33.6053465978439,-7.57725477262284 33.6031662542634,-7.57388591810013 33.6041492028667,-7.57313489957596 33.60556105489,-7.57480859800125 33.6067941725528,-7.57839202924515 33.6053465978439))',0 , But when I load it on Spark , the data becames coded as you see in the post . Commented Aug 10, 2020 at 17:38
  • please check my EDIT Commented Aug 10, 2020 at 17:48
  • I'll have a look at this soon. We had a same problem but we solved it through the load phase. When selecting from DB we used ST_AsText(column) Commented Aug 10, 2020 at 20:38
  • I've been stuck in the same problem for a week and I'm still stuck , Thank you for your time :) Commented Aug 10, 2020 at 20:40

1 Answer 1

4

MySQL geometry types are not really stored as WKB and as such cannot be read using the geospark's method ST_GeomFromWKB.

https://dev.mysql.com/doc/refman/5.7/en/gis-data-formats.html

Internally, MySQL stores geometry values in a format that is not identical to either WKT or WKB format. (Internal format is like WKB but with an initial 4 bytes to indicate the SRID.)

Solution to that is to specify the query when loading dataframe that will parse the geometry types into WKT format using the builtin function ST_AsWKT:

df = spark
  .read
  .format("jdbc")
  .options(
    Map(
      "driver" -> "com.mysql.cj.jdbc.Driver",
      "url" -> "jdbc:mysql://host:3306/db",
      "user" -> "user",
      "password" -> "password",
      "dbtable" -> "(select ST_AsWKT(BinaryGeometry) as BinaryGeometry from geo_table) as t")
  )
  .load
  .withColumn("BinaryGeometry",expr("ST_GeomFromWKT(BinaryGeometry)"))

Alternatively, the column can be extracted as WKB using

select hex(ST_AsWKB(BinaryGeometry)) as BinaryGeometry from geo_table

and fed to ST_GeomFromWKB method.

scala> df.printShema()
root
 |-- BinaryGeometry: geometry (nullable = false)

scala> df.show()
+--------------------+
|      BinaryGeometry|
+--------------------+
|POLYGON ((-7.5783...|
+--------------------+
Sign up to request clarification or add additional context in comments.

1 Comment

THANK YOU SO MUCH :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.