I have been experimenting with Quantlib and Spark, trying to pass a Quantlib function in Pyspark see example below:
from QuantLib import *
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
df = sc.parallelize([("2016-10-01",),
("2016-11-01",),
("2016-12-01",)]).toDF(['someDate'])
testudf = udf(lambda x: str(DateParser.parseFormatted(x,'%Y-%m-%d')), StringType())
df.withColumn('new', testudf('someDate')).show()
I haven't been successful so far and was wondering if anybody has had better luck.
Here is the error I get:
typeError: in method 'DateParser_parseFormatted', argument 1 of type 'std::string const &'
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
xthat gets passed to the lambda insideudf? Is it a Python string, or some Spark type?