7

I'm experimenting with Spark-CSV package (https://github.com/databricks/spark-csv) for reading csv files into Spark DataFrames.

Everything works but all columns are assumed to be of StringType.

As shown in Spark SQL documentation (https://spark.apache.org/docs/latest/sql-programming-guide.html), for built-in sources such as JSON, the schema with data types can be inferred automatically.

Can the types of columns in CSV file be inferred automatically?

2
  • 1. StringTypes are a field type in SparkSQL. 2. What you are asking is not very clear, can you be more specific about what you are trying to achieve Commented Apr 19, 2015 at 8:28
  • 1
    I'm asking about automatic type inference, which is available in built-in data sources such as JSON. I.e. if one creates df using sqlContext.jsonFile("...") from json file having say one integer and one string field - these types would be defined in schema. Is this possible with CSV data source format? Commented Apr 19, 2015 at 8:56

2 Answers 2

7

Starting from Spark 2 we can use option 'inferSchema' like this: getSparkSession().read().option("inferSchema", "true").csv("YOUR_CSV_PATH")

Sign up to request clarification or add additional context in comments.

Comments

3

Unfortunately this is not currently supported but it would be a very useful feature. Currently they must be declared in DLL. From the documentation we have:

header: when set to true the first line of files will be used to name columns and will not be included in data. All types will be assumed string. Default value is false.

which is what you are seeing.

Note that it is possible to infer schema at query time, e.g.

select sum(mystringfield) from mytable

1 Comment

Thanks, that's what I thought, thanks for confirming. I agree - this would be extremely nice functionality. Exact type matching may be infeasible, but various heuristics may be possible: type based on the first N records in column, based on random N records, based on N/size fraction, etc

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.