What is the maximum column count of spark Dataframe? I tried getting it from data frame documentation but unable to find it.
1 Answer
From the architectural perspective, they are scalable, so there should not be any limit on the column count, but it can give rise to uneven load on the nodes & may affect the overall performance of your transformations.
3 Comments
zero323
It is not correct. You can easily find a hard limit (
Int.MaxValue) but what is more important Spark scales well only long and relatively thin data. Fundamentally you cannot split a single record between executors / partitions. And there is a number of practical limitations (GC, disk IO) which make very wide data impractical. Not to mention some known bugs.KiranM
For that matter, most (as far as I know) programming models scale "well" for long & thin data. ( Due to one basic reason, the record would be broken to write onto next relevant "logical unit" of storage after a threshold.) Most of the "big data" frameworks are designed to handle data that has no limits, if you overcome the technical limitations, with a performance hit though. So I think we would get memory errors before we reach the said limit. Your thoughts?
eliasah
This is an old entry but I concur with @zero323 on this. Big-data frameworks has the limitation mentioned in the comment above. These kind of framework don't work well with wide data. I've experimented that earlier but unfortunately I can't share that benchmark due to NDA.