8

What is the maximum column count of spark Dataframe? I tried getting it from data frame documentation but unable to find it.

1
  • Short answer is there is a limit- read this answer for a more thorough explanation. Commented Aug 7, 2018 at 14:23

1 Answer 1

1

From the architectural perspective, they are scalable, so there should not be any limit on the column count, but it can give rise to uneven load on the nodes & may affect the overall performance of your transformations.

Sign up to request clarification or add additional context in comments.

3 Comments

It is not correct. You can easily find a hard limit (Int.MaxValue) but what is more important Spark scales well only long and relatively thin data. Fundamentally you cannot split a single record between executors / partitions. And there is a number of practical limitations (GC, disk IO) which make very wide data impractical. Not to mention some known bugs.
For that matter, most (as far as I know) programming models scale "well" for long & thin data. ( Due to one basic reason, the record would be broken to write onto next relevant "logical unit" of storage after a threshold.) Most of the "big data" frameworks are designed to handle data that has no limits, if you overcome the technical limitations, with a performance hit though. So I think we would get memory errors before we reach the said limit. Your thoughts?
This is an old entry but I concur with @zero323 on this. Big-data frameworks has the limitation mentioned in the comment above. These kind of framework don't work well with wide data. I've experimented that earlier but unfortunately I can't share that benchmark due to NDA.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.