I have the problem that i can't skip my own Header in a CSV-File while reading it with Pyspark read.csv.
CSV-File looks like that:
°°°°°°°°°°°°°°°°°°°°°°°°
° My Header °
° Important Data °
° Data °
°°°°°°°°°°°°°°°°°°°°°°°°
MYROW;SECONDROW;THIRDROW
290;6848;66484
96849684;68463;63848
84646;6484;98718
I can't figure it out how i skip all those first lines or 'n' lines.
I tried something like:
df_read = spark.read.csv('MyCSV-File.csv', sep=';') \
.rdd.zipWithIndex() \
.filter(lambda x: x[1] > 6) \
.map(lambda x: x[0]) \
.toDF('MYROW','SECONDROW','THIRDROW')
Is there any posibility to skip the lines, in particular how fast will it be? Data could be some GB's. Thanks