I've a larger dataframe with more than 100 columns and set of columns have same names with unique numbering. Multiple smaller dataframe to be created based on this unique number.
Yes column names have same pattern and number of such groups could be sometimes 64 or sometimes 128. net1, net2, net3...net64...net128
I need to have 64 subdfs or 128 subdfs. I cannot use startswith because column name net1, net10,net11...net100,net101...could match
I've created a solution in Spark+Scala, it works fine but I feel there must be an easier way to achieve it dynamically
df_net.printSchema()
|-- net1: string (nullable = true)
|-- net1_a: integer (nullable = true)
|-- net1_b: integer (nullable = true)
|-- net1_c: integer (nullable = true)
|-- net1_d: integer (nullable = true)
|-- net1_e: integer (nullable = true)
|-- net2: string (nullable = true)
|-- net2_a: integer (nullable = true)
|-- net2_b: integer (nullable = true)
|-- net2_c: integer (nullable = true)
|-- net2_d: integer (nullable = true)
|-- net2_e: integer (nullable = true)
|-- net3: string (nullable = true)
|-- net3_a: integer (nullable = true)
|-- net3_b: integer (nullable = true)
|-- net3_c: integer (nullable = true)
|-- net3_d: integer (nullable = true)
|-- net3_e: integer (nullable = true)
|-- net4: string (nullable = true)
|-- net4_a: integer (nullable = true)
|-- net4_b: integer (nullable = true)
|-- net4_c: integer (nullable = true)
|-- net4_d: integer (nullable = true)
|-- net4_e: integer (nullable = true)
|-- net5: string (nullable = true)
|-- net5_a: integer (nullable = true)
|-- net5_b: integer (nullable = true)
|-- net5_c: integer (nullable = true)
|-- net5_d: integer (nullable = true)
|-- net5_e: integer (nullable = true)
val df_net1 = df_net
.filter(!($"net1".isNull))
.select("net1","net1_a","net1_b","net1_c","net1_d","net1_e")
val df_net2 = df_net
.filter(!($"net2".isNull))
.select("net2","net2_a","net2_b","net2_c","net2_d","net2_e")
val df_net3 = df_net
.filter(!($"net3".isNull))
.select("net3","net3_a","net3_b","net3_c","net3_d","net3_e")
smaller data frames filtered based on unique number