I have a sparse table that has nested sub-tables in some of the rows, as shown below, how do I represent this structure with scala collections
| rowkey | orderid | name | amount | supplier | account
| rowkey1|id0: 1001 |id1: "apple" | id1: 1000 | id3: "fruits, inc"|
|id2: "apple2"| id2: 1200 | |
| rowkey2|id4: 1002 |id5: "orange"| id5: 5000 | |
| rowkey3|id6: 1003 |id7: "pear" | id7: 500 | |id10: 77777
|id8: "pear2" | id8: 350 | |
|id9: "pear3" | id9: 500 | |
note: id1,2,3,.. represent unique identifiers for each "group attribute", which is basically the groupid for each sub-row, e.g. in the first row "|id2: "apple2"| id2: 1200" belong to the same group id2 (sub-row with two attributes (name and amount) under rowkey1)
another way to look at these 3 rows:
rowkey1, (orderid, id0, 1001), (name, id1, "apple"), (amount, id1, 1000), (name, id2, "apple2"), (amount, id2,1200), (supplier, id3, "fruit inc.")
rowkey2, (orderid, id4, 1002), (name, id5, "orange"), (amount, id5,5000)
rowkey3, (orderid, id6, 1003), (name, id7, "pear"), (amount, id7,500),(name, id8, "pear2"), (amount, id8,350),(name, id9, "pear3"), (amount, id9, 250), (account, id10, 777777)
edit: note that the table has 2000 columns, Is it possible to create a class (or add attributes to a class) dynamically, e.g. load field names and types from external file in Scala? I know that case classes are limited to 22 fields
edit2: also note that any of the attributes can have multiple lines (except rowkey), i.e. orderid, name, amount, supplier, account and 1995+ other columns, so creating individual "singleLine" classes for all of them is not feasible, I'm looking for the most general solution.
thanks for the answers, I guess to make it more general I can create these classes:
case class ColumnLine(
id: Int,
value: Option[Any]
)
case class Column(
colname: String,
coltype: String,
lines: Option[List[ColumnLine]]
)
case class Row (
rowkey:String,
columns:Map[String,Column] //colname -> Column
)
case class Table (
name:String,
rows:Map[String,Row] //rowkey -> Row
)
now I'm trying to figure out how to query this structure, i.e. return rows where column with colname=="amount" contains lines where value >500
edit3: ok, this is "quick and dirty" way, but seems to work, it scans 10M records in ~15 sec on my laptop
import scala.util.control.Breaks._
object hello{
def main(args: Array[String]) {
val n = 10000000
def uuid = java.util.UUID.randomUUID.toString
val row: Row = new Row(uuid, List(
Column("orderid", "String", List(Single("id2",Some(uuid)))),
Column("name", "String", List(Single("id2",Some("apple")),Single("id3",Some("apple2")))),
Column("amount", "Int", List(Single("id2",Some(1000)),Single("id3",Some(1200)))),
Column("supplier", "String", List(Single("id4",Some("fruits.inc")))),
Column("account", "Int", List(Single("id10",Some(7777))))
)
)
println(new java.util.Date)
val table: List[Row]= List.fill(n)(row)
table.par.filter(row=> gt(row, "amount",500))
.filter(row=> eq(row, "supplier","fruits.inc"))
.filter(row=> eq(row, "account", 7777))
//.foreach(println)
println(new java.util.Date)
}
def eq (row:Row, colname: String, colvalue:Any): Boolean = {
var res:Boolean = false
val col:Column = getCol(row,colname)
breakable{
for (line <- col.lines){
if (line.value.getOrElse()==colvalue){
res = true
break
}
}
}
return res
}
def gt (row:Row, colname: String, colvalue:Int): Boolean = {
var res:Boolean = false
val col:Column = getCol(row,colname)
breakable{
for (line <- col.lines){
if (line.value.getOrElse().asInstanceOf[Int]>colvalue){
res = true
break
}
}
}
return res
}
def getCol(row: Row, colname: String) : Column =
row.columns.filter(_.colname==colname).head
case class Single(id: String, value: Option[Any])
case class Column(
colname: String,
coltype: String,
lines: List[Single]
)
case class Row(
rowkey: String,
columns: List[Column]
)
}