I find the existing Spark implementations for accessing a traditional Database very restricting and limited. Particularly:
- Use of Bind Variables is not possible.
- Passing the partitioning parameters to your generated SQL is very restricted.
Most bothersome is that I am not able to customize my query in how partitioning takes place, all it allows is to identify a partitioning column, and upper / lower boundaries, but only allowed is a numeric column and values. I understand I can provide the query to my database like you do a subquery, and map my partitioning column to a numeric value, but that will cause very inefficient execution plans on my database, where partition pruning (true Oracle Table Partitions), and or use of indexes is not efficient.
Is there any way for me to get around those restriction ... can I customize my query better ... build my own partition logic. Ideally I want to wrap my own custom jdbc code in an Iterator that I can be executed lazily, and does not cause the entire resultset to be loaded in memory (like the JdbcRDD works).
Oh - I prefer to do all this using Java, not Scala.