0

Is there any nice (and elegant, if there is..) way of extracting list of tables involved in transformation while using spark.sql(...) ?

I need dynamically identify list of tables (ideally without using customer parsing logic) in spark sql query and apply down stream transformations depending on table(s) involved.

I was thinking to use default SQL parser in

import spark.sessionState.sqlParser

but, I am just wondering if there is an easy way of doing this??

6
  • Is the sql query written by you or it needs to be extracted from somewhere (eg, from explain function)? Could you give some examples of how should extraction of tables work? Commented Mar 13, 2019 at 14:16
  • essentially, behind the scene I am registering some set of views using createOrReplaceTempView in the scope of the session, I want to know what view(s) involved in this specific sql query, I've notice that this information reachable in the debuger from resulting dataFrame via df.executionPlan.child.child.tableIndetifier (not sure 100% if its correct), but I can't manage to create a code for this. Commented Mar 13, 2019 at 14:22
  • Do you want extract only list of table names or you would like extract any other details like LogicalPlan,catalyst expressions and etc ? Commented Mar 13, 2019 at 14:27
  • at this moment I just need list of table names, other details irrelevant Commented Mar 13, 2019 at 14:28
  • How about to use some regular expression on your query, like Parse SQL Query Text to Extract Table Names Used Commented Mar 13, 2019 at 14:57

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.