How to process multiple Spark SQL queries in parallel [duplicate]

Ask Question

Asked 7 years, 5 months ago

Modified 7 years, 5 months ago

Viewed 163 times

Suppose I have something like

df1 = sqlContext.sql("select count(1) as ct1 from big_table_1")
df2 = sqlContext.sql("select count(1) as ct2 from big_table_2")
df1.show()
df2.show()

Within each table (either Hive or temporary), the rows will be counted in parallel across the worker nodes, assuming the underlying dataframe is partitioned.

Is there also a way I can get the two tables to count in parallel? Is this even possible in PySpark?

asked Jun 21, 2018 at 18:48

wrschneider

19k17 gold badges109 silver badges192 bronze badges

Why don’t you do union all if it’s just a count.

William R
– William R

2018-06-21 20:17:55 +00:00
Commented Jun 21, 2018 at 20:17
It might not be. I'm just giving that as a minimal example

wrschneider
– wrschneider

2018-06-21 20:47:47 +00:00
Commented Jun 21, 2018 at 20:47

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to process multiple Spark SQL queries in parallel [duplicate]

0

Linked

Hot Network Questions