How to pass .sql files to spark.sql and the .sql files are in google storage

Question

I'm new to Spark and working on pySpark version 3.0.1 and the Python version used by spark is 3.6.x. I have SQL files that have merge SQL's and are stored in google storage. I am trying to pass those SQL files to spark.sql, can some help me with how can I achieve this using SparkSession?

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("sample").getOrCreate()

df = spark.read.load("TERR.txt")

df.createTempView("example")

I want to pass this sql by reading it from a .sql file located in google storage bucket and .sql file will be having multiple lines:

df2 = spark.sql("SELECT * FROM example")

@MykolaZotko IMO it's not a dupe as it involves reading the SQL file from a storage object (GCS), and none of the answers in the linked question does provide a solution for this. — blackbishop
– blackbishop, Commented Mar 29, 2021 at 9:28

davidmesalpz · Accepted Answer · 2021-05-27 14:14:21Z

0

In order to read a .sql file you could try the following:

from pyspark.sql import SparkSession
from google.cloud import storage #You will need pip install --upgrade google-cloud-storage

#Downloading the example.sql file

storage_client = storage.Client("[Your project name here]")
bucket = storage_client.get_bucket(bucket_name)
blob = bucket.blob("folder_one/foldertwo/example.sql")
blob.download_to_filename(destination_file_name)

#Given example.sql document

with open("example.sql") as fr:
    Queries = fr.readlines()
        for line in Queries:
        query = spark.sql(line.strip())

answered May 27, 2021 at 14:14

davidmesalpz

1316 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to pass .sql files to spark.sql and the .sql files are in google storage

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related