1

The GCP python docs have a script with the following function:

def upload_pyspark_file(project_id, bucket_name, filename, file):
      """Uploads the PySpark file in this directory to the configured
      input bucket."""
      print('Uploading pyspark file to GCS')
      client = storage.Client(project=project_id)
      bucket = client.get_bucket(bucket_name)
      blob = bucket.blob(filename)
      blob.upload_from_file(file)

I've created an argument parsing function in my script that takes in multiple arguments (file names) to upload to a GCS bucket. I'm trying to adapt the above function to parse those multiple args and upload those files, but am unsure how to proceed. My confusion is with the 'filename' and 'file' variables above. How can I adapt the function for my specific purpose?

1 Answer 1

2

I don't suppose you're still looking for something like this?

from google.cloud import storage
import os

files = os.listdir('data-files')
client = storage.Client.from_service_account_json('cred.json')
bucket = client.get_bucket('xxxxxx')


def upload_pyspark_file(filename, file):
    # """Uploads the PySpark file in this directory to the configured
    # input bucket."""
    # print('Uploading pyspark file to GCS')
    # client = storage.Client(project=project_id)
    # bucket = client.get_bucket(bucket_name)
    print('Uploading from ', file, 'to', filename)
    blob = bucket.blob(filename)
    blob.upload_from_file(file)


for f in files:
    upload_pyspark_file(f, "data-files\\{0}".format(f))

The difference between file and filename is as you may have guessed, file is the source file and filename is the destination file.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.