5

I just try to read csv file which was upload to GCS.

I want to read csv file which is upload to GCS with Cloud functions in GCP. And I want to deal with the csv data as "DataFrame".

But I can't read csv file by using pandas.

This is the code to read csv file on the GCS with using cloud functions.

def read_csvfile(data, context):
     try:
          bucket_name = "my_bucket_name"
          file_name = "my_csvfile_name.csv"
          project_name = "my_project_name"

          # create gcs client
          client = gcs.Client(project_name)
          bucket = client.get_bucket(bucket_name)
          # create blob
          blob = gcs.Blob(file_name, bucket)
          content = blob.download_as_string()
          train = pd.read_csv(BytesIO(content))
          print(train.head())
     
     except Exception as e:
          print("error:{}".format(e))

When I ran my Python code, I got the following error.

No columns to parse from file

Some websites says that the error means I read un empty csv file. But actually I upload non empty csv file. So how can I solve this problem?

please give me your help. Thanks.

----add at 2020/08/08-------

Thank you for giving me your help! But finally I cloud not read csv file by using your code... I still have the error, No columns to parse from file.

So I tried new way to read csv file as Byte type. The new Python code to read csv file is bellow.

MAIN.PY

from google.cloud import storage
import pandas as pd
import io
import csv
from io import BytesIO 

def check_columns(data, context):
    try:
        object_name = data['name']
        bucket_name = data['bucket']

        storage_client = storage.Client()
        bucket = storage_client.bucket(bucket_name)
        blob = bucket.blob(object_name)
        data = blob.download_as_string()
        
        #read the upload csv file as Byte type.
        f = io.StringIO(str(data))
        df = pd.read_csv(f, encoding = "shift-jis")

        print("df:{}".format(df))     
        print("df.columns:{}".format(df.columns)) 
        print("The number of columns:{}".format(len(df.columns)))

REQUIREMENTS.TXT

Click==7.0
Flask==1.0.2
itsdangerous==1.1.0
Jinja2==2.10
MarkupSafe==1.1.0
Pillow==5.4.1
qrcode==6.1
six==1.12.0
Werkzeug==0.14.1
google-cloud-storage==1.30.0
gcsfs==0.6.2
pandas==1.1.0

The output I got is bellow.

df:Empty DataFrame
Columns: [b'Apple, Lemon, Orange, Grape]
Index: []
df.columns:Index(['b'Apple', 'Lemon', 'Orange', 'Grape'])
The number of columns:4

So I could read only first record in csv file as df.column!? But I could not get the other records in csv file...And the first column is not the column but normal record.

So how can I get some records in csv file as DataFrame with using pandas?

Could you help me again? Thank you.

1 Answer 1

10

Pandas, since version 0.24.1, can directly read a Google Cloud Storage URI.

For example:

gs://awesomefakebucket/my.csv

Your service account attached to your function must have access to read the CSV file.

Please, feel free to test and modify this code.

I used Python 3.7

function.py

from google.cloud import storage
import pandas as pd

def hello_world(request):
    # it is mandatory initialize the storage client
    client = storage.Client()
    #please change the file's URI
    temp = pd.read_csv('gs://awesomefakebucket/my.csv', encoding='utf-8')
    print (temp.head())
    return f'check the results in the logs'

requirements.txt

google-cloud-storage==1.30.0
gcsfs==0.6.2
pandas==1.1.0

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for your help! I used the code you supplied here. But I still have the same error like "No columns to parse from file"... Why... I tried new way to read csv file. The new Python code is followed. I want to show my new code, but I could not write it here because I have a limitation of number of letter. So please check the just bellow first post.
@alan How are you uploading your CSV file? How was it created? in your local environment is it working fine? I tested with a csv created on MS EXCEL and with this internet sample, both works fine, also I uploaded both files by using the web ui, are you uploading your csv as binary file?
Finally I could read csv file which was upload to GCS. The problem is that I add many kind of library in REQUIREMENTS.TXT. So when I cleared it all, then add only code you gave me, Finally I could read csv finally. Thank you so much!!
@alan If my answer was helpful, please mark as valid and upvote it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.