1

Cloud Function will triggered once a file gets uploaded in the storage, My File Name : PubSubMessage. Inside Text : Hi, this this the first message

from google.cloud import storage
storage_client = storage.Client()

def hello_gcs(event, context):
file = event

bucket = storage_client.get_bucket(file['bucket'])

blob = bucket.blob(file['name'])

contents = blob.download_as_string()
print('contents: {}'.format(contents))

decodedstring = contents.decode(encoding="utf-8", errors="ignore")
print('decodedstring: \n{}'.format(decodedstring))

print('decodedstring: \n{}'.format(decodedstring))

------WebKitFormBoundaryAWAKqDaYZB3fJBhx
Content-Disposition: form-data; name="file"; filename="PubSubMessage.txt"
Content-Type: text/plain

Hi, this this the first line.
Hi ,this is the second line. 

hi this is the space after.
------WebKitFormBoundaryAWAKqDaYZB3fJBhx--

My Requirements.txt file

google-cloud-storage
requests==2.20.0
requests-toolbelt==0.9.1

How do i get the actual string inside the file "Hi, I am the first message....." ?

What is the best possible way to get the text from a file? TIA

11
  • I see you've edited your post to include further things you want to do once you read the string inside the file, but I think it would be better to split them into one or more separate questions. Commented Jun 24, 2020 at 14:27
  • @RafaelAlmeida i didnt tried those 2 things as i am stuck with getting the text part. Commented Jun 24, 2020 at 14:48
  • @RafaelAlmeida I tried that code but its failing. i have updated my question with the code. please help. not sure y its not working Commented Jun 24, 2020 at 14:54
  • From a quick look, it appears an indent problem, you're missing three spaces before the print in the last two lines. If this does not solve the problem, please include the error message you're getting. Commented Jun 24, 2020 at 15:00
  • @RafaelAlmeida if its an indent problem, cloud functions throws a compilation error. Bigger problem is there is no proper error coming in logs. its just saying crashed with no other info but i am sure that its not an indent problem Commented Jun 24, 2020 at 15:15

2 Answers 2

3

The string you read from Google Storage is a string representation of a multipart form. It contains not only the uploaded file contents but also some metadata. The same kind of request may be used to represent more than one file and/or form fields along with a file.

To access the file contents you want, you can use a library which supports that, such as requests-toolbelt. Check out this SO answer for an example. You'll need the Content-Type header, which includes the boundary, or to manually parse the boundary just from the content, if you absolutely must.

EDIT: from your answer, it seems that the Content-Type header was available in the Storage Metadata in Google Storage, which is a common scenario. For future readers of this answer, the specifics of where to read this header from will depend on your particular case.

Since this library is present in PyPI (the Python Package Index), you can use it even in Cloud Functions by specifying it as a dependency in the requirements.txt file.

Sign up to request clarification or add additional context in comments.

3 Comments

I want to achieve it via cloud func tions so i am nt sure whether this toolbelt is supported or how can i get it work there. is there any other way with which i can access the file data and process it within the same function?
You can use it in Cloud Functions, I added an edit to the answer with the link explaining the process.
I tried, its not working. from_response method requires a response object to access the response.content but ours is just a string. Error says str object has no attribute content. Here is link i am looking but its AWS Lambda. they are hardcoding it in some part or getting things from aws lamba context object which is not a part of gcp functions :-------stackoverflow.com/questions/50925083/…
0

Below Code will print the actual text present inside a file.

from requests_toolbelt.multipart import decoder
from google.cloud import storage
storage_client = storage.Client()

def hello_gcs(event, context):
    file = event
    
    bucket = storage_client.bucket(file['bucket'])
    #print('Bucket Name :  {}'.format(file['bucket']))
    #print('Object Name :  {}'.format(file['name']))
    #print('Bucket Object :  {}'.format(bucket))
    
    blob = bucket.get_blob(file['name'])
    #print('Blob Object :  {}'.format(blob))
    
    contentType = blob.content_type
    print('Blob ContentType: {}'.format(contentType))

    #To download the file as byte object
    content = blob.download_as_string()
    print('content: {}'.format(content))

    for part in decoder.MultipartDecoder(content, contentType).parts:
         print(part.text)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.