I want to extract the whole message body of mail using gmail api. Right now i am using 'snippets' but i need the entire text. I searched and found that it's something to do with payload, but i didn't understand how. Can someone show me an example? Also, I am using the Gmail api via python.
5 Answers
same as noogui said but I found that snippet wont return the whole body
when the snippet exceed 200~ chars you will get it under payload.body.data you can find the whole body using payload.body.data
the catch is that you are getting it base64encoded so you need to decode it :)
the following code will do the trick
import base64
mail = service.users().messages().get(userId=user_id, id=id, format="full").execute()
def parse_msg(msg):
if msg.get("payload").get("body").get("data"):
return base64.urlsafe_b64decode(msg.get("payload").get("body").get("data").encode("ASCII")).decode("utf-8")
return msg.get("snippet")
4 Comments
Use Users.messages.get from the docs where there's a Python snippet:
import base64
import email
from apiclient import errors
def GetMessage(service, user_id, msg_id):
try:
message = service.users().messages().get(userId=user_id, id=msg_id).execute()
print 'Message snippet: %s' % message['snippet']
return message
except errors.HttpError, error:
print 'An error occurred: %s' % error
def GetMimeMessage(service, user_id, msg_id):
try:
message = service.users().messages().get(userId=user_id, id=msg_id, format='raw').execute()
print 'Message snippet: %s' % message['snippet']
msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
mime_msg = email.message_from_string(msg_str)
return mime_msg
except errors.HttpError, error:
print 'An error occurred: %s' % error
What your trying to access is the payload.body or if you want to go further, payload.body.data.
Comments
The gmail api docs provides sample code that showcases how to return the full message body as a message object structure. However, the code that they provided doesn't work for Python3. If you want to use to do this in Python3, you need to change their email.message_from_string() to email.message_from_bytes. Not sure exactly which module it was that changed this to make this issue happen, but the code below works just fine for me Python3.7.4
import base64
import email
message = gmail_conn.users().messages().get(userId=u_id, id=m_id, format='raw').execute()
msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))
mime_msg = email.message_from_bytes(msg_str)
print(mime_msg)
Comments
Now, you can definitely do it like so:
def main():
service = build('gmail', 'v1', credentials=creds)
for message in service.users().messages().list(userId=...).execute()['messages']:
print(parse_msg(
service.users().messages().get(userId='me', id=message['id'], format='raw').execute()
))
def parse_msg(msg):
return base64.urlsafe_b64decode(msg['raw'].encode('ASCII')).decode('utf-8')
But I felt like you get a little bit too much data this way, so I started to do like this:
def main():
service = build('gmail', 'v1', credentials=creds)
for message in service.users().messages().list(userId='me').execute()['messages']:
print(parse_msg(
service.users().messages().get(userId=..., id=message['id'], format='full').execute()
))
def parse_msg(msg):
payload = msg['payload']
if data := payload['body'].get('data'):
return parse_data(data)
return ''.join(parse_data(part['body']['data']) for part in payload['parts'])
def parse_data(data):
return base64.urlsafe_b64decode(data.encode('ASCII')).decode('utf-8')
Above you can see the modified version of the code that was posted here by Omer Shacham. This modification does so that you get all the payload data consistently, I think.
Comments
I wrote this script which extracts the full email body in text/plain
import base64
import email
class GmailAPI():
def get_service():
## You can get the gmail service snippet from here
# https://developers.google.com/gmail/api/quickstart/python
pass
class GmailParser():
def data_encoder(self, text):
if text and len(text)>0:
message = base64.urlsafe_b64decode(text.encode('UTF8'))
message = str(message, 'utf-8')
message = email.message_from_string(message)
return message
else:
return None
def read_message(self, content)->str:
import copy
if content.get('payload').get('parts', None):
parts = content.get('payload').get('parts', None)
sub_part = copy.deepcopy(parts[0])
while sub_part.get("mimeType", None) != "text/plain":
try:
sub_part = copy.deepcopy(sub_part.get('parts', None)[0])
except Exception as e:
break
return self.data_encoder(sub_part.get('body', None).get('data', None)).as_string()
else:
return content.get("snippet")
gmail_parser = GmailParser()
gmail_service = GmailAPI()
mail = gmail_service.users().messages().list(userId='me', labelIds=['INBOX']).execute()
messages = mail.get('messages')
for email in messages:
message = gmail_service.users().messages().get(userId='me', id=email['id'], format="full").execute()
data = gmail_parser.read_message(content=message)