0

I have an row object row.total_bytes_processed, which have rows that return None. If it returns None I have logic to default the value to 0

for row in rows:
        if row.total_bytes_processed is  not None:
            cost_dollars = (row.total_bytes_processed/1024 **4) *5
            print( f"JOB_ID : {row.job_id} | Creation_Time : {row.creation_time} |  Query: {row.query} | Total_Bytes_processed : {row.total_bytes_processed} | Estimated_Cost : ${cost_dollars}".format(
            row.job_id, row.creation_time, row.query, row.total_bytes_processed,cost_dollars))
        else:
            row.total_bytes_processed = 0 # <- Error occurs here
            cost_dollars = (int(row.total_bytes_processed) / 1024 ** 4) * 5
            print(f"JOB_ID : {row.job_id} | Creation_Time : {row.creation_time} |  Query: {row.query} | Total_Bytes_processed : {row.total_bytes_processed} | Estimated_Cost : ${cost_dollars}".format(row.job_id, row.creation_time, row.query, row.total_bytes_processed, cost_dollars))

Yet when I do I receive this error:

row.total_bytes_processed = 0
AttributeError: 'Row' object has no attribute 'total_bytes_processed'

How do I fix this error? Can I not default a None(Nonetype) to 0?

I have verified that all the rows have total_bytes_processed.

here is my source code:

from google.cloud import bigquery
from google.oauth2 import service_account



sql = """
SELECT
job_id,
creation_time,
user_email,
query,
total_bytes_processed
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE project_id ='nj-dev-blah'
AND creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 183 DAY)
AND CURRENT_TIMESTAMP()
ORDER BY creation_time DESC
LIMIT 100
"""


query_job = client.query(sql)# Make an API request.
results = query_job.result()
rows = list(results)
print("The query data:")
# print(rows)
for row in rows:
    if row.total_bytes_processed is  not None:
        cost_dollars = (row.total_bytes_processed/1024 **4) *5
        print( f"JOB_ID : {row.job_id} | Creation_Time : {row.creation_time} |  Query: {row.query} | Total_Bytes_processed : {row.total_bytes_processed} | Estimated_Cost : ${cost_dollars}".format(
        row.job_id, row.creation_time, row.query, row.total_bytes_processed,cost_dollars))
    else:
        row.total_bytes_processed = 0
        cost_dollars = (int(row.total_bytes_processed) / 1024 ** 4) * 5
        print(f"JOB_ID : {row.job_id} | Creation_Time : {row.creation_time} |  Query: {row.query} | Total_Bytes_processed : {row.total_bytes_processed} | Estimated_Cost : ${cost_dollars}".format(row.job_id, row.creation_time, row.query, row.total_bytes_processed, cost_dollars))

1 Answer 1

1

Having an attribute that equals to "None" is different from not having such attribute at all, if you use:

if row.total_bytes_processed is not None:

Python will try to access this object's attribute called "total_bytes_processed" and then comparing it to None. You're getting this error because in this case, the attribute "total_bytes_processed" does not exists for that object.

You could use the method "hasattr", provide the object and the name of the attribute you're looking for, as parameters, and the method will return True if the parameter exists and False otherwise:

if hasattr(row, "total_bytes_processed"):

Keep in mind that "hasattr" will still return True even if the attribute exists and equals to "None", so you could put it as and outer validation and then, after you know that the attribute exists, verify if it's equal to "None" and act accordingly. It would be something like:

for row in rows:
  if hasattr(row, "total_bytes_processed"):
    if row.total_bytes_processed is not None:
        cost_dollars = (row.total_bytes_processed/1024 **4) *5
        print( f"JOB_ID : {row.job_id} | Creation_Time : {row.creation_time} |  Query: {row.query} | Total_Bytes_processed : {row.total_bytes_processed} | Estimated_Cost : ${cost_dollars}".format(
        row.job_id, row.creation_time, row.query, row.total_bytes_processed,cost_dollars))
    else:
        row.total_bytes_processed = 0
        cost_dollars = (int(row.total_bytes_processed) / 1024 ** 4) * 5
        print(f"JOB_ID : {row.job_id} | Creation_Time : {row.creation_time} |  Query: {row.query} | Total_Bytes_processed : {row.total_bytes_processed} | Estimated_Cost : ${cost_dollars}".format(row.job_id, row.creation_time, row.query, row.total_bytes_processed, cost_dollars))
  else:
    #code for when total_bytes_processed does not exists
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.