1

I'm having some issues with the index from a Pandas data frame. What I'm trying to do is load data from a JSON file, create a Pandas data frame and then select specific fields from that data frame and send it to my database.

The following is a link to what's in the JSON file so you can see the fields actually exist: https://pastebin.com/Bzatkg4L

import pandas as pd
from pandas.io import sql
import MySQLdb
from sqlalchemy import create_engine

# Open and read the text file where all the Tweets are
with open('US_tweets.json') as f:
    tweets = f.readlines()

# Convert the list of Tweets into a structured dataframe
df = pd.DataFrame(tweets)
# Attributes needed should be here
df = df[['created_at', 'screen_name', 'id', 'country_code', 'full_name', 'lang', 'text']]

# To create connection and write table into MySQL
engine = create_engine("mysql+pymysql://{user}:{pw}@localhost/{db}"
                       .format(user="blah",
                               pw="blah",
                               db="blah"))

df.to_sql(con=engine, name='US_tweets_Table', if_exists='replace', flavor='mysql')

Thanks for your help!

3
  • Is your original dataframe being constructed correctly? Specifically, what columns are present in that dataframe? Commented Nov 16, 2017 at 18:35
  • @Evan I think you might be right, how would I create columns for the dataframe? Correct me if I'm wrong but it seems as if you're saying I should create columns in the dataframe that associate with the attributes in the JSON file. And once those columns are made, I can add the attributes into the columns? Commented Nov 16, 2017 at 19:34
  • The error occurs because the columns you are trying to reference are not in the index: that is, they are not present in the first df you create. They are present within objects in the JSON file, but pandas does not create a column for every object in the JSON, just for the highest level. Commented Nov 16, 2017 at 20:37

1 Answer 1

1

Pandas doesn't map every object in the JSON file to a column in the dataframe. Your example file contains 24 columns:

with open('tweets.json') as f:
    df = pd.read_json(f, lines = True)
df.columns

Returns:

Index(['contributors', 'coordinates', 'created_at', 'entities',
   'favorite_count', 'favorited', 'geo', 'id', 'id_str',
   'in_reply_to_screen_name', 'in_reply_to_status_id',
   'in_reply_to_status_id_str', 'in_reply_to_user_id',
   'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'metadata',
   'place', 'retweet_count', 'retweeted', 'source', 'text', 'truncated',
   'user'],
  dtype='object')

To dig deeper into the JSON data, I found this solution, but I hope a more elegant approach exists: How do I access embedded json objects in a Pandas DataFrame?

For example, df['entities'].apply(pd.Series)['urls'].apply(pd.Series)[0].apply(pd.Series)['indices'][0][0] returns 117.

To access full_name and copy it to the df, try this: df['full_name'] = df['place'].apply(pd.Series)['full_name'], which returns 0 Austin, TX.

Sign up to request clarification or add additional context in comments.

4 Comments

Hey Evan, you've given a really good solution however when I try to access other attributes such as 'text' and 'id' in the same manner, I get an error. And why did you apply df['place'] to the 'full_name'? I tried it without 'place' and it gave the same error I had with accessing the other attributes.
UPDATE Ok the following attributes can easily be accessed via print(df['attribute_here']): text, created_at, id and lang. It's only the screen_name and country_code that's empty.
UPDATE 2 Ok so I figured out how to print screen_name, i didn't understand why you added 'place' for 'full_name' to work until I looked into the JSON file. The attribute 'user' contained 'screen_name' and that's why it worked. Great, I'll do my best to import the database now. Thanks Evan!
Glad you figured it out - I am not very familiar with JSON, so this was a nice learning opportunity for me, as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.