How to produce a correct bar plot using Pandas and Matplotlib.pyplot, from a list of dictionaries

Question

My problem is that I'm trying to create a bar plot, but it is not outputting correctly.

I have a list of dictionaries.

Each dictionary contains all of the data and attributes associated with thousands of tweets from Twitter. Each dictionary contains attributes as key:value combinations including the tweet content, the screen name of the person tweeting, the language of the tweet, the country of origin of the tweet, and more.

To create my bar plot for the language attribute, I have a list comprehension that attempts to read in the list as a Pandas dataframe and output the data as a bar plot with 5 frequency bars for each of the top 5 most used languages in my list of tweets.

Here is my code for the language bar plot (note that my list of dictionaries containing each tweet is called tweets_data):

tweets_df = pd.DataFrame()

tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)

tweets_by_lang = tweets_df['lang'].value_counts()

fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')

As I said, I should be getting 5 bars, one for each of the top five languages in my data. Instead, I am getting the graph show below.

The problem is here: tweets_df['lang'] = map( ... ). What does tweets_data look like? What kind of object is it? If it's a dataframe, why are you mapping it instead of just using tweets_data['lang'].value_counts()? — ASGM
– ASGM, Commented Oct 19, 2017 at 15:03
tweets_data is a list, and each item in the list is a dictionary. Each dictionary contains all of the data for a single tweet. And when I try your suggestion of tweets_data['lang'].value_counts() -- I get the error "TypeError: list indices must be integers or slices, not str." — TJE
– TJE, Commented Oct 19, 2017 at 15:17
What does the output of print tweets_df['lang'] look like? — ASGM
– ASGM, Commented Oct 19, 2017 at 15:19
0 en 1 en 2 en 3 en 4 pt 5 sp 6 en 7 und 8 en 9 en 10 en ... 530 en 531 sp 532 en 533 en 534 it 535 en 536 pt 537 en — TJE
– TJE, Commented Oct 19, 2017 at 15:22
Hmm, it's not immediately clear to me why this isn't working. I've made some sample data and tried it myself and it seems fine. What happens if you replace tweets_data with this test list: tweets_data = [{'lang': 'en'}, {'lang': 'pl'}, {'lang': 'en'}]. — ASGM
– ASGM, Commented Oct 19, 2017 at 15:30

ASGM · Accepted Answer · 2017-10-19 15:44:18Z

2

Your problem is here:

tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)

The issue, as your comment suggests, is down to changes from Python 2 to 3. In Python 2, map() returns a list. But in Python 3, map() returns an iterator. The hint is that there's only one value of tweets_df['lang'].value_counts() and it's the <map ... > iterator object).

In either Python 2 or 3, you can use a list comprehension instead:

tweet_by_lang = pd.Series([tweet['lang'] for tweet in tweets_data]).value_counts()

Or in Python 3, you can follow @Triptych's advice from the answer linked above and wrap map() in a list():

tweets_df['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data))

edited Oct 19, 2017 at 15:44

answered Oct 19, 2017 at 15:22

ASGM

11.5k1 gold badge37 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to produce a correct bar plot using Pandas and Matplotlib.pyplot, from a list of dictionaries

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related