0

My problem is that I'm trying to create a bar plot, but it is not outputting correctly.

I have a list of dictionaries.

Each dictionary contains all of the data and attributes associated with thousands of tweets from Twitter. Each dictionary contains attributes as key:value combinations including the tweet content, the screen name of the person tweeting, the language of the tweet, the country of origin of the tweet, and more.

To create my bar plot for the language attribute, I have a list comprehension that attempts to read in the list as a Pandas dataframe and output the data as a bar plot with 5 frequency bars for each of the top 5 most used languages in my list of tweets.

Here is my code for the language bar plot (note that my list of dictionaries containing each tweet is called tweets_data):

tweets_df = pd.DataFrame()

tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)

tweets_by_lang = tweets_df['lang'].value_counts()

fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')

As I said, I should be getting 5 bars, one for each of the top five languages in my data. Instead, I am getting the graph show below.enter image description here

8
  • 2
    The problem is here: tweets_df['lang'] = map( ... ). What does tweets_data look like? What kind of object is it? If it's a dataframe, why are you mapping it instead of just using tweets_data['lang'].value_counts()? Commented Oct 19, 2017 at 15:03
  • tweets_data is a list, and each item in the list is a dictionary. Each dictionary contains all of the data for a single tweet. And when I try your suggestion of tweets_data['lang'].value_counts() -- I get the error "TypeError: list indices must be integers or slices, not str." Commented Oct 19, 2017 at 15:17
  • 1
    What does the output of print tweets_df['lang'] look like? Commented Oct 19, 2017 at 15:19
  • 0 en 1 en 2 en 3 en 4 pt 5 sp 6 en 7 und 8 en 9 en 10 en ... 530 en 531 sp 532 en 533 en 534 it 535 en 536 pt 537 en Commented Oct 19, 2017 at 15:22
  • 1
    Hmm, it's not immediately clear to me why this isn't working. I've made some sample data and tried it myself and it seems fine. What happens if you replace tweets_data with this test list: tweets_data = [{'lang': 'en'}, {'lang': 'pl'}, {'lang': 'en'}]. Commented Oct 19, 2017 at 15:30

1 Answer 1

2

Your problem is here:

tweets_df['lang'] = map(lambda tweet: tweet['lang'], tweets_data)

The issue, as your comment suggests, is down to changes from Python 2 to 3. In Python 2, map() returns a list. But in Python 3, map() returns an iterator. The hint is that there's only one value of tweets_df['lang'].value_counts() and it's the <map ... > iterator object).

In either Python 2 or 3, you can use a list comprehension instead:

tweet_by_lang = pd.Series([tweet['lang'] for tweet in tweets_data]).value_counts()

Or in Python 3, you can follow @Triptych's advice from the answer linked above and wrap map() in a list():

tweets_df['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.