0

I am new to Pandas DataFrame and was curious why my basic thinking of adding new values to a new line doesn't work here.

I also tried using different ways with .loc[], .append(), but obciously used them in an incorrect way (still plenty to learn).

Instructions Add a column to data named length, defined as the length of each word. Add another column named frequency, which is defined as follows for each word in data:

  • If count > 10, frequency is "frequent".
  • If 1 < count <= 10, frequency is "infrequent".
  • If count == 1, frequency is "unique".

    My if sentenses record for all DataFrame only by last value of dictionary like object (Counter from pandas/numpy?). Word and count values are all returned within for cycle, so I don't understand why DataFrame cannot append values each cycle

    data['length'] = ''
    data['frequency'] = ''
    
    for word, count in counted_text.items():
        if count > 10:
            data.length = len(word)
            data.frequency = 'frequent'
        if 1 < count <=10:
            data.length = len(word)
            data.frequency = 'infrequent'
        if count == 1:
            data.length = len(word)
            data.frequency = 'unique'
    print(word, len(word), '\n')
    
    """
    This is working code that I googled
    -----------------------------------
    data = pd.DataFrame({
        "word": list(counted_text.keys()),
        "count": list(counted_text.values())
    })
    
    data["length"] = data["word"].apply(len)
    
    data.loc[data["count"] > 10,  "frequency"] = "frequent"
    data.loc[data["count"] <= 10, "frequency"] = "infrequent"
    data.loc[data["count"] == 1,  "frequency"] = "unique"
    
    """
    
    print(data.head(), '\n')
    print(data.tail())
    

Output:

finis 5 

       word  count  length frequency
1       the    935       5    unique
2  tragedie      3       5    unique
3        of    576       5    unique
4    hamlet     97       5    unique
5            45513       5    unique 

              word count  length frequency
5109  shooteexeunt     1       5    unique
5110      marching     1       5    unique
5111         peale     1       5    unique
5112           ord     1       5    unique
5113         finis     1       5    unique
1
  • The working code that you Googled is the right way to update rows in pandas based on specific conditions. You should never attempt to update rows in a data-frame by using loops. If you scrutinize the loop you've written when you say data.length = len(word), what row_number are you talking about? Commented Apr 21, 2020 at 18:03

2 Answers 2

1

Assuming you have only word and count in the data dataframe and that count will not have a value of 0, you could try the following -

import numpy as np
data['length'] = data['word'].str.len()
data['frequency'] = np.where(data['count'] > 10, 'frequent',\
                             np.where((data['count'] > 1) & (data['count'] <= 10),\
                             'infrequent', 'unique')) 
Sign up to request clarification or add additional context in comments.

2 Comments

Works. But I was looking also for an aswer why // if count > 10: data.length = len(word) /// took last value of for-loop?
I think in each iteration of the for-loop, you are re-assigning the value to data.length and data.frequency. So, the last value obtained at the end of the for-loop is being assigned.
0

After @Sajan gave a valid code, I came to a conclusion, that DataFrame doesn't need for-loop at all.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.