1

I've got a list comprehension that isn't sorting once I add the 'not in stop' method. Basically, the sorting function I had before is lost now when I include stopwords for this NLTK. Can anyone point out what I did wrong?

I have now included everything in the code for better reference.

EDITED:

from nltk import word_tokenize
from nltk.corpus import stopwords
import string

stop = stopwords.words('english') + list(string.punctuation)
f = open('review_text_all.txt', encoding="utf-8")
raw = f.read().lower().replace("'", "").replace("\\", "").replace(",", 
"").replace("\ufeff", "")

tokens = nltk.word_tokenize(raw)

bgs = nltk.bigrams(tokens)

fdist = nltk.FreqDist(bgs)
for (k,v) in sorted(fdist.items(), key=lambda x: (x[1] not in stop), 
reverse=True):
    print(k,v)

Here is my result w/'not in stop'

('or', 'irish') 3
('put', 'one') 1
('was', 'repealed') 1
('please', '?') 6
('contact', 'your') 2
('wear', 'sweats') 1

without 'not in stop'

('white', 'people') 4362
('.', 'i') 3734
('in', 'the') 2880
('of', 'the') 2634
('to', 'be') 2217
('all', 'white') 1778

as you can see the sorted works, but only once I remove the 'not in stop'

4
  • 2
    what is fdist and what is your desired sorted output? Include minimal examples Commented Sep 26, 2017 at 14:48
  • 1
    Please post your input and desired output. Commented Sep 26, 2017 at 14:49
  • do you want to sort or to filter the list ? Because sorting on a boolean criteria will almost certainly not produce what you expect. Commented Sep 26, 2017 at 14:51
  • Perhaps you need to first apply the filter function, and then sort. As already written, your function for sorting is incorrect Commented Sep 26, 2017 at 14:57

1 Answer 1

4

The key parameter of the sorted method is a function that will let you tell python on which key (attribute/value related to the item of the list) to sort.

In your case, your function will return True or False.... which are not really good values to make a sort :)

EDIT:

from what I understand of what you want to achieve, you need to add before (or after) the sort a filter method that will remove from your list the items which are in your "stop words" list.

Something like this :

for (k,v) in sorted(filter(lambda x: (x[1] not in stop), fdist.items()), key=lambda x: x[1], reverse=True):
    print(k,v)
Sign up to request clarification or add additional context in comments.

3 Comments

It worked, but not exactly the way I needed it to. It sorted by the keys, but I actually need the values to be sorted from highest to lowest.
@M4cJunk13 I updated my answer with the (I think) correct comparison method (bvased on the apparition frequency of the words)
Perfect, it worked!!! Thank you so much. I'm still trying to get a better understanding at using lambdas.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.