1

I just started learning python. I was trying to clean a sentence by breaking into words and joining back to a sentence. the document big.txt has some words like youth, caretaker etc. The problem is in the final procedure : looper , This produces an output by each line.

Correct is an another procedure defined before this code that corrects each word

here is the code :

zebra = 'Yout caretak taking care of something'

count = len(re.findall(r'\w+', zebra))

def looper(a,count):
words = nltk.word_tokenize(zebra)
for i in range(len(words)):
    X = correct(words[i])
    print (X)

final = looper(zebra)

The output it produces:

youth
caretaker
walking
car
in
something

How should I take all the individual outputs and make a sentence:

Expected Result:

youth caretaker walking car in something

Please let me know if you need additional details.

Thanks in advance

0

3 Answers 3

1

use list comprehension:

print " ".join([ correct(words[i]) for i in range(len(words)) ])

it should be like this:

zebra = 'Yout caretak taking care of something'

count = len(re.findall(r'\w+', zebra))
words = nltk.word_tokenize(zebra)
def looper(a,count):
    print " ".join([ correct(words[i]) for i in range(len(words)) ])

the words should be out of the function, you don't need to get words every time while looping.

you can use this too:

print " ".join([ correct(i) for i in words ])

here is the correct way to do it:

zebra = 'Yout caretak taking care of something'
words = nltk.word_tokenize(zebra)
print " ".join([ correct(i) for i in words ])

you dont need a function here , as words is list of words, you can iterate and join.

in your code:

zebra = 'Yout caretak taking care of something'
words = nltk.word_tokenize(zebra)
for x in words:
    print correct(x),

demo:

>>> zebra = 'Yout caretak taking care of something'
>>> words = nltk.word_tokenize(zebra)
>>> words
['Yout', 'caretak', 'taking', 'care', 'of', 'something']

As you can see nltk.word_tokenize give you list of words, so you can iterate through them easily,

Sign up to request clarification or add additional context in comments.

2 Comments

provide a reason for downvoting
desired output is youth caretaker walking car in something not only tokenized words
1
>>> import nltk
>>> zebra = 'Yout caretak taking care of something'
>>> for word in nltk.word_tokenize(zebra):
...     print word
... 
Yout
caretak
taking
care
of
something

Then $ sudo pip install pyenchant (see https://pythonhosted.org/pyenchant/api/enchant.html) and:

>>> import nltk
>>> import enchant
>>> zebra = 'Yout caretak taking care of something'
>>> dictionary = enchant.Dict('en_US')
>>> for word in nltk.word_tokenize(zebra):
...     dictionary.suggest(word)
... 
['Out', 'Yost', 'Rout', 'Tout', 'Lout', 'Gout', 'Pout', 'Bout', 'Y out', 'Your', 'You', 'Youth', 'Yous', 'You t']
['caretaker', 'caret', 'Clareta', 'cabaret', 'curettage', 'critical']
['raking', 'takings', 'tasking', 'staking', 'tanking', 'talking', 'tacking', 'taring', 'toking', 'laking', 'caking', 'taming', 'making', 'taping', 'baking']
['CARE', 'acre', 'acer', 'race', 'Care', 'car', 'are', 'cares', 'scare', 'carer', 'caret', 'carte', 'cared', 'cadre', 'carve']
['if', 'pf', 'o', 'f', 'oaf', 'oft', 'off', 'sf', 'on', 'or', 'cf', 'om', 'op', 'oh', 'hf']
['somethings', 'some thing', 'some-thing', 'something', 'locksmithing', 'smoothness']

Then try:

>>> for word in nltk.word_tokenize(zebra):
...     print [i for i in dictionary.suggest(word) if word in i]
... 
['Youth']
['caretaker']
['takings', 'staking']
['cares', 'scare', 'carer', 'caret', 'cared']
['oft', 'off']
['somethings', 'something']

So:

>>> " ".join([[word if dictionary.check(word) else i for i in dictionary.suggest(word) if word in i][0] for word in nltk.word_tokenize(zebra)])
'Youth caretaker taking care of something'

Comments

0
zebra = 'Yout caretak taking care of something'

count = len(re.findall(r'\w+', zebra))

def looper(a,count):
words = nltk.word_tokenize(zebra)
for i in range(len(words)):
    X = correct(words[i])
    print X,    
final = looper(zebra)

just add , after X --->print X,

1 Comment

Thanks Alex, but this method did not work, it doesnot give me any output

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.