I have this line of code:
bitext = [[sentence.strip().split()
for sentence in pair if len(sentence) < 100]
for pair in zip(open(c_data), open(e_data))[:opts.num_sents]]
c_data is a file with Chinese sentences
e_data is a file with English sentences.
bitext should be a list that contains pairs of English and Chinese sentences, which are translations of one another.
Since both data files are huge,
I want to reduce the complexity of my code by only taking into consideration sentences that are under a certain length. The length is measured in characters.
As an example,
I've specified length here as 100. :opts.num_sents is a variable that states how many sentences from the data files should be taken into consideration.
The problem/bug
If a Chinese sentence would be, say, 95 characters, and an English sentence 105 characters, bitext would be updated with the Chinese sentence only.
But I want the code only to add a pair of sentences if both of them are under the stated length.
How do I do this?
len(sentence) < 100but the other doesn't.