Skip to main content
added 22 characters in body
Source Link
Reinderien
  • 71.2k
  • 5
  • 76
  • 257

Minor perf improvements

These are unlikely to impact your performance in a material way, but they are performance improvements nonetheless:

re.search(r'[,!?{}\[\]\"\"\'\']',word_tokens[j])

recompiles the regex every time. re.compile() outside of your loops so that this does not happen.

Repeated concatenation such as this:

wordtocompare = wordtocompare+" "+word_tokens[j].lower()

can be a problem; strings in Python are immutable, so this is recreating a new string instance every time the concatenation is done. To avoid this, consider using StringIO or join a generator.

Other improvements

if not wordtocompare=="":

should be

if word_to_compare != "":

Also, wordtocompare.strip() is not being assigned to anything so it does not have any effect, currently.

Minor perf improvements

These are unlikely to impact your performance in a material way, but they are performance improvements nonetheless:

re.search(r'[,!?{}\[\]\"\"\'\']',word_tokens[j])

recompiles the regex every time. re.compile() outside of your loops so that this does not happen.

Repeated concatenation such as this:

wordtocompare = wordtocompare+" "+word_tokens[j].lower()

can be a problem; strings in Python are immutable, so this is recreating a new string instance every time the concatenation is done. To avoid this, consider using StringIO.

Other improvements

if not wordtocompare=="":

should be

if word_to_compare != "":

Also, wordtocompare.strip() is not being assigned to anything so it does not have any effect, currently.

Minor perf improvements

These are unlikely to impact your performance in a material way, but they are performance improvements nonetheless:

re.search(r'[,!?{}\[\]\"\"\'\']',word_tokens[j])

recompiles the regex every time. re.compile() outside of your loops so that this does not happen.

Repeated concatenation such as this:

wordtocompare = wordtocompare+" "+word_tokens[j].lower()

can be a problem; strings in Python are immutable, so this is recreating a new string instance every time the concatenation is done. To avoid this, consider using StringIO or join a generator.

Other improvements

if not wordtocompare=="":

should be

if word_to_compare != "":

Also, wordtocompare.strip() is not being assigned to anything so it does not have any effect, currently.

Source Link
Reinderien
  • 71.2k
  • 5
  • 76
  • 257

Minor perf improvements

These are unlikely to impact your performance in a material way, but they are performance improvements nonetheless:

re.search(r'[,!?{}\[\]\"\"\'\']',word_tokens[j])

recompiles the regex every time. re.compile() outside of your loops so that this does not happen.

Repeated concatenation such as this:

wordtocompare = wordtocompare+" "+word_tokens[j].lower()

can be a problem; strings in Python are immutable, so this is recreating a new string instance every time the concatenation is done. To avoid this, consider using StringIO.

Other improvements

if not wordtocompare=="":

should be

if word_to_compare != "":

Also, wordtocompare.strip() is not being assigned to anything so it does not have any effect, currently.