0

I am working on an iOS Swift project that takes takes OCR data and then searches the text for key phrases. The OCR output looks like this:

INGREDIENTS WATER, BROWN SUGAR, RED RIPE

TOMATO CONCENTRATE, APPLE CIDERVINEGAR

W01CESTERSHlWSMJCE(WATERW4EGAR CORN

SYRUP, SALT, MOLASSE, SPICE, NATURAL FLAVOR

GARLIC POWDER, CARAMEL COLOR, ANCHOVIES

CFlSril,TAMARiN0), MOLASSES, LEMON JUICE,

ONION, HONEY, MODIFIED TAVIOCA STARCH,

When I search the string for "corn syrup", nothing is found. Searching for "corn" and "syrup" does produce positive results.

I have also tried

tesseract.recognizedText.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())

to no avail.

Any thoughts on how to format this text for searching that would allow "corn syrup" to be identified? The qualifier is that only the exact phrase is useful - after all there are corn, corn starch, maple syrup, etc. as potential ingredients.

Thanks.

OK here is the solution that worked

'textView.text = tesseract.recognizedText.stringByReplacingOccurrencesOfString("\n", withString: " ", options: NSStringCompareOptions.LiteralSearch, range: nil)'

I thought the initial code was accomplishing the same task.

2
  • Why don't you replace the line feeds with spaces? Then "corn syrup" will just work. Commented Sep 26, 2015 at 3:10
  • What does your question's title have to do with the question? Commented Sep 26, 2015 at 3:26

2 Answers 2

2

If you want to search for "corn syrup", you most likely need to replace all new lines with spaces (and then ideally check for double spaces and replace with single space).

The quality of the character recognition is not very good and I think the text would deserve more maintenance before being used for searching. You might, for example split the phrases into array of individual strings, then trim spaces etc. from beginning and the end, perhaps you could use UITextChecker to help identify misspelled terms and fix them...

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. This is the solution
0

That's because "corn syrup", which is the string you're looking for, is not the same as "corn\nsyrup", which is what your wall of text is showing.

You could instead try searching for "corn\nsyrup" or "corn \nsyrup" instead.

Notice in your picture how "corn\nsyrup" produces the same results that your wall of text is showing?

Also, your code to replace "\n" by " " might not be working because it could be "corn\n syrup", which will make it have 2 spaces in between.

Picture to Compare

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.