I have a lookup table of Scientific Names for plants. I want to use this lookup table to validate other tables where I have a data entry person entering the data. Sometimes they get the formatting of these scientific names wrong, so I am writing a script to try to flag the errors.
There's a very specific way to format each name. For example 'Sonchus arvensis L.' specifically needs to have the S in Sonchus capitalized as well as the L at the end. I have about 1000 different plants and each one is formatted differently. Here's a few more examples:
- Linaria dalmatica (L.) Mill.
- Knautia arvensis (L.) Coult.
- Alliaria petiolata (M. Bieb.) Cavara & Grande
- Berteroa incana (L.) DC.
- Aegilops cylindrica Host
As you can see, all of these strings are formatted very differently (i.e some letters are capitalized, some aren't, there are brackets sometimes, ampersands, periods, etc)
My question is, is there any way to dynamically read the formatting of each string in the lookup table so that I can compare that to the value the data entry person entered to make sure it is formatted properly? In the script below, I test (first elif) to see if the value is in the lookup table by capitalizing all values in order to make the match work, regardless of formatting. In the next test (second elif) I can sort of test formatting by comparing against the lookup table value for value. This will return unmatched records based on formatting, but it doesn't specifically tell you why the unmatched record returned.
What I perceive to do is, read in the string values in the look up table and somehow dynamically read the formatting of each string, so that I can specifically identify the error (i.e. a letter should be capitalized, where it wasn't)
So far my code snippet looks like this:
# Determine if the field heaidng is in a list I built earlier
if "SCIENTIFIC_NAME" in fieldnames:
# First, Test to see if record is empty
if not row.SCIENTIFIC_NAME:
weedPLineErrors.append("SCIENTIFIC_NAME record is empty")
# Second, Test to see if value is in the lookup table, regardless of formatting.
elif row.SCIENTIFIC_NAME.upper() not in [x.upper() for x in weedScientificTableList]:
weedPLineErrors.append("COMMON_NAME (" + row.SCIENTIFIC_NAME + ")" + " is not in the domain table")
# Third, if the second test is satisfied, we know the value is in the lookup table. We can then test the lookup table again, without capitalizing everything to see if there is an exact match to account for formatting.
elif row.SCIENTIFIC_NAME not in weedScientificTableList:
weedPLineErrors.append("COMMON_NAME (" + row.SCIENTIFIC_NAME + ")" + " is not formatted properly")
else:
pass
I hope my question is clear enough. I looked at string templates, but I don't think it does what I want to do...at least not dynamically. If anyone can point me in a better direction, I am all eyes...but maybe I am way out to lunch on this one.
Thanks, Mike