1

My goal is to write a function that iterates through a CSV file and achieves the following:

  • Takes a 'keyword' which is a list of strings and returns all associated codes that have the 'keyword' in the description.

Sample of the CSV file formatting:

"2399","1","theft-bicycle","Bicycle theft"

For example:

  1. find_that_code(['bicycle'])
  2. find_that_code(['bicycle' , 'scooter'])

Output:

  1. [2399]
  2. [ ]

I am having difficulty figuring out how to frame the IF statement that matches the strings within the keyword list to within row[3].

For the examples, you'll notice that the second example outputs nothing though the string bicycle is present, the string scooter is not.

What I've tried:

def find_that_code(keywords):
codelist = []
keywords = str(keywords)


with open('codes.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    next(reader)  # skip the first row
    for row in reader:
        if row[3] == any([x in keywords for x in keywords]):
            code = row[0]
            return True
        else:
            return False

Currently I have the code returning True or False to figure out where my issue is. But once I get how to match the string from the keywords list to row[3] of the CSV then I should be able to finish off the rest of it.

Thank you for your time and I greatly appreciate the advice.

2
  • What are you hoping this does row[3] == any([x in keywords for x in keywords])? Commented Feb 1, 2019 at 1:04
  • row[3] == any([x in keywords for x in keywords]) My intentions for this code were that whenever the description (which appears in row[3]) matches a keyword, it would return true and go onto appending the associated code for that line. Commented Feb 2, 2019 at 20:05

1 Answer 1

1

If you're looking to find all rows for which any of the given keywords appear in the description column:

def find_that_code(keywords):
    codelist = []

    with open('codes.csv') as csv_file:
        reader = csv.reader(csv_file, delimiter=',')
        next(reader)  # skip the first row
        for row in reader:
            if any(k in row[3].lower() for k in keywords):
                codelist.append(row[0])

    return codelist

I tried to change as little from your original posting as possible. But note that this will perform quite slowly depending on what you're trying to do because if you call it many times, you'll be repeatedly reading the file from the very beginning and reprocessing it for every keyword set you wish to match on.

You might be better served dumping your data set into something like Solr or Lucene (or some kind of text-based search engine) if this is an operation you expect to need to do on this data set regularly.

Sign up to request clarification or add additional context in comments.

3 Comments

I think you meant any(k in row[3].lower() for k in keywords) using a guard in this case doesn't make much sense.
Ah yes definitely better. Updated.
Sorry, but I've only just now realized that my question may have been slightly misinterpreted. The function you've provided returns all the codes if the keyword is present anywhere in the description column. However, I'm not sure how to return the code only when the keyword is present in the same row.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.