Finding a small list of strings in a large list of strings (Python)

Question

Hi I'm new to Python, so this may come across as a simple problem but I've been searching through Google many times and I can't seem to find a way to overcome it. Basically I have a list of strings, taken from a CSV file. And I have another list of strings in a text file. My job is to see if the words from my text file are in the CSV file.

Let's say this is what the CSV file looks like (it's made up):

  name,author,genre,year
  Private Series,Kate Brian,Romance,2003
  Mockingbird,George Orwell,Romance,1956
  Goosebumps,Mary Door,Horror,1990
  Geisha,Mary Door,Romance,2003

And let's say the text file looks like this: Romance 2003

What I'm trying to do is, create a function which returns the names of a book which have the words "Romance" and "2003" in them. So in this case, it should return "Private Series" and "Geisha" but not "Mockingbird". But my problem is, it doesn't seem to return them. However when I change my input to "Romance" it returns all three books with Romance in them. I assume it's because "Romance 2003" aren't together because if I change my input to "Mary Door" both "Goosebumps" and "Geisha" show up. So how can I overcome this?

Also, how do I make my function case insensitive?

Any help would be much appreciated :)

shang · Accepted Answer · 2011-05-24 06:41:31Z

3

import csv

def read_input(filename):
    f = open(filename)
    return csv.DictReader(f, delimiter = ',')

def search_filter(src, term):
    term = term.lower()
    for s in src:
        if term in map(str.lower, s.values()):
            yield s

def query(src, terms):
    terms = terms.split()
    for t in terms:
        src = search_filter(src, t)
    return src

def print_query(q):    
    for row in q:
        print row

I tried to split the logic into small, re-usable functions.

First, we have read_input which takes a filename and returns the lines of a CSV file as an iterable of dicts.

The search_filter filters a stream of results with the given term. Both the search term and the row values are changed to lowercase for the comparison to achieve case-independent matching.

The query function takes a query string, splits it into search terms and then makes a chain of filters based on the terms and returns the final, filtered iterable.

>>> src = read_input("input.csv")
>>> q = query(src, "Romance 2003")
>>> print_query(q)
{'genre': 'Romance', 'year': '2003', 'name': 'Private Series', 'author': 'Kate Brian'}
{'genre': 'Romance', 'year': '2003', 'name': 'Geisha', 'author': 'Mary Door'}

Note that the above solution only returns full matches. If you want to e.g. return the above matcher with the search query "Roman 2003", then you can use this alternative version of search_filter:

def search_filter(src, term):
    term = term.lower()
    for s in src:
        if any(term in v.lower() for v in s.values()):
            yield s

edited May 24, 2011 at 6:41

answered May 24, 2011 at 6:05

shang

24.9k3 gold badges61 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Elp Eeded Over a year ago

Hi thanks so much for the help, but I have a few problems. One is that I need to be able to get the query "Romance 2003" from a text file stored away (input.txt) & also when I tried running your solution, this came up: line 10, in <genexpr> if any(term in v.lower() for v in s.values()): AttributeError: 'NoneType' object has no attribute 'lower'

Elp Eeded Over a year ago

Actually never mind, I've gotten my original code to work from looking at yours. So thank you so so much, couldn't have done it without your help, really appreciate it :)

Collectives™ on Stack Overflow

Finding a small list of strings in a large list of strings (Python)

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related