Comparing a list of strings to a list of strings (python)

Question

I'm trying to compare two excel documents to each other, they are made up of around 6000 rows and 4 columns, the first column is a domain name, the other three are comments, one of the documents has updated comments in some of the columns and eventually I would like this script to function like a batch update of new comments replacing the old outdated ones.

The code I have written so far opens the documents and adds them to two separate lists:

import csv

newlist = csv.reader(open('newcomments.csv','rU'), dialect='excel')
export = csv.reader(open('oldcomments.csv', 'rU'), dialect='excel')

for row in newlist:
    olddomain=[]
    domain = row[0:]
    olddomain.append(domain)
    for item in olddomain:
        print item

    for row in export:
        newdomain=[]
        domain= row[0:]
        newdomain.append(domain)
        for item in newdomain:
            print item

the output from the lists looks like(the second column is normally blank):

['example.com', '', 'excomment', 'Parked Page']

When trying to compare the lists i have tried something like:

if item in olddomain != item in newdomain:
                    print "no match"
                else:
                    print "match"

but that doesn't appear to work,for example, the first row in the two files contain the exact same data, but the code returns "no match", the second row in both files also contains the same data, but the code returns "match"

Is the problem with the way I am saving the rows to the list, or is there something else I'm missing? I'm going to assume there is a better way of doing this but I'm using it as an excuse to learn more python!

Thanks for your time.

@joaquin I'm sorry, for example, the first row in the two files contain the exact same data, but the code returns "no match", the second row in both files also contains the same data, but the code returns "match" — Christopher Long
– Christopher Long, Commented Jan 6, 2012 at 12:00
thanks for the clarification (the downvote is not mine. In fact dislike people downvoting without giving any explanation. the OP learns nothing and everybody loses reputation. what a waste!) — joaquin
– joaquin, Commented Jan 6, 2012 at 12:07

Mike Pennington · Accepted Answer · 2012-01-06 14:30:41Z

8

It seems like you are trying to compare an old list of domain names to a new list of domain names. After those lists have been built, you want to see whether there is commonality between the lists.

In this case, I think a set() offers much richer functionality that makes your life easier. Example:

>>> olddomains = set(['www.cisco.com', 'www.juniper.com', 'www.hp.com'])
>>> newdomains = set(['www.microsoft.com', 'www.cisco.com', 'www.apple.com'])
>>> olddomains.intersection(newdomains)
set(['www.cisco.com'])
>>>
>>> 'www.google.com' in newdomains
False
>>>

Rewriting part of your code to use a set would look like this:

# retain newlist, since that's the output from csv...
for row in newlist:
    olddomain = set([])
    domain = row[0]
    olddomain.add(domain.lower())   # use lower() to ensure no CAPS mess things up
    for item in olddomain:
        print item

And the code you asked about:

if olddomain.intersection(newdomain) == set([]):
                    print "no match"
                else:
                    print "match"

The general rule I use when determining whether I use a set() or a list():

If retaining the ordering of the elements matters (to include being able to access them with an index), use a list()
In any other case, use a set()

EDIT

Since you're asking why the code I posted throws a TypeError, if you are assigning row the same way I am, then you need to use row[0] instead of row[0:]

>>> row = ['example.com', '', 'excomment', 'Parked Page']
>>> row[0:]
['example.com', '', 'excomment', 'Parked Page']
>>> row[0]
'example.com'
>>>

I changed my example to reflect this, since I suspect that is where the issue lies.

edited Jan 6, 2012 at 14:30

answered Jan 6, 2012 at 12:09

Mike Pennington

43.2k22 gold badges140 silver badges191 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Christopher Long Over a year ago

I'm trying this code but it's returning the error: Traceback (most recent call last): File "C:\chris\new project\export\export.py", line 9, in <module> olddomain.add(domain) TypeError: unhashable type: 'list'

Mike Pennington Over a year ago

@ChristopherLong, I get a TypeError, when I do this: olddomain.add(['www.google.com']). You need to make sure that the argument to set.add() is not a python list

Christopher Long Over a year ago

The code is "domain = row[0:] olddomain.add(domain)" "domain" is just a row in the csv.

phihag · Accepted Answer · 2012-01-06 11:57:43Z

3

You are most likely just missing parantheses. Note that the following two lines are equal, because the operator precedences of in and != are equal:

if   item in olddomain  != item in newdomain:
if ((item in olddomain) != item) in newdomain:

You probably want:

if (item in olddomain) != (item in newdomain):

answered Jan 6, 2012 at 11:57

phihag

289k75 gold badges475 silver badges489 bronze badges

Comments

user3256363 · Accepted Answer · 2015-06-01 07:07:44Z

0

Try making it a set and do and operation.

Example:

In [1]: a = ['a' , 'b', 'c']

In [2]: b = ['b' , 'a', 'c']

In [3]: set(a) & set(b)

Out[3]: {'a', 'b', 'c'}

In [4]: set(b) == set(a) & set(b)

Out[4]: True

answered Jun 1, 2015 at 7:07

user3256363

1539 bronze badges

Collectives™ on Stack Overflow

Comparing a list of strings to a list of strings (python)

3 Answers 3

EDIT

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

EDIT

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related