CSV if row contains string append row

Question

I am trying to append a row in sitemap_bp.csv in the adjacent column, if a line contains a string from mobilesitemap-browse.csv. I'm not able to iterate through the lines in mobilesitemap-browse.csv, it gets stuck on the first line. How do I go about solving this?

import csv

with open('sitemap_bp.csv','r') as csvinput:
    with open('mobilesitemap-browse.csv','r') as csvinput2:
        with open('output.csv', 'w') as csvoutput:
            writer = csv.writer(csvoutput, lineterminator='\n')
            sitemap = csv.reader(csvinput)
            mobilesitemap = csv.reader(csvinput2)

            all = []
            row = next(sitemap)
            row.append('mobile')
            all.append(row)

            for mobilerow in mobilesitemap:
                for row in sitemap:
                    #print row[0]
                    if mobilerow[1] in row[0]:
                        #print row, mobilerow[1]
                        all.append((row[0], mobilerow[1]))
                    else:
                        all.append(row)

            writer.writerows(all)

This is an aside, but don't use the nested with expressions. You can chain them with commas, e.g. with open('file1.txt') as file1, open('file2.txt') as file2, ... — Adam Smith
– Adam Smith, Commented Mar 11, 2015 at 23:54
Thank you for the data. Could you show us the output you're ACTUALLY getting? I think the expected output is clear enough — Adam Smith
– Adam Smith, Commented Mar 11, 2015 at 23:57
Updated snippet of sitemap_bp.csv, Am currently using \d{4,}_\d{4,}_\d{4,}_\d{4,}_\d{4,}|\d{4,}_\d{4,}_\d{4,}_\d{4,}|\d{4,}_\d{4,}_\d{4,}|\d{4,} to capture new types. — E liquid Vape
– E liquid Vape, Commented Mar 12, 2015 at 18:38

Adam Smith · Accepted Answer · 2015-03-12 18:58:41Z

1

Personally I'd parse the data from sitemap_bp.csv first, then use that dictionary to populate the new file.

import re

with open('sitemap_bp.csv','r') as csvinput, \
        open('mobilesitemap-browse.csv','r') as csvinput2, \
        open('output.csv', 'w') as csvoutput:
    writer = csv.writer(csvoutput, lineterminator='\n')
    sitemap = csvinput # no reason to pipe this through csv.reader
    mobilesitemap = csv.reader(csvinput2)

    item_number = re.compile(r"\d{5}_\d{7}_{7}")

    item_number_mapping = {item_number.search(line).group(): line.strip()
                           for line in sitemap if item_number.search(line)}
    # makes a dictionary {item_number: full_url, ...} for each item in sitemap
    # alternate to the above, consider:
    # # item_number_mapping = {}
    # # for line in sitemap:
    # #     line = line.strip()
    # #     match = item_number.search(line)
    # #     if match:
    # #         item_number_mapping[match.group()] = match.string

    all = [row + [item_number_mapping[row[1]] for row in mobilesitemap]

    writer.writerows(all)

My guess is that after the first time through your outer for loop, it tries to iterate through sitemap again but can't since the file is already exhausted. The minimal change for that would be:

        for mobilerow in mobilesitemap:
            csvinput.seek(0) # seek to the start of the file object
            next(sitemap) # skip the header row
            for row in sitemap:
                #print row[0]
                if mobilerow[1] in row[0]:
                    #print row, mobilerow[1]
                    all.append((row[0], mobilerow[1]))
                else:
                    all.append(row)

But the obvious reason not to do this is that it iterates through your sitemap_bp.csv file once per row in mobilesitemap-browse.csv, rather than just once like my code.

EDIT per question in comments

If you need to get a list of those URLs in sitemap_bp.csv that don't correspond with mobilesitemap-browse.csv, you're probably best-served by making a set for all the items you see as you see them, then using set operations to get the unseen items. This takes a little tinkering, but...

# instead of all = [row + [item number ...

seen = set()
all = []

for row in mobilesitemap:
    item_no = row[1]
    if item_no in item_number_mapping:
        all.append(row + [item_number_mapping[item_no]])
        seen.add(item_no)
# after this for loop, `all` is identical to the list comp version
unmatched_items = [item_number_mapping[item_num] for item_num in
                   set(item_number_mapping.keys()) - seen]

edited Mar 12, 2015 at 18:58

answered Mar 12, 2015 at 2:32

Adam Smith

54.6k13 gold badges85 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

E liquid Vape Over a year ago

TY, one thing i need in 'all = []' is the URLs(from sitemap) that do not have a corresponding match, what is the best way to do this? Do I need to iterate through sitemap?

Adam Smith Over a year ago

@EliquidVape you mean all the URLs in sitemap_bp.csv or the URLs in mobilesitemap-browse.csv?

E liquid Vape Over a year ago

All the URLs in sitemap_bp.csv

Collectives™ on Stack Overflow

CSV if row contains string append row

1 Answer 1

EDIT per question in comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

EDIT per question in comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related