In python, search strings using regular expression and replace it with another

Question

I have a db.sql file that includes lots of urls like as follows.

....<td class=\"column-1\"><a href=\"http://geni.us/4Lk5\" rel=nofollow\"><img src=\"http://www.toprateten.com/wp-content/uploads/2016/08/25460A-Panini-Press-Gourmet-Sandwich-Maker.jpg \" alt=\"25460A Panini Press Gourmet Sandwich Maker\" height=\"100\" width=\"100\"></a></td><td class=\"column-2\"><a href=\"http://geni.us/4Lk5\" rel=\"nofollow\">25460A Panini Press Gourmet Sandwich Maker</a></td><td class....

As you can see, there is http://geni.us/4Lk5\ in the file.

I have another product.csv files that contains ID (like 4LK5 above) and Amazon product URL like as follows.

4Lk5    8738    8/16/2016 0:20  https://www.amazon.com/gp/product/B00IWOJRSM/ref=as_li_qf_sp_asin_il_tl?ie=UTF8
Jx9Aj2  8738    8/22/2016 20:16 https://www.amazon.com/gp/product/B007EUSL5U/ref=as_li_qf_sp_asin_il_tl?ie=UTF8
9sl2    8738    8/22/2016 20:18 https://www.amazon.com/gp/product/B00C3GQGVG/ref=as_li_qf_sp_asin_il_tl?ie=UTF8

As you can see, there is 4LK5 which matches with Amazon product URL.

I have already read the csv file and pick only ID and Amazon product url with python.

def openFile(filename, mode):
    index = 0
    result = []
    with open(filename, mode) as csvfile:
        spamreader = csv.reader(csvfile, delimiter = ',', quotechar = '\n')
        for row in spamreader:
            result.append({
                "genu_id": row[0],
                "amazon_url": row[3]
            });
    return result

I have to add some code to search appropriate URL with genu_id in the db.sql and replace with amazon_url described on the code above.

Please help me.

Why would you want to use a regex for this, rather than parsing the cell contents with lxml.html or similar? — Charles Duffy
– Charles Duffy, Commented Jun 6, 2017 at 16:15
I'm new to python, so I don't know well. I think that I have to use regex in order to select 'http://' + 'geni.us/4Lk5' in ...**-1\"><a href=\"geni.us/4Lk5\" rel=nofol...** — Yuiry Kozlenko
– Yuiry Kozlenko, Commented Jun 6, 2017 at 16:18

zwer · Accepted Answer · 2017-06-06 18:07:51Z

There is no need for regex if you have such a predefined structure - if all links are in the form of http://geni.us/<geni_id> you can do it with simple str.replace() by reading each row of your CSV and replacing the matches in your SQL file. Something like:

import csv

with open("product.csv", "rb") as source, open("db.sql", "r+") as target:  # open the files
    sql_contents = target.read()  # read the SQL file contents
    reader = csv.reader(source, delimiter="\t")  # build a CSV reader, tab as a delimiter
    for row in reader:  # read the CSV line by line
        # replace any match of http://geni.us/<first_column> with third column's value
        sql_contents = sql_contents.replace("http://geni.us/{}".format(row[0]), row[3])
    target.seek(0)  # seek back to the start of your SQL file
    target.truncate()  # truncate the rest
    target.write(sql_contents)  # write back the changed content
    # ...
    # Profit? :D

Of course, if your original CSV file is comma-delimited, replace the delimiter in the csv.reader() call - the one you presented here seems tab-delimited.

Collectives™ on Stack Overflow

In python, search strings using regular expression and replace it with another

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related