0

I have a list like below:

['data-data analysis','word or words-phrase','rank-ranking']

and a regular CSV file which could contain the word before "-" anywhere in it (any column). I'd like to replace those with the words after "-". A sample CSV file could be like below:

h1,h2,h3
data of database,a,v
gg,word or words/word,gdg
asd,r,rank

I really appreciate any help.

Desired output:

h1,h2,h3
data analysis of database,a,v
gg,phrase/word,gdg
asd,r,ranking
4
  • This question looks pretty close stackoverflow.com/questions/19748676/… Commented Jan 11, 2015 at 20:46
  • @BobHaffner Yeah I tried that but my output file looks strange. It didn't replace anything and moreover the entire table is repeated 29 times in the same file. Commented Jan 11, 2015 at 21:18
  • can you include the desired output? Commented Jan 11, 2015 at 21:43
  • @Jasper Please find the edited question. Commented Jan 11, 2015 at 22:29

1 Answer 1

1

This has some trickery so you don't get data analysis of data analysisbase when replacing data:

input.csv

h1,h2,h3
data of database,a,v
gg,word or words/word,gdg
asd,r,rank

Python code

#!python2
import csv
import re

# This builds a dictionary of key/value replacements.
# It wraps the key in word breaks to handle not replacing
# "database" when the key is "data".
L = ['data-data analysis','word or words-phrase','rank-ranking']
pairs = [w.split('-') for w in L]
replacements = {r'\b' + re.escape(k) + r'\b':v for k,v in pairs}

# Files should be opened in binary mode for use with csv module.
with open('input.csv','rb') as inp:
    with open('output.csv','wb') as outp:

        # wrap the file streams in csv reader and csv writer objects.
        r = csv.reader(inp)
        w = csv.writer(outp)

        for line in r:
            for i,item in enumerate(line):
                for k,v in replacements.items():
                    item = re.sub(k,v,item)
                line[i] = item
            w.writerow(line)

output.csv

h1,h2,h3
data analysis of database,a,v
gg,phrase/word,gdg
asd,r,ranking
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you. This is exactly what I want but it says "ValueError: need more than 1 value to unpack" pointing to replacements = {r'\b' + re.escape(k) + r'\b':v for k,v in pairs}
@amy, what version of Python are you using? Also, that could happen if you don't have a hyphen in one of your replacement strings. Are you using the exact code above?
Yes. I used the exact code. Also, I'm using Python 2.7.6. There is a hyphen in all replacement strings.
The above code is Python 3, so it gets an error for me after that line. Did you change the list L? I'll make a version compatible with 2.7.
@amy Updated for Python 2.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.