0

I am trying to replace a character (for ex A) in a file with a set of characters as this NC@@(C)C(=O)O The code I wrote is:

# Read in the file
with open('C:/in.txt', 'rt') as file :
  filedata = file.read()

# Replace the character with coresponding code

filedata = filedata.replace('A','N[C@@]([H])(C)C(=O)O')
filedata = filedata.replace('R','N[C@@]([H])(CCCNC(=N)N)C(=O)O')
filedata = filedata.replace('N','N[C@@]([H])(CC(=O)N)C(=O)O')
filedata = filedata.replace('D','N[C@@]([H])(CC(=O)O)C(=O)O')
filedata = filedata.replace('C','N[C@@]([H])(CS)C(=O)O')
filedata = filedata.replace('E','N[C@@]([H])(CCC(=O)O)C(=O)O')
filedata = filedata.replace('Q','N[C@@]([H])(CCC(=O)N)C(=O)O')
filedata = filedata.replace('G','NCC(=O)O')
filedata = filedata.replace('H','N[C@@H](Cc1c[nH]cn1)C(=O)O')
filedata = filedata.replace('I','N[C@@]([H])([C@]([H])(CC)C)C(=O)O')
filedata = filedata.replace('L','N[C@@]([H])(CC(C)C)C(=O)O')
filedata = filedata.replace('K','N[C@@]([H])(CCCCN)C(=O)O')
filedata = filedata.replace('M','N[C@@]([H])(CCSC)C(=O)O')
filedata = filedata.replace('F','N[C@@]([H])(Cc1ccccc1)C(=O)O')
filedata = filedata.replace('P','N1[C@@]([H])(CCC1)C(=O)O')
filedata = filedata.replace('S','N[C@@]([H])(CO)C(=O)O')
filedata = filedata.replace('T','N[C@@]([H])([C@]([H])(O)C)C(=O)O')
filedata = filedata.replace('W','N[C@@H](Cc1c[nH]c2c1cccc2)C(=O)O')
filedata = filedata.replace('Y','N[C@@]([H])(Cc1ccc(O)cc1)C(=O)O')
filedata = filedata.replace('V','N[C@@]([H])(C(C)C)C(=O)O')


# Write the file out
with open('C:/out.txt', 'wt') as file:
  file.write(filedata)

The in.txt file is:

AAA
RRR
NNN
DDD
CCC
EEE
QQQ

The problem is that it generates a larger/stranger output then I would expect. For AAA I would expect it to be:

N[C@@]([H])(C)C(=O)ON[C@@]([H])(C)C(=O)ON[C@@]([H])(C)C(=O)O

But I am getting:

N[N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)ON[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)N)N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)O[N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O)N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)ON[N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)ON[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)N)N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)O[N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O)N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)ON[N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)ON[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)N)N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)O[N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O)N[C@@]([N[C@@H](Cc1c[nH]cn1)C(=O)O])(CN[C@@]([H])(CO)C(=O)O)C(=O)O(=O)O

What am I missing?

Many thanks for your help!

2
  • What is this syntax? What does it represent? Edit: ah it's chemistry formulae? Commented Apr 15, 2020 at 16:51
  • Indeed: SMILES code Commented Apr 15, 2020 at 17:46

1 Answer 1

1

When your code runs:

filedata = filedata.replace('A','N[C@@]([H])(C)C(=O)O')

It puts an N, a C, another H, and two more Cs into the string. These are then picked up and replaced by the future replaces.


The typical way to do this is to operate one line at a time.

def do_replace(line):
    # your replace logic from above

with open(r"C:\in.txt", 'r') as inf, open(r"C:\out.txt", 'w') as outf:
    for line in inf:
        new_line = do_replace(line)
        outf.write(new_line)

However you can also use str.translate with a translation dictionary.

mapping = {
    'A': 'N[C@@]([H])(C)C(=O)O',
    'R': 'N[C@@]([H])(CCCNC(=N)N)C(=O)O',
    'N': 'N[C@@]([H])(CC(=O)N)C(=O)O',
    'D': 'N[C@@]([H])(CC(=O)O)C(=O)O',
    'C': 'N[C@@]([H])(CS)C(=O)O',
    'E': 'N[C@@]([H])(CCC(=O)O)C(=O)O',
    'Q': 'N[C@@]([H])(CCC(=O)N)C(=O)O',
    'G': 'NCC(=O)O',
    'H': 'N[C@@H](Cc1c[nH]cn1)C(=O)O',
    'I': 'N[C@@]([H])([C@]([H])(CC)C)C(=O)O',
    'L': 'N[C@@]([H])(CC(C)C)C(=O)O',
    'K': 'N[C@@]([H])(CCCCN)C(=O)O',
    'M': 'N[C@@]([H])(CCSC)C(=O)O',
    'F': 'N[C@@]([H])(Cc1ccccc1)C(=O)O',
    'P': 'N1[C@@]([H])(CCC1)C(=O)O',
    'S': 'N[C@@]([H])(CO)C(=O)O',
    'T': 'N[C@@]([H])([C@]([H])(O)C)C(=O)O',
    'W': 'N[C@@H](Cc1c[nH]c2c1cccc2)C(=O)O',
    'Y': 'N[C@@]([H])(Cc1ccc(O)cc1)C(=O)O',
    'V': 'N[C@@]([H])(C(C)C)C(=O)O',
}

with open(r'C:\in.txt') as inf:
    text = inf.read()

with open(r'C:\out.txt') as outf:
    outf.write(text.translate(mapping))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.