0

I have a csv file structured in the following way:

num  mut
36    L
45    P
  ...

where num indicates the position of a mutation and mut indicates the mutation. I have to modify at the position num with the letter mut a string. I wrote the following code in python:

import pandas as pd
import os
df = pd.read_csv(r'file.csv')
df_tmp=df.astype(str)
df_tmp["folder"]=df_tmp["num"]+df_tmp["mut"] #add a third column
f = open("sequence.txt", 'r')
content = f.read()
for i in range(len(df)):
     num=df_tmp.num.loc[[i]]-13
     num=num.astype(int)
     prev=num-1
     prev=prev.astype(int)
     mut=df_tmp.mut.loc[[i]]
     mut=mut.astype(str)
     new="".join((content[:prev],mut,content[num:])) #this should modify the file

But it returns me

TypeError: slice indices must be integers or None or have an __index__ method

How can I solve?

Edit: maybe it is more clear what I want to do. I have to insert only the first mutation in my sequence, save it to a file, copy the file in a folder that is named as the third column (that I added in the code), make the same thing with the second mutation, then the third and so on. But I have to insert only one mutation at time.

3
  • your approach is really inefficient, you're looping and recreating the full string for each loop. the maximum complexity, assuming you change all characters would be O(n**2) while you can do it in O(n) Commented May 31, 2022 at 11:50
  • I edited the question, maybe now it is more clear why i use the loop @mozway Commented May 31, 2022 at 11:56
  • I see, I added an alternative to my answer Commented May 31, 2022 at 12:08

2 Answers 2

1

multiple mutations:

IIUC, you'd be better off pandas, convert your dataframe to dictionary, iterate and join:

# input DataFrame
df = pd.DataFrame({'num': [36, 45], 'mut': ['L', 'P']})

# input string
string = '-'*50
# '--------------------------------------------------'

# get the positions to modify
pos = df.set_index('num')['mut'].to_dict()
# {36: 'L', 45: 'P'}

# iterate over the string, replace hte characters if in the dictionary
# NB. define start=1 if you want the first position to be 1
new_string = ''.join([pos.get(i, c) for i,c in enumerate(string, start=0)])
# '------------------------------------L--------P----'

single mutations:

string = '-'*50
# '--------------------------------------------------'

for idx, r in df.iterrows():
    new_string = string[:r['num']-1]+r['mut']+string[r['num']:]
    # or
    # new_string = ''.join([string[:r['num']-1], r['mut'], string[r['num']:]])
    
    with open(f'file_{idx}.txt', 'w') as f:
        f.write(new_string)

output:

file_0.txt
-----------------------------------L--------------

file_1.txt
--------------------------------------------P-----
Sign up to request clarification or add additional context in comments.

1 Comment

In this case it adds the mutation and not substitute it. So for example in your string of 50 dash, it becomes of 51 @mozway
0

I tried your code with a sample file.csv and an empty sequence.txt file,

in your code first line from for loop

num=df_tmp.num.loc[[i]]-13
#gives an error since the num in that location is str, to correct that:

num=df_tmp.num.loc[[i]].astype(int)-13 
# I used astype to convert it into int first

After this the next error is in last line , the slice indices type error, This is due to the fact that , the resulting prev and num you use to slice the content variable is not a int, to get the int value add a [0] to it in this way:

content="".join((content[:prev[0]],mut,content[num[0]:]))

There shouldn't be an error now.

2 Comments

now it gives me this: TypeError: sequence item 1: expected str instance, Series found @Cranchian
at which line does this error pop up exactly?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.