1

I am attempting to use this code to parse a csv file but cannot find my way around this error:

"File "(file location)", line 438, in parser_42

position = tmp2[1]

IndexError: list index out of range"

my csv file is structured like so:

mutant coefficient Score

Q41V -0.19 0.05

Q41L -0.08 0.26

Q41T -0.21 0.43

I23V -0.02 0.45

I61V 0.01 1.12

I want to take the mutants and separate 'Q' '41' and 'V', for example. I then want to create lists of positions and wt's and put them in numerical order.

The goal is to write the string "seq" to a new csv file

obviously, I am a beginner in python and data manipulation. I imagine that I am just overlooking something silly...Can anyone steer me in the right direction?

def parser_42(csv_in, fasta_in, *args):

    with open(csv_in, 'r') as tsv_in:
        tsv_in = csv.reader(tsv_in, delimiter='\t')
        next(tsv_in) # data starts on line 7
        next(tsv_in)
        next(tsv_in)
        next(tsv_in)
        next(tsv_in)
        next(tsv_in)

        for row in tsv_in:
            tmp = row[0].split(',')
            tmp2 = re.split('(\d+)', tmp[0])
            wt = tmp2[0]
            position = tmp2[1]
            substitution = tmp[2]

            seq = ""
            current_positions = []


            if position not in current_positions:
                current_positions += [position]
                print(current_positions)
                seq += wt
            else:
                continue

        print(seq)
3
  • it looks like your csv only has one value per row and youre trying to acces a second value with tmp2[1] Commented Feb 19, 2017 at 21:47
  • You probably have an empty line somewhere, possibly at the end of the file. Commented Feb 19, 2017 at 22:10
  • After you split, you can check the length of the result before you proceed to access indexes which do not exist. Commented Feb 19, 2017 at 23:50

1 Answer 1

1

for anyone who may be interested, this is how I solved my problem... if anyone has any suggestions on how to make this a little more concise, the advice would be appreciated. I know this probably seems like a roundabout way to fix a small issue but I learned a fair amount in the process so I am not overly concerned :). I basically replaced the .split() with regular expressions, which seems to be a bit more clean.

def parser_42(csv_in, fasta_in, *args):
    dataset = pd.DataFrame(columns=get_column_names())
    with open(csv_in) as tsv_in:
        tsv_in = csv.reader(tsv_in, delimiter='\t')
        next(tsv_in) #data starts on row 7
        next(tsv_in)
        next(tsv_in)
        next(tsv_in)
        next(tsv_in)
        next(tsv_in)
        save_path = '(directory path)'
        complete_fasta_filename = os.path.join(save_path, 'dataset_42_seq.fasta.txt')
        output_fasta_file = open(complete_fasta_filename, 'w')

        seq = ''
        current_positions = []

        for row in tsv_in:

         # regular expressions to split numbers and characters in single cell
            regepx_match = re.match(r'([A-Z])([0-9]+)([A-Z,*])', row[0], re.M | re.I)
            wt = regepx_match.group(1)
            position = int(regepx_match.group(2))
            substitution = regepx_match.group(3)

            if position not in current_positions:
                current_positions += [position]
                seq += wt
            else:
                continue
        seq = list(seq)

    # this zips seq and current_positions and sorts seq
        sorted_y_idx_list = sorted(range(len(current_positions)), key=lambda x: current_positions[x])
        Xs = [seq[i] for i in sorted_y_idx_list]

        seq1 = '>dataset_42 fasta\n'
        seq1 = seq1 + ''.join(Xs) # join to string


        output_fasta_file.write(seq1)
        output_fasta_file.close()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.