1

I'm working with large set of csv data and I want to put several columns in different places into one column separated by semi-colon(;).

So what I have now is..

a   b   c   d
1   2   3   4
1   2   3   4
1   2   3   4

I want to change this like..this, So all my data is only in column d.

a   b   c   d
            a=1;b=2;c=3;d=4;
            a=1;b=2;c=3;d=4;
            a=1;b=2;c=3;d=4;

I know how to delete those empty column a,b and c but I just can't figure out a way to merge the data from column a,b,c into column d. Thanks in advance.

The code that I have so far is..

# Parsing the custom formatted data with csv module.
# reads the custom format input and spits out the output in VCF format.
import csv
# input and output
with open('1-0002', 'rb') as csvin, open('converted1','wb') as csvout:
    # reading and writing are all tab delimited
    reader = csv.reader(csvin, delimiter = '\t')
    writer = csv.writer(csvout, delimiter = '\t')
    # add headings before the for loop to prevent the heading being affected by column manipulation.
    writer.writerow(["#CHROM","POS","ID","REF","ALT","QUAL","FILTER","INFO"])

    for row in reader:
        # deleting unnecessary columns, 'del' operator must be in ascending order or else it will give range error
        # manually deleting columns since the input data is in custom format.
        del row[11]
        del row[10]
        del row[9]
        del row[8]
        del row[7]
        del row[6]
        del row[5]
        del row[1]
        del row[0]
        # inserting 1 and . in specific columns
        row.insert(0,'1')
        row.insert(2,'.')
        row.insert(5,'.')
        row.insert(7,'') # inserting empty column for INFO headings.

        # change 'YES' to 'PASS' , leaving HETERO as it is.
        if row[6] == 'YES':
                row[6] = 'PASS'

        writer.writerow(row)

So from this code above, I want to put the data from several different columns into INFO column.

6
  • Does the last column necessarily have to be called d, or can the new merged column be called something else, eg. d_merged? Commented Jun 25, 2015 at 19:17
  • 1
    how is the file '1-0002' formatted ? Commented Jun 25, 2015 at 19:19
  • @vk, the column name doesn't matter Commented Jun 25, 2015 at 19:34
  • @Praneeth , the file is formatted just like the example shown, each column is different parameter and strings in row. Commented Jun 25, 2015 at 19:35
  • Can't you just use row = row[12:]? Commented Jun 25, 2015 at 20:02

3 Answers 3

3

Simple answer: don't bother deleting the row, but make a NEW row for insertion that only picks what you want.

It will look like this:

# leave row alone, don't bother deleting columns in it.
new_row = ["a=%s;b=%s;c=%s;d=%s"% (row[12], row[13], row[14])] 
# new_row has only one column, with a string constructed of what you need. 

writer.writerow(new_row)

And voila, that should do it for you. You can also copy any other columns you need to new_row, and append() whatever else you might desire.

Sign up to request clarification or add additional context in comments.

Comments

0

just read the file with a reader and write the data you read into another file. this example is assuming you want to still keep empty columns a, b and c

r = csv.readedr('filename.csv', 'r')
with open('outfile.csv') as outfile:
    w = csv.writer(outfile, 'w')
    for row in r:
        # joins the current row into string seperated by spaces
        outdata = row.join(' ')
        # leaves first three columns empty, writes old content onto fourth column
        data_to_write = ('', '', '', outdata)
        w.writerow(data_to_write)

Comments

0

Try pandas

import pandas as pd
df = pd.read_csv('1-0002.csv')
df['d_merged'] = df.apply(lambda row: 'a={0};b={1};c={2};d={3};'.format(row['a'],row['b'],row['c'],row['d']), axis=1)

This gives:

>>> df

   a  b  c  d          d_merged
0  1  2  3  4  a=1;b=2;c=3;d=4;
1  1  2  3  4  a=1;b=2;c=3;d=4;
2  1  2  3  4  a=1;b=2;c=3;d=4;

Now delete the columns you don't want:

df = df.drop(['a','b','c','d'], axis=1)

>>> df

           d_merged
0  a=1;b=2;c=3;d=4;
1  a=1;b=2;c=3;d=4;
2  a=1;b=2;c=3;d=4;

Now rename d_merged if you wish:

df = df.rename(columns={'d_merged':'d'})

>>> df

                  d
0  a=1;b=2;c=3;d=4;
1  a=1;b=2;c=3;d=4;
2  a=1;b=2;c=3;d=4;

(Alternatively, combine the above two steps as:

df['d'] = df.apply(lambda row: 'a={0};b={1};c={2};d={3};'.format(row['a'],row['b'],row['c'],row['d']), axis=1)

)

Then write to CSV:

df.to_csv('csvout.csv', index=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.