How to merge several csv columns into one column using python 2.7?

Question

I'm working with large set of csv data and I want to put several columns in different places into one column separated by semi-colon(;).

So what I have now is..

a   b   c   d
1   2   3   4
1   2   3   4
1   2   3   4

I want to change this like..this, So all my data is only in column d.

a   b   c   d
            a=1;b=2;c=3;d=4;
            a=1;b=2;c=3;d=4;
            a=1;b=2;c=3;d=4;

I know how to delete those empty column a,b and c but I just can't figure out a way to merge the data from column a,b,c into column d. Thanks in advance.

The code that I have so far is..

# Parsing the custom formatted data with csv module.
# reads the custom format input and spits out the output in VCF format.
import csv
# input and output
with open('1-0002', 'rb') as csvin, open('converted1','wb') as csvout:
    # reading and writing are all tab delimited
    reader = csv.reader(csvin, delimiter = '\t')
    writer = csv.writer(csvout, delimiter = '\t')
    # add headings before the for loop to prevent the heading being affected by column manipulation.
    writer.writerow(["#CHROM","POS","ID","REF","ALT","QUAL","FILTER","INFO"])

    for row in reader:
        # deleting unnecessary columns, 'del' operator must be in ascending order or else it will give range error
        # manually deleting columns since the input data is in custom format.
        del row[11]
        del row[10]
        del row[9]
        del row[8]
        del row[7]
        del row[6]
        del row[5]
        del row[1]
        del row[0]
        # inserting 1 and . in specific columns
        row.insert(0,'1')
        row.insert(2,'.')
        row.insert(5,'.')
        row.insert(7,'') # inserting empty column for INFO headings.

        # change 'YES' to 'PASS' , leaving HETERO as it is.
        if row[6] == 'YES':
                row[6] = 'PASS'

        writer.writerow(row)

So from this code above, I want to put the data from several different columns into INFO column.

Does the last column necessarily have to be called d, or can the new merged column be called something else, eg. d_merged? — vk1011
– vk1011, Commented Jun 25, 2015 at 19:17
@Praneeth , the file is formatted just like the example shown, each column is different parameter and strings in row. — Danny
– Danny, Commented Jun 25, 2015 at 19:35

Danver Braganza · Accepted Answer · 2015-06-25 19:20:47Z

3

Simple answer: don't bother deleting the row, but make a NEW row for insertion that only picks what you want.

It will look like this:

# leave row alone, don't bother deleting columns in it.
new_row = ["a=%s;b=%s;c=%s;d=%s"% (row[12], row[13], row[14])] 
# new_row has only one column, with a string constructed of what you need. 

writer.writerow(new_row)

And voila, that should do it for you. You can also copy any other columns you need to new_row, and append() whatever else you might desire.

answered Jun 25, 2015 at 19:20

Danver Braganza

1,36510 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Christian W. · Accepted Answer · 2015-06-25 19:16:00Z

0

just read the file with a reader and write the data you read into another file. this example is assuming you want to still keep empty columns a, b and c

r = csv.readedr('filename.csv', 'r')
with open('outfile.csv') as outfile:
    w = csv.writer(outfile, 'w')
    for row in r:
        # joins the current row into string seperated by spaces
        outdata = row.join(' ')
        # leaves first three columns empty, writes old content onto fourth column
        data_to_write = ('', '', '', outdata)
        w.writerow(data_to_write)

answered Jun 25, 2015 at 19:16

Christian W.

2,6601 gold badge22 silver badges34 bronze badges

Comments

vk1011 · Accepted Answer · 2015-06-25 19:37:09Z

Try pandas

import pandas as pd
df = pd.read_csv('1-0002.csv')
df['d_merged'] = df.apply(lambda row: 'a={0};b={1};c={2};d={3};'.format(row['a'],row['b'],row['c'],row['d']), axis=1)

This gives:

>>> df

   a  b  c  d          d_merged
0  1  2  3  4  a=1;b=2;c=3;d=4;
1  1  2  3  4  a=1;b=2;c=3;d=4;
2  1  2  3  4  a=1;b=2;c=3;d=4;

Now delete the columns you don't want:

df = df.drop(['a','b','c','d'], axis=1)

>>> df

           d_merged
0  a=1;b=2;c=3;d=4;
1  a=1;b=2;c=3;d=4;
2  a=1;b=2;c=3;d=4;

Now rename d_merged if you wish:

df = df.rename(columns={'d_merged':'d'})

>>> df

                  d
0  a=1;b=2;c=3;d=4;
1  a=1;b=2;c=3;d=4;
2  a=1;b=2;c=3;d=4;

(Alternatively, combine the above two steps as:

df['d'] = df.apply(lambda row: 'a={0};b={1};c={2};d={3};'.format(row['a'],row['b'],row['c'],row['d']), axis=1)

)

Then write to CSV:

df.to_csv('csvout.csv', index=False)

Collectives™ on Stack Overflow

How to merge several csv columns into one column using python 2.7?

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related