1

I have data in a text file and I would like to be able to modify the file by columns and output the file again. I normally write in C (basic ability) but choose python for it's obvious string benefits. I haven't ever used python before so I'm a tad stuck. I have been reading up on similar problems but they only show how to change whole lines. To be honest I have on clue what to do.

Say I have the file

1 2 3
4 5 6
7 8 9

and I want to be able to change column two with some function say multiply it by 2 so I get

1 4 3
4 10 6
7 16 9

Ideally I would be able to easily change the program so I apply any function to any column.

For anyone who is interested it is for modifying lab data for plotting. eg take the log of the first column.

1

4 Answers 4

1

Python is an excellent general purpose language however I might suggest that if you are on an Unix based system then maybe you should take a look at awk. The language awk is design for these kind of text based transformation. The power of awk is easily seen for your question as the solution is only a few characters: awk '{$2=$2*2;print}'.

$ cat file
1 2 3
4 5 6
7 8 9

$ awk '{$2=$2*2;print}' file
1 4 3
4 10 6
7 16 9

# Multiple the third column by 10
$ awk '{$3=$3*10;print}' file
1 2 30
4 5 60
7 8 90

In awk each column is referenced by $i where i is the ith field. So we just set the value of second field to be the value of second field multiplied by two and print the line. This can be written even more concisely like awk '{$2=$2*2}1' file but best to be clear at beginning.

Sign up to request clarification or add additional context in comments.

4 Comments

holy c*&p I have never heard of awk but that is amazing. Thank you very much for that! I will certainly be using it. However, I believe I need to have some form of python script because some of the functions I need to apply to the numbers won't be so trivial and I'm not sure if awk can handle that? Basic example would be to take column 1 and replace it with the log of column 1. That wouldn't be possible on awk would it? Plus the lab computers are windows, another reason for wanting python.
Here is a great referee to get started. Awk really is great for these types of jobs and being focus on one domain means the language is small and simple so great for learning. Awk has the log function so this isn't a problem. If you are on a Windows machine and need a python solution then just keep this answers as a side note.
I will definitely be reading up on awk but I would still like to get my python script working as I have started and don't like failing. I really appreciate your help though.
If you already have made an attempt you should post it in the question.
1

Here is a very simple Python solution:

for line in open("myfile.txt"):
    col = line.strip().split(' ')
    print col[0],int(col[1])*2,col[2]

There are plenty of improvements that could made but I'll leave that as an exercise for you.

1 Comment

I tried this originally but coming from C I tried (int)col[1] and obviously it didn't work. I feel like a fool for not looking up python type casting now. Cheers for your input
1

I would use pandas or just numpy. Read your file with:

data = pd.read_csv('file.txt', header=None, delim_whitespace=True)

then work with the data in a spreadsheet like style, ex:

data.values[:,1] *= 2

finally write again to file with:

data.to_csv('output.txt')

Comments

0

As @sudo_O said, there are much efficient tools than python for this task. However,here is a possible solution :

from itertools import imap, repeat
import csv

fun = pow

with open('m.in', 'r') as input_file :
    with open('m.out', 'wb') as out_file:

        inpt = csv.reader(input_file, delimiter=' ')
        out = csv.writer(out_file, delimiter=' ')

        for row in inpt:
            row = [ int(e) for e in row] #conversion
            opt = repeat(2, len(row) ) # square power for every value

                    # write ( function(data, argument) )
            out.writerow( [ str(elem )for elem in imap(fun, row , opt ) ]  )

Here it multiply every number by itself, but you can configure it to multiply only the second colum, by changing opt : opt = [ 1 + (col == 1) for col in range(len(row)) ] (2 for col 1, 1 otherwise )

6 Comments

This is exactly what I was looking for. However, when I put in your other "opt = ..." to apply the function to only one column the function works on that column but outputs 1's for every other component. I can't seem to work out why.
that's because my formula is wrong . when col != 1, opt[col] == 0 and pow( number, 0) equals 1. The real one is [ 1 + (col==1) for col in range(len(row)) ]
perfect, sorry for not picking up on that. Thank you for this. you are a life saver =D
only problem with this is I'm not sure how to change the function. I have been messing around for a while now but can't quite get my head around it.
All good I finally figured it out. I was reading it all wrong. I can't thank you enough. If I could rep you i would
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.