Issues calling awk from within Python using subprocess.call

Question

Having some issues calling awk from within Python. Normally, I'd do the following to call the command in awk from the command line.

Open up command line, in admin mode or not.
Change my directory to awk.exe, namely cd R\GnuWin32\bin
Call awk -F "," "{ print > (\"split-\" $10 \".csv\") }" large.csv

My command is used to split up the large.csv file based on the 10th column into a number of files named split-[COL VAL HERE].csv. I have no issues running this command. I tried to run the same code in Python using subprocess.call() but I'm having some issues. I run the following code:

def split_ByInputColumn():
     subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', '\",\"', 
              '\"{ print > (\\"split-\\" $10 \\".csv\\") }\"', 'large.csv'],
                  cwd = 'C:/R/GnuWin32/bin/')

and clearly, something is running when I execute the function (CPU usage, etc) but when I go to check C:/R/GnuWin32/bin/ there are no split files in the directory. Any idea on what's going wrong?

Any reason why you don't just do the equivalent with Python,so you don't have to run awk? — wwii
– wwii, Commented Nov 3, 2016 at 18:19

Jean-François Fabre · Accepted Answer · 2016-11-03 18:40:43Z

As I stated in my previous answer that was downvoted, you overprotect the arguments, making awk argument parsing fail.

Since there was no comment, I supposed there was a typo but it worked... So I suppose that's because I should have strongly suggested a full-fledged python solution, which is the best thing to do here (as stated in my previous answer)

Writing the equivalent in python is not trivial as we have to emulate the way awk opens files and appends to them afterwards. But it is more integrated, pythonic and handles quoting properly if quoting occurs in the input file.

I took the time to code & test it:

def split_ByInputColumn():
    # get rid of the old data from previous runs
    for f in glob.glob("split-*.csv"):
        os.remove(f)

    open_files = dict()

    with open('large.csv') as f:
        cr = csv.reader(f,delimiter=',')
        for r in cr:
            tenth_row = r[9]
            filename = "split-{}.csv".format(tenth_row)
            if not filename in open_files:
                handle = open(filename,"wb")
                open_files[filename] = (handle,csv.writer(handle,delimiter=','))
            open_files[filename][1].writerow(r)

    for f,_ in open_files.values():
        f.close()

split_ByInputColumn()

in detail:

read the big file as csv (advantage: quoting is handled properly)
compute the destination filename
if filename not in dictionary, open it and create csv.writer object
write the row in the corresponding dictionary
in the end, close file handles

Aside: My old solution, using awk properly:

import subprocess

def split_ByInputColumn():
     subprocess.call(['awk.exe', '-F', ',',
              '{ print > ("split-" $10 ".csv") }', 'large.csv'],cwd = 'some_directory')

Thanks for re-answering and providing a great python solution as well!

genap · Accepted Answer · 2016-11-03 18:21:22Z

1

Someone else posted an answer (and then subsequently deleted it), but the issue was that I was over-protecting my arguments. The following code works:

def split_ByInputColumn():
 subprocess.call(['C:/R/GnuWin32/bin/awk.exe', '-F', ',', 
          '{ print > (\"split-\" $10 \".csv\") }', 'large.csv'],
              cwd = 'C:/R/GnuWin32/bin/')

answered Nov 3, 2016 at 18:21

genap

2653 silver badges11 bronze badges

2 Comments

Jean-François Fabre Over a year ago

my answer was downvoted, so I supposed that it wasn't working, just tested and it works...

genap Over a year ago

@Jean-FrançoisFabre Unsure why you got downvoted, but the new answer is even better - thanks for the help.

Collectives™ on Stack Overflow

Issues calling awk from within Python using subprocess.call

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related