0

I have different text files and I want to extract the values from there into a csv file. Each file has the following format

main cost: 30
additional cost: 5

I managed to do that but the problem that I want it to insert the values of each file into a different columns I also want the number of text files to be a user argument

This is what I'm doing now

  numFiles = sys.argv[1]
  d = [[] for x in xrange(numFiles+1)]
  for i in range(numFiles): 
      filename = 'mytext' + str(i) + '.text'
      with open(filename, 'r') as in_file:
      for line in in_file:
        items = line.split(' : ')
        num = items[1].split('\n')

        if i ==0:
            d[i].append(items[0])

        d[i+1].append(num[0])

        grouped = itertools.izip(*d[i] * 1)
        if i == 0:
            grouped1 = itertools.izip(*d[i+1] * 1)

        with open(outFilename, 'w') as out_file:
            writer = csv.writer(out_file)
            for j in range(numFiles):
                for val in itertools.izip(d[j]):
                    writer.writerow(val)

This is what I'm getting now, everything in one column

main cost   
additional cost   
30   
5   
40   
10

And I want it to be

main cost        | 30  | 40
additional cost  | 5   | 10
5
  • have you tried using tuples ? Commented Jul 29, 2016 at 21:53
  • Where does the last column come from in the desired output? Are ther only two lines in each input file? Commented Jul 29, 2016 at 22:57
  • I'm assuming the input file looks something like: main cost: 30 additional cost: 5 main cost: 40 additional cost: 10 Commented Jul 29, 2016 at 22:57
  • ahh ..., so each file would be a new column. Commented Jul 29, 2016 at 23:03
  • @wwii yes as Michael said Commented Jul 29, 2016 at 23:58

2 Answers 2

2

You could use a dictionary to do this where the key will be the "header" you want to use and the value be a list.

So it would look like someDict = {'main cost': [30,40], 'additional cost': [5,10]}

edit2: Went ahead and cleaned up this answer so it makes a little more sense.

You can build the dictionary and iterate over it like this:

from collections import OrderedDict

in_file = ['main cost : 30', 'additional cost : 5', 'main cost : 40', 'additional cost : 10']
someDict = OrderedDict()

for line in in_file:
    key,val = line.split(' : ')
    num = int(val)
    if key not in someDict:
        someDict[key] = []

    someDict[key].append(num)

for key in someDict:
    print(key)
    for value in someDict[key]:
        print(value)

The code outputs:

main cost
30
40
additional cost
5
10

Should be pretty straightforward to modify the example to fit your desired output.

I used the example @ append multiple values for one key in Python dictionary and thanks to @wwii for some suggestions.

I used an OrderedDict since a dictionary won't keep keys in order.

You can run my example @ https://ideone.com/myN2ge

Sign up to request clarification or add additional context in comments.

5 Comments

For this solution, you can be sure that there are only two keys, so you could construct the dictionary before-hand with those two keys and an empty list for values - then you can get rid of the if/else for the dictionary assignment. Alternatively if you are not sure about the keys beforehand you could use collections.defaultdict.
When you split text and plan on using the individual items later in your code, it is nice to give them names - it makes subsequent code easier to read. Take advantage of unpacking: in this case something like - key, value = line.split(':') ; value = value.strip()
Both great examples. For the first, I would probably keep it my way so in the future the file formats can change without having to modify the code. I agree with your second example.
Play around with collections.defaultdict, it solves the problem of trying to assign to a missing key without using if/thens or try/excepts.
That works as well unless you want to use an OrderedDict, which is probably what OP wants. Otherwise, it won't always output in the same order. I'll edit my example to include your first suggestion though. It's much easier to read that way.
0

This is how I might do it. Assumes the fields are the same in all the files. Make a list of names, and a dictionary using those field names as keys, and the list of values as the entries. Instead of running on file1.text, file2.text, etc. run the script with file*.text as a command line argument.

#! /usr/bin/env python

import sys

if len(sys.argv)<2:
    print "Give file names to process, with wildcards"
else:
    FileList= sys.argv[1:]
    FileNum = 0
    outFilename = "myoutput.dat"
    NameList = []
    ValueDict = {}
    for InfileName in FileList:
        Infile = open(InfileName, 'rU') 
        for Line in Infile: 
            Line=Line.strip('\n')
            Name,Value = Line.split(":")
            if FileNum==0:
                NameList.append(Name.strip())
            ValueDict[Name] = ValueDict.get(Name,[]) + [Value.strip()]
        FileNum += 1 # the last statement in the file loop
        Infile.close()
    # print NameList
    # print ValueDict

    with open(outFilename, 'w') as out_file:
        for N in NameList:
            OutString =  "{},{}\n".format(N,",".join(ValueDict.get(N)))
            out_file.write(OutString)

Output for my four fake files was:

main cost,10,10,40,10
additional cost,25.6,25.6,55.6,25.6

3 Comments

Thanks @beroe but I want the output to be saved in an csv file and the | representing a different column
this is what I get when I try the above code TypeError: can only join an iterable
Insert a line that prints ValueDict and see what it says. Each value should be a list of strings (numbers) if the data match your example. If there are blank lines or header lines, you could insert a check in the loop before the ValueDict[Name]= part...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.