0

I have a file containing information in three columns that have separated by different amount of spaces. How can i split the columns to the three separated columns? In order that I can calculate the average of the middle column.

Example from the data file.

     0          41         216
    10          42         214
    20          43         215
    30          39         222
    40          34         222
    50          35         215
    60          42         218
    70          37         213
    80          41         216
    90          43         222
   100          33         220

My code

def main ():

    total = 0.0
    n = 0
    aveg = 0.0  

    try:
        inputfile = open("inputfile.txt", "r")
        for line in  inputfile:
            line = line.rstrip()
            if line[0] != '#' and line[0] != '@':
                line = line.strip()
                data = line.split(" ")
                print(data[1])
                bonds = data[1]
                float(bonds)
                total = total + bonds
                n = n + 1

        inputfile.close
    except OSError:
        print("OSError")
    aveg = total/n
        print("Average:", aveg)

main()
5
  • 1
    Can you use pandas? pandas.read_csv('inputfile.txt', sep='\s+') will take care of everything you want :) Commented Aug 1, 2019 at 8:47
  • Are you sure the separator is " " and not "\t"? Commented Aug 1, 2019 at 8:51
  • @alec_djinn the separator is not tabular, Commented Aug 1, 2019 at 8:56
  • @L0KiZ the data you posted doesn't look like being separated by just one blank space. Please double-check it. Commented Aug 1, 2019 at 9:00
  • you need just to use line.split() see my answer Commented Aug 1, 2019 at 9:02

4 Answers 4

1

Some modules already do the job for you !

Have a look at numpy.loadtxt. It loads a text file and returns a numpy array ready to use.

Here an example:

# Import module
import numpy as np

# Load text
data = np.loadtxt("filename.txt")
print(data)
# [[  0.  41. 216.]
#  [ 10.  42. 214.]
#  [ 20.  43. 215.]
#  [ 30.  39. 222.]
#  [ 40.  34. 222.]
#  [ 50.  35. 215.]
#  [ 60.  42. 218.]
#  [ 70.  37. 213.]
#  [ 80.  41. 216.]
#  [ 90.  43. 222.]
#  [100.  33. 220.]]

Then you can easily have the average of a column with np.mean()

print(np.mean(data[:, 1]))
# 39.09090909090909
Sign up to request clarification or add additional context in comments.

Comments

0

here is your code with some changes:

def main ():

total = 0.0
n = 0
aveg = 0.0

try:
    inputfile = open("test", "r")
    for line in  inputfile:
        line = line.rstrip()
        if line[0] != '#' and line[0] != '@':
            line = line.strip()
            data = line.split()
            print(data)
            bonds = data[1]
            bonds = float(bonds)
            total = total + bonds
            n = n + 1

    inputfile.close()
except OSError:
    print("OSError")
aveg = total/n
print("Average:", aveg)

main()

result Average: 38.18181818181818 the problem was that your code return this after split

['0', '', '', '', '', '', '', '', '', '', '41', '', '', '', '', '', '', '', '', '216']

['10', '', '', '', '', '', '', '', '', '', '42', '', '', '', '', '', '', '', '', '214']

['20', '', '', '', '', '', '', '', '', '', '43', '', '', '', '', '', '', '', '', '215']

['30', '', '', '', '', '', '', '', '', '', '39', '', '', '', '', '', '', '', '', '222']

['40', '', '', '', '', '', '', '', '', '', '34', '', '', '', '', '', '', '', '', '222']

['50', '', '', '', '', '', '', '', '', '', '35', '', '', '', '', '', '', '', '', '215']

['60', '', '', '', '', '', '', '', '', '', '42', '', '', '', '', '', '', '', '', '218']

['70', '', '', '', '', '', '', '', '', '', '37', '', '', '', '', '', '', '', '', '213']

['80', '', '', '', '', '', '', '', '', '', '41', '', '', '', '', '', '', '', '', '216']

['90', '', '', '', '', '', '', '', '', '', '43', '', '', '', '', '', '', '', '', '222']

['100', '', '', '', '', '', '', '', '', '', '33', '', '', '', '', '', '', '', '', '220']

Comments

0

There are already some great answers using numpy and pandas, but if you want to process it by yourself, you could do it with list comprehension

Sample:

# line has a variable number of spaces as delimiters
line = '1   3     5'
# split line into a list by spaces
split_line = line.split(' ')
# filter spaces out, keeping only those list elements which have values,
# because if '' evaluates to false, they will be discarded
# also convert the string numbers to integers
only_numbers = [int(num) for num in split_line if num]

Then you'll get

print(only_numbers)
[1, 3, 5]

Comments

0

As suggested, you could use Pandas or Numpy to solve the task. However, if you really want to do it yourself in pure Python, without using extra libraries, here is a quite readable and Pythonic version of your code. Also, use simply split() instead split(" ") since you have more than one blank-space separating the data in your file.

bonds = []
with open("inputfile.txt", "r") as inputfile:
    for line in inputfile:
        line = line.strip()
        if len(line) and line[0] not in '#@':
            data = line.split()
            bonds.append(float(data[1]))

avg = sum(bonds)/len(bonds)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.