Problem with a splitting string from file

Question

I have a file containing information in three columns that have separated by different amount of spaces. How can i split the columns to the three separated columns? In order that I can calculate the average of the middle column.

Example from the data file.

     0          41         216
    10          42         214
    20          43         215
    30          39         222
    40          34         222
    50          35         215
    60          42         218
    70          37         213
    80          41         216
    90          43         222
   100          33         220

My code

def main ():

    total = 0.0
    n = 0
    aveg = 0.0  

    try:
        inputfile = open("inputfile.txt", "r")
        for line in  inputfile:
            line = line.rstrip()
            if line[0] != '#' and line[0] != '@':
                line = line.strip()
                data = line.split(" ")
                print(data[1])
                bonds = data[1]
                float(bonds)
                total = total + bonds
                n = n + 1

        inputfile.close
    except OSError:
        print("OSError")
    aveg = total/n
        print("Average:", aveg)

main()

Can you use pandas? pandas.read_csv('inputfile.txt', sep='\s+') will take care of everything you want :) — Chris
– Chris, Commented Aug 1, 2019 at 8:47
@L0KiZ the data you posted doesn't look like being separated by just one blank space. Please double-check it. — alec_djinn
– alec_djinn, Commented Aug 1, 2019 at 9:00

Alexandre B. · Accepted Answer · 2019-08-01 08:50:21Z

1

Some modules already do the job for you !

Have a look at numpy.loadtxt. It loads a text file and returns a numpy array ready to use.

Here an example:

# Import module
import numpy as np

# Load text
data = np.loadtxt("filename.txt")
print(data)
# [[  0.  41. 216.]
#  [ 10.  42. 214.]
#  [ 20.  43. 215.]
#  [ 30.  39. 222.]
#  [ 40.  34. 222.]
#  [ 50.  35. 215.]
#  [ 60.  42. 218.]
#  [ 70.  37. 213.]
#  [ 80.  41. 216.]
#  [ 90.  43. 222.]
#  [100.  33. 220.]]

Then you can easily have the average of a column with np.mean()

print(np.mean(data[:, 1]))
# 39.09090909090909

answered Aug 1, 2019 at 8:50

Alexandre B.

5,5002 gold badges19 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

LinPy · Accepted Answer · 2019-08-01 09:11:28Z

here is your code with some changes:

def main ():

total = 0.0
n = 0
aveg = 0.0

try:
    inputfile = open("test", "r")
    for line in  inputfile:
        line = line.rstrip()
        if line[0] != '#' and line[0] != '@':
            line = line.strip()
            data = line.split()
            print(data)
            bonds = data[1]
            bonds = float(bonds)
            total = total + bonds
            n = n + 1

    inputfile.close()
except OSError:
    print("OSError")
aveg = total/n
print("Average:", aveg)

main()

result Average: 38.18181818181818 the problem was that your code return this after split

['0', '', '', '', '', '', '', '', '', '', '41', '', '', '', '', '', '', '', '', '216']

['10', '', '', '', '', '', '', '', '', '', '42', '', '', '', '', '', '', '', '', '214']

['20', '', '', '', '', '', '', '', '', '', '43', '', '', '', '', '', '', '', '', '215']

['30', '', '', '', '', '', '', '', '', '', '39', '', '', '', '', '', '', '', '', '222']

['40', '', '', '', '', '', '', '', '', '', '34', '', '', '', '', '', '', '', '', '222']

['50', '', '', '', '', '', '', '', '', '', '35', '', '', '', '', '', '', '', '', '215']

['60', '', '', '', '', '', '', '', '', '', '42', '', '', '', '', '', '', '', '', '218']

['70', '', '', '', '', '', '', '', '', '', '37', '', '', '', '', '', '', '', '', '213']

['80', '', '', '', '', '', '', '', '', '', '41', '', '', '', '', '', '', '', '', '216']

['90', '', '', '', '', '', '', '', '', '', '43', '', '', '', '', '', '', '', '', '222']

['100', '', '', '', '', '', '', '', '', '', '33', '', '', '', '', '', '', '', '', '220']

Teemu · Accepted Answer · 2019-08-01 08:54:45Z

0

There are already some great answers using numpy and pandas, but if you want to process it by yourself, you could do it with list comprehension

Sample:

# line has a variable number of spaces as delimiters
line = '1   3     5'
# split line into a list by spaces
split_line = line.split(' ')
# filter spaces out, keeping only those list elements which have values,
# because if '' evaluates to false, they will be discarded
# also convert the string numbers to integers
only_numbers = [int(num) for num in split_line if num]

Then you'll get

print(only_numbers)
[1, 3, 5]

answered Aug 1, 2019 at 8:54

Teemu

2861 gold badge2 silver badges11 bronze badges

Comments

alec_djinn · Accepted Answer · 2019-08-01 09:10:28Z

0

As suggested, you could use Pandas or Numpy to solve the task. However, if you really want to do it yourself in pure Python, without using extra libraries, here is a quite readable and Pythonic version of your code. Also, use simply split() instead split(" ") since you have more than one blank-space separating the data in your file.

bonds = []
with open("inputfile.txt", "r") as inputfile:
    for line in inputfile:
        line = line.strip()
        if len(line) and line[0] not in '#@':
            data = line.split()
            bonds.append(float(data[1]))

avg = sum(bonds)/len(bonds)

edited Aug 1, 2019 at 9:10

answered Aug 1, 2019 at 8:57

alec_djinn

10.9k9 gold badges57 silver badges77 bronze badges

Collectives™ on Stack Overflow

Problem with a splitting string from file

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related