1

I have the following line of code reading in a specific part of a text file. The problem is these are numbers not strings so I want to convert them to ints and read them into a list of some sort.

A sample of the data from the text file is as follows:

However this is not wholly representative I have uploaded the full set of data here: http://s000.tinyupload.com/?file_id=08754130146692169643 as a text file.

*NSET, NSET=Nodes_Pushed_Back_IB

99915527, 99915529, 99915530, 99915532, 99915533, 99915548, 99915549, 99915550, 99915551, 99915552, 99915553, 99915554, 99915555, 99915556, 99915557, 99915558, 99915562, 99915563, 99915564, 99915656, 99915657, 99915658, 99915659, 99915660, 99915661, 99915662, 99915663, 99915664, 99915665, 99915666, 99915667, 99915668, 99915669, 99915670, 99915885, 99915886, 99915887, 99915888, 99915889, 99915890, 99915891, 99915892, 99915893, 99915894, 99915895, 99915896, 99915897, 99915898, 99915899, 99915900, 99916042, 99916043, 99916044, 99916045, 99916046, 99916047, 99916048, 99916049, 99916050

*NSET, NSET=Nodes_Pushed_Back_OB

Any help would be much appreciated.

Hi I am still stuck with this issue any more suggestions? Latest code and error message is as below Thanks!

 import tkinter as tk
 from tkinter import filedialog
 file_path = filedialog.askopenfilename()
 print(file_path)
 data =  []
 data2 = []
 data3 = []
 flag= False
 with open(file_path,'r') as f:
     for line in f:
         if line.strip().startswith('*NSET, NSET=Nodes_Pushed_Back_IB'):
             flag= True
         elif line.strip().endswith('*NSET, NSET=Nodes_Pushed_Back_OB'):
             flag= False    #loop stops when condition is false i.e if false do nothing
         elif flag:          # as long as flag is true append
             data.append([int(x) for x in line.strip().split(',')]) 

 result is the following error:

 ValueError: invalid literal for int() with base 10: ''

Instead of reading these as strings I would like each to be a number in a list, i.e [98932850 98932852 98932853 98932855 98932856 98932871 98932872 98932873]

2
  • From your code, does any line start with '*NSET, NSET=Nodes_Pushed_Back_IB'? we need to see a perfect representation of the line. And then you need to split valid lines Commented Jul 27, 2019 at 16:54
  • Hi thanks AAA Yes I have edited the sample data in my original question above to include where '*NSET, NSET=Nodes_Pushed_Back_IB' is. The data set is quite large so have not included all the data but the above is representative. Should I then have: data.append(strip().split(","))? How then to convert the values to int. Thanks. Commented Jul 28, 2019 at 6:05

3 Answers 3

1

In such cases I use regular expressions together with string methods. I would solve this problem like so:

import re 
with open(filepath) as f:
    txt = f.read()

g = re.search(r'NSET=Nodes_Pushed_Back_IB(.*)', txt, re.S)
snums = g.group(1).replace(',', ' ').split()
numbers = [int(num) for num in snums]

I read the entire text into txt. Next I use a regular expression and use the last portion of your header in the text as an anchor, and capture with capturing parenthesis all the rest (the re.S flag means that a dot should capture also newlines). I access all the nubers as one unit of text via g.group(1).

Next. I remove all the commas (actually replace them with spaces) because on the resulting text I use split() which is an excellent function to use on text items that are separated with spaces - it doesn't matter the amount of spaces, it just splits it as you would intent.

The rest is just converting the text to numbers using a list comprehension.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi thanks i the following error when trying this: ValueError: invalid literal for int() with base 10: '*NSET'
Make sure you copy the program exactly as it is. The regular expression gets rid of the header.
1

Your line contains more than one number, and some separating characters. You could parse that format by judicious application of split and perhaps strip, or you could minimize string handling by having re extract specifically the fields you care about:

ints = list(map(int, re.findall(r'-?\d+', line)))

This regular expression will find each group of digits, optionally prefixed by a minus sign, and then map will apply int to each such group found.

Comments

1

Using a sample of your string:

strings = '  98932850,  98932852,  98932853,  98932855,  98932856,  98932871,  98932872,  98932873,\n'

I'd just split the string, strip the commas, and return a list of numbers:

numbers = [ int(s.strip(',')) for s in strings.split() ]

Based on your comment and regarding the larger context of your code. I'd suggest a few things:

from itertools import groupby
number_groups = []
with open('data.txt', 'r') as f:
    for k, g in groupby(f, key=lambda x: x.startswith('*NSET')):
        if k:
            pass
        else:
            number_groups += list(filter('\n'.__ne__, list(g)))  #remove newlines in list

data = []
for group in number_groups:
    for str_num in group.strip('\n').split(','):
        data.append(int(str_num))

7 Comments

Hi I tried this at the end of my existing line of code but get the following error: AttributeError: 'list' object has no attribute 'split'
If you have a list of strings, then it is even simpler: just remove the split() function.
Thanks for the help I am however hoping to adapt the code I currently have in combination with your first suggestion. I have edited the code in my original question. I do not understand why this does not work.
@Jay My guess is that you have list of strings. And, you're trying to split the list instead of the string. I included in my example two vars, group and str_num, to highlight that issue.
Hi thanks however I still can not get this to work even with your suggested second code. I have uploaded the full set of data as a text file in the original post as II realized the formatting is slightly different. I tried to replace data.append(line.strip()) with data.append([int(x) for x in line.replace(" ", "").split(',')]). However this still does not work. The split function seems to work on this however it still throws up an error in regards to ValueError: invalid literal for int() with base 10: '\n' So I can not seem to remove the space. That I think comes in when I split.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.