2

I got a big text file with data from a spectroscopy.

The first few lines are like these:

397.451 -48.38

397.585 -48.38

397.719 -48.38

397.853 -18.38

397.987 -3.38

398.121 6.62

398.256 -0.38

398.39  -1.38

398.524 7.62

398.658 4.62

398.792 -4.38

398.926 12.62

399.06  5.62

399.194 -6.38

399.328 -6.38

399.463 0.6

399.597 -6.38

399.731 -12.38

399.865 1.62

399.999 2.62

What I would like to do is to create two lists where one contains e.g [397.451, 397.585, 397.719.... etc]

And the other [-48.38, -48.38,-48.38, -18.38,-3.38 ...etc]

4
  • use split() for i in list then append split()[0] to one new list1and split()[1] to one new list2 Commented Feb 25, 2021 at 18:12
  • ok first need to read file line by line and append values of each line into a list Commented Feb 25, 2021 at 18:14
  • Does this answer your question? Reading specific columns from a text file in python Commented Feb 25, 2021 at 18:37
  • I think pandas read_csv is the way to go for this. It'll give you a dataframe. Commented Feb 25, 2021 at 18:39

4 Answers 4

1

Sticking to the basics:

fil = open("big_text_file.txt")
list1 = []
list2 = []
text = fil.readline()
while text:
    try:
        nums = text.split()
        list1.append(float(nums[0]))
        list2.append(float(nums[1]))
    except:
        pass
    text = fil.readline()

print(list1)
print(list2)

Explanation:

  • create two lists
  • As you said it is a big text file (so reading line by line)
  • splitting the line read on space " " (Single Space is default in split)
  • If the above fails means empty line. (That's what try and except are for)
  • update the two lists (if no error)
  • read next line.

Output:

[397.451, 397.585, 397.719, 397.853, 397.987, 398.121, 398.256, 398.39, 398.524, 398.658, 398.792, 398.926, 399.06, 399.194, 399.328, 399.463, 399.597, 399.731, 399.865, 399.999]
[-48.38, -48.38, -48.38, -18.38, -3.38, 6.62, -0.38, -1.38, 7.62, 4.62, -4.38, 12.62, 5.62, -6.38, -6.38, 0.62, -6.38, -12.38, 1.62, 2.62]
Sign up to request clarification or add additional context in comments.

6 Comments

This did the job perfectly, thank you so much!
@Rishabh Kumar is it faster for very big files; try: nums = text.split() except: pass
See, It may not be the fastest way to do it. But its memory efficient. As OP said its a very big text file. let's assume something in GBs and say system memory is 4GB (pretty common), then this could pose a problem. If you have enough memory in your system, there are other options too, like loading the entire text file into memory using readLines and all, this could be faster.
@Rishabh Kumar I am trying to evaluate the time needed with different option using begin0 = datetime.now() , time.process_time(), time.perf_counter() and then after the script print('time 0 :' , datetime.now() - begin0[0], ' process_time : ', time.process_time() - begin0[1] , ' perf_counter : ', time.perf_counter() - begin0[2],'\n\n')
but with the example file I am getting different results (i.e. fastest isnt always the same script, how big should the initial file to see consistent result ? Am I missing the right way to evaluate the speed of a script ? sorry to bother but I am on no more question ban (PS I voted for your aswer)
|
0

Use the csv library: https://docs.python.org/3/library/csv.html

Solution:

import csv

with open("spectroscopy.txt", newline="") as csvfile:
    reader = csv.reader(csvfile, delimiter=" ")
    column_A = []
    column_B = []
    for row in reader:
        try:
            column_A.append(float(row[0]))
            column_B.append(float(row[1]))
        except ValueError:
            pass

Alternative with pandas:

import pandas as pd

data = pd.read_csv("spectroscopy.txt", sep=" ", header=None, index_col=0)

1 Comment

Thank you for the help!
0
spect_list = []

spect_list_a =[]

spect_list_b =[]

with open('spect.txt') as f:
    for i in  f.readlines():            #read entire file as lines
        i = (i.rstrip('\n'))        #remove newlin character
        if i:                       #discard blank lines
            spect_list.append(i)
            spect_list_a.append(i.split()[0])
            spect_list_b.append(i.split()[1])
                 
print(spect_list)
print(spect_list_a)
print(spect_list_b)

you get python list with element as 'element' (with quotes) not sure is the right answer

got it :

use

spect_list_a.append(float(i.split()[0]))
spect_list_b.append(float(i.split()[1]))

Comments

0

Using a transposition trick and a parameter to auto-convert the columns to float. Also, skipinitialspace handles a couple of lines with two spaces between the values.

import csv

# The quoting value auto-converts numeric columns to float.
with open('input.csv',newline='') as f:
    r = csv.reader(f,delimiter=' ',quoting=csv.QUOTE_NONNUMERIC,skipinitialspace=True)
    data = list(r)

# transpose row/col data and convert to list (otherwise, it would be tuple)
col1,col2 = [list(col) for col in zip(*data)]
print(col1)
print(col2)
[397.451, 397.585, 397.719, 397.853, 397.987, 398.121, 398.256, 398.39, 398.524, 398.658, 398.792, 398.926, 399.06, 399.194, 399.328, 399.463, 399.597, 399.731, 399.865, 399.999]
[-48.38, -48.38, -48.38, -18.38, -3.38, 6.62, -0.38, -1.38, 7.62, 4.62, -4.38, 12.62, 5.62, -6.38, -6.38, 0.62, -6.38, -12.38, 1.62, 2.62]

Using pandas:

import pandas as pd
data = pd.read_csv('input.csv',sep=' ',skipinitialspace=True,header=None)
col1 = list(data[0])
col2 = list(data[1])
print(col1)
print(col2)

Using no imports:

with open('input.csv') as f:
    data = [[float(n) for n in row.split()] for row in f]
col1,col2 = [list(n) for n in zip(*data)]
print(col1)
print(col2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.