How would I split these data in to two separate lists to plot in python?

Question

I got a big text file with data from a spectroscopy.

The first few lines are like these:

397.451 -48.38

397.585 -48.38

397.719 -48.38

397.853 -18.38

397.987 -3.38

398.121 6.62

398.256 -0.38

398.39  -1.38

398.524 7.62

398.658 4.62

398.792 -4.38

398.926 12.62

399.06  5.62

399.194 -6.38

399.328 -6.38

399.463 0.6

399.597 -6.38

399.731 -12.38

399.865 1.62

399.999 2.62

What I would like to do is to create two lists where one contains e.g [397.451, 397.585, 397.719.... etc]

And the other [-48.38, -48.38,-48.38, -18.38,-3.38 ...etc]

use split() for i in list then append split()[0] to one new list1and split()[1] to one new list2 — pippo1980
– pippo1980, Commented Feb 25, 2021 at 18:12
ok first need to read file line by line and append values of each line into a list — pippo1980
– pippo1980, Commented Feb 25, 2021 at 18:14
Does this answer your question? Reading specific columns from a text file in python — Kraigolas
– Kraigolas, Commented Feb 25, 2021 at 18:37
I think pandas read_csv is the way to go for this. It'll give you a dataframe. — Kraigolas
– Kraigolas, Commented Feb 25, 2021 at 18:39

Rishabh Kumar · Accepted Answer · 2021-02-25 18:36:38Z

1

Sticking to the basics:

fil = open("big_text_file.txt")
list1 = []
list2 = []
text = fil.readline()
while text:
    try:
        nums = text.split()
        list1.append(float(nums[0]))
        list2.append(float(nums[1]))
    except:
        pass
    text = fil.readline()

print(list1)
print(list2)

Explanation:

create two lists
As you said it is a big text file (so reading line by line)
splitting the line read on space " " (Single Space is default in split)
If the above fails means empty line. (That's what try and except are for)
update the two lists (if no error)
read next line.

Output:

[397.451, 397.585, 397.719, 397.853, 397.987, 398.121, 398.256, 398.39, 398.524, 398.658, 398.792, 398.926, 399.06, 399.194, 399.328, 399.463, 399.597, 399.731, 399.865, 399.999]
[-48.38, -48.38, -48.38, -18.38, -3.38, 6.62, -0.38, -1.38, 7.62, 4.62, -4.38, 12.62, 5.62, -6.38, -6.38, 0.62, -6.38, -12.38, 1.62, 2.62]

edited Feb 25, 2021 at 18:36

answered Feb 25, 2021 at 18:20

Rishabh Kumar

2,4403 gold badges16 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Johand Over a year ago

This did the job perfectly, thank you so much!

pippo1980 Over a year ago

@Rishabh Kumar is it faster for very big files; try: nums = text.split() except: pass

Rishabh Kumar Over a year ago

See, It may not be the fastest way to do it. But its memory efficient. As OP said its a very big text file. let's assume something in GBs and say system memory is 4GB (pretty common), then this could pose a problem. If you have enough memory in your system, there are other options too, like loading the entire text file into memory using readLines and all, this could be faster.

pippo1980 Over a year ago

@Rishabh Kumar I am trying to evaluate the time needed with different option using begin0 = datetime.now() , time.process_time(), time.perf_counter() and then after the script print('time 0 :' , datetime.now() - begin0[0], ' process_time : ', time.process_time() - begin0[1] , ' perf_counter : ', time.perf_counter() - begin0[2],'\n\n')

pippo1980 Over a year ago

but with the example file I am getting different results (i.e. fastest isnt always the same script, how big should the initial file to see consistent result ? Am I missing the right way to evaluate the speed of a script ? sorry to bother but I am on no more question ban (PS I voted for your aswer)

|

0K9S · Accepted Answer · 2021-02-25 18:43:32Z

0

Use the csv library: https://docs.python.org/3/library/csv.html

Solution:

import csv

with open("spectroscopy.txt", newline="") as csvfile:
    reader = csv.reader(csvfile, delimiter=" ")
    column_A = []
    column_B = []
    for row in reader:
        try:
            column_A.append(float(row[0]))
            column_B.append(float(row[1]))
        except ValueError:
            pass

Alternative with pandas:

import pandas as pd

data = pd.read_csv("spectroscopy.txt", sep=" ", header=None, index_col=0)

edited Feb 25, 2021 at 18:43

answered Feb 25, 2021 at 18:17

0K9S

362 bronze badges

1 Comment

Johand Over a year ago

Thank you for the help!

pippo1980 · Accepted Answer · 2021-02-25 18:50:29Z

0

spect_list = []

spect_list_a =[]

spect_list_b =[]

with open('spect.txt') as f:
    for i in  f.readlines():            #read entire file as lines
        i = (i.rstrip('\n'))        #remove newlin character
        if i:                       #discard blank lines
            spect_list.append(i)
            spect_list_a.append(i.split()[0])
            spect_list_b.append(i.split()[1])
                 
print(spect_list)
print(spect_list_a)
print(spect_list_b)

you get python list with element as 'element' (with quotes) not sure is the right answer

got it :

use

spect_list_a.append(float(i.split()[0]))
spect_list_b.append(float(i.split()[1]))

edited Feb 25, 2021 at 18:50

answered Feb 25, 2021 at 18:43

pippo1980

3,3463 gold badges18 silver badges43 bronze badges

Comments

Mark Tolonen · Accepted Answer · 2021-02-25 19:47:28Z

Using a transposition trick and a parameter to auto-convert the columns to float. Also, skipinitialspace handles a couple of lines with two spaces between the values.

import csv

# The quoting value auto-converts numeric columns to float.
with open('input.csv',newline='') as f:
    r = csv.reader(f,delimiter=' ',quoting=csv.QUOTE_NONNUMERIC,skipinitialspace=True)
    data = list(r)

# transpose row/col data and convert to list (otherwise, it would be tuple)
col1,col2 = [list(col) for col in zip(*data)]
print(col1)
print(col2)

[397.451, 397.585, 397.719, 397.853, 397.987, 398.121, 398.256, 398.39, 398.524, 398.658, 398.792, 398.926, 399.06, 399.194, 399.328, 399.463, 399.597, 399.731, 399.865, 399.999]
[-48.38, -48.38, -48.38, -18.38, -3.38, 6.62, -0.38, -1.38, 7.62, 4.62, -4.38, 12.62, 5.62, -6.38, -6.38, 0.62, -6.38, -12.38, 1.62, 2.62]

Using pandas:

import pandas as pd
data = pd.read_csv('input.csv',sep=' ',skipinitialspace=True,header=None)
col1 = list(data[0])
col2 = list(data[1])
print(col1)
print(col2)

Using no imports:

with open('input.csv') as f:
    data = [[float(n) for n in row.split()] for row in f]
col1,col2 = [list(n) for n in zip(*data)]
print(col1)
print(col2)

Collectives™ on Stack Overflow

How would I split these data in to two separate lists to plot in python?

4 Answers 4

6 Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related