read text file into matrix - python

Question

I have a text file which contains m rows like the following:

0.4698537878,0.1361006627,0.2400000000,0.7209302326,0.0054816275,0.0116666667,1 0.5146649986,0.0449680289,0.4696969697,0.5596330275,0.0017155500,0.0033333333,0 0.4830107706,0.0684999306,0.3437500000,0.5600000000,0.0056351257,0.0116666667,0 0.4458490073,0.1175445834,0.2307692308,0.6212121212,0.0089169801,0.0200000000,0

I tried to read the file and copy it into a matrix like in the following code:

import string

file = open("datasets/train.txt",encoding='utf8')

for line in file.readlines():
    tmp = line.strip()
    tmp = tmp.split(",")
    idx = np.vstack(tmp)
    idy = np.hstack(tmp[12])

matrix = idx

I want to read the file as its into the matrix, in my sample data the matrix size should be: (4,6) and idy: (4,1) # the last line, the labels

but it stacked the last line of the file vertically !? like that:

0.4458490073,

0.1175445834,

0.2307692308,

0.6212121212,

0.0089169801,

0.0200000000,

0

any help?

The problem with your original code is that idx = np.vstack(tmp) doesn't concatenate 'tmp' vertically into an existing array idx; it just turns tmp into a vertical array, then replaces idx with that. You could fix your code by using idx = [] before the loop, then idx.append(tmp) inside the loop, then matrix = np.array(idx) after the loop completes. Then use @jp_data_analysis's technique to split the matrix into id and data parts. — Matthias Fripp
– Matthias Fripp, Commented Feb 11, 2018 at 2:05

jpp · Accepted Answer · 2018-02-11 16:50:08Z

3

Since you are using numpy, this functionality is already available:

arr = np.genfromtxt('file.csv', delimiter=',')

You can then separate headers as follows:

data = arr[:, :-1]
header = arr[:, -1:]

edited Feb 11, 2018 at 16:50

answered Feb 11, 2018 at 1:49

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Minions Over a year ago

thanx, but how can i get the last line "labels" independently ?

datawrestler · Accepted Answer · 2018-02-11 01:52:59Z

1

This should get you the right shape (4,6) for the idx variable and (4,1) for the labels

alllines = open('train.txt', 'r').readlines()
# shape (4,6)
idx = np.matrix([line.replace('\n', '').split(',')[0:6] for line in alllines])
# reshape to (4,1) for labels
idy = np.matrix([line.replace('\n', '').split(',')[6] for line in alllines]).reshape(-1, 1)

edited Feb 11, 2018 at 1:52

answered Feb 11, 2018 at 1:46

datawrestler

1,57715 silver badges17 bronze badges

2 Comments

Minions Over a year ago

are there any other way with determining the lengths of the matrix "dynamic" ?

datawrestler Over a year ago

@MIBMinion I just changed the solution for the idy variable so you don't have to know anything ahead of time. But if your data is wider, and you don't know how wide it is, I would use the other solution genfromtxt and pull out the labels separately.

Collectives™ on Stack Overflow

read text file into matrix - python

2 Answers 2

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related