1

I have a text file which contains m rows like the following:

0.4698537878,0.1361006627,0.2400000000,0.7209302326,0.0054816275,0.0116666667,1 0.5146649986,0.0449680289,0.4696969697,0.5596330275,0.0017155500,0.0033333333,0 0.4830107706,0.0684999306,0.3437500000,0.5600000000,0.0056351257,0.0116666667,0 0.4458490073,0.1175445834,0.2307692308,0.6212121212,0.0089169801,0.0200000000,0

I tried to read the file and copy it into a matrix like in the following code:

import string

file = open("datasets/train.txt",encoding='utf8')

for line in file.readlines():
    tmp = line.strip()
    tmp = tmp.split(",")
    idx = np.vstack(tmp)
    idy = np.hstack(tmp[12])

matrix = idx

I want to read the file as its into the matrix, in my sample data the matrix size should be: (4,6) and idy: (4,1) # the last line, the labels

but it stacked the last line of the file vertically !? like that:

0.4458490073,

0.1175445834,

0.2307692308,

0.6212121212,

0.0089169801,

0.0200000000,

0

any help?

4
  • How is np defined? Commented Feb 11, 2018 at 1:29
  • 1
    Numpy library .. Commented Feb 11, 2018 at 1:34
  • Does your actual file have new lines in it? Commented Feb 11, 2018 at 1:48
  • The problem with your original code is that idx = np.vstack(tmp) doesn't concatenate 'tmp' vertically into an existing array idx; it just turns tmp into a vertical array, then replaces idx with that. You could fix your code by using idx = [] before the loop, then idx.append(tmp) inside the loop, then matrix = np.array(idx) after the loop completes. Then use @jp_data_analysis's technique to split the matrix into id and data parts. Commented Feb 11, 2018 at 2:05

2 Answers 2

3

Since you are using numpy, this functionality is already available:

arr = np.genfromtxt('file.csv', delimiter=',')

You can then separate headers as follows:

data = arr[:, :-1]
header = arr[:, -1:]
Sign up to request clarification or add additional context in comments.

1 Comment

thanx, but how can i get the last line "labels" independently ?
1

This should get you the right shape (4,6) for the idx variable and (4,1) for the labels

alllines = open('train.txt', 'r').readlines()
# shape (4,6)
idx = np.matrix([line.replace('\n', '').split(',')[0:6] for line in alllines])
# reshape to (4,1) for labels
idy = np.matrix([line.replace('\n', '').split(',')[6] for line in alllines]).reshape(-1, 1)

2 Comments

are there any other way with determining the lengths of the matrix "dynamic" ?
@MIBMinion I just changed the solution for the idy variable so you don't have to know anything ahead of time. But if your data is wider, and you don't know how wide it is, I would use the other solution genfromtxt and pull out the labels separately.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.