0

I've got a CSV file with 20 columns & about 60000 rows.

I'd like to read fields 2 to 20 only. I've tried the below code but the browser(using ipython) freezes & it just goes n for ages

import numpy as np
from numpy import genfromtxt

myFile = 'sampleData.csv'
myData = genfromtxt(myFile, delimiter=',', usecols(2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19)
print myData

How could I tweak this to work better & actually produce output please?

2
  • 1
    I'd think the reading is fast, it's the printing that takes the time. Time just the read. Then print only what you need. Try myData[:10] etc. Do you have missing values, are you getting error messages? Commented Mar 2, 2016 at 4:53
  • 2
    genfromtxt() is notoriously slow. Try loadtxt() which is marginally faster or read it as a pandas dataframe which is apparently much faster. You can use the read_csv() function Commented Mar 2, 2016 at 5:06

1 Answer 1

2
import pandas as pd

myFile = 'sampleData.csv'
df = pd.DataFrame(pd.read_csv(myFile,skiprows=1)) // Skipping header

print df

This works like a charm

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.