0

I'm a python newbie but have some R experience. In R if I'd like to subset a data.frame I can use a variable to do something like this:

# Columns

# Assign column names to variable
colsToUse <- c('col1','col2','col3')

# Use variable to subset
df2 <- df1[,colsToUse]

# Rows

# Assign column names to variable
rowsToUse <- sample(1:nrows(df1), 500)

# Use variable to subset
df3 <- df1[rowsToUse,]

How would I do this in python?

2
  • What sort of data structure do you plan on using in Python? Lists? Arrays? And do you want to subset by index or value? Commented Feb 16, 2015 at 17:53
  • @CactusWoman: Pandas/Numpy - end target is scikit-learn. I'm trying to create test/train subsets. Commented Feb 16, 2015 at 17:55

1 Answer 1

2

Based on your stated use of pandas

colsToUse = ['col1', 'col2', 'col3']
rowsToUse = np.random.choice(range(len(df1)), 500)

df2 = df1.ix[:, colsToUse]
df3 = df1.ix[rowsToUse, :]

There are also some other DataFrame helper functions for indexing: df1.loc, df1.iloc, and df1.xs.

It's also helpful to look at the guide NumPy for MATLAB Users which also often answers questions for R users too, at least when dealing with just a numpy.ndarray).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.