R to Python subsetting via vector

Question

I'm a python newbie but have some R experience. In R if I'd like to subset a data.frame I can use a variable to do something like this:

# Columns

# Assign column names to variable
colsToUse <- c('col1','col2','col3')

# Use variable to subset
df2 <- df1[,colsToUse]

# Rows

# Assign column names to variable
rowsToUse <- sample(1:nrows(df1), 500)

# Use variable to subset
df3 <- df1[rowsToUse,]

How would I do this in python?

What sort of data structure do you plan on using in Python? Lists? Arrays? And do you want to subset by index or value? — C_Z_
– C_Z_, Commented Feb 16, 2015 at 17:53
@CactusWoman: Pandas/Numpy - end target is scikit-learn. I'm trying to create test/train subsets. — screechOwl
– screechOwl, Commented Feb 16, 2015 at 17:55

Glorfindel · Accepted Answer · 2023-01-12 20:07:05Z

2

Based on your stated use of pandas

colsToUse = ['col1', 'col2', 'col3']
rowsToUse = np.random.choice(range(len(df1)), 500)

df2 = df1.ix[:, colsToUse]
df3 = df1.ix[rowsToUse, :]

There are also some other DataFrame helper functions for indexing: df1.loc, df1.iloc, and df1.xs.

It's also helpful to look at the guide NumPy for MATLAB Users which also often answers questions for R users too, at least when dealing with just a numpy.ndarray).

edited Jan 12, 2023 at 20:07

Glorfindel

22.8k13 gold badges97 silver badges124 bronze badges

answered Feb 16, 2015 at 17:59

ely

77.8k36 gold badges158 silver badges234 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

R to Python subsetting via vector

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related