44

I have a dataset in a relational database format (linked by ID's over various .csv files).

I know that each data frame contains only one value of an ID, and I'd like to know the simplest way to extract values from that row.

What I'm doing now:

# the group has only one element
purchase_group = purchase_groups.get_group(user_id)
price = list(purchase_group['Column_name'])[0]

The third row is bothering me as it seems ugly, however I'm not sure what is the workaround. The grouping (I guess) assumes that there might be multiple values and returns a <class 'pandas.core.frame.DataFrame'> object, while I'd like just a row returned.

0

5 Answers 5

84

If you want just the value and not a df/series then call values and index the first element [0] so just:

price = purchase_group['Column_name'].values[0]

will work.

Sign up to request clarification or add additional context in comments.

3 Comments

this is what I tend to use, but I must say that it still feels a little clunky.
Yeah, it does feel clunky. Part of me wants to see a new feature made for conciseness in these kind of scenarios. Something like df.sel(col: str|int = 0, row: str|int = 0) where row can be supplied as a string indexer or row number, and column can be supplied as the column header or name, both of which defaulting to 0. col being the first positional argument because, imo, that suits a DataFrame better in this context (opposed to a Series, which is probably what someone is working with anyways if they'd benefit from something like df.sel(row: str|int = 0, col: str|int = 0)).
Edit: It exists at and iat such as df.at[0, 'A']. Read more: stackoverflow.com/questions/16729574/…
15

Late to the party here, but purchase_group['Column Name'].item() is now available and is cleaner than some other solutions

Comments

10

If purchase_group has single row then doing purchase_group = purchase_group.squeeze() would make it into a series so you could simply call purchase_group['Column_name'] to get your values

Comments

2

I've come into this article over time on many occasions. And it's the typical thing that I have written in different repositories in a non-consistent way. So I've done a little benchmark of which method is faster.

For the tests I have used a dataframe in which there is a single column and a single row.

df['_col0'].values[0] -> 3.34 µs ± 4.65 ns
df['_col0'].item() -> 4.7 µs ± 10.6 ns
df.values[0][0] -> 6.36 µs ± 611 ns
np.array(df)[0][0] -> 10.4 µs ± 1.19 µs
np.array(df['_col0'])[0] -> 11.2 µs ± 162 ns
df.squeeze() -> 18.3 µs ± 177 ns
df.iloc[0, 0] -> 18.8 µs ± 5.66 µs

Comments

0

This method is intuitive; for example to get the first row (list from a list of lists) of values from the dataframe:

np.array(df)[0]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.