How do I read a single value from dataframe in Python?

Question

I am trying to find a way to read just one value from a big dataframe in Python. I have 2 data tables in my project.

One looks like this:

Company ID  Company  201512  201511  ...  199402  199401
1234        abc      1.1     0.8     ...  2.1     -0.9
.
.
.
4321        cba      2.1     -0.4    ...  0.3     -0.1

There are about 260 months and 10,000 companies. I need to check their monthly returns one by one and see if there are 36 valid data points behind that data point. That means there is no "0" or "NaN". If there are 36 valid data points, I need to run a regression of these 36 data points against 7 factors, which are listed in another table.

The other table looks like this:

Month    Factor1     Factor2     ...     Factor6     Factor7  
201512   -0.4        1.1         ...     2.1         1.2
.
.
.
199401   0.1         0.2         ...     0.3         0.4

Now my problem is, I couldn't find a way to load just one value at a time from table 1 and create a loop for it. Can someone please advise?

Well you could use value = df['some_field'].iloc[the_index] but you perhaps don't want that in a for loop if there's a way to group_by.aggregate() in some way and take a specific value. — roganjosh
– roganjosh, Commented Sep 22, 2017 at 19:01
Because 0 is highly likely to be just a missing data point or typo. — Jeremy.O
– Jeremy.O, Commented Sep 22, 2017 at 19:25

user8658280 · Accepted Answer · 2017-09-22 21:13:29Z

1

You can iterate over rows with following code:

for index, row in df.iterrows():

Then the index would be the index of the row, and you can access the columns with lets say row["Company"] for example.

answered Sep 22, 2017 at 21:13

user8658280

Sign up to request clarification or add additional context in comments.

Comments

acushner · Accepted Answer · 2017-09-22 23:16:34Z

0

you don't want a for loop for this.

assuming 0 is a valid monthly return and that you only have 36 columns after Company you can easily find all companies with valid monthly return data:

df = df[df.notnull().all(1)]

if, for some unknown reason, you want to get rid of 0s, you can do a replace first:

df = df[df.replace(0, np.nan).notnull().all(1)]

edit for the comment:

you could do something like:

cols = df.columns
first_col = get_first_return_col(df)
for i in range(first_col, len(cols)):
    df = df[df[cols[i : i + 36]].notnull().all(1)]
    run_regression(df[cols[i]])

edited Sep 22, 2017 at 23:16

answered Sep 22, 2017 at 19:02

acushner

9,9461 gold badge38 silver badges37 bronze badges

1 Comment

Jeremy.O Over a year ago

Thank you for the answer. This helps if I just need one regression for each company, but I actually need to run multiple regressions for each company. It goes like this. I read 201512 data for company abc, I found 36 valid data after that point, I run a regression and note done the results. Then I check 201511 data for the same company to see if there are still 36-month valid data points. If yes, I need to run another regression for these 36 months, which is just 1-month different from the previous regression.

Collectives™ on Stack Overflow

How do I read a single value from dataframe in Python?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related