DataFrame from variable and filtering data

Question

I have a DataFrame and want to extract 3 columns from it, but one of them is an input from the user. I made a list, but need it to be iterable so I can run a For iteration. So far I made it through by making a dictionary with 2 of the columns making a list of each and zipping them... but I really need the 3 columns...

My code:

Data=pd.read_csv(----------)
selec=input("What month would you want to show?")
NewData=[(Data['Country']),(Data['City']),(Data[selec].astype('int64')]

#here I try to iterate:
iteration=[i for i in NewData if NewData[i]<=25] 
print (iteration)
*TypeError:list indices must be int ot slices, not Series*

My CSV is the following:

I want to be able to choose the month with the variable "selec" and filter the results of the month I've chosen... so the output for selec="Feb" would be:

I tried as well with loc/iloc, but not lucky at all (unhashable type:'list').

Czaporka · Accepted Answer · 2021-03-06 09:58:36Z

See the below example for how you can:

select specific columns from a DataFrame by providing a list of columns between the selection brackets (link to tutorial)
select specific rows from a DataFrame by providing a condition between the selection brackets (link to tutorial)
iterate rows of a DataFrame, although I don't suppose you need it - if you'd like to keep working with the DataFrame after filtering it, it's better to use the method mentioned above (you won't have to put the rows back together, and it will likely be more performant because pandas is optimized for bulk operations)

import pandas as pd

# this is just for testing, instead of pd.read_csv(...)
df = pd.DataFrame([
    dict(Country="Spain", City="Madrid", Jan="15", Feb="16", Mar="17", Apr="18", May=""),
    dict(Country="Spain", City="Galicia", Jan="1", Feb="2", Mar="3", Apr="4", May=""),
    dict(Country="France", City="Paris", Jan="0", Feb="2", Mar="3", Apr="4", May=""),
    dict(Country="Algeria", City="Argel", Jan="20", Feb="28", Mar="29", Apr="30", May=""),
])

print("---- Original df:")
print(df)

selec = "Feb"  # let's pretend this comes from input()

print("\n---- Just the 3 columns:")
df = df[["Country", "City", selec]]  # narrow down the df to just the 3 columns
df[selec] = df[selec].astype("int64")  # convert the selec column to proper type
print(df)

print("\n---- Filtered dataframe:")
df1 = df[df[selec] <= 25]
print(df1)

print("\n---- Iterated & filtered rows:")
for row in df.itertuples():
    # we could also use row[3] instead of getattr(...)
    if getattr(row, selec) <= 25:
        print(row)

Output:

---- Original df:
   Country     City Jan Feb Mar Apr May
0    Spain   Madrid  15  16  17  18
1    Spain  Galicia   1   2   3   4
2   France    Paris   0   2   3   4
3  Algeria    Argel  20  28  29  30

---- Just the 3 columns:
   Country     City  Feb
0    Spain   Madrid   16
1    Spain  Galicia    2
2   France    Paris    2
3  Algeria    Argel   28

---- Filtered dataframe:
  Country     City  Feb
0   Spain   Madrid   16
1   Spain  Galicia    2
2  France    Paris    2

---- Iterated & filtered dataframe:
Pandas(Index=0, Country='Spain', City='Madrid', Feb=16)
Pandas(Index=1, Country='Spain', City='Galicia', Feb=2)
Pandas(Index=2, Country='France', City='Paris', Feb=2)

Collectives™ on Stack Overflow

DataFrame from variable and filtering data

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related