38

I have a pandas dataframe and a list as follows

mylist = ['nnn', 'mmm', 'yyy']
mydata =
   xxx   yyy zzz nnn ddd mmm
0  0  10      5    5   5  5
1  1   9      2    3   4  4
2  2   8      8    7   9  0

Now, I want to get only the columns mentioned in mylist and save it as a csv file.

i.e.

     yyy  nnn   mmm
0    10     5     5
1    9      3     4
2    8      7     0

My current code is as follows.

mydata = pd.read_csv( input_file, header=0)

for item in mylist:
    mydata_new = mydata[item]

print(mydata_new)
mydata_new.to_csv(file_name)

It seems to me that my new dataframe produces wrong results.Where I am making it wrong? Please help me!

1
  • The answer by CS95 on Jan 18 is probably the best one. Slight improvement. Since you already have the needed column names defined in mylist, you could use that instead. print(mydata[mylist]) As for what you were doing wrong a. the for loop isn't necessary b. you declared a new variable "mydata_new" but didnt define it c. mydata[ite,] values are getting updated in mydata_new since you are not doing an append or insert,. hence you get values for the latest item in the itemList. hope these make sense. Commented May 29, 2024 at 19:44

5 Answers 5

80

Just pass a list of column names to index df:

df[['nnn', 'mmm', 'yyy']]

   nnn  mmm  yyy
0    5    5   10
1    3    4    9
2    7    0    8

If you need to handle non-existent column names in your list, try filtering with df.columns.isin -

df.loc[:, df.columns.isin(['nnn', 'mmm', 'yyy', 'zzzzzz'])]

   yyy  nnn  mmm
0   10    5    5
1    9    3    4
2    8    7    0
Sign up to request clarification or add additional context in comments.

5 Comments

Hi, thank you very much. However, it is a typo. I corrected it. Btw, I want to loop over than directly mentioning the columns heading as my real data list is very long. Is there any special way of doing it?
@JCena this might surprise you, but it's faster to select them all at once.
Thank you for the information. The reason why I said was some of the column heading names in mylist is not actually in my dataframe. So I get an error like this KeyError: "['recipe' 'food' 'calories' ..., ] not in index". Is there a way to avoid this?
@JCena Indeed, there is. See my last edit. Happy coding!
If you need to handle non-existent column names, the df.filter function provides a cleaner and shorter syntax than the .loc[:, df.columns.isin()] syntax proposed here. See my answer below for more details
6

You can just put mylist inside [] and pandas will select it for you.

mydata_new = mydata[mylist]

Not sure whether your yyy is a typo.

The reason that you are wrong is that you are assigning mydata_new to a new series every time in the loop.

for item in mylist:
    mydata_new = mydata[item]  # <-  

Thus, it will create a series rather than the whole df you want.


If some names in the list is not in your data frame, you can always check it with,

len(set(mylist) - set(mydata.columns)) > 0

and print it out

print(set(mylist) - set(mydata.columns))

Then see if there are typos or other unintended behaviors.

1 Comment

You're missing the fact that there's a (possible) "typo" in the column names.
2

If mylist contains some column names which are not in mydata.columns, you will get an error like

KeyError: "['fff'] not in index"

In this case, you can use the df.filter function:

mydata.filter(['nnn', 'mmm', 'yyy', 'fff'])

Comments

0

I had a case similar to the question above, but to solve I did this:

columns = ["a", "b", "c"]

df[[*columns, "d"]]

This unpacks the column names and uses them to generate a new dataframe with only the column names in the columns list.

UPDATE:

Ran into a KeyError: "['c'] not in index" when doing the above method. So I switched to the below code:

df.filter([*columns, "d"])

Comments

-1

You want to filter specific column from a pandas dataframe, you want to filter the columns are: 'nnn', 'mmm', 'yyy'

So the appropriate code will be like that:

filtered_df = mydata[['nnn', 'mmm', 'yyy']]

And please see the output in below image: enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.