40

I'm using df.columns.values to make a list of column names which I then iterate over and make charts, etc... but when I set this up I overlooked the non-numeric columns in the df. Now, I'd much rather not simply drop those columns from the df (or a copy of it). Instead, I would like to find a slick way to eliminate them from the list of column names.

Now I have:

names = df.columns.values 

what I'd like to get to is something that behaves like:

names = df.columns.values(column_type=float64) 

Is there any slick way to do this? I suppose I could make a copy of the df, and drop those non-numeric columns before doing columns.values, but that strikes me as clunky.

Welcome any inputs/suggestions. Thanks.

1

5 Answers 5

25

Someone will give you a better answe than this possibly, but one thing I tend to do is if all my numeric data are int64 or float64 objects, then you can create a dict of the column data types and then use the values to create your list of columns.

So for example, in a dataframe where I have columns of type float64, int64 and object firstly you can look at the data types as so:

DF.dtypes

and if they conform to the standard whereby the non-numeric columns of data are all object types (as they are in my dataframes), then you can do the following to get a list of the numeric columns:

[key for key in dict(DF.dtypes) if dict(DF.dtypes)[key] in ['float64', 'int64']]

Its just a simple list comprehension. Nothing fancy. Again, though whether this works for you will depend upon how you set up you dataframe...

Sign up to request clarification or add additional context in comments.

1 Comment

I ended up using this because it works and because I'm running 0.14.0 and didn't want to upgrade to 0.14.1 in the middle of my project. Thanks.
24

dtypes is a Pandas Series. That means it contains index & values attributes. If you only need the column names:

headers = df.dtypes.index

it will return a list containing the column names of "df" dataframe.

Comments

19

There's a new feature in 0.14.1, select_dtypes to select columns by dtype, by providing a list of dtypes to include or exclude.

For example:

df = pd.DataFrame({'a': np.random.randn(1000),
                   'b': range(1000),
                   'c': ['a'] * 1000,
                   'd': pd.date_range('2000-1-1', periods=1000)})


df.select_dtypes(['float64','int64'])

Out[129]: 
            a    b
0    0.153070    0
1    0.887256    1
2   -1.456037    2
3   -1.147014    3
...

1 Comment

select_dtypes now also allows selecting more general categories (df.select_dtypes('number'), df.select_dtypes('object') or df.select_dtypes('datetime'), for example).
8

To get the column names from pandas dataframe in python3- Here I am creating a data frame from a fileName.csv file

>>> import pandas as pd
>>> df = pd.read_csv('fileName.csv')
>>> columnNames = list(df.head(0)) 
>>> print(columnNames)

Comments

0

You can also try to get the column names from panda data frame that returns columnn name as well dtype. here i'll read csv file from https://mlearn.ics.uci.edu/databases/autos/imports-85.data but you have define header that contain columns names.

import pandas as pd

url="https://mlearn.ics.uci.edu/databases/autos/imports-85.data"

df=pd.read_csv(url,header = None)

headers=["symboling","normalized-losses","make","fuel-type","aspiration","num-of-doors","body-style",
         "drive-wheels","engine-location","wheel-base","length","width","height","curb-weight","engine-type",
         "num-of-cylinders","engine-size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm"
         ,"city-mpg","highway-mpg","price"]

df.columns=headers

print df.columns

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.