0

I'm trying to change the columns of a dataframe to no avail.

Here is the dataframe:

>>> file = open("data.csv", "r")
>>> data = pd.DataFrame(file)
>>> print(data)
                                                   0
0   date,1. open,2. high,3. low,4. close,5. volume\n
1  2020-01-14,316.7,317.57,312.17,312.68,40653457...
2  2020-01-15,311.85,315.5,309.55,311.34,30480882...
3  2020-01-16,313.59,315.7,312.09,315.24,27207254...

Here is the data.rename function to change the column names as documented - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html

#Trying to change the columns with a dict
>>> data.rename(columns={"date": "date", "1. open": "open", "2. high": "high", "3. low": "low", "5. volume": "volume"})
                                                   0
0   date,1. open,2. high,3. low,4. close,5. volume\n
1  2020-01-14,316.7,317.57,312.17,312.68,40653457...
2  2020-01-15,311.85,315.5,309.55,311.34,30480882...
3  2020-01-16,313.59,315.7,312.09,315.24,27207254...

What am I doing wrong?


Updating: Thanks for all the responses.

I explicitly defined the column that I wanted to see and it all worked perfected.

>>> df.columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6']
>>> print (df)
         col1    col2    col3    col4    col5        col6
0  2020-01-14  316.70  317.57  312.17  312.68  40653457.0
1  2020-01-15  311.85  315.50  309.55  311.34  30480882.0
2  2020-01-16  313.59  315.70  312.09  315.24  27207254.0 
2
  • Read your dataframe with pandas.read_csv. Currently you are producing a single column with the name 0. Commented Jun 6, 2020 at 16:01
  • what does data = pd.read_csv() give you? Commented Jun 6, 2020 at 16:27

3 Answers 3

1

As others have said, use pd.read_csv("data.csv", columns=['col1', 'col2',...]) when reading a csv to a DataFrame.

Also, here's an easy way to change DataFrame column names:

df.columns = ['col_name1', 'col_name2', ...]
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much, bug. New tool in my arsenal!
When I use this within the interpreter, it worked perfectly. But when I use it within data.py package by explicitly defining the columns, the print is still invoking the old attributes. df = data df.column = ['date', 'open', 'high', 'low', 'close', 'volume'] print(df) And I still got this: df.column = ['date', 'open', 'high', 'low', 'close', 'volume'] 1. open 2. high 3. low 4. close 5. volume date 2020-01-14 163.3900 163.6000 161.7200 162.13 23500783.0
Found my typo in df.columns. Thank you again, bug.
0

You don't need to open the file when using pandas;

data = pandas.read_csv('path_to/your_file.csv')

I always add inplace=True to the rename function.

But if you just want to remove the numbers from the column names, you could as well do this;

data.columns = [col.split()[1] for col in data.columns]

1 Comment

Thank you, Sy. This one is very much for the keep. The only problem with my columns is that one of them has only index [0], so the split iterate doesn't work. I could use a 'While True' statement to check if there is a split applicable to each column.
0

you can explicitly specify the new column names(Apart from the ones which are already there in the file) while reading the file as follows:

df = pd.read_csv('data.csv', columns=['new_col1', 'new_col2', ...])

But if you won't add the columns attribute then bydefault the column names mentioned in the csv file will be taken by the dataframe.

1 Comment

This is perfect! I didn't realize that I could explicitly define how I want the columns. This solves the problem beautifully! Thank you very much, Vaibhav.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.