8

I am importing a text file into pandas, and would like to concatenate 3 of the columns from the file to make the index.

I am open to doing this in 1 or more steps. I can either do the conversion at the same time I create the DataFrame, or I can create the DataFrame and restructure it with the newly created column. Knowing how to do this both ways would be the most helpful for me.

I would eventually like the index to be value of concatenating the values in the first 3 columns.

3
  • What do you mean with 'concatenating the values'? Are it strings you want to concatenate? Or do you want a multi-index? Commented Jul 23, 2013 at 20:24
  • A multi-index won't work. I am just trying to concatenate 3 strings. Each one is in a seperate DataFrame field. Commented Jul 23, 2013 at 20:50
  • it would help if you post the data (or at least part of it), and your code so far Commented Jul 23, 2013 at 21:11

2 Answers 2

14

If your columns consist of strings, you can just use the + operator (addition in the context of strings is to concatenate them in python, and pandas follows this):

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'year':['2012', '2012'], 'month':['01', '02']})

In [3]: df
Out[3]:
  month  year
0    01  2012
1    02  2012

In [4]: df['concatenated'] = df['year'] + df['month']

In [5]: df
Out[5]:
  month  year concatenated
0    01  2012       201201
1    02  2012       201202

And then, if this column is created, you can just use set_index to change the index

In [6]: df = df.set_index('concatenated')

In [7]: df
Out[7]:
             month  year
concatenated
201201          01  2012
201202          02  2012

Note that pd.concat is not to 'concat'enate strings but to concatenate series/dataframes, so to add columns or rows of different dataframes or series together into one dataframe (not several rows/columns into one row/column). See http://pandas.pydata.org/pandas-docs/dev/merging.html for an extensive explanation of this.

Sign up to request clarification or add additional context in comments.

2 Comments

If the month and year data was integers, you could concatenate with: df['concatenated'] = df['year'].apply(str) + df['month'].apply(str)
How do I reverse the process?
1

If you're using read_csv to import your text file, there is an index_col argument that you can pass a list of column names or numbers to. This will end up creating a MultiIndex - I'm not sure if that suits your application.

If you want to explicitly concatenate your index together (assuming that they are strings), it seems you can do so with the + operator. (Warning, untested code ahead)

df['concatenated'] = df['year'] + df['month']
df.set_index('concatenated')

7 Comments

Sounds logical, but when I try it I get a "Reindexing only valid with uniquely valued Index objects" error. Is there something I am missing? The DataFrame has the default Auto Incrementing index, so I know it unique.
@DJElbow: Seems like the set of concatenated fields has duplicates. An index has to be unique.
Just to clarify - I am getting this error before resetting the index. This is the test code I am using that is throwing the error: visits['concatenated'] = pd.concat([visits['year'],visits['month']])
@DJElbow: Maybe try pd.concat([visits['year'], visits['month']]).reindex_like(visits)?
This gives me the same error -> d.concat([visits['year'], visits['month']]).reindex_like(visits)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.