1

I have a dataset with 5 rows that I wish to merge into one so that I can use them as unique column identifiers. For example

Name Unique No. Summary Nominal Voltage Nominal Voltage Upstream Upstream NaN NaN Class Upstream Downstream Constraint Oppurtunity (non unique) NaN NaN NaN NaN Physical Nan

I would like the columns to be named

Name (non unique) Unique No. Summary Class Nominal Voltage Upstream Nominal Voltage Downstream Upstream Constraint Phsyical Upstream Oppurtunity

So the rows (there are actually 5) would be merged (while ignoring NaNs) which I could then use as unique column names.

Thanks in advance.

As far as I can understand, groupby requires something common between the things being grouped, so can't be used here? The whole database is currently of string type because I thought that would make it easier to join them, but I couldn't figure out a way.

1
  • I may be misreading/misunderstanding the documentation but I didn't think that merge join or concat could do what was required here. They seem to join dataframes, rather than taking the contents of multiple rows and returning them as one row. Commented Mar 28, 2017 at 13:49

1 Answer 1

1

I think you need apply with dropna:

df.columns = df.apply(lambda x: ' '.join([x.name] + x.dropna().tolist()))

print (df.columns.tolist())

['Name (non unique)', 
'Unique No.',
'Summary Class', 
'Nominal Voltage Upstream', 
'Nominal Voltage Downstream', 
'Upstream Constraint Physical', 
'Upstream Oppurtunity Nan']

If there are some string Nan - replace first:

df.columns = df.replace('Nan',np.nan)
               .apply(lambda x: ' '.join([x.name] + x.dropna().tolist()))
print (df.columns.tolist())
['Name (non unique)',
 'Unique No.', 
'Summary Class', 
'Nominal Voltage Upstream', 
'Nominal Voltage Downstream', 
'Upstream Constraint Physical',
 'Upstream Oppurtunity']

But if need unique column names, the simpliest is:

df.columns = range(len(df.columns))
print (df.columns.tolist())
[0, 1, 2, 3, 4, 5, 6]

Or assign new unique values of columns:

df.columns = list('abcdefg')
print (df.columns.tolist())
['a', 'b', 'c', 'd', 'e', 'f', 'g']
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, so apply is the way! (I appreciate columns a-z etc would be easier but I need the titles for the later code to check and identify, as the columns aren't always in the same order)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.