2

I want to create a multiindex Dataframe in Pandas, however, I have 2 values (string and float), that are unique and to my understanding should be on the highest level. Each of my data sets has one string with a respective (float) value and 4 features that have their respective values in 16 columns.

What is the correct/pythonic way to create such a dataframe?

1.) Having the value as a level?

2.) Inserting the value 4 times in all columns of the set as its own column?

3.) Something more elegant I am not aware of? sample data

If you think this question is for whatever reasons inappropriate, please let me know in a short comment why and don't just down rate. Thanks a lot!

3
  • Can you add some input sample data? Commented Oct 27, 2017 at 11:31
  • "I have 2 values ( string and float)" dont forget to add a sample of them here Commented Oct 27, 2017 at 11:31
  • @Bharath shetty, jezrael : added sample data as a picture, didn't know how else to do it Commented Oct 27, 2017 at 11:46

1 Answer 1

1

I think the best is create DataFrame with:

Name, Features, value <-MultiIndex with 3 levels
Pos1 - Pos16 - columns

But all depends of what you need do later with data.

Sign up to request clarification or add additional context in comments.

5 Comments

so what you would suggest is option 1 in the question, right? I want to frequently use the values and correlate them with values of the features and maybe also use them as a variable to plot my data
Yes, but in my opinion the best is dont mix string columns with numeric. Because if want some easy operiation like df = df * 10, then if string column it fail. But if Strings columns are in MultiIndex and all data columns are numeric, it working nice. The best is distinguish data and metadata (in columns, index)
following your argument it would be best to put the strings in the index? I thought the first level of the multiindex df is treated like some kind of index. Is this incorrect?
It is still up tou you ;) If it is index or columns, it depends for your using. But simpliest is selecting by columns like df['col']. As MultiIndex column like df.xs('col', axis=1, level=0). But all works ;)
Thanks for that last comment, it really made things clear for me!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.