0

I have a pandas DataFrame that comes with informations, df.info() prints as following,

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6661 entries, 0 to 6660
Data columns (total 3 columns):
value      6661 non-null float64
country    6477 non-null object
outlier    6661 non-null int64
dtypes: float64(1), int64(1), object(1)
memory usage: 208.2+ KB
None 

df.columns.values prints as following,

[u'value' 'country' 'outlier'] 

df prints as following,

       value country  outlier
0     118.66   CHINA        0
1     120.83   CHINA        0
2      86.83   USA          0
3     112.15   CHINA        0
4     113.60   CHINA        0
5     114.32   CHINA        1
6     111.43   CHINA        0
7     117.22   CHINA        1
8     111.43   CHINA        0

- - - - - - - - - - - - - - -

- - - - - - - - - - - - - - -

6652  420.00     USA        0
6653  420.00     USA        0
6654  500.00     USA        0
6655  500.00     USA        0
6656  390.00     USA        1
6657  450.00     USA        0
6658  420.00     USA        0
6659  420.00     USA        1
6660  450.00     USA        0

The value for 1 in the outlier column is considered as outlier and I would like to visualize the value for respective countries w/o considering the outliers. I should mentioned, the indexes of the DF is not to be considered and I need to put own indexes for the respective countries. To clarify, the DF index of 2 is for the data for the USA (2 86.83 USA 0) and it will be the index zero data for US. The index 2 data for the China will be (3 112.15 CHINA 0) and so on.

I was tried to use the code snippet and it didn't work as expected.

import matplotlib.pyplot as plt
df.plot.bar()
df.plot()
plt.show(block=True)

How to do that properly ?

2
  • 1
    What type of plot are you looking for? there many ways to "visualize the value for respective countries". You must be more specific. Commented Mar 10, 2017 at 4:32
  • Please, have a look in the question. I would like to have simple line graph with values over the Y-axis and the indexes for the respective countries on the X-axis Commented Mar 10, 2017 at 4:33

1 Answer 1

1

I think you can first filter values where outlier is 1 and then reshape dataframe by pivot:

df = df[df.outlier == 1]
df['g'] = df.groupby('country').cumcount()

df = df.pivot(index='g', columns='country', values='value')
print (df)
country   CHINA    USA
g                     
0        114.32  390.0
1        117.22  420.0

df.plot()

Another solution is groupby with unstack:

df = df[df.outlier == 1]
df = df.groupby('country')['value'].apply(lambda x: pd.Series(x.values)).unstack(0)

print (df)
country   CHINA    USA
0        114.32  390.0
1        117.22  420.0

df.plot()
Sign up to request clarification or add additional context in comments.

1 Comment

Can we talk little in chat ? I need to ask you something. I partially solved the issue, but still need some modification.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.