Matplotlib logarithmic scale with zero value [duplicate]

Question

I have a very large and sparse dataset of spam twitter accounts and it requires me to scale the x axis in order to be able to visualize the distribution (histogram, kde etc) and cdf of the various variables (tweets_count, number of followers/following etc).

    > describe(spammers_class1$tweets_count)
  var       n   mean      sd median trimmed mad min    max  range  skew kurtosis   se
1   1 1076817 443.47 3729.05     35   57.29  43   0 669873 669873 53.23  5974.73 3.59

In this dataset, the value 0 has a huge importance (actually 0 should have the highest density). However, with a logarithmic scale these values are ignored. I thought of changing the value to 0.1 for example, but it will not make sense that there are spam accounts that have 10^-1 followers.

So, what would be a workaround in python and matplotlib ?

it would be nice if you put your axes/plot code so as to be corrected. — Stephane Rolland
– Stephane Rolland, Commented May 5, 2013 at 9:29

unutbu · Accepted Answer · 2013-05-05 10:25:27Z

2

Add 1 to each x value, then take the log:

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker

fig, ax = plt.subplots()
x = [0, 10, 100, 1000]
y = [100, 20, 10, 50]
x = np.asarray(x) + 1 
y = np.asarray(y)
ax.plot(x, y)
ax.set_xscale('log')
ax.set_xlim(x.min(), x.max())
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1)))
ax.xaxis.set_major_locator(ticker.FixedLocator(x))
plt.show()

enter image description here

Use

ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x-1)))
ax.xaxis.set_major_locator(ticker.FixedLocator(x))

to relabel the tick marks according to the non-log values of x.

(My original suggestion was to use plt.xticks(x, x-1), but this would affect all axes. To isolate the changes to one particular axes, I changed all commands calls to ax, rather than calls to plt.)

matplotlib removes points which contain a NaN, inf or -inf value. Since log(0) is -inf, the point corresponding to x=0 would be removed from a log plot.

If you increase all the x-values by 1, since log(1) = 0, the point corresponding to x=0 will not be plotted at x=log(1)=0 on the log plot.

The remaining x-values will also be shifted by one, but it will not matter to the eye since log(x+1) is very close to log(x) for large values of x.

edited May 5, 2013 at 10:25

answered May 5, 2013 at 9:35

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

amaatouq Over a year ago

yes, but I will not be able to say in my paper that 50% of spammers have 0 followers. because it will be shown as 10^0 and this will mean that they have one follower (which is different).

unutbu Over a year ago

You could relabel the tick marks with plt.xticks. I've edited the post to show how.

amaatouq Over a year ago

In order not to shift all of the data. How can I efficiently add 0.1 to 0 values, so they will come up at the 10^-1 and then relabel the ticks ? I know this is another question. but It might be a better way of doing it without contaminating all of the data -shifting only 0 values- (and looping over large numpy arrays is very slow)

unutbu Over a year ago

If you have an array with many 0 values, you can change them to 0.1 with x[x<=0] = 0.1. Note that if the array is of dtype int, then you must first convert the array to dtype float: x = x.astype('float').

tacaswell Over a year ago

I protest in the strongest terms to modifying data before plotting it.

|

Stephane Rolland · Accepted Answer · 2013-05-05 09:25:02Z

0

ax1.set_xlim(0, 1e3)

Here is the example from matplotlib documentation.

And there it sets the limit values of the axes this way:

ax1.set_xlim(1e1, 1e3)
ax1.set_ylim(1e2, 1e3)

answered May 5, 2013 at 9:25

Stephane Rolland

40.2k38 gold badges127 silver badges173 bronze badges

2 Comments

amaatouq Over a year ago

This doesn't show how to go with zero values on the logarithmic scale. as log(0) is undefined so matplotlib will ignore these values.Setting the xlim to 1e1 will make the x axis start from 0.1 and still would ignore 0 (I believe). I'll try it out anyway

poleguy Over a year ago

at least as of july 2015, matplotlib is not ignoring zeros, it draws a straight line on the log plot all the way to the edge of the plot, which looks terrible and doesn't match matlab. hayer's comment doesn't seem true to me.

Collectives™ on Stack Overflow

Matplotlib logarithmic scale with zero value [duplicate]

2 Answers 2

10 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

2 Comments

Linked

Related