Plot topics with bokeh or matplotlib

Question

I'm trying to plot topic visualization from a model. I want to do something like bokeh covariance implementation.

My data is:

data 1: index,                            topics.   
data 2: index, topics, weights(use it for color).

where topic is just set of words.

How do i give the data to bokeh to plot the above data? From the example, data handling is not intuitive.

With matplot, it looks like this.
Obviously, it is not visually helpful to see what topic correspond to each circle. Here is my matplotlib code:

x = []
y = []
area = []

for row in joined:
      x.append(row['index']) 
      y.append(row['index'])
      #weight.append(row['score'])
      area.append(np.pi * (15 * row['score'])**2)
scale_values = 1000
plt.scatter(x, y, s=scale_values*np.array(area), alpha=0.5)
plt.show()

Any idea/suggestions?

bigreddot · Accepted Answer · 2019-10-20 00:49:54Z

15

UPDATE: The answer below is still correct in all major points, but the API has changed slightly to be more explicit as of Bokeh 0.7. In general, things like:

rect(...)

should be replaced with

p = figure(...)
p.rect(...)

Here are the relevant lines from the Les Mis examples, simplified to your case. Let's take a look:

# A "ColumnDataSource" is like a dict, it maps names to columns of data.
# These names are not special we can call the columns whatever we like.
source = ColumnDataSource(
    data=dict(
        x = [row['name'] for row in joined],
        y = [row['name'] for row in joined],
        color = list_of_colors_one_for_each_row, 
    )
)

# We need a list of the categorical coordinates
names = list(set(row['name'] for row in joined))

# rect takes center coords (x,y) and width and height. We will draw 
# one rectangle for each row.
rect('x', 'y',        # use the 'x' and 'y' fields from the data source
     0.9, 0.9,        # use 0.9 for both width and height of each rectangle 
     color = 'color', # use the 'color' field to set the color
     source = source, # use the data source we created above
     x_range = names, # sequence of categorical coords for x-axis
     y_range = names, # sequence of categorical coords for y-axis
)

A few notes:

For numeric data x_range and y_range usually get supplied automatically. We have to give them explicitly here because we are using categorial coordinates.
You can order the list of names for x_range and y_range however you like, this is the order they are displayed on the plot axis.
I'm assuming you want to use categorical coordinates. :) This is what the Les Mes example does. See the bottom of this answer if you want numerical coordinates.

Also, the Les Mis example was a little more complicated (it had a hover tool) which is why we created a ColumnDataSource by hand. If you just need a simple plot you can probably skip creating a data source yourself, and just pass the data in to rect directly:

names = list(set(row['name'] for row in joined))

rect(names,    # x (categorical) coordinate for each rectangle
     names,    # y (categorical) coordinate for each rectangle
     0.9, 0.9, # use 0.9 for both width and height of each rectangle
     color = some_colors, # color for each rect
     x_range = names, # sequence of categorical coords for x-axis
     y_range = names, # sequence of categorical coords for y-axis
)

Another note: this only plots rectangles on the diagonal, where the x- and y-coordinates are the same. That seems to be what you want from your description. But just for completeness, it's possible to plot rectangles that have different x- and y-coordinates. The Les Mis example does this.

Finally, maybe you don't actually want categorical axes? If you just want to use the numeric index of the coordinates, its even simpler:

inds = [row['index'] for row in joined]

rect(inds,    # x-coordinate for each rectangle
     inds,    # y-coordinate for each rectangle
     0.9, 0.9, # use 0.9 for both width and height of each rectangle
     color = some_colors, # color for each rect
)

Edit: Here is a complete runnable example that uses numeric coords:

from bokeh.plotting import * 

output_file("foo.html")

inds = [2, 5, 6, 8, 9]
colors = ["red", "orange", "blue", "green", "#4488aa"]

rect(inds, inds, 1.0, 1.0, color=colors)

show()

and here is one that uses the same values as categorical coords:

from bokeh.plotting import * 

output_file("foo.html")

inds = [str(x) for x in [2, 5, 6, 8, 9]]
colors = ["red", "orange", "blue", "green", "#4488aa"]

rect(inds, inds, 1.0, 1.0, color=colors, x_range=inds, y_range=inds)

show()

edited Oct 20, 2019 at 0:49

answered Mar 29, 2014 at 9:28

bigreddot

34.8k5 gold badges73 silver badges128 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

sb32134 Over a year ago

I'm trying to get the color only when co-ordinates are same and rest grey. But its not working even though the color array is ok.pastebin.com/jKQGJDRr

karan.dodia Over a year ago

Hi @sb32134, I noted a couple of bugs in your code that prevented Bokeh from rendering exactly what you wanted. Your color list was generated properly, but there's a distinction between the inds indices you were building and the categories you wanted to appear on the plot. I've put up an IPython Notebook which will hopefully clarify the issue: wakari.io/sharing/bundle/kpsfire/Categorical Hope that helps!

sb32134 Over a year ago

thanks to both @kpsfire and bigreddot. Your replies really motivated me in last few days.I also realized it is one thing to get beautiful visualization but importantly a visualization that gives insight about the data is real visualization. I now find bokeh really useful. I'll be exploring more now with bokeh.

Collectives™ on Stack Overflow

Plot topics with bokeh or matplotlib

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related