5

I am trying to recreate the Pólya urn model (https://en.wikipedia.org/wiki/Pólya_urn_model) in R, with ggplot. The model basically starts with 1 white and 1 black ball in an 'urn' and randomly chooses one ball and put it back together with a ball of the same color. I do this in R for lets say 10 iterations (so 10 times take out one ball and put it back together with another ball from the same color). And I run this say 5 times. Thus, i get a data frame of 5 columns (=for each run) and 10 rows (=for the iterations).

What I want is to illustrate is this but then this picture has a lot more trials and iterations obviously.

What I have so far is a data frame where each column is the fraction of white balls in the urn per trial/run and I would like to illustrate how the proportions changed for each iteration. And I want to show this separately for each run, so each run I would like to have in a different color.

I have looked through countless similar questions but did not find an answer. I think it's because my data frame has now 5 columns but when i reshape it then I get only a single column of the proportions and next to each I get a code illustrating which column it belonged to - but in this case ggplot only draws one single line in 4 colors.

my data frame looks like this:
          V1         V2         V3        V4 id
1  0.3333333 0.33333333 0.33333333 0.3333333  1
2  0.5000000 0.25000000 0.25000000 0.2500000  2
3  0.4000000 0.20000000 0.20000000 0.4000000  3
4  0.3333333 0.16666667 0.16666667 0.3333333  4
5  0.2857143 0.14285714 0.14285714 0.2857143  5
6  0.2500000 0.12500000 0.12500000 0.3750000  6
7  0.2222222 0.11111111 0.11111111 0.3333333  7
8  0.2000000 0.10000000 0.10000000 0.3000000  8
9  0.1818182 0.09090909 0.09090909 0.2727273  9
10 0.2500000 0.08333333 0.08333333 0.2500000 10

but to make it easier here's some test code:

V1 <- rnorm(10, 0.5, 0.1)
V2 <- rnorm(10, 0.5, 0.1)
V3 <- rnorm(10, 0.5, 0.1)
V4 <- rnorm(10, 0.5, 0.1)
df <- data.frame(V1, V2, V3, V4)

My code for the ggplot is the following :

library(reshape2)
df$id = row.names(df) # add id to each row 
df_long = melt(df, id.vars = "id")  # reshape the data into long format

this first version only depicts the points

ggplot(df_long, aes(x = value, y = id, color = variable)) + 
geom_point() 

and this version somehow gets the lines 'messed up' and i cannot figure out why.

ggplot() + geom_line(data = df_long, aes(x = value, y = id, color = variable, group = variable)) + xlab("x axis") +  ylab("y axis")

Any help would be appreciated, I've been really struggling for days with this and couldn't make any significant breakthroughs so far.

EDIT: By 'messed up' I mean that instead of plotting one line per run (which I want to get), the data points seem to lose which trial/run they belong to. So instead of getting one line per run/trial, i get more lines from which some only connect 2-3 points and often connect points from different runs. I hope my explanation is clear enough.

1
  • 1
    Can you define 'messed up'? I am seeing a graph with one line per value of variable, V1 to V4. Commented May 18, 2018 at 11:57

2 Answers 2

3

This seems to connect all of them correctly if I understood you correctly. Please check if this is correct.

df$id = 1:nrow(df)
final_data <- melt(df, id='id')
names(final_data) <- c('id', 'func', 'value')

ggplot() + geom_line(data = final_data, aes(x = id, y = value, color = func, group = func), size = 1)

Output:

          V1        V2        V3        V4 id
1  0.4656275 0.4846357 0.4613710 0.5885883  1
2  0.4312952 0.4929042 0.5499502 0.5133333  2
3  0.5890201 0.4652452 0.5598206 0.4789956  3
4  0.7108441 0.4143140 0.5738660 0.4073124  4
5  0.6374072 0.6671785 0.5111608 0.4475132  5
6  0.4797948 0.6191391 0.5423101 0.4472512  6
7  0.5868793 0.5601147 0.4369428 0.5696494  7
8  0.5169970 0.4398982 0.5137524 0.3923140  8
9  0.3960616 0.3552303 0.4174657 0.4449402  9
10 0.5222120 0.5028562 0.5760920 0.4310323 10

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

2

Using your df you can do something like that:

library(tidyverse)

# I use 'gather' instead of 'melt'
df_long = df %>% 
  mutate(id = 1:nrow(.)) %>% 
  gather(id.vars, values, -id) 

df_long %>% 
  ggplot(aes(x = values, y = id, group = id.vars, color = id.vars)) + 
  geom_line(size = 1) 

![enter image description here]

Obs.:

if you set.seed(...) we can replicate your df object.

2 Comments

thank you! this worked although i had to change it slightly because for some reason the values for the two axes needed to be changed (so: x = id, y = values) but in the end it works so that's what matters.
That's great @Ron

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.