Misplaced points in ggplot

Question

I'm reading in a file like so:

genes<-read.table("goi.txt",header=TRUE, row.names=1)
control<-log2(1+(genes[,1]))
experiment<-log2(1+(genes[,2]))

And plotting them as a simple scatter in ggplot:

ggplot(genes, aes(control, experiment)) +
    xlim(0, 20) + 
    ylim(0, 20) +
    geom_text(aes(control, experiment, label=row.names(genes)),size=3)

However the points are incorrectly placed on my plot (see attached image)

This is my data:

          control     expt
gfi1     0.189634  3.16574
Ripply3 13.752000 34.40630
atonal   2.527670  4.97132
sox2    16.584300 42.73240
tbx15    0.878446  3.13560
hes8     0.830370  8.17272
Tlx1     1.349330  7.33417
pou4f1   3.763400  9.44845
pou3f2   0.444326  2.92796
neurog1 13.943800 24.83100
sox3    17.275700 26.49240
isl2     3.841100 10.08640

As you can see, 'Ripply3' is clearly in the wrong position on the graph!

Am I doing something really stupid?

enter image description here

Yep. ;) ggplot looks inside the data frame you provide for the columns you name in aes(). So it finds the original control. It finds no experiment, continues search into global environment. You probably meant to put the transformations in genes as new columns. — joran
– joran, Commented Mar 19, 2015 at 22:33
@joran - So definitely a rookie mistake! I've got it working properly now, but would you be able to elaborate on where it finds the data for experiment? — fugu
– fugu, Commented Mar 20, 2015 at 9:13

joran · Accepted Answer · 2015-03-20 14:16:53Z

The aes() function used by ggplot looks first inside the data frame you provide via data = genes. This is why you can (and should) specify variable only by bare column names like control; ggplot will automatically know where to find the data.

But R's scoping system is such that if nothing by that name is found in the current environment, R will look in the parent environment, and so on, until it reaches the global environment until it finds something by that name.

So aes(control, experiment) looks for variables named control and experiment inside the data frame genes. It finds the original, untransformed control variable, but of course there is no experiment variable in genes. So it continues up the chain of environments until it hits the global environment, where you have defined the isolated variable experiment and uses that.

You meant to do something more like this:

genes$controlLog <- log2(1+(genes[,1]))
genese$exptLog <- log2(1+(genes[,2]))

followed by:

ggplot(genes, aes(controlLog, exptLog)) +
     xlim(0, 20) + 
     ylim(0, 20) +
     geom_text(aes(controlLog, exptLog, label=row.names(genes)),size=3)

Collectives™ on Stack Overflow

Misplaced points in ggplot

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related