How to implement R's p.adjust in Python

Question

I have a list of p-values and I would like to calculate the adjust p-values for multiple comparisons for the FDR. In R, I can use:

pval <- read.csv("my_file.txt",header=F,sep="\t")
pval <- pval[,1]
FDR <- p.adjust(pval, method= "BH")
print(length(pval[FDR<0.1]))
write.table(cbind(pval, FDR),"pval_FDR.txt",row.names=F,sep="\t",quote=F )

How can I implement this code in Python? Here was my feable attempt in Python with the help of Google:

pvalue_list [2.26717873145e-10, 1.36209234286e-11 , 0.684342083821...] # my pvalues
pvalue_lst = [v.r['p.value'] for v in pvalue_list]
p_adjust = R.r['p.adjust'](R.FloatVector(pvalue_lst),method='BH')
for v in p_adjust:
    print v

The above code throws an AttributeError: 'float' object has no attribute 'r' error. Can anyone help point out my problem? Thanks in advance for the help!

lgautier · Accepted Answer · 2011-09-17 07:46:37Z

17

If you wish to be sure of what you are getting from R, you can also indicate that you wish to use the function in the R package 'stats':

from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import FloatVector

stats = importr('stats')

p_adjust = stats.p_adjust(FloatVector(pvalue_list), method = 'BH')

answered Sep 17, 2011 at 7:46

lgautier

11.6k31 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

drbunsen Over a year ago

@Igautier Thanks for the help! When i run your code, Python throws an ImportError: No module named packages error. Any idea what the problem is? I'm running R 2.13.1.

lgautier Over a year ago

I'd say you are using an outdated version of rpy2. Try rpy2.__version__ if unsure. Current is 2.2.2.

drbunsen Over a year ago

Yep, this works for me with R 2.2x. Unfortunately, I'm stuck with using R 2.13.1 on a remote server. Any suggestions?

lgautier Over a year ago

hmmm... I am referring to rpy2 version, not R versions. Ask an upgrade of rpy2 to your system administrators or upgrade it for yourself (consider using the Python package 'virtualenv' to create your customized Python).

drbunsen Over a year ago

Sorry for the confusion. I misread your comments. I updated my local rpy2 to 2.2x and your code worked. Thank you very much for the help!

jseabold · Accepted Answer · 2012-12-06 13:58:08Z

17

This question is a bit old, but there are multiple comparison corrections available in statsmodels for Python. We have

http://statsmodels.sourceforge.net/devel/generated/statsmodels.sandbox.stats.multicomp.multipletests.html#statsmodels.sandbox.stats.multicomp.multipletests

answered Dec 6, 2012 at 13:58

jseabold

8,3712 gold badges45 silver badges53 bronze badges

1 Comment

Dataman Over a year ago

@jseabold: Hi, a quick question about the multipletests? How does this function take care of NaN values in the list of p-values when using it with BH? It seems that it assumes that all the p-values are finite, is that right?

emre · Accepted Answer · 2014-02-12 21:05:53Z

Here is an in-house function I use:

def correct_pvalues_for_multiple_testing(pvalues, correction_type = "Benjamini-Hochberg"):                
    """                                                                                                   
    consistent with R - print correct_pvalues_for_multiple_testing([0.0, 0.01, 0.029, 0.03, 0.031, 0.05, 0.069, 0.07, 0.071, 0.09, 0.1]) 
    """
    from numpy import array, empty                                                                        
    pvalues = array(pvalues) 
    n = float(pvalues.shape[0])                                                                           
    new_pvalues = empty(n)
    if correction_type == "Bonferroni":                                                                   
        new_pvalues = n * pvalues
    elif correction_type == "Bonferroni-Holm":                                                            
        values = [ (pvalue, i) for i, pvalue in enumerate(pvalues) ]                                      
        values.sort()
        for rank, vals in enumerate(values):                                                              
            pvalue, i = vals
            new_pvalues[i] = (n-rank) * pvalue                                                            
    elif correction_type == "Benjamini-Hochberg":                                                         
        values = [ (pvalue, i) for i, pvalue in enumerate(pvalues) ]                                      
        values.sort()
        values.reverse()                                                                                  
        new_values = []
        for i, vals in enumerate(values):                                                                 
            rank = n - i
            pvalue, index = vals                                                                          
            new_values.append((n/rank) * pvalue)                                                          
        for i in xrange(0, int(n)-1):  
            if new_values[i] < new_values[i+1]:                                                           
                new_values[i+1] = new_values[i]                                                           
        for i, vals in enumerate(values):
            pvalue, index = vals
            new_pvalues[index] = new_values[i]                                                                                                                  
    return new_pvalues

Excellent solution. I have ported it to python 3 and placed it on a repository on github. If you wish me to add your name to the copyright line, please provide me with it via PM.

Vladimir · Accepted Answer · 2016-01-24 12:17:26Z

12

Using Python's numpy library, without calling out to R at all, here's a reasonably efficient implementation of the BH method:

import numpy as np

def p_adjust_bh(p):
    """Benjamini-Hochberg p-value correction for multiple hypothesis testing."""
    p = np.asfarray(p)
    by_descend = p.argsort()[::-1]
    by_orig = by_descend.argsort()
    steps = float(len(p)) / np.arange(len(p), 0, -1)
    q = np.minimum(1, np.minimum.accumulate(steps * p[by_descend]))
    return q[by_orig]

(Based on the R code BondedDust posted)

edited Jan 24, 2016 at 12:17

Vladimir

1,5733 gold badges16 silver badges28 bronze badges

answered Nov 4, 2015 at 21:37

Eric Talevich

4233 silver badges8 bronze badges

1 Comment

Vladimir Over a year ago

Should be float(len(p)), otherwise it will be integer division

IRTFM · Accepted Answer · 2011-09-16 22:53:44Z

2

(I know this is not the answer... just trying to be helpful.) The BH code in R's p.adjust is just:

BH = {
        i <- lp:1L   # lp is the number of p-values
        o <- order(p, decreasing = TRUE) # "o" will reverse sort the p-values
        ro <- order(o)
        pmin(1, cummin(n/i * p[o]))[ro]  # n is also the number of p-values
      }

answered Sep 16, 2011 at 22:53

IRTFM

264k22 gold badges381 silver badges503 bronze badges

Comments

Chrismit · Accepted Answer · 2014-01-27 14:55:39Z

1

Old question, but here's a translation of the R FDR code in python (which is probably fairly inefficient):

def FDR(x):
    """
    Assumes a list or numpy array x which contains p-values for multiple tests
    Copied from p.adjust function from R  
    """
    o = [i[0] for i in sorted(enumerate(x), key=lambda v:v[1],reverse=True)]
    ro = [i[0] for i in sorted(enumerate(o), key=lambda v:v[1])]
    q = sum([1.0/i for i in xrange(1,len(x)+1)])
    l = [q*len(x)/i*x[j] for i,j in zip(reversed(xrange(1,len(x)+1)),o)]
    l = [l[k] if l[k] < 1.0 else 1.0 for k in ro]
    return l

answered Jan 27, 2014 at 14:55

Chrismit

1,52814 silver badges23 bronze badges

Comments

Thomas K · Accepted Answer · 2011-09-16 22:47:49Z

0

Well, to get your code working, I would guess something like this would work:

import rpy2.robjects as R

pvalue_list = [2.26717873145e-10, 1.36209234286e-11 , 0.684342083821...] # my pvalues
p_adjust = R['p.adjust'](R.FloatVector(pvalue_list),method='BH')
for v in p_adjust:
    print v

If p.adjust is simple enough, you could write it in Python so you avoid the need to call into R. And if you want to use it a lot, you can make a simple Python wrapper:

def adjust_pvalues(pvalues, method='BH'):
    return R['p.adjust'](R.FloatVector(pvalues), method=method)

answered Sep 16, 2011 at 22:47

Thomas K

40.7k7 gold badges88 silver badges89 bronze badges

Collectives™ on Stack Overflow

How to implement R's p.adjust in Python

7 Answers 7

5 Comments

1 Comment

1 Comment

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

5 Comments

1 Comment

1 Comment

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related