1

I have a script which gradually adds a number of columns to an existing data frame (df1) and then from this will then take a subset of these columns and output it as df2, whilst renaming the columns at the same time.

I've previously used the select() function in dplyr to do this and it has actually worked previously on similar datasets, so I'm a bit stumped as to why it's not working all of a sudden now. I've seen a few other threads about using select() but none of them really helped with my question.

Here is the column list and first line of the data I am using:

gene_id variant_id tss_distance ma_samples ma_count maf pval_nominal slope slope_se rsid chr pos ref_allele alt gene_id_new gene_name info
ENSG00000227232.4 1_13417_C_CGAGA_b37       -16136         50       50 0.07225430   0.00908288  0.3556660 0.1354910 rs777038595   1 13417          C CGAGA ENSG00000227232    WASH7P    1

Here is the code for my selection:

parsed_columns = select(df1, chr = "chr",
                    pos = "pos",
                    ref = "ref_allele",
                    alt = "alt",
                    reffrq = "maf",
                    info = "info",
                    rs = "rsid",
                    pval = "pval_nominal",
                    effalt = "slope",
                    gene = "gene_name")

And from this I get an error saying that all of the names in the quotations do not resolve to integer positions.

I initially thought I might just have the names on the wrong side of the function (so, for example, it should be rsid = "rs") but then you have columns where it is the same on both sides (e.g. pos = "pos") and supposedly that isn't present either. So I'm a bit stuck. Any help would be appreciated.

1
  • 1
    With the select function the new column name should go in quotes on the left hand side of the equal symbol like this "rs" = rsid Commented Jan 31, 2019 at 18:51

1 Answer 1

1

With dplyr, do you need to have your column names in quotations. Simply adding the column name of the referenced data frame should suffice.

More generically,

df2 = select(df1,
             col1name = col1
             col2name = col2
             ...
             )

Provided that the col1, col2, etc. are valid column names in df1.

Give this a try for your R code

parsed_columns = select(df1, chr = chr,
                    pos = pos,
                    ref = ref_allele,
                    alt = alt,
                    reffrq = maf,
                    info = info,
                    rs = rsid,
                    pval = pval_nominal,
                    effalt = slope,
                    gene = gene_name)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.