1

I have a very large file generated by other tools, but I don't need all the information, only a few columns of information are enough. When I use Python pandas to read, I can specify the required columns, but I don't know how Rust implements it.

Thanks.

I hope Rust can achieve the same functionality as Python pandas.

data = pd.read_csv(file, sep='\t', header=None, usecols=[0,1,5])
2
  • 1
    Are you looking for rust polars or python polars? Commented Apr 17 at 8:51
  • I looking for rust polars. Commented Apr 17 at 9:02

1 Answer 1

1

I am assuming that you want to use rust-polars. This is how you could achieve the same using rust.

I have also added the comment to understand what's going to with each steps.

use polars::prelude::*;

fn main() {
    let path = "<PATH_TO_THE_FILE>";
    // If there are no headers, polars automatically choose "column_1, column_2 etc"
    let columns_to_select = ["column_1".into(), "column_2".into()];

    let df = CsvReadOptions::default()
        .with_has_header(false) // equivalent to `header=None` in pandas 
        .map_parse_options(|parse_options| parse_options.with_separator(b'\t')) // use custom separator. equivalent to `sep=\t` in pandas 
        .with_columns(Some(Arc::new(columns_to_select))) // select the columns. equivalent to `usecols=[1, 2]` in pandas
        .try_into_reader_with_file_path(Some(path.into())) // specify the file path
        .unwrap()
        .finish()
        .unwrap();
    println!("{:?}", df);
}

Or use with_projection method if you want to select the columns based on index. For example, .with_projection(Some(Arc::new(vec![0, 1]))) will select the first and second column.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you very much, this can indeed achieve this function. I just started using poles and I'm not familiar with them yet. If I want to read data from a 30GB file, should I use LazyCsvReader instead? Most of its implementations are the same as CsvReader, but I found that it doesn't have with_comlumns. If possible, how should it implement this functionality?
Ok, If that's the case you can use the select
``` let gaf_df = LazyCsvReader::new(file_path.clone()) .with_has_header(false) .with_separator(b'\t') .finish()?; let gaf_df_select = gaf_df.lazy() .select([ col("column_1").alias("read_id"), col("column_2").alias("length"), col("column_6").alias("path"), col("column_12").alias("mapq") ]) .collect()?; println!("{:?}", gaf_df_select.head(Some(10))); ``` Could you help me check if my usage is correct? This can indeed run. Thanks.
That looks fine to me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.