6

I'm using polars and I would like to define the type of the columns while loading a dataframe. In pandas, I can use dtype:

df=pd.read_csv("iris.csv", dtype={'petal_length':str})

I'm trying to do the same thing in polars, but without success until now. Here is what I have tried:

use polars::prelude::*;
use std::fs::File;
use std::collections::HashMap;


fn main() {
    let df = example();
    println!("{:?}", df.expect("Cannot find dataframe").head(Some(10)))
}

fn example() -> Result<DataFrame> {
    let file = File::open("iris.csv")
                    .expect("could not read file");
    let mut myschema = HashMap::new();
    myschema.insert("sepal_length", f64);
    myschema.insert("sepal_width", f64); 
    myschema.insert("petal_length",String); 
    myschema.insert("petal_width", f64); 
    myschema.insert("species", String); 

    CsvReader::new(file)
            .with_schema(myschema)
            .has_header(true)
            .finish()
}

My doubt is what type of data the implementation with_schema expects? I printed the schema of the DataFrame loaded using infer_schema(None).This prints a object that looks like a dictionary:

Schema { fields: [Field { name: "sepal_length", data_type: Float64 }, Field { name: "sepal_width", data_type: Float64 }, Field { name: "petal_length", data_type: Float64 }, Field { name: "petal_width", data_type: Float64 }, Field { name: "species", data_type: Utf8 }] }

But I cannot figure what object I should use to implement my schema.

Also, there is a way to specify the type of one variable, instead of all of them?

3 Answers 3

4

The with_schema method expects an Arc<Schema> type, not a Hashmap.

The following code works:

use polars::prelude::*;
use std::sync::Arc;

fn example() -> Result<DataFrame> {
    let file = "iris.csv";

    let myschema = Schema::new(
        vec![
            Field::new("sepal_length", DataType::Float64),
            Field::new("sepal_width", DataType::Float64),
            Field::new("petal_length", DataType::Utf8),
            Field::new("petal_width", DataType::Float64),
            Field::new("species", DataType::Utf8),
        ]
    );

    CsvReader::from_path(file)?
        .with_schema(Arc::new(myschema))
        .has_header(true)
        .finish()
}

Also, there is a way to specify the type of one variable, instead of all of them?

Yes, you can use with_dtype_overwrite. Which expects a partial schema.

Sign up to request clarification or add additional context in comments.

2 Comments

Today, this code does not compile anymore. When I try to use it, I get the message: argument of type Vec<polars::prelude::Field> unexpected Any idea?
you might have to make the VECTOR into an iterator using .into_iter()
1

A slight update to ritche46's answer. As Robert stated, the vector needs to be changed to an iterator. And it looks like we should use from now instead of new? I've not executed the code below, but it compiles.

...
        let myschema = Schema::from(
            vec![
                Field::new("sepal_length", DataType::Float64),
                Field::new("sepal_width", DataType::Float64),
                Field::new("petal_length", DataType::Utf8),
                Field::new("petal_width", DataType::Float64),
                Field::new("species", DataType::Utf8),
            ]
            .into_iter(),
        );
...

Comments

1

The above code with Schema::new will not compile as of today. The solution is to use:

    let myschema = Schema::from_iter(
        vec![
            Field::new("sepal_length", DataType::Float64),
            Field::new("sepal_width", DataType::Float64),
            Field::new("petal_length", DataType::String),
            Field::new("petal_width", DataType::Float64),
            Field::new("species", DataType::Utf8),
        ]
    );

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.