2

I am trying to add another column that will contain combination of two columns (Total & percentage) into a result column(labels_value) which look like: (Total) percentage%.

Basically to wrap bracket strings on Total column and add % string at the end of combination of these two columns.

import polars as pl

pl.Config(tbl_rows=21) # increase repr defaults

so_df = pl.from_repr("""
┌──────────────┬─────────────────────┬─────┬───────┬────────────┬────────┐
│ Flag         ┆ Category            ┆ len ┆ Total ┆ percentage ┆ value  │
│ ---          ┆ ---                 ┆ --- ┆ ---   ┆ ---        ┆ ---    │
│ str          ┆ str                 ┆ i64 ┆ i64   ┆ f64        ┆ f64    │
╞══════════════╪═════════════════════╪═════╪═══════╪════════════╪════════╡
│ Outof Range  ┆ Thyroid             ┆ 7   ┆ 21    ┆ 33.33      ┆ 33.33  │
│ Outof Range  ┆ Inflammatory Marker ┆ 2   ┆ 8     ┆ 25.0       ┆ 25.0   │
│ Outof Range  ┆ Lipid               ┆ 12  ┆ 63    ┆ 19.05      ┆ 19.05  │
│ Outof Range  ┆ LFT                 ┆ 14  ┆ 87    ┆ 16.09      ┆ 16.09  │
│ Outof Range  ┆ DLC                 ┆ 11  ┆ 126   ┆ 8.73       ┆ 8.73   │
│ Outof Range  ┆ Vitamin             ┆ 1   ┆ 14    ┆ 7.14       ┆ 7.14   │
│ Outof Range  ┆ CBC                 ┆ 2   ┆ 45    ┆ 4.44       ┆ 4.44   │
│ Outof Range  ┆ KFT                 ┆ 2   ┆ 56    ┆ 3.57       ┆ 3.57   │
│ Outof Range  ┆ Urine Examination   ┆ 1   ┆ 28    ┆ 3.57       ┆ 3.57   │
│ Within Range ┆ Thyroid             ┆ 14  ┆ 21    ┆ 66.67      ┆ -66.67 │
│ Within Range ┆ Inflammatory Marker ┆ 6   ┆ 8     ┆ 75.0       ┆ -75.0  │
│ Within Range ┆ Lipid               ┆ 51  ┆ 63    ┆ 80.95      ┆ -80.95 │
│ Within Range ┆ LFT                 ┆ 73  ┆ 87    ┆ 83.91      ┆ -83.91 │
│ Within Range ┆ DLC                 ┆ 115 ┆ 126   ┆ 91.27      ┆ -91.27 │
│ Within Range ┆ Vitamin             ┆ 13  ┆ 14    ┆ 92.86      ┆ -92.86 │
│ Within Range ┆ CBC                 ┆ 43  ┆ 45    ┆ 95.56      ┆ -95.56 │
│ Within Range ┆ KFT                 ┆ 54  ┆ 56    ┆ 96.43      ┆ -96.43 │
│ Within Range ┆ Urine Examination   ┆ 27  ┆ 28    ┆ 96.43      ┆ -96.43 │
│ Within Range ┆ Anemia              ┆ 38  ┆ 38    ┆ 100.0      ┆ -100.0 │
│ Within Range ┆ Diabetes            ┆ 22  ┆ 22    ┆ 100.0      ┆ -100.0 │
│ Within Range ┆ Electrolyte         ┆ 46  ┆ 46    ┆ 100.0      ┆ -100.0 │
└──────────────┴─────────────────────┴─────┴───────┴────────────┴────────┘
""")

I have tried below three ways and none of them worked:

(so_df
#  .with_columns(labels_value = "("+str(pl.col("Total"))+") "+str(pl.col("percentage"))+"%")
#  .with_columns(labels_value = "".join(["(",str(pl.col("Total")),") ",str(pl.col("percentage")),"%"]))
#  .with_columns(labels_value =pl.concat_str([pl.col("Total"),pl.col("percentage")])))

Desired result would be to add a new column like:

┌──────────────┐
│ labels_value │
│ ---          │
│ str          │
╞══════════════╡
│ (21) 33.33%  │
│ (8) 25%      │
│ (63) 19.05%  │
│ (87) 16.09%  │
│ (126) 8.73%  │
│ …            │

1 Answer 1

3

Here are a few options:

so_df.with_columns(
    labels_value_1=pl.format("({}) {}%", "Total", "percentage"),
    labels_value_2=pl.concat_str(
        pl.lit("("), "Total", pl.lit(") "), "percentage", pl.lit("%")
    ),
    labels_value_3=(
        "("
        + pl.col("Total").cast(pl.String)
        + ") "
        + pl.col("percentage").cast(pl.String)
        + "%"
    ),
)

# Only Total, percentage and outputs shown for brevity
# ┌───────┬────────────┬────────────────┬────────────────┬────────────────┐
# │ Total ┆ percentage ┆ labels_value_1 ┆ labels_value_2 ┆ labels_value_3 │
# │ ---   ┆ ---        ┆ ---            ┆ ---            ┆ ---            │
# │ i64   ┆ f64        ┆ str            ┆ str            ┆ str            │
# ╞═══════╪════════════╪════════════════╪════════════════╪════════════════╡
# │ 21    ┆ 33.33      ┆ (21) 33.33%    ┆ (21) 33.33%    ┆ (21) 33.33%    │
# │ 8     ┆ 25.0       ┆ (8) 25.0%      ┆ (8) 25.0%      ┆ (8) 25.0%      │
# │ 63    ┆ 19.05      ┆ (63) 19.05%    ┆ (63) 19.05%    ┆ (63) 19.05%    │
# │ 87    ┆ 16.09      ┆ (87) 16.09%    ┆ (87) 16.09%    ┆ (87) 16.09%    │
# │ 126   ┆ 8.73       ┆ (126) 8.73%    ┆ (126) 8.73%    ┆ (126) 8.73%    │
# │ …     ┆ …          ┆ …              ┆ …              ┆ …              │
# │ 56    ┆ 96.43      ┆ (56) 96.43%    ┆ (56) 96.43%    ┆ (56) 96.43%    │
# │ 28    ┆ 96.43      ┆ (28) 96.43%    ┆ (28) 96.43%    ┆ (28) 96.43%    │
# │ 38    ┆ 100.0      ┆ (38) 100.0%    ┆ (38) 100.0%    ┆ (38) 100.0%    │
# │ 22    ┆ 100.0      ┆ (22) 100.0%    ┆ (22) 100.0%    ┆ (22) 100.0%    │
# │ 46    ┆ 100.0      ┆ (46) 100.0%    ┆ (46) 100.0%    ┆ (46) 100.0%    │
# └───────┴────────────┴────────────────┴────────────────┴────────────────┘

pl.format I would say is most idiomatic, and it does the casting to string for you (much like Python's f-strings). It uses {} as a placeholder. This is what I'd personally recommend out of the options presented.

For concat_str, you can add the parentheses and "%" with pl.lit (strings are parsed as column names in this function). It will cast numeric columns to string for you.

Lastly, string concatenation with the + operator does work, but you must cast the column(s) with Polars' .cast method, rather than Python's builtin str.

One thing to note is that row 2 of the output does differ slightly from your desired output. (8) 25% (expected) vs (8) 25.0% (actual). This is because the string representation of floating point columns in Polars contain at least one decimal place. If that is an issue, drop a comment and I will try to come up with a solution.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.