If I have a DataFrame, I can create a column with a single value like this:
df = pl.DataFrame([[1, 2, 3]])
df.with_columns(pl.lit("ok").alias("metadata"))
shape: (3, 2)
┌──────────┬──────────┐
│ column_0 ┆ metadata │
│ --- ┆ --- │
│ i64 ┆ str │
╞══════════╪══════════╡
│ 1 ┆ ok │
│ 2 ┆ ok │
│ 3 ┆ ok │
└──────────┴──────────┘
but with pl.Object columns, it does not work:
df = pl.DataFrame([[1, 2, 3]])
df.with_columns(pl.lit("ok", dtype=pl.Object).alias("metadata"))
# InvalidOperationError: casting from Utf8View to FixedSizeBinary(8) not supported
using one-element pl.Series does not work either:
df.with_columns(pl.Series(["ok"], dtype=pl.Object).alias("metadata"))
# InvalidOperationError: Series metadata, length 1 doesn't
# match the DataFrame height of 3
# If you want expression: Series[metadata] to be broadcasted,
# ensure it is a scalar (for instance by adding '.first()').
It seems that I need either to create a pl.Series of correct length manually (like pl.Series(["ok"] * df.height, dtype=pl.Object), or do a cross-join like this:
df.join(pl.Series(["ok"], dtype=pl.Object).to_frame("metadata"), how="cross")
It works, but is not very elegant. Are there any better solutions?
NB. I used a string object just as an example. I really need pl.Object column to store various heterogeneous data, not strings, and cannot use, say, pl.Struct instead.