How does only output only dataframe columns to csv?

Question

Use case: golfing on the CLI in a utility function that I can't afford to make complicated.

I need to peek at only the column names only of a large file in binary format, and not the column names plus, say, the first data row.

In my current implementation, I have to write the burdensome command to peek at the first row of large files:

my-tool peek -n 1 huge-file.parquet | head -n 1 | tr ',' '\n' | less

What I would like is to:

my-tool peek --cols huge-file.parquet | tr ',' '\n' | less

or

my-tool peek --cols -d '\n' huge-file.parquet | less

Without getting complicated in python. I currently use the following mechanism to generate the csv:

out = StringIO()
df.to_csv(out)
print(out.getvalue())

Is there a DataFrame-ish way to output just the columns to out via to_csv(...) or similarly simple technique?

Ian Thompson · Accepted Answer · 2022-12-22 16:31:14Z

1

Maybe something like this?

import pandas as pd
import numpy as np


if __name__ == "__main__":
    # some fake data for setup
    np.random.seed(1)
    df = pd.DataFrame(
        data=np.random.random(size=(5, 5)),
        columns=list("abcde")
    )

    out = df.columns.to_frame(name="columns")
    out.to_csv("file.csv", index=False)
    print(out)

  columns
a       a
b       b
c       c
d       d
e       e

answered Dec 22, 2022 at 16:31

Ian Thompson

3,3252 gold badges22 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How does only output only dataframe columns to csv?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related