0

Use case: golfing on the CLI in a utility function that I can't afford to make complicated.

I need to peek at only the column names only of a large file in binary format, and not the column names plus, say, the first data row.

In my current implementation, I have to write the burdensome command to peek at the first row of large files:

my-tool peek -n 1 huge-file.parquet | head -n 1 | tr ',' '\n' | less

What I would like is to:

my-tool peek --cols huge-file.parquet | tr ',' '\n' | less

or

my-tool peek --cols -d '\n' huge-file.parquet | less

Without getting complicated in python. I currently use the following mechanism to generate the csv:

out = StringIO()
df.to_csv(out)
print(out.getvalue())

Is there a DataFrame-ish way to output just the columns to out via to_csv(...) or similarly simple technique?

1 Answer 1

1

Maybe something like this?

import pandas as pd
import numpy as np


if __name__ == "__main__":
    # some fake data for setup
    np.random.seed(1)
    df = pd.DataFrame(
        data=np.random.random(size=(5, 5)),
        columns=list("abcde")
    )

    out = df.columns.to_frame(name="columns")
    out.to_csv("file.csv", index=False)
    print(out)
  columns
a       a
b       b
c       c
d       d
e       e

csv output

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.