1

I have a csv that I need to import into a pandas dataframe. I then need to export that pandas dataframe back to a csv that is exactly the same as the csv that I imported (the input file and output files must match exactly).

The problem I am having is that the csv is very strangely formatted, and I am not able to change this formatting (that is part of the challenge). Some lines of the csv have differing numbers of elements and some lines end with 2 semicolons, a semicolon or no semicolon.

input.csv

scene;1
date;08.04.2019;;
sky;AM;45;
sky;PM;45;
weight;upper;-5

So far i have completed the easiest bit - I have imported the csv into a pandas dataframe using the following code:

def csv_read(path):
    df = pd.read_csv(path, sep = ";",
                     header = None,
                     names = ['a', 'b', 'c', 'd'])
    print(df)
csv_read(input.csv)

The bit I am struggling with is how to export that dataframe to the same format as input.csv. I am trying to use to_csv with some well crafted arguments but i am not sure if this is the right approach? This is as much as i have written so far.

df.to_csv("output.csv", 
          sep = ";")

This outputs:

scene;1;;
date;08.04.2019;;
sky;AM;45;
sky;PM;45;
weight;upper;-5;

Therefore, lines 1,2 and 4 do not match input.csv . Any help/guidance on the best approach would be appreciated! I think the best approach may be to set up some rules. e.g. If line begins with scenario don't add any semicolons to the end. But I'm not sure if this is the best way or if it is the most elegant solution.

For ease of replication I have put input.csv into a single command:

df = pd.DataFrame(np.array([["scene", "1", "NaN", "NaN"],
["date", "08.04.2019", "NaN", "NaN"],
["sky", "AM", 45, "NaN"],
["sky", "PM", 45, "NaN"],
["weight", -5, "NaN", "NaN"],]))

2 Answers 2

2

The only thing I can think to do is read the csv output from pandas using csv module and then rewrite the file with the desired delimiter format.

import csv
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([["scenario", "scen1", "NaN", "NaN"],
["date", "08.04.2019", "NaN", "NaN"],
["turnaround time", "BAH", 45, "NaN"],
["turnaround time", "AMM", 45, "NaN"],
["weight", -5, "NaN", "NaN"],]))

df = df.replace("NaN", '', regex=True)
df.to_csv('out.csv',sep = ';',index=False, header=False)

with open('out.csv','r') as csvfile:
   data = csvfile.readlines()


with open('out.csv','w') as rewrite:
   for row in data:
      if row.find('weight')>-1:
         row = row[:-3]
      if row.find('scenario')>-1:
         row = row.replace(';;','')
      rewrite.write(row)      

returns a csv file out.csv with the lines:

scenario;scen1
date;08.04.2019;;
turnaround time;BAH;45;
turnaround time;AMM;45;
weight;-5
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! This was exactly what i was looking for
0

If the issue is that the saved csv file does not match the original, you may want to use:

df.to_csv("output.csv", sep = ";", index = False)

The two dataframes were identical when I saved the csv using the above method and reloaded it using df1 = pd.read_csv("output.csv").

2 Comments

Hi, thanks for your response. The issue isn't whether the two dataframes match. The issue is whether the files input.csv and output.csv match. I can not change input.csv at all and output.csv must be created from the pandas dataframe.
@RossHeadington can you elaborate on why it is essential that the literal text of the .csv files is identical? I have a feeling there might be a better way to approach this problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.