Format pandas output to csv

Question

I am new to python and pandas and have created a test web page with html code to use to help with learning how to pull the data and then format into CSV for use in excel. Below is the code I have come up with that puts it into a nice format but I am stuck on how to format it into a CSV file to import.

Code:

# Importing pandas 
import pandas as pd 

# The webpage URL whose table we want to extract 
url = "/home/dvm01/e007"

# Assign the table data to a Pandas dataframe 
table = pd.read_html(url,**index_col=0**)[0]
#table2 = pd.read_html(url)[0],pd.read_html(url)[1],pd.read_html(url)[6]

# Print the dataframe 
print(table)
#print(table2)

# Store the dataframe in Excel file 
#table.to_excel("data.xlsx")

Output:

            Account                                          Account.1
ID:                                         e007
Description:  ABST: 198, SUR: J DOUTHIT
Geo ID:                            014.0198.0000

What I am trying to figure out is how to remove the index for the rows and make the text before the first: to be a column header. In row 1 I have two: but everything after the first: should be the data for the column header.

I would like to take the above current output and and have ID, Description, and Geo ID as the column headers and the text that comes after the ':' to be the data for each of the headers.

I do not need 'Account' and 'Account.1' I believe these are being recognized as column headers. Below is what I would like the output to look like in Excel, but I cannot figure out how to format it correctly to export out to a CSV that can be imported. Maybe I do not even need to import or format into a CSV, the 'table.to_excel' function seems to not need that step.

+------+---------------------------+---------------+
| ID   | Description               | Geo ID        |
+------+---------------------------+---------------+
| e007 | ABST: 198, SUR: J Douthit | 014.0198.0000 |
+------+---------------------------+---------------+

I was able to remove the index numbers, by using index_col=0 above where I define the dfs variable. Not sure that is the best way but it does do what I was trying to accomplish for that portion.

Since I am new to python I am having a hard time formatting my question into Google or StackOverflow to get the answers I am looking for. If someone could just point me in the right direction in what I am looking for, that would work but examples would be nice as well.

Thanks for any guidance

If you're trying to drop the index in your exports from pandas, you need to use the index=False keyword argument in your output function, such as to_csv or to_excel. A sample of how you want your output to look vs. how it looks now is always good practice. — mayosten
– mayosten, Commented Oct 18, 2019 at 18:18
+ @mayosten don't worry about this, when I started to learn python (and pandas at the same time) didn't know how to remove the index for a long time, heck didn't even know it was called an index. You'll get there mate! — Umar.H
– Umar.H, Commented Oct 18, 2019 at 18:21
I am trying to format what I would like the output to look like but not having much success. Reading through the stackoverflow.com/editing-help#code I should be able to use <br> tags or <pre> tags to format text but it is not working — Rey
– Rey, Commented Oct 18, 2019 at 18:39
@Datanovice, thanks for the formatting, I was going crazy trying to figure out how to make that look correct. — Rey
– Rey, Commented Oct 18, 2019 at 19:38

BlackFox · Accepted Answer · 2019-10-18 21:07:06Z

2

so to format your questions you can show us an example of what you want. try something like this:

|id|name|data1|data2|date3|-url-|
|--|----|-----|-----|-----|-----|
|1 |xyz |datax|datay|dataz|x:url|
|2 |xyz |datax|datay|dataz|x:url|
|3 |xyz |datax|datay|dataz|x:url|
...

Then you can ask questions about how to create the right Dataframe output that fits your desired design:)

you can also use this generator online: https://www.tablesgenerator.com/text_tables

+----+------+-------+-------+-------+------+
| Id | Name | Data1 | Data2 | Data3 | Url  |
+----+------+-------+-------+-------+------+
| 1  | xyz  | datax | datay | dataz | xurl | 
+----+------+-------+-------+-------+------+
| 2  | xyz  | datax | datay | dataz | xurl |
+----+------+-------+-------+-------+------+
| 3  | xyz  | datax | datay | dataz | xurl |
+----+------+-------+-------+-------+------+

Ok now that you have your Data Table design. next I would ask you to try using Jupyter Notebook. This will let you test your dataframes line by line. Each Test should be a new transfermation of the dataset.

How I see the workflow going to get to your needs: 1. See test and see what your current DF columns are:

print(df.columns)

2. use this command to edit your columns:

df.rename(columns={'old column 1':'ID',
                          'old column 2':'Description',
                          'old column 3':'Geo ID'}, 
                 inplace=True)

use this command change an index data

df.rename(index={0:'zero',1:'one'}, inplace=True)

use this command to change a row

df.loc['--insert_Column_here--', '--insert_row_here--'] = new_value

edited Oct 18, 2019 at 21:07

answered Oct 18, 2019 at 19:46

BlackFox

8381 gold badge10 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rey Over a year ago

thank you, I was seriously beginning to lose it trying to get that table to format.

BlackFox Over a year ago

@Rey youre welcome please help me by upvoting and or accepting my answer:)

Collectives™ on Stack Overflow

Format pandas output to csv

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related