29

Basically I am trying to do the opposite of How to generate a list from a pandas DataFrame with the column name and column values?

To borrow that example, I want to go from the form:

data = [
    ['Name','Rank','Complete'],
    ['one', 1, 1],
    ['two', 2, 1],
    ['three', 3, 1],
    ['four', 4, 1],
    ['five', 5, 1]
]

which should output:

      Rank Complete
 Name
  One    1        1
  Two    2        1
Three    3        1
 Four    4        1
 Five    5        1

However when I do something like:

pd.DataFrame(data)

I get a dataframe where the first list should be my column labels, and then the first element of each list should be the indices.

3 Answers 3

55

One way to do this would be to take the column names as a separate list and then only give from 1st index for pd.DataFrame -

In [8]: data = [['Name','Rank','Complete'],
   ...:                ['one', 1, 1],
   ...:                ['two', 2, 1],
   ...:                ['three', 3, 1],
   ...:                ['four', 4, 1],
   ...:                ['five', 5, 1]]

In [10]: df = pd.DataFrame(data[1:],columns=data[0])

In [11]: df
Out[11]:
    Name  Rank  Complete
0    one     1         1
1    two     2         1
2  three     3         1
3   four     4         1
4   five     5         1

If you want to set the first column Name column as index, use the .set_index() method and send in the column to use for index. Example -

In [16]: df = pd.DataFrame(data[1:],columns=data[0]).set_index('Name')

In [17]: df
Out[17]:
       Rank  Complete
Name
one       1         1
two       2         1
three     3         1
four      4         1
five      5         1
Sign up to request clarification or add additional context in comments.

1 Comment

what about the row names?
1

To create the desired dataframe from construction, the list could be converted into a numpy array and indexed accordingly.

arr = np.array(data, dtype=object)
df = pd.DataFrame(arr[1:, 1:], index=pd.Index(arr[1:, 0], name=arr[0,0]), columns=arr[0, 1:], dtype=int)

Another method is, since the data looks like a csv file read into a Python list, it could be converted into an in-memory text buffer and have pd.read_csv called on it. A nice thing about read_csv is that it can set MultiIndex columns, indices etc. and can infer dtypes.

from io import StringIO
df = pd.read_csv(StringIO('\n'.join(['|'.join(map(str, row)) for row in data])), sep='|', index_col=[0])

res


A convenience function for the latter method:

from io import StringIO
def read_list(data, index_col=None, header=0):
    sio = StringIO('\n'.join(['|'.join(map(str, row)) for row in data]))
    return pd.read_csv(sio, sep='|', index_col=index_col, header=header)

df = read_list(data, index_col=[0])

Comments

0

Convert nested list to pandas dataframe:

import pandas as pd

# Sample data (replace with your `Final_data` if obtained from scraping)
data = [[['1', 'Walmart', 'https://www.walmart.com/'], ['2', 'Amazon', 'https://www.amazon.com/'], ['3', 'Exxon Mobil', 'https://corporate.exxonmobil.com/'], ['4', 'Apple', 'https://www.apple.com/'], ['5', 'UnitedHealth Group', 'https://www.unitedhealthgroup.com/'], ['6', 'CVS Health', 'https://www.cvshealth.com/'], ['7', 'Berkshire Hathaway', 'https://www.berkshirehathaway.com/'], ['8', 'Alphabet', 'https://abc.xyz/'], ['9', 'McKesson', 'https://www.mckesson.com/'], ['10', 'Chevron', 'https://www.chevron.com/']], [['11', 'AmerisourceBergen', 'https://www.amerisourcebergen.com/'], ['12', 'Costco Wholesale', 'https://www.costco.com/'], ['13', 'Microsoft', 'https://www.microsoft.com/'], ['14', 'Cardinal Health', 'https://www.cardinalhealth.com/'], ['15', 'Cigna', 'https://www.cigna.com/'], ['16', 'Marathon Petroleum', 'https://www.marathonpetroleum.com/'], ['17', 'Phillips 66', 'https://www.phillips66.com/'], ['18', 'Valero Energy', 'https://www.valero.com/'], ['19', 'Ford Motor', 'https://www.ford.com/'], ['20', 'Home Depot', 'https://www.homedepot.com/']]]

# Create a DataFrame from the list, flattening each sublist into rows
df = pd.DataFrame([item for sublist in data for item in sublist])

# Rename columns (assuming the first element in each sublist is the S.No)
df.columns = ['S. No', 'Name', 'URL']

print(df)

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.