np.ravel alone (as Anky proposed) is not enough.
You need then to remove duplicates.
And if you are unhappy about non-continuous index, you are free to reset it.
So the complete code can be:
df = pd.DataFrame(np.ravel(data),columns=['fruit'])\
.drop_duplicates().reset_index(drop=True)
np.unique (as in the other answer) has such a downside that it
sorts the source array. I suppose you want to keep the original order.
Edit after your comment
It looks like you actually had a DataFrame, read using read_excel(),
looking like below:
fruits
0 [apple, orange, grape]
1 [apple, pineapple, coconut]
(not a list presented in your post).
To convert such a DataFrame to a single, flat list, you can run:
lst = df['fruits'].apply(pd.Series).stack().drop_duplicates().to_list()
It in an "ordinary" (pythonic) list.
To create a second DataFrame with a single column, run:
df2 = pd.DataFrame(lst, columns=['fruits'])
Another option, without creation of an intermediate list:
df['fruits'].apply(pd.Series).stack().rename('fruits')\
.drop_duplicates().reset_index(drop=True).to_frame()
Edit 2
I found a simpler solution, taking into account that read_excel
reads by default all cells as strings.
The key to success is str.extractall method, applied to fruits column.
To extract the text between apostrophes, the regex should be:
'(?P<fruits>[^']+)'
Details:
' - An apostrophe (represents itself), before the text to match.
(?P<fruits> - Start of a named capturing group (called also fruits).
[^']+ - The content of this group - a non-empty sequence of chars
other than an apostrophe.
) - End of the capturing group.
' - Another apostrophe, after the text to match.
So if you run:
df.fruits.str.extractall(r"'(?P<fruits>[^']+)'")
you will get:
fruits
match
0 0 apple
1 orange
2 grape
1 0 apple
1 pineapple
2 coconut
This result contains:
- A MultiIndex:
- top level - the index of the source row (with no name),
- second level - match number (0, 1 and 2 for each row).
- fruits - the name of the capturing group with individual strings
in consecutive rows.
Now it remains only to drop duplicates and reset the index.
So the complete code, a single instruction is:
df.fruits.str.extractall("'(?P<fruits>[^']+)'")\
.drop_duplicates().reset_index(drop=True)
The result is:
fruits
0 apple
1 orange
2 grape
3 pineapple
4 coconut