2

I have two csv files each of them has one column. That column has shared information between them like PassengerId,Name,Sex,Age. etc.

I am trying to draw a graph box plot of the ages of the passengers distribution per title(Mr, Mrs etc.). I get an error. how to pass the error that the plot can be drawn ?

import csv as csv
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
csv_file_object = csv.reader(open('test.csv', 'r')) 

header = next(csv_file_object)
data=[] 

for row in csv_file_object:
    data.append(row)
data = np.array(data) 

csv_file_object1 = csv.reader(open('train.csv', 'r')) 
header1 = next(csv_file_object1) 
data1=[] 

for row in csv_file_object:
    data1.append(row)
data1 = np.array(data1)


Mergerd_file = header.merge(header1, on='PassengerId')

df = pd.DataFrame(Mergerd_file, index=['pAge', 'Tilte'])

df.T.boxplot(vert=False)
plt.subplots_adjust(left=0.25)
plt.show()

I get error this error

  ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-0d7fafc1fcf9> in <module>()
     21 
     22 
---> 23 Mergerd_file = header.merge(header1, on='PassengerId')
     24 
     25 df = pd.DataFrame(Mergerd_file, index=['pAge', 'Tilte'])

AttributeError: 'list' object has no attribute 'merge'
4
  • Just for note Python 2 doesn't complain abou tthat, but does complain about "AttributeError: '_csv.reader' object has no attribute 'merge'" later on. Commented Dec 22, 2016 at 13:37
  • 2
    Well this has nothing to do with boxplot in pandas. Btw, if you use pandas, then use directly pd.read_csv() to import your dataframe, then pd.concat and use seaborn to plot the boxplot. If your question is more on how to use the csv library, remove all the unnecessary part, or ask a separate question and make this one clearer Commented Dec 22, 2016 at 13:48
  • My aim s to do box plot of the ages of the passengers distribution per title using pandas jrjc Commented Dec 22, 2016 at 13:57
  • I am a bit confused - in csv is no column title, do you think Sex column? Commented Dec 22, 2016 at 13:59

2 Answers 2

2

I think you need read_csv first, then concat both DataFrames and last create boxplot:

df1 = pd.read_csv('el/test.csv')
print (df1.head())

df2 = pd.read_csv('el/train.csv')
print (df2.head())

df = pd.concat([df1, df2])
df['Title'] = df.Name.str.extract(', (.*)\.', expand=False)
print (df.head())

df[['Age','Title']].boxplot(vert=False, by='Title')
plt.subplots_adjust(left=0.25)
plt.show()
Sign up to request clarification or add additional context in comments.

6 Comments

There is no separate column for the title. However, under the name there is e.g Braund, Mr. Owen Harris so its the Mr
Ok, you can try yourself, you can also check this answer
Thank you so much for you answer :)
Thank you for accepting. Small advice if you post some question in future - check How to make good reproducible pandas examples for better questions. Good luck!
I tried to exclude the other titles than Dr','Mrs','Mr' ,'Sir using df['Title']= df.Name.str.extract(', (.*)\.', expand=False).isin(['Dr','Mrs','Mr' ,'Sir']) however i get onlt true and flase ?
|
2

The code you're using is for Python 2, yet you're running Python 3. In Python 3 (and recommended in Python 2.6+), the proper way to advance iterator is to use

header = next(csv_file_object1)

Furthermore, the file should be opened in text mode 'r', not 'rb'.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.