get information from pandas' dataframe and python 3

Question

I have two csv files. Contacts and Users.

How I load data into dataframes and merge them

First, I load a dataframe with the name of the users:

import pandas as pd
import numpy as np
df_users= pd.read_csv('./Users_001.csv',sep=',',usecols=[0,2,3])

Then I load the information from contacts of each user

df_contacts = pd.read_csv('./Contacts_001.csv',sep=',',usecols=[0,1,5,48,55,56,57,83,58])

df_users columns name are: user_id, Name, Surname

df_contacts columns name are: Contact ID, id user owner, fullname, qualification, ...

I want to merge both dataframes using user_id and 'id user owner' since they represent the same information. To to this I first change the name of the columns on df_contacts and then I merge

dfcontactos.columns = ['ID de Contacto','user_id','fullname','qualification','accesibility' ... ]
df_us_cont = pd.merge(dfcontactos,df_usuarios,on='user_id')

Now df_us_cont has the information from users and contacts.

What I want to do

There are only 18 user_id but there are 500 contacts. For each user I want to know:

Number of contacts with qualification < 100

For the contacts that have qualification <100

How many contacts have accesibility >= 4

Accesibility is a discrete number (0-5))
Number of contacts with qualification > 100 and < 300
Number of contacts with qualification > 300

What I have tried and fail

df_qua_lower100 = df_us_cont[df_us_cont['qualification']<100]
df_qua_lower100['user_id'].value_counts()

So far with this I am able to get the information on how many contacts with qualification<100 has each user_id. But I am unable to look how many have 'accesibility>=4'

I have tried to explain the best I could

Abhishek Sharma · Accepted Answer · 2017-12-11 17:30:03Z

1

First thing you can merge without changing column names

df_us_cont = dfcontactos.merge(dfcontactos,left_on='id user owner',right_on='user_id')

You can add as many conditions as you want if you use loc

df_us_cont.loc[(df_us_cont['qualification']<100) & (df_us_cont['accesibility']>=4),'user_id'].value_counts()

Number of contacts with qualification > 100 and < 300

df_us_cont.loc[(df_us_cont['qualification']>100) &(df_us_cont['qualification']<300) & (df_us_cont['accesibility']>=4),'user_id'].value_counts()

Number of contacts with qualification > 300

df_us_cont.loc[(df_us_cont['qualification']>300) & (df_us_cont['accesibility']>=4),'user_id'].value_counts()

answered Dec 11, 2017 at 17:30

Abhishek Sharma

2,0592 gold badges17 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

get information from pandas' dataframe and python 3

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related