7

I want to apply a this function hideEmail to a specific column of my csv file (large file) using python

Example of function :

def hideEmail(email):
    #hide email
    text = re.sub(r'[^@.]', 'x', email)
    return text 

Csv file (large file > 1gb):

    id;Name;firstName;email;profession
    100;toto;tata;[email protected];developer
    101;titi;tete;[email protected];doctor
    ..
    ..

0

5 Answers 5

4

Load the csv data into a DataFrame:

df = pd.read_csv(r'/path/to/csv')

Then you can just use pd.Series.str.replace directly as it supports regex by default:

df = df.astype(str).apply(lambda x: x.str.replace(r'[^@.]', 'x'), axis=1)

That said, if all you want to do is changing a large csv file, pandas is probably an overkill.. You might have a look at sed. Here's one example:

sed -E 's/(\w+)@(\w+)/xxx@xxx/' /path/to/file.csv > /path/to/new_file.csv
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks@FelipeLanza but i have others functions in python to apply, and unfortunately there is no regex, so i cant use sed
It most certainly does support regex. Might have a look here: gnu.org/software/sed/manual/sed.html#sed-regular-expressions.
3

Its a bit hard to know without the data frame, but you can try:

import pandas as pd #import pandas
df = pd.read_csv('enter_file_path_here') #read the data

df['col'] = df['col'].apply(lambda x: hideEmail(x))
#if you want to make it back to a csv:
df.to_csv('name.csv')

4 Comments

Question is how to apply on csv file, not on pandas dataframe. I think you should include how to read and write pandas dataframe as well
Right you are, I will edit it accordingly :)
I think the question is directed to a Pandas Dataframe
You don't need the lambda here.
3

Using pandas

You can use pandas as described here in a previous question to apply a function passed as parameter.

To export the dataframe obtained, use to_csv function described here

import pandas as pd

def hideEmail(email):
    #hide email
    text = re.sub(r'[^@.]', 'x', email)
    return text 
    

column_name = "email"

df = pd.read_csv(r'Path of your CSV file\File Name.csv')
df[column_name] = df[column_name].map(hideEmail)
df.to_csv(r'Path where you want to store the exported CSV file\File Name.csv')

Comments

2

You can use built-in map() function to get it done as follows:

def hideEmail(email):
    #hide email
    text = re.sub(r'[^@.]', 'x', email)
    return text


with open('path/to/csvfile', 'r') as file:
     lines = [l.strip().split(';') for l in file.readlines()]

modifiedlines = []       # to store lines after email field is modified 

for i in lines[1:]:         # iterating from index 1 as index 0 is header
    i[3] = hideEmail(i[3])       # as email field is at index 3
    modifiedlines.append(';'.join(i))     # appending modified line

with open('path/to/csvfile', 'w') as file:
     file.writelines(modifiedlines)            # writing the lines back to file

Comments

1

You can use the built-in map() method to map the function to each line of the file:

import re

def hideEmail(email):
    #hide email
    text = re.sub(r'[^@.]', 'x', email)
    return text 

with open('file.csv', 'r') as r:
    r = map(hideEmail, r.readlines())

with open('file2.csv', 'w') as f:
    for line in r:
        f.write(line + '\n')

EDIT (credits to juanpa.arrivillaga for pointing it out):

The r = map(hideEmail, r.readlines()) can be replaced with just r = map(hideEmail, r).

4 Comments

no need for r.readlines() just r = map(hideEmail, r) works
@juanpa.arrivillaga Thank you for informing me.
@AnnZen how i can specify a column name to apply my lamda function ?
This will replace everything in the line that isn't @ or ., this solution is definitely missing the columns/fields aspect of the delimited input.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.