3

I have a column "City_trad_chinese" in a pandas dataframe "df" which contains values in Traditional Chinese language. I need to create another column "City_English" which must contain the translated values in English.

How can I do this with Python? I tried the following:

#importing required libraries
import pandas as pd 

from os import path

from googletrans import Translator

#setting path to data
path2data = 'C:/Users/data'

# data import
df = pd.read_excel(path.join(path2data, 'data.xlsx'), converters={'City_trad_chinese':str})


translator = Translator()

df['City_English'] = df['City_trad_chinese'].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)

but it is giving me an error:

raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting value
1
  • The error appears to be due to a limit on the amount of characters you can translate at one time using the google translate api. If you go over this limit (15k), google just responds with an empty json. This question claims that if it still doesn't work, reducing it to 5k character chunks resolves the issue. Commented Jun 11, 2018 at 13:01

1 Answer 1

4

You can use the library googletrans

import pandas as pd
from googletrans import Translator

d = {"City_trad_chinese":["香港特别行政区",
                          "澳门特别行政区",
                          "北京市",
                          "上海市"]}
df = pd.DataFrame(data=d)

translator = Translator()

df["City_English"] = df["City_trad_chinese"].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)

print(df["City_English"])

0    Hong Kong Special Administrative Region
1        Macao Special Administrative Region
2                               Beijing City
3                              Shanghai City

Note: The Google Translate API has a 15k character limit. You can circumnavigate this by translating each row individually:

df["City_English"] = ""

for index, row in df.iterrows():
    translator = Translator()
    eng_text = translator.translate(row["City_trad_chinese"], src="zh-TW", dest="en").text
    row["City_English"] = eng_text
Sign up to request clarification or add additional context in comments.

8 Comments

Is there any way I can do this translation for whole column with a single command ?
@Architgupta - use df['eng'] = df['chinese'].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)
it is throwing the error: raise JSONDecodeError("Expecting value", s, err.value) from None
@Architgupta this page might help solve that issue "The error arises because the "data" is of type bytes so you have to decode it into a string before using json.loads to turn it into a json object."
I have imported the data from an excel file, and too have converted the particular column into string while importing.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.