Remove u202a from Python string

Question

I'm trying to open a file in Python, but I got an error, and in the beginning of the string I got a /u202a character... Does anyone know how to remove it?

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

OSError: [Errno 22] Invalid argument: '\u202aH:\7 - Script\teste.csv'

/u202a is the unicode control character for LEFT-TO-RIGHT EMBEDDING. I hope this information aids you in your search. — Ryan Schaefer
– Ryan Schaefer, Commented Mar 14, 2018 at 0:43
Possible duplicate of Error when opening xml file in python 3 with sublime text 3 — Simon
– Simon, Commented Mar 14, 2018 at 1:50

Robᵩ · Accepted Answer · 2018-03-14 16:39:13Z

When you initially created your .py file, your text editor introduced a non-printing character.

Consider this line:

carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

Let's carefully select the string, including the quotes, and copy-paste it into an interactive Python session:

$ python
Python 3.6.1 (default, Jul 25 2017, 12:45:09) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "‪H:\\7 - Script\\teste.csv"
'\u202aH:\\7 - Script\\teste.csv'
>>>

As you can see, there is a character with codepoint U-202A immediately before the H.

As someone else pointed out, the character at codepoint U-202A is LEFT-TO-RIGHT EMBEDDING. Returning to our Python session:

>>> s = "‪H:\\7 - Script\\teste.csv"
>>> import unicodedata
>>> unicodedata.name(s[0])
'LEFT-TO-RIGHT EMBEDDING'
>>> unicodedata.name(s[1])
'LATIN CAPITAL LETTER H'
>>>

This further confirms that the first character in your string is not H, but the non-printing LEFT-TO-RIGHT EMBEDDING character.

I don't know what text editor you used to create your program. Even if I knew, I'm probably not an expert in that editor. Regardless, some text editor that you used inserted, unbeknownst to you, U+202A.

One solution is to use a text editor that won't insert that character, and/or will highlight non-printing characters. For example, in vim that line appears like so:

carregar_uml("<202a>H:\\7 - Script\\teste.csv", variaveis)

Using such an editor, simply delete the character between " and H.

carregar_uml("H:\\7 - Script\\teste.csv", variaveis)

Even though this line is visually identical to your original line, I have deleted the offending character. Using this line will avoid the OSError that you report.

This is the correct answer. The answer that OP accepted only worked because it got OP to re-type the string.

Akshay Karande · Accepted Answer · 2019-06-14 08:43:13Z

you can use this sample code to remove u202a from file path

st="‪‪F:\\somepath\\filename.xlsx"    
data = pd.read_excel(st)

if i try to do this it gives me a OSError and In detail

Traceback (most recent call last):
  File "F:\CodeRepo\PythonWorkSpace\demo\removepartofstring.py", line 14, in <module>
    data = pd.read_excel(st)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 653, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 424, in __init__
    self.book = xlrd.open_workbook(filepath_or_buffer)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\__init__.py", line 111, in open_workbook
    with open(filename, "rb") as f:
OSError: [Errno 22] Invalid argument: '\u202aF:\\somepath\\filename.xlsx'

but if i do that like this

    st="‪‪F:\\somepath\\filename.xlsx" 
    data = pd.read_excel(st.strip("‪u202a")) #replace your string here

Its working for me

jose_bacoy · Accepted Answer · 2018-03-14 01:38:16Z

1

The problem is the directory path of the file is not read properly. Use raw strings to pass it as argument and it should work.

carregar_uml(r'H:\7 - Script\teste.csv', variaveis)

edited Mar 14, 2018 at 1:38

answered Mar 14, 2018 at 0:45

jose_bacoy

12.7k1 gold badge25 silver badges41 bronze badges

6 Comments

jose_bacoy Over a year ago

did it worked or not? pls let me know since my answer was down voted.

Shashank Singh Over a year ago

Can you also explain what the error is bcz r'H:\7 - Script\teste.csv' is equivalent to 'H:\\7 - Script\\teste.csv'

Robᵩ Over a year ago

It appeared to work for OP presumably because he retyped the string, avoiding the non-printing character.

jose_bacoy Over a year ago

@singh, the open file is the problem. passing it as a raw string enables the open command to treat it as a whole string without any parsing it as a directory path

jsbueno Over a year ago

No, as Singh points out, both forms are equivalent. If this worked it is just due to the user retyping it,as for @Robᵩ's comment

|

Loco · Accepted Answer · 2019-09-27 09:29:00Z

0

try strip(),

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

carregar_uml = carregar_uml.strip("\u202a")

answered Sep 27, 2019 at 9:29

Loco

113 bronze badges

Comments

Amadeus · Accepted Answer · 2019-11-16 19:08:06Z

0

Or you can slice out that character

file_path = r"‪C:\Test3\Accessing_mdb.txt"
file_path = file_path[1:]
with open(file_path, 'a') as f_obj:
f_obj.write('some words')

answered Nov 16, 2019 at 19:08

Amadeus

17711 bronze badges

Comments

Yechan · Accepted Answer · 2019-12-29 06:19:15Z

0

use small letter when you write your hard-disk-drive name! not big letter!

ex) H: -> error ex) h: -> not error

answered Dec 29, 2019 at 6:19

Yechan

1

Comments

Surie · Accepted Answer · 2021-07-29 13:35:03Z

0

I tried all of the above solutions. Problem is when we copy path or any string from left to write, extra character is added . It does not show in our IDE. this extra added character denotes Right to Left mark (RLM) https://en.wikipedia.org/wiki/Right-to-left_mark , i.e. you selected the text at time of copying from Right to left.

check the image Linked to my answer. I also did try copying left to right ,then this extra character is not added. So either type your path manually or copy it left to right to avoid this type of issue.

edited Jul 29, 2021 at 13:35

answered Jul 28, 2021 at 23:02

Surie

314 bronze badges

Comments

PythonPro · Accepted Answer · 2022-01-11 17:02:12Z

0

The following is a simple function to remove the "\u202a"and "\u202c" characters.

you can add any characters you want to be removed to the list.

def cleanup(inp):
    new_char = ""
    for char in inp:
        if char not in ["\u202a", "\u202c"]:
            new_char += char
    return new_char

example = '\u202a7551\u202c'
print(cleanup(example)) # prints 7551

answered Jan 11, 2022 at 17:02

PythonPro

295 bronze badges

Collectives™ on Stack Overflow

Remove u202a from Python string

8 Answers 8

1 Comment

Comments

6 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

1 Comment

Comments

6 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related