3

I'm trying to open a file in Python, but I got an error, and in the beginning of the string I got a /u202a character... Does anyone know how to remove it?

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

OSError: [Errno 22] Invalid argument: '\u202aH:\7 - Script\teste.csv'

2

8 Answers 8

13

When you initially created your .py file, your text editor introduced a non-printing character.

Consider this line:

carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

Let's carefully select the string, including the quotes, and copy-paste it into an interactive Python session:

$ python
Python 3.6.1 (default, Jul 25 2017, 12:45:09) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "‪H:\\7 - Script\\teste.csv"
'\u202aH:\\7 - Script\\teste.csv'
>>> 

As you can see, there is a character with codepoint U-202A immediately before the H.

As someone else pointed out, the character at codepoint U-202A is LEFT-TO-RIGHT EMBEDDING. Returning to our Python session:

>>> s = "‪H:\\7 - Script\\teste.csv"
>>> import unicodedata
>>> unicodedata.name(s[0])
'LEFT-TO-RIGHT EMBEDDING'
>>> unicodedata.name(s[1])
'LATIN CAPITAL LETTER H'
>>> 

This further confirms that the first character in your string is not H, but the non-printing LEFT-TO-RIGHT EMBEDDING character.

I don't know what text editor you used to create your program. Even if I knew, I'm probably not an expert in that editor. Regardless, some text editor that you used inserted, unbeknownst to you, U+202A.

One solution is to use a text editor that won't insert that character, and/or will highlight non-printing characters. For example, in vim that line appears like so:

carregar_uml("<202a>H:\\7 - Script\\teste.csv", variaveis)

Using such an editor, simply delete the character between " and H.

carregar_uml("H:\\7 - Script\\teste.csv", variaveis)

Even though this line is visually identical to your original line, I have deleted the offending character. Using this line will avoid the OSError that you report.

Sign up to request clarification or add additional context in comments.

1 Comment

This is the correct answer. The answer that OP accepted only worked because it got OP to re-type the string.
3

you can use this sample code to remove u202a from file path

st="‪‪F:\\somepath\\filename.xlsx"    
data = pd.read_excel(st)

if i try to do this it gives me a OSError and In detail

Traceback (most recent call last):
  File "F:\CodeRepo\PythonWorkSpace\demo\removepartofstring.py", line 14, in <module>
    data = pd.read_excel(st)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 653, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 424, in __init__
    self.book = xlrd.open_workbook(filepath_or_buffer)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\__init__.py", line 111, in open_workbook
    with open(filename, "rb") as f:
OSError: [Errno 22] Invalid argument: '\u202aF:\\somepath\\filename.xlsx'

but if i do that like this

    st="‪‪F:\\somepath\\filename.xlsx" 
    data = pd.read_excel(st.strip("‪u202a")) #replace your string here

Its working for me

Comments

1

The problem is the directory path of the file is not read properly. Use raw strings to pass it as argument and it should work.

carregar_uml(r'H:\7 - Script\teste.csv', variaveis)

6 Comments

did it worked or not? pls let me know since my answer was down voted.
Can you also explain what the error is bcz r'H:\7 - Script\teste.csv' is equivalent to 'H:\\7 - Script\\teste.csv'
It appeared to work for OP presumably because he retyped the string, avoiding the non-printing character.
@singh, the open file is the problem. passing it as a raw string enables the open command to treat it as a whole string without any parsing it as a directory path
No, as Singh points out, both forms are equivalent. If this worked it is just due to the user retyping it,as for @Robᵩ's comment
|
0

try strip(),

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)

carregar_uml = carregar_uml.strip("\u202a")

Comments

0

Or you can slice out that character

file_path = r"‪C:\Test3\Accessing_mdb.txt"
file_path = file_path[1:]
with open(file_path, 'a') as f_obj:
f_obj.write('some words')

Comments

0

use small letter when you write your hard-disk-drive name! not big letter!

ex) H: -> error ex) h: -> not error

Comments

0

I tried all of the above solutions. Problem is when we copy path or any string from left to write, extra character is added . It does not show in our IDE. this extra added character denotes Right to Left mark (RLM) https://en.wikipedia.org/wiki/Right-to-left_mark , i.e. you selected the text at time of copying from Right to left.

check the image Linked to my answer. enter image description here I also did try copying left to right ,then this extra character is not added. So either type your path manually or copy it left to right to avoid this type of issue.

Comments

0

The following is a simple function to remove the "\u202a"and "\u202c" characters.

you can add any characters you want to be removed to the list.

def cleanup(inp):
    new_char = ""
    for char in inp:
        if char not in ["\u202a", "\u202c"]:
            new_char += char
    return new_char

example = '\u202a7551\u202c'
print(cleanup(example)) # prints 7551

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.