0

Here is what I ultimately want:

A dictionary that holds unicode chars as keys and html code + unicode number as list values.

Basic_Latin = {
        ...
        "@": ["U+0040", "@"],
        ...
        }

How can this be achieved if only the key is given?

I think of something like this:

Basic_Latin = {
        ...
        "@": [to_unicode(@), to_html(@)],
        ...
        }

If find a lot of methods for converting the other way round, but not for what I am looking for.

1 Answer 1

1

All that the notations contain is the hexadecimal and decimal value for the Unicode codepoint of the character. That value can easily be obtained by using the ord() function, then formatting the resulting integer:

codepoint = ord('@')
unicode_codepoint = 'U+{:04X}'.format(codepoint)  # four-digit uppercase hex
html_escape = '&#{:d};'.format(codepoint)         # decimal number

or as a function:

def codepoints(c):
    codepoint = ord(c)
    return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))

The function returns a tuple rather than a list; presumably this doesn't need to be mutable after all. You probably want to consider using a namedtuple class so you can also use attribute access.

Demo:

>>> def codepoints(c):
...     codepoint = ord(c)
...     return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))
...
>>> codepoints('@')
('U+0040', '@')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.