4

I have Unicode text as follows

(S (NP (N \u0db6\u0dbd\u0dbd\u0dcf)) (VP (V \u0db6\u0dbb\u0dc0\u0dcf)))

How do I change this to a readable format by converting the codes '\u0___' in to the relevant readable characters. I'm using python version 2.7

I obtained that output by following code segment in NLTK (3.0) where tree is a nltk.tree.Tree

for tree in treelist1:
    print unicode(str(tree))

I need something like print(TreePrettyPrinter(tree).text()) where it gives unicode compatible output as I wanted, but with a tree layout that I don't want. Is there a method in NLTK to get such a readable text like output too?


Same issue have with the output from

for rule in grammar1.productions():
    print(rule.unicode_repr())

where grammar1 is nltk.grammar.CFG

Output is as follows.

VP -> V
VP -> NP V
N -> '\u0db6\u0dbd\u0dca\u0dbd\u0dcf'
N -> '\u0db8\u0dd2\u0db1\u0dd2\u0dc3\u0dcf'
N -> '\u0db8\u0dda\u0dc3\u0dba'

Final results are perfectly fine. I only have issues with the representation of the output

3
  • Did you try printing the value contained in the field itself? Commented Sep 28, 2015 at 20:05
  • 1
    The Windows console is notoriously bad at handling Unicode strings, you may be better creating some sort of interface or file you can output to, rather than lots of explicit encoding/decoding Commented Sep 28, 2015 at 20:07
  • @IgnacioVazquez-Abrams gives the same output for field itself. ex: print(tree) and print(grammar1) Commented Sep 28, 2015 at 20:19

1 Answer 1

3

Solutions are there in this question. Also works for Python 2.7

Nothing to do with NLTK. Simple solution is just decode the output text with 'unicode_escape'

print(str(tree).decode('unicode_escape'))

and

print(rule.unicode_repr().decode('unicode_escape'))

For NTLK kind of solution for print the tree of type nltk.tree.Tree as a bracketed text, use the following

print(tree.pformat())
Sign up to request clarification or add additional context in comments.

1 Comment

a simpler solution is to avoid invoking unicode_repr or str. 'unicode_escape' masks the bugs upstream.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.