1

ascii command in Linux is fast and great. It allows us to search for a character or for a code point and returns all relevant results for a given search. Is there something similar for ASCII extended (e.g.: ISO-8859-1) and/or for Unicode characters?

1 Answer 1

1

The unicode tool provides similar functionality to ascii:

$ unicode -d ..255
          .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F 
     000.                                                 
     001.                                                 
     002.     !  "  #  $  %  &  '  (  )  *  +  ,  -  .  / 
     003.  0  1  2  3  4  5  6  7  8  9  :  ;  <  =  >  ? 
     004.  @  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O 
     005.  P  Q  R  S  T  U  V  W  X  Y  Z  [  \  ]  ^  _ 
     006.  `  a  b  c  d  e  f  g  h  i  j  k  l  m  n  o 
     007.  p  q  r  s  t  u  v  w  x  y  z  {  |  }  ~    
     008.                                                 
     009.                                                 
     00A.     ¡  ¢  £  ¤  ¥  ¦  §  ¨  ©  ª  «  ¬    ®  ¯ 
     00B.  °  ±  ²  ³  ´  µ  ¶  ·  ¸  ¹  º  »  ¼  ½  ¾  ¿ 
     00C.  À  Á  Â  Ã  Ä  Å  Æ  Ç  È  É  Ê  Ë  Ì  Í  Î  Ï 
     00D.  Ð  Ñ  Ò  Ó  Ô  Õ  Ö  ×  Ø  Ù  Ú  Û  Ü  Ý  Þ  ß 
     00E.  à  á  â  ã  ä  å  æ  ç  è  é  ê  ë  ì  í  î  ï 
     00F.  ð  ñ  ò  ó  ô  õ  ö  ÷  ø  ù  ú  û  ü  ý  þ  ÿ 

It can be used to map other code pages (see its --fromcp option):

$ unicode --fcp cp437 -d 200
U+255A BOX DRAWINGS DOUBLE UP AND RIGHT
UTF-8: e2 95 9a UTF-16BE: 255a Decimal: &#9562; Octal: \022532
╚
Category: So (Symbol, Other); East Asian width: A (ambiguous)
Unicode block: 2500..257F; Box Drawing
Bidi: ON (Other Neutrals)

It can also be used to search for characters by name:

$ unicode acute
U+00B4 ACUTE ACCENT
UTF-8: c2 b4 UTF-16BE: 00b4 Decimal: &#180; Octal: \0264
´
Category: Sk (Symbol, Modifier); East Asian width: A (ambiguous)
Unicode block: 0080..00FF; Latin-1 Supplement
Bidi: ON (Other Neutrals)

Decomposition: <compat> 0020 0301

U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
UTF-8: c3 81 UTF-16BE: 00c1 Decimal: &#193; Octal: \0301
Á (á)
Lowercase: 00E1
Category: Lu (Letter, Uppercase); East Asian width: N (neutral)
Unicode block: 0080..00FF; Latin-1 Supplement
Bidi: L (Left-to-Right)

Decomposition: 0041 0301
…

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.