Questions tagged [unicode]
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems.
504 questions
0
votes
1
answer
121
views
Is there a list of every scancode that Linux uses?
I am making another remapper like xkb, sxhkd, xmodmap etc. because I don't like the other ones and in this one I want a more simple and terse syntax that I find nice to use and to make an API that ...
0
votes
1
answer
85
views
Nonstandard subnational flag emoji: What part of the system is responsible?
So, I'm using Linux Mint 21.3 with MATE 1.26.0. I've noticed that my system supports a number of nonstandard flag emoji. I'm wondering what part of the system is responsible for this, if this is ...
1
vote
1
answer
174
views
Gibberish characters in EFI variables
Do gibberish characters found in EFI variables serve any purpose?
Out of curiosity, i am trying to read out EFI variables. Specifically ones related to the booting mechanism.
Under /sys/firmware/efi/...
0
votes
2
answers
262
views
To have or not Byte Order Mark (BOM) in UTF-8 text files?(Linux)
Is it advisable to have or not Byte Order Mark (BOM) in UTF-8 text files on Linux?
Is it correct to say byte order (even for multi-byte characters) is already strictly defined/fixed in UTF-8 standard?
...
6
votes
3
answers
573
views
How to make Perl half/full width-insensitive regular expressions?
In Perl, /a/i matches both A and a, so I don't have to write /A|a/.
What is the easy way to write /4|4/ ?
Yes, I'm talking about
$ unicode 4 4|grep U+
U+FF14 FULLWIDTH DIGIT FOUR
U+0034 DIGIT FOUR
...
5
votes
2
answers
892
views
iconv fails to detect valid utf-8 character as utf-8
My input data is as follows (as generated by hexdump):
000000f0 69 61 6e e2 80 99 73 20 65 79 65 73 20 61 62 72 |ian...s eyes abr|
When I open this html () file in Firefox, it displays these ...
0
votes
2
answers
216
views
How to insert text before the first line of an UTF-8 with BOM file
This question is closely related to: How to insert text before the first line of a file?. I deliberately made the title similar to that question to highlight this.
Except the target file is UTF-8 with ...
1
vote
0
answers
104
views
Unconsistent display of unicode chars between software and Ubuntus
I have two computers installed slightly differently :
A: KUbuntu based on 22.04.3 LTS
B: Ubuntu 24.04.1 LTS + KDE somehow added after
I noticed between the 2 that some (not all) Unicode chars where ...
0
votes
0
answers
76
views
Cross-platform method of checking if using terminal emulator or tty
I am looking for a cross platform way to check if I am using a terminal emulator (with support for unicode characters) or a TTY session (with only support for ASCII chars). I initially tried to use if ...
2
votes
1
answer
88
views
Cannot insert the mapsto character ↦ in groff
I am trying to learn how to insert the mapsto (↦, U+21A6) character in groff.
I am trying to use this code to insert the character
\[u21A6]
But I get the following error message and nothing is ...
2
votes
1
answer
735
views
Which interpreter for "Unicode text, UTF-8 text executable"
I'm trying to set up a keybinding for an executable which is in my home. For this, I set the command:
sh -c '\"/path/to/the/executable\" --options'
But, it does not work, and, when I'm ...
11
votes
3
answers
2k
views
UTF-8 characters in POSIX shell script *comments* - anything against it?
I would like to include a couple of non-ASCII characters in my POSIX shell script comments. Note this is in no way a duplicate of e.g. "Which character encodings are supported by posix?" as ...
1
vote
0
answers
60
views
Ignore Accent Differences in Zsh Autocomplete
Suppose I have a directory named cálculo in the current directory. How can I autocomplete its name after typing the starting characters without the accent?
$ cd calc<tab>
$ cd cálculo/
I failed ...
1
vote
1
answer
300
views
Fontawesome icons are not pasted correctly
I am using Fedora and installed fontawesome via sudo dnf install fontawesome fonts. Later because it didn't work I also additionally installed the font manually via downloading the zip from the Github ...
2
votes
0
answers
176
views
How do I disable UTF-8 in an xterm (or X, really)?
I have a system running Debian unstable where I don't want to have UTF-8 in my xterms (or at all). But I recently discovered that somehow I now have UTF-8 in my xterms and other windows. It might have ...
0
votes
0
answers
103
views
ls: single-column vs. multi-column layout, non-Unicode characters in filenames
Create a directory ~/test with abcdefghijklmnopqrstuvwxyz and zyxwvutsrqponmlkjihgfedcba files in it.
ls ~/test will list them using multi-column layout:
abcdefghijklmnopqrstuvwxyz ...
2
votes
2
answers
365
views
Search and replace composed Unicode characters
I have a deep folder structure on a Debian machine
where the directory names and the filenames
contain some "special" characters (ä,ö,ü).
However, these are not in "ISO-8859-1"
...
3
votes
1
answer
156
views
'ls name' and 'ls | grep name' with accent different
I am on Xigmanas (NAS freebsd). I'll explain the situation as simply as possible:
:; set | egrep 'LC_A|LANG'
GDM_LANG=fr_FR.UTF-8
LANG=fr_FR.UTF-8
LC_ALL=fr_FR.UTF-8
SLIM_LANG=fr_FR.UTF-8
:; ls -i ...
0
votes
0
answers
91
views
Terminal: Help understanding behavior with UTF-8 text
I am trying to understand the following behavior I am observing on my Ubuntu system. Consider the following two files:
$ hexdump -C 1.txt
00000000 d9 82 d8 a8 d8 a7 d9 86 d9 8a 5e d9 84 d9 86 d8 |.....
1
vote
0
answers
56
views
XQuartz xterm UTF-8 resource name
I was using UTF-8 resources names like these ones:
wengé*Background: #321
wengé*Foreground: #ffb
and this was working with XQuartz 2.8.1 through this direct
call like from within the ...
1
vote
1
answer
83
views
Crossmark symbol (\u274c) doesn't work in debian 12
I have moved from Ubuntu 22.04 to Debian 12, I have a bash function that outputs crossmark if command failed and checkmark if command succeed. The checkmark works, but the crossmark doesn't.
Here is ...
1
vote
1
answer
477
views
What puts the terminal in Unicode mode?
I have a Debian server which is not properly displaying Unicode characters when logged in locally, without starting X11. Unicode works after running unicode_start (until the terminal is closed). It ...
1
vote
1
answer
78
views
How to use unix `mv` to rename files with unicode spaces(not U+20)?
$ ls cn*
cn blah blah.txt
$ ls cn\ *
ls: cannot access 'cn *': No such file or directory
$ ls cn*|hexdump -C
00000000 63 6e e2 80 85 62 6c 61 68 c2 a0 62 6c 61 68 2e |cn...blah..blah.|
00000010 74 ...
5
votes
2
answers
254
views
Why is ls sorting Chinese filenames by length?
I've run into a bit of a weird behaviour that I don't fully understand with ls and Chinese filenames. I'm running macOS 13.6.1 with SIP enabled (no core OS modifications), MacPorts installed, and US ...
2
votes
1
answer
724
views
Can awk be told to count the character string length rather than byte string length for '%10s' printf formats?
Try this for an output of |Ü| X|:
echo 'Ü X' | awk '{printf("|% 2s|% 2s|\n", $1, $2)}'
Obviously awk counts the byte length, not the character length of the Ü, so the count is 2 and no left ...
0
votes
2
answers
157
views
groff -mandoc creating "ESC[1m" versus overstriking with backspace for bold text
I found that groff uses different ways to indicate bold text for the utf8 output format.
On FreeBSD 14, groff emits escape codes for a terminal (ESC, [1m):
$ printf ".Dd today\n.Sh NAME\n" | ...
0
votes
1
answer
175
views
Why is MB_CUR_MAX 6 instead of 4 for UTF-8? (Linux, glibc)
MB_CUR_MAX is defined by glibc as 'a positive integer expression that is the maximum number of bytes in a multibyte character in the current locale.'
If I print the value I get 1. I assume that this ...
0
votes
2
answers
549
views
I need to create a pipe to convert string from UTF-8 to UTF-7-IMAP
To automate the command line creation of hundreds of directories in IMAP maildirs, I would need to be able to convert UTF-8 strings to UTF-7-IMAP on the fly.
In php, I found a way to do it with a ...
0
votes
2
answers
371
views
Listing filenames with special characters
I have a zsh shell (with oh-my-zsh default config). Why I ls filenames with special characters, they are printed as:
''$'\316\262''=0.35-L=32-m=10.jld2'
This should be:
β=0.35-L=32-m=10.jld2
but the ...
1
vote
0
answers
50
views
Debian terminal not displaying correct Unicode half-block characters [duplicate]
I have a program that prints Unicode half-block characters (U+2580, U+2584), but on Debian 10 terminals (just the fullscreen terminal, no X), it's printing diamonds instead of half-blocks.
The two ...
1
vote
0
answers
119
views
Ctrl-Shift-U requires *extra* U in Ubuntu 23.04 Cinnamon?
I'm running a new install of Ubuntu 23.04 with cinnamon desktop 5.6.7
Typing Ctrl-Shift-u in a terminal does nothing unles the next character is another u; then the underlined u appears and I can ...
1
vote
2
answers
1k
views
Entering special characters the same way on Windows and Linux
ctrlshiftu followed by the hex value of a Unicode character enters that character. For example, ctrlshiftu41 enters 'A', whose value is 0x41 in hex and 65 in decimal.
There's also the compose key, ...
1
vote
1
answer
107
views
Expand tabs in file with utf8 characters
I use expand to expand tabs to spaces. For utf8 files expand doesn't work correctly. E.g. in ć\ta tab is expanded to 6 spaces while in a\ta to 7 spaces.
How do I make it work for utf8 files?
-1
votes
2
answers
127
views
Is ∞ allowed in UTF-8 Encoded files?
Are lemniscates, ∞, allowed in UTF-8 Encoded files?
I am hoping that students with less than six months of computer programming experience can use a search engine to type something like "is ...
2
votes
2
answers
234
views
Unicode Supplementary Multilingual Plane (Plane 1) glyphs in xterm
I'm trying to display Unicode Supplementary Multilingual Plane (Plane 1) glyphs in xterm. Those glyphs are in the U+010000..U+01FFFF range (https://unifoundry.com/pub/unifont/unifont-15.0.01/...
7
votes
1
answer
2k
views
How should I interpret the fact that a Unicode code point is shown in two completely different ways in two different terminal emulators?
This is kind of a spin off from an older question I asked.
Here's the screenshot from that question:
In the bottom left is URxvt, and you can see a lighting bolt-like icon at the beginning of the ...
4
votes
4
answers
556
views
Collect chars from strings and print their unicode
Context (skip, if you don't care; read, if you suspect I'm totally on the wrong track)
For an embedded system with small memory, I want to generate fonts which contain only those glyphs actually ...
1
vote
0
answers
164
views
Pasting non-ascii (utf8) into remote urxvt terminal
For pasting text, in urxvt/rxvt-unicode one can use middle button to paste PRIMARY selection.
I can do such Mouse-Middle-Click paste in my local urxvt terminal and even a remote server, in Chinese/...
0
votes
0
answers
278
views
Script for awscli check not working with crontab schedule
I have written a small code snippet to check the aws cli version
#!/usr/bin/env bash
if [ -e "/usr/local/bin/aws" ];
then
myAWS="/usr/local/bin/aws"
else
...
9
votes
3
answers
4k
views
Box character doesn't display properly in Linux terminal
I was just writing a C++] program that uses the box characters to display information.
I ran the program on macOS and used the terminal app and it worked fine.
When I switched to Debian Linux using ...
2
votes
2
answers
1k
views
How to combine settings from multiple locales in Linux?
When I installed Linux I set my locale to en_US.UTF-8. However I want to override some but not all of the settings in that locale. Specifically, I would like the Measurement to be Metric instead of ...
1
vote
1
answer
112
views
Command similar to ascii for ascii extended and/or for unicode?
ascii command in Linux is fast and great. It allows us to search for a character or for a code point and returns all relevant results for a given search. Is there something similar for ASCII extended (...
4
votes
1
answer
319
views
How do I create a zip that preserves unicode character composition on linux?
I'm on Debian. I have a file called Sóanr.jpg. According to https://emojidissector.com/, this is made of the following code points:
S 0053 LATIN CAPITAL LETTER S
o 006F LATIN SMALL LETTER O
...
1
vote
3
answers
114
views
Writing bash arguments with trunctation
I want to print the first two arguments of a bash function, with the unicode character \u2263 on each side using a two space separation. The thing is that the final unicode must display at column 70. ...
3
votes
1
answer
320
views
Different encoding/Unicode interpretation using terminal vs using shell script
I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are ...
3
votes
0
answers
253
views
Unnormalized UTF-8 directory names
I noticed something interesting in one of my directories:
$ ls -li
total 36
2625309 drwxrwxr-x 2 dotancohen dotancohen 4096 Jul 4 2022 Español
2625385 drwxrwxr-x 2 dotancohen dotancohen 4096 Jul ...
0
votes
0
answers
167
views
Is there a way to remove specific emoji from being rendered in any application while using Cinnamon desktop?
I am slightly annoyed with some emojis.
So I was wondering, how could I remove/prevent some emojis from being rendered at all?
Replacing them with some other emoji like cute cat face could work too.
...
0
votes
1
answer
120
views
testdisk utility reports nonexistent files from a exFAT drive used with Windows - why?
I tried to recover lost files from an exFAT thumb drive with the testdisk package on linux. It was very good at finding deleted files. However as I went through the entries, I saw weird entries. The ...
1
vote
0
answers
46
views
Cannot use unicode shortcut on non-english layouts
I’m using US and RU layouts, and while I can use Ctrl+Shift+u, when I have US layout selected, when I try to use it with RU layout selected, it just doesn’t work. Didn’t find anything related to it in ...
1
vote
3
answers
915
views
Looking up and Inputing arbitrary unicode characters in console/terminal
I'm looking for a simple, generic way to input arbitrary unicode characters in a text document on the terminal(e.g. in a terminal editor).
A basic method I can imagine is having a simple text(utf-8) ...