Skip to main content

Questions tagged [unicode]

Unicode is intended to be a universal character set for describing all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Filter by
Sorted by
Tagged with
9 votes
7 answers
3k views

I frequently encounter recommendations to specifically keep to ASCII characters in field and function names in documentation, even though non-ASCII (modern Unicode) generally works perfectly. An ...
Michael Macha's user avatar
6 votes
0 answers
790 views

I am wondering how to take these Hieroglyphs and make them into Unicode. I read through the Tesseract docs on how to create training data, but it seems largely tailored toward "traditional" ...
Lance Pollard's user avatar
2 votes
3 answers
154 views

For example, in Vietnamese, there are Unicode characters like "â", "ê", "ô", "ư", v.v. To type them from keyboard, I need to type aa, ee, oo, w, then a program ...
Ooker's user avatar
  • 335
10 votes
0 answers
268 views

A valid sequence of code-points can begin with one or more combining mark, which form a grapheme cluster that has no base glyph. I'm unsure how that should be handled, if at all. For example, consider ...
Wes's user avatar
  • 872
1 vote
1 answer
87 views

I've been reading Unicode's core specification (see https://www.unicode.org/versions/latest/). I mostly understood what the text was explaining in section 2.1 Architectural Context until it started ...
lonious's user avatar
  • 121
9 votes
3 answers
737 views

People often get excited about JuliaLang supporting Unicode function names. But it's not new at all,it's just that the Julia community decided that it was sometimes appropriate, and built tooling to ...
Frames Catherine White's user avatar
5 votes
1 answer
428 views

When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 ...
codepersonnel49's user avatar
2 votes
1 answer
443 views

I am developing against a file spec that lists the data type for certain fields as CHAR(<length>) The spec is for a fixed width flat file. In most cases, possible values to populate the fields ...
mathewb's user avatar
  • 137
3 votes
4 answers
5k views

From what it sounds like, a 64 bit processor means aligning to 64 bits, which means if you have unicode utf-8 stored in there, each 8-bit chunk would take up 64 bits of space. That doesn't really make ...
Lance Pollard's user avatar
0 votes
2 answers
555 views

My main goal is described here. How can Microsoft Word or Wordpad or other word editing software render fonts when these fonts seems to not follow the same rules? How do they render characters ...
HKhoshdel's user avatar
0 votes
2 answers
2k views

I am wondering why unicode encoding is necessary in JavaScript. I am looking at utf8.js as an example. I am also looking at the utf8 spec, but am not really following the different pieces of data. ...
Lance Pollard's user avatar
0 votes
1 answer
1k views

In general a character is represented in 1 byte i.e. 8 bits . This is I believe true for all text editors even for databases like oracle. 1 byte can represent 2^8 = 256 Characters. My question is when ...
user3198603's user avatar
  • 1,896
50 votes
4 answers
46k views

Our line-of-business software allows the user to save certain data as CSV. Since there are a lot of different formats (all called "CSV") in use in the wild, we are tying to decide what the &...
Heinzi's user avatar
  • 9,868
8 votes
1 answer
4k views

I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32. But then I have read the following (in this article): Let's look just at the ones that Notepad supports. ...
user9002947's user avatar
6 votes
3 answers
3k views

(Not entirely sure whether this should go in the information-security StackExchange instead; feel free to move it there if that's where it belongs.) Unicode has many, many instances of pairs or ...
Vikki's user avatar
  • 179
1 vote
1 answer
390 views

I am developing a mobile app in android in which I use Telugu (Indian language) texts. On my mobile Telugu language alphabets are available. Therefore, I am not facing any problem for testing my app. ...
Vempati Satya Suryanarayana's user avatar
8 votes
1 answer
615 views

I've been working on a UTF-8 iterator adapter. By which, I mean an adapter that turns an iterator to a char or unsigned char sequence into an iterator to a char32_t sequence. My work here was inspired ...
Nicol Bolas's user avatar
  • 12.1k
9 votes
2 answers
73k views

I can type ⅓, ⅔ and ½ but can I type 3/3 and 2/2 using unicode? I know that from a mathematical point of view the fractions 2/2 = 3/3 = 1 but I am typing a list where I want to indicate that you have ...
d-b's user avatar
  • 215
2 votes
1 answer
222 views

Many programs will supply one or more of the following as file encoding formats: UTF-8, UTF-16, UTF-32 and simply Unicode. How do I know what Unicode Transformation Format Unicode is referring to? I'm ...
Govind Rai's user avatar
9 votes
3 answers
5k views

I'm creating a library. I want to use it in multiple projects which may use multi-byte or unicode (std::string or std::wstring). I've adopted the old MS method of conditional compiling: namespace ...
001's user avatar
  • 283
1 vote
1 answer
604 views

I am working on a large command line tool, written for Python 2.6+ and supported for Windows, OS X and Linux. The target users are developers but it is also being auto-invoked by CI-systems etc. In ...
Betamos's user avatar
  • 111
10 votes
1 answer
2k views

Say your native language is Hebrew, and you're working in a programming language like Python 3, which lets you put Hebrew in source code. Good for you! You've got a dict: d = {'a': 1} and you want to ...
user2357112's user avatar
89 votes
5 answers
10k views

In the event an alien invasion occurred and we were forced to support their languages in all of our existing computer systems, is UTF-8 designed in a way to allow for their possibly vast amount of ...
Qix - MONICA WAS MISTREATED's user avatar
4 votes
2 answers
417 views

ISO 8859-1 contains a few letter-free diacritics: The diaeresis (¨), the acute accent (´), the cedilla (¸) and the macron (¯).¹ Why were they included? As far as I know (please correct me if I am ...
Heinzi's user avatar
  • 9,868
4 votes
1 answer
8k views

Today I was reading my favourite C++ Programming book (C++ Primer Plus) and the section which was about variables and character sets in C++, however I got really confused about Unicode and Wide ...
user avatar
0 votes
0 answers
160 views

We have an important legal document that our app generates in WordML, with foreign characters represented via Unicode. These foreign characters vary widely, and include languages with special ...
Zibbobz's user avatar
  • 1,602
1 vote
2 answers
360 views

When writing for technical audiences, there are various ways to type Unicode representations, but they all seem to be Hexadecimal: \uFFFF - From C# / Java Strings \U0000FFFF - From C# / Java Strings ...
Ehryk's user avatar
  • 127
8 votes
4 answers
5k views

So I'm a terrible person and I want to name a variable in my mathy-python3 code s′ (that's U+2032 PRIME). I was under the impression Unicode literals work as identifiers in Python 3, which is why my ɣ,...
Alex Lenail's user avatar
12 votes
5 answers
6k views

Something that has long confused me is that so much software uses the terms "charset" and "encoding" as synonyms. When people refer to a unicode "encoding", they always mean a ruleset for ...
Mark Amery's user avatar
  • 1,273
0 votes
1 answer
345 views

I want to create simple language learning applications to help friends in learning languages. A simple Java console application would do the trick, but the Windows console does not seem to handle ...
zxz's user avatar
  • 277
4 votes
4 answers
271 views

I have a legacy application in which the UI and business logic are already reasonably well-separated. There is a proposal to separate them even further, turning the core application into a "service" (...
omatai's user avatar
  • 195
9 votes
1 answer
4k views

I'm working on an cross platform C++ project, which doesn't consider unicode, and need change to support unicode. There is following two choices, and I need to decide which one to choose. Using UTF-8 ...
ZijingWu's user avatar
  • 1,077
5 votes
3 answers
3k views

Unicode seems that its becoming more and more ubiquitous these days if it's not already, but I have to wonder if there are any domains were Unicode isn't the best implementation choice. Are there any ...
Daniel Wolfe's user avatar
8 votes
5 answers
5k views

This is a function in the d3.v3.js file (the graph library D3.js): function d3_geo_areaRingStart() { var λ00, φ00, λ0, cosφ0, sinφ0; d3_geo_area.point = function(λ, φ) { d3_geo_area....
Nav's user avatar
  • 1,191
1 vote
1 answer
197 views

I know so little about this that I'm having trouble formulating the question. Apparently due to technical limitations, nastaleeq style of writing Urdu is very difficult, perhaps impossible, given ...
Shahbaz's user avatar
  • 181
5 votes
3 answers
603 views

While using IE autocorrect "naive" got transformed to "naïve"! My regional settings are Au English, from a Unicode search point of view the two are nothing alike. I am not even sure whether there are ...
jimjim's user avatar
  • 863
9 votes
2 answers
1k views

Are there any programming languages that support the use of unicode logic operators? For example, many programming languages use "!=" as the "does not equal" operator, but in mathematics the symbol ...
kyle k's user avatar
  • 225
14 votes
3 answers
2k views

I am desiging a file format and I want to do it right. Since it is a binary format, the very first byte (or bytes) of the file should not form valid textual characters (just like in the PNG file ...
Daniel A.A. Pelsmaeker's user avatar
4 votes
2 answers
212 views

What things need to be considered for a Website that contains International strings, for instance Simplified Chinese and English mixed. UTF8 seems to me a natural choice, including a meta tag. Still,...
Philip's user avatar
  • 1,709
6 votes
3 answers
1k views

I remember reading that there are no existing data structures which allow for random-access into a variable length encoding, like UTF-8, without requiring additional lookup tables. The main question ...
DeadMG's user avatar
  • 36.9k
32 votes
2 answers
16k views

I would imagine the reason was fast, array like access to the character at index, but some characters won't fit into 16 bits, so it wouldn't work... So if you have to handle special cases anyways, ...
zduny's user avatar
  • 2,633
36 votes
2 answers
2k views

The Unicode Terms of Use state that any software that uses their data files (or a modification of them) should carry the Unicode license references. It seems to me that most Unicode libraries have ...
Eric Grange's user avatar
3 votes
1 answer
4k views

I got this error in my program which grab data from different website and write them to a file: 'charmap' codec can't encode characters in position 151618-151624: character maps to <undefined> ...
lamwaiman1988's user avatar
3 votes
3 answers
22k views

Well, I am reading Programing Windows with MFC, and I came across Unicode and ASCII code characters. I understood the point of using Unicode over ASCII, but what I do not get is how and why is it ...
vin's user avatar
  • 177
41 votes
3 answers
172k views

I'm learning T-SQL. From the examples I've seen, to insert text in a varchar() cell, I can write just the string to insert, but for nvarchar() cells, every example prefix the strings with the letter N....
qinking126's user avatar
15 votes
2 answers
6k views

I have been looking for an efficient String trie implementation. Mostly I have found code like this: Referential implementation in Java (per wikipedia) I dislike these implementations for mostly two ...
RokL's user avatar
  • 2,451
7 votes
1 answer
2k views

http://babelstone.blogspot.com/2005/11/how-many-unicode-characters-are-there.html says there are 1 million unicode characters and around 240k of which are already assigned. 1 million > 240k > 65k ...
user4951's user avatar
  • 739
4 votes
2 answers
217 views

What could be the necessary prerequisites to be taken when developing an application with Unicode support in the context of Web applications Desktop applications Embedded applications Prerequisites to ...
Ubermensch's user avatar
  • 1,349
5 votes
3 answers
731 views

What limitations will we have if Unicode standards had decided to assign one and only one codepoint to every user-perceived character? Currently, Unicode has code-points that correspond to combining ...
Pacerier's user avatar
  • 5,063
16 votes
8 answers
3k views

I personally find reading code full of Unicode identifiers confusing. In my opinion, it also prevents the code from being easily maintained. Not to mention all the effort required for authors of ...
Egor Tensin's user avatar