Newest 'unicode' Questions

9 votes

7 answers

3k views

Are there historical problems with non-ASCII identifier characters in code?

I frequently encounter recommendations to specifically keep to ASCII characters in field and function names in documentation, even though non-ASCII (modern Unicode) generally works perfectly. An ...

Michael Macha

396

asked Jan 29, 2022 at 16:29

6 votes

0 answers

790 views

How to OCR and/or recreate lines of Egyptian Hieroglyphs in Unicode/HTML?

I am wondering how to take these Hieroglyphs and make them into Unicode. I read through the Tesseract docs on how to create training data, but it seems largely tailored toward "traditional" ...

Lance Pollard

2,787

asked Jul 22, 2020 at 16:01

2 votes

3 answers

154 views

What is the name of the type of program to produce Unicode characters from ASCII combinations?

For example, in Vietnamese, there are Unicode characters like "â", "ê", "ô", "ư", v.v. To type them from keyboard, I need to type aa, ee, oo, w, then a program ...

Ooker

335

asked Jul 18, 2020 at 12:39

10 votes

0 answers

268 views

Is there any guideline from Unicode on how to deal with graphemes that have no base character?

A valid sequence of code-points can begin with one or more combining mark, which form a grapheme cluster that has no base glyph. I'm unsure how that should be handled, if at all. For example, consider ...

Wes

872

asked Jun 17, 2020 at 19:08

1 vote

1 answer

87 views

Layout Behavior of Characters (question about unicode standard)

I've been reading Unicode's core specification (see https://www.unicode.org/versions/latest/). I mostly understood what the text was explaining in section 2.1 Architectural Context until it started ...

lonious

121

asked Feb 15, 2020 at 23:45

9 votes

3 answers

737 views

What was the first language to allow Unicode in function names?

People often get excited about JuliaLang supporting Unicode function names. But it's not new at all,it's just that the Julia community decided that it was sometimes appropriate, and built tooling to ...

Frames Catherine White

942

asked Dec 25, 2019 at 0:00

5 votes

1 answer

428 views

UTF-8 questions

When you encode a code point to code units based on UTF-8, then if the code point fits on 7 bits, the most significant bit is set to zero so that it tells you it is a character which is stored on 1 ...

codepersonnel49

69

asked Nov 15, 2019 at 22:03

2 votes

1 answer

443 views

Differentiating Between ASCII and Unicode in File Spec

I am developing against a file spec that lists the data type for certain fields as CHAR(<length>) The spec is for a fixed width flat file. In most cases, possible values to populate the fields ...

mathewb

137

asked Aug 22, 2018 at 17:11

3 votes

4 answers

5k views

How to align on both word size and cache lines in x86

From what it sounds like, a 64 bit processor means aligning to 64 bits, which means if you have unicode utf-8 stored in there, each 8-bit chunk would take up 64 bits of space. That doesn't really make ...

Lance Pollard

2,787

asked Aug 22, 2018 at 16:35

0 votes

2 answers

555 views

How does MS word renders different fonts?

My main goal is described here. How can Microsoft Word or Wordpad or other word editing software render fonts when these fonts seems to not follow the same rules? How do they render characters ...

HKhoshdel

11

asked Aug 8, 2018 at 6:26

0 votes

2 answers

2k views

Why Unicode Encoding/Decoding is Necessary in JavaScript

I am wondering why unicode encoding is necessary in JavaScript. I am looking at utf8.js as an example. I am also looking at the utf8 spec, but am not really following the different pieces of data. ...

Lance Pollard

2,787

asked Jul 23, 2018 at 21:45

0 votes

1 answer

1k views

Java takes 2 bytes to represent character?

In general a character is represented in 1 byte i.e. 8 bits . This is I believe true for all text editors even for databases like oracle. 1 byte can represent 2^8 = 256 Characters. My question is when ...

user3198603

1,896

asked Jul 6, 2018 at 14:31

50 votes

4 answers

46k views

Should UTF-8 CSV files contain a BOM (byte order mark)?

Our line-of-business software allows the user to save certain data as CSV. Since there are a lot of different formats (all called "CSV") in use in the wild, we are tying to decide what the &...

Heinzi

9,868

asked Jun 18, 2018 at 7:36

8 votes

1 answer

4k views

Is the BOM optional for UTF-16 and UTF-32?

I used to think that the BOM is optional for UTF-8, but mandatory for UTF-16 and UTF-32. But then I have read the following (in this article): Let's look just at the ones that Notepad supports. ...

user9002947

249

asked Apr 28, 2018 at 5:11

6 votes

3 answers

3k views

Why does Unicode have separate codepoints for characters with identical glyphs?

(Not entirely sure whether this should go in the information-security StackExchange instead; feel free to move it there if that's where it belongs.) Unicode has many, many instances of pairs or ...

Vikki

179

asked Apr 4, 2018 at 22:32

1 vote

1 answer

390 views

Unicode Telugu language characters

I am developing a mobile app in android in which I use Telugu (Indian language) texts. On my mobile Telugu language alphabets are available. Therefore, I am not facing any problem for testing my app. ...

Vempati Satya Suryanarayana

123

asked Jan 17, 2018 at 11:40

8 votes

1 answer

615 views

Do C++'s iterator categories forbid writing a UTF-8 iterator adapter?

I've been working on a UTF-8 iterator adapter. By which, I mean an adapter that turns an iterator to a char or unsigned char sequence into an iterator to a char32_t sequence. My work here was inspired ...

Nicol Bolas

12.1k

asked Apr 1, 2017 at 18:43

9 votes

2 answers

73k views

I can type ⅓, ⅔ and ½ but can I type 3/3 and 2/2 using unicode? [closed]

I can type ⅓, ⅔ and ½ but can I type 3/3 and 2/2 using unicode? I know that from a mathematical point of view the fractions 2/2 = 3/3 = 1 but I am typing a list where I want to indicate that you have ...

d-b

215

asked Oct 9, 2016 at 19:04

2 votes

1 answer

222 views

What Unicode Transformation Format is being represented when just Unicode is written?

Many programs will supply one or more of the following as file encoding formats: UTF-8, UTF-16, UTF-32 and simply Unicode. How do I know what Unicode Transformation Format Unicode is referring to? I'm ...

Govind Rai

139

asked Aug 2, 2016 at 19:10

9 votes

3 answers

5k views

Compiling for string and wstring

I'm creating a library. I want to use it in multiple projects which may use multi-byte or unicode (std::string or std::wstring). I've adopted the old MS method of conditional compiling: namespace ...

001

283

asked Jul 22, 2016 at 17:39

1 vote

1 answer

604 views

Cross-platform unicode support for Python command line tool

I am working on a large command line tool, written for Python 2.6+ and supported for Windows, OS X and Linux. The target users are developers but it is also being auto-invoked by CI-systems etc. In ...

Betamos

111

asked Apr 14, 2016 at 2:54

10 votes

1 answer

2k views

How do you mix left-to-right and right-to-left scripts without your files looking crazy?

Say your native language is Hebrew, and you're working in a programming language like Python 3, which lets you put Hebrew in source code. Good for you! You've got a dict: d = {'a': 1} and you want to ...

user2357112

779

asked Apr 9, 2016 at 3:07

89 votes

5 answers

10k views

Would UTF-8 be able to support the inclusion of a vast alien language with millions of new characters?

In the event an alien invasion occurred and we were forced to support their languages in all of our existing computer systems, is UTF-8 designed in a way to allow for their possibly vast amount of ...

Qix - MONICA WAS MISTREATED

1,936

asked Nov 24, 2015 at 12:18

4 votes

2 answers

417 views

Why does ISO 8859-1 contain letter-free diacritics?

ISO 8859-1 contains a few letter-free diacritics: The diaeresis (¨), the acute accent (´), the cedilla (¸) and the macron (¯).¹ Why were they included? As far as I know (please correct me if I am ...

Heinzi

9,868

asked Sep 2, 2015 at 19:30

4 votes

1 answer

8k views

What is the difference between "Wide character" set and "Unicode character set"? [closed]

Today I was reading my favourite C++ Programming book (C++ Primer Plus) and the section which was about variables and character sets in C++, however I got really confused about Unicode and Wide ...

user192922

asked Aug 23, 2015 at 16:16

0 votes

0 answers

160 views

How can I resolve Unicode Hex Value Mismatches between WordML and XSL:FO?

We have an important legal document that our app generates in WordML, with foreign characters represented via Unicode. These foreign characters vary widely, and include languages with special ...

Zibbobz

1,602

asked Aug 5, 2015 at 15:11

1 vote

2 answers

360 views

Is there accepted decimal-based Unicode notation for technical audiences?

When writing for technical audiences, there are various ways to type Unicode representations, but they all seem to be Hexadecimal: \uFFFF - From C# / Java Strings \U0000FFFF - From C# / Java Strings ...

Ehryk

127

asked Mar 14, 2015 at 5:15

8 votes

4 answers

5k views

Prime symbol in Python variable name

So I'm a terrible person and I want to name a variable in my mathy-python3 code s′ (that's U+2032 PRIME). I was under the impression Unicode literals work as identifiers in Python 3, which is why my ɣ,...

Alex Lenail

183

asked Oct 7, 2014 at 15:48

12 votes

5 answers

6k views

Why does "charset" really mean "encoding" in common usage?

Something that has long confused me is that so much software uses the terms "charset" and "encoding" as synonyms. When people refer to a unicode "encoding", they always mean a ruleset for ...

Mark Amery

1,273

asked Sep 7, 2014 at 13:13

0 votes

1 answer

345 views

Simple unicode application?

I want to create simple language learning applications to help friends in learning languages. A simple Java console application would do the trick, but the Windows console does not seem to handle ...

zxz

277

asked Sep 6, 2014 at 13:37

4 votes

4 answers

271 views

Technical reasons to prefer coding business logic to support Unicode (when not required)

I have a legacy application in which the UI and business logic are already reasonably well-separated. There is a proposal to separate them even further, turning the core application into a "service" (...

omatai

195

asked Jul 22, 2014 at 1:38

9 votes

1 answer

4k views

Does it make sense to choose UTF-32, based on concern that some basic rule will be broken for UTF-8?

I'm working on an cross platform C++ project, which doesn't consider unicode, and need change to support unicode. There is following two choices, and I need to decide which one to choose. Using UTF-8 ...

ZijingWu

1,077

asked Apr 17, 2014 at 10:06

5 votes

3 answers

3k views

When should I not use Unicode? [duplicate]

Unicode seems that its becoming more and more ubiquitous these days if it's not already, but I have to wonder if there are any domains were Unicode isn't the best implementation choice. Are there any ...

Daniel Wolfe

151

asked Apr 10, 2014 at 17:50

8 votes

5 answers

5k views

Using π, φ, λ etc. as variable names while programming? [duplicate]

This is a function in the d3.v3.js file (the graph library D3.js): function d3_geo_areaRingStart() { var λ00, φ00, λ0, cosφ0, sinφ0; d3_geo_area.point = function(λ, φ) { d3_geo_area....

Nav

1,191

asked Mar 4, 2014 at 15:08

1 vote

1 answer

197 views

How can I learn about typography, fonts, glyphs, etc.? [closed]

I know so little about this that I'm having trouble formulating the question. Apparently due to technical limitations, nastaleeq style of writing Urdu is very difficult, perhaps impossible, given ...

Shahbaz

181

asked Dec 7, 2013 at 3:04

5 votes

3 answers

603 views

Consequences of "naïve" vs "naive"?

While using IE autocorrect "naive" got transformed to "naïve"! My regional settings are Au English, from a Unicode search point of view the two are nothing alike. I am not even sure whether there are ...

jimjim

863

asked Nov 25, 2013 at 12:50

9 votes

2 answers

1k views

Languages supporting unicode logic operators

Are there any programming languages that support the use of unicode logic operators? For example, many programming languages use "!=" as the "does not equal" operator, but in mathematics the symbol ...

kyle k

225

asked Nov 6, 2013 at 4:08

14 votes

3 answers

2k views

A Unicode sentinel value I can use?

I am desiging a file format and I want to do it right. Since it is a binary format, the very first byte (or bytes) of the file should not form valid textual characters (just like in the PNG file ...

Daniel A.A. Pelsmaeker

2,755

asked Mar 13, 2013 at 15:15

4 votes

2 answers

212 views

Strategy for website with international strings

What things need to be considered for a Website that contains International strings, for instance Simplified Chinese and English mixed. UTF8 seems to me a natural choice, including a meta tag. Still,...

Philip

1,709

asked Feb 23, 2013 at 8:47

6 votes

3 answers

1k views

Is O(1) random access into variable length encoding strings useful?

I remember reading that there are no existing data structures which allow for random-access into a variable length encoding, like UTF-8, without requiring additional lookup tables. The main question ...

DeadMG

36.9k

asked Nov 11, 2012 at 19:18

32 votes

2 answers

16k views

Why does Java use UTF-16 for internal string representation?

I would imagine the reason was fast, array like access to the character at index, but some characters won't fit into 16 bits, so it wouldn't work... So if you have to handle special cases anyways, ...

zduny

2,633

asked Nov 7, 2012 at 13:40

36 votes

2 answers

2k views

Unicode license

The Unicode Terms of Use state that any software that uses their data files (or a modification of them) should carry the Unicode license references. It seems to me that most Unicode libraries have ...

Eric Grange

413

asked Sep 28, 2012 at 7:02

3 votes

1 answer

4k views

understanding the encoding scheme in python 3

I got this error in my program which grab data from different website and write them to a file: 'charmap' codec can't encode characters in position 151618-151624: character maps to <undefined> ...

lamwaiman1988

1,483

asked Jul 26, 2012 at 9:56

3 votes

3 answers

22k views

How does it matter if a character is 8 bit or 16 bit or 32 bit

Well, I am reading Programing Windows with MFC, and I came across Unicode and ASCII code characters. I understood the point of using Unicode over ASCII, but what I do not get is how and why is it ...

vin

177

asked Jul 23, 2012 at 11:40

41 votes

3 answers

172k views

Why do we need to put N before strings in Microsoft SQL Server?

I'm learning T-SQL. From the examples I've seen, to insert text in a varchar() cell, I can write just the string to insert, but for nvarchar() cells, every example prefix the strings with the letter N....

qinking126

551

asked Jul 6, 2012 at 14:47

15 votes

2 answers

6k views

Efficient Trie implementation for unicode strings

I have been looking for an efficient String trie implementation. Mostly I have found code like this: Referential implementation in Java (per wikipedia) I dislike these implementations for mostly two ...

RokL

2,451

asked Jul 5, 2012 at 11:25

7 votes

1 answer

2k views

How in the earth CHRW produce unicode codes given that it only accept 65k possible input?

http://babelstone.blogspot.com/2005/11/how-many-unicode-characters-are-there.html says there are 1 million unicode characters and around 240k of which are already assigned. 1 million > 240k > 65k ...

user4951

739

asked May 23, 2012 at 4:59

4 votes

2 answers

217 views

Prerequisites for developing an application with Unicode support [closed]

What could be the necessary prerequisites to be taken when developing an application with Unicode support in the context of Web applications Desktop applications Embedded applications Prerequisites to ...

Ubermensch

1,349

asked Jan 17, 2012 at 14:33

5 votes

3 answers

731 views

What limitation will we face if each user-perceived character is assigned to one codepoint?

What limitations will we have if Unicode standards had decided to assign one and only one codepoint to every user-perceived character? Currently, Unicode has code-points that correspond to combining ...

Pacerier

5,063

asked Dec 8, 2011 at 23:10

16 votes

8 answers

3k views

What's the point of adding Unicode identifier support to various language implementations?

I personally find reading code full of Unicode identifiers confusing. In my opinion, it also prevents the code from being easily maintained. Not to mention all the effort required for authors of ...

Egor Tensin

509

asked Nov 13, 2011 at 17:02

Questions tagged [unicode]