summaryrefslogtreecommitdiffstats
path: root/src/corelib/text/qunicodetools.cpp
Commit message (Collapse)AuthorAgeFilesLines
* QUnicodeTools: port getSentenceBreaks() to QStringIteratorMarc Mutz2025-10-301-20/+8
| | | | | | | | | | | | | | Like getWordBreaks(), this one is a bit more complicated than the first two, since there's a nested loop. Solve it by using a copy of the QStringIterator for look-ahead loop. To see that old and new version are equivalent, observe that qsizetype `i` and `lookahead` always pointed _onto_, while QStringIterator always points to just _after_ the last-consumed code-unit. Pick-to: 6.10 6.8 6.5 Change-Id: Id272b1a1597912eb611acb544b5ef0ac1d13a754 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: port getWordBreaks() to QStringIteratorMarc Mutz2025-10-301-20/+7
| | | | | | | | | | | | | | This one is a bit more complicated than the previous two, since there's a nested loop. Solve it by using a copy of the QStringIterator for look-ahead loop. To see that old and new version are equivalent, observe that qsizetype `i` and `lookahead` always pointed _onto_, while QStringIterator always points to just _after_ the last-consumed code-unit. Pick-to: 6.10 6.8 6.5 Change-Id: I391fafcf2418dac39b4aea3b3a3a675114233dff Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: port initScripts() to QStringIteratorMarc Mutz2025-10-281-10/+4
| | | | | | | | | | | | After fixing the roundabout way of updating 'eor' in a previous commit, this is now trivial (no nested loop). Amends c20422af13fb30751eaa58e8755c7a9a7fd20a50. Pick-to: 6.10 6.8 6.5 Change-Id: Idbbfc503f4bf3878d1fc57729224c99973bddc32 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* QUnicodeTools: don't look up surrogate line-break propertiesMarc Mutz2025-10-281-6/+12
| | | | | | | | | | | | | | We know they're SG, so don't go through the properties trie, hard-code the result. As a defense against changes, add checks to the generator and tst_QUnicodeTools. This is in preparation of porting getLineBreaks() to QStringIterator. Pick-to: 6.10 6.8 6.5 Change-Id: Ib3567398ba56f7ad3ce6fbca81f6b0f40379ee7d Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* QUnicodeTools: port some functions to QStringIteratorMarc Mutz2025-10-281-22/+9
| | | | | | | | | | | | | | | | (those which don't have nested loops due to the need for look-ahead). Use QStringIterator's new nextOrRawCodeUnit() to replace a Clang-21-Wcharacter-conversion-prone pattern of parsing a UTF-16 string. Amends cbfdec66033d14020d3e8a49bacc0d12d2b6798e (getGraphemeBreaks(); though that commit merely moved the code there from Harfbuzz) and 824180a12249e48c0e3280fec64940825ce0aa6e (getWhiteSpaces()). Pick-to: 6.10 6.8 6.5 Change-Id: I26b64fca6a26bb7ea4ab8ad14ba590213e949190 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: fix weird variable assignment in initScripts() loopMarc Mutz2025-10-271-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code updated the `eor` variable in the third field of the for loop, after the increment of the loop variable, `i`, to the then-value of `i`. The variable was initialized as zero. This is a very roundabout way of doing things, because, if you look at it from the right angle, `eor` will always have the value 'i' has when entering the loop body. Proof: - First round: i = 0, eor = 0. So i == eor. Check. - Next round: i = 1 + whatever value `i` had at the end of the previous iteration. eor := i, so i == eor. Check. So rewrite the code to create `eor` at the beginning of the loop body, with the then-value of 'i'. This allows marking it const, too, and scoping it correctly, drastically improving readability. The tighter scoping runs afoul of the assert(eor == string.size()) after the loop, which, however, is pointless, because it's true by construction: the loop has no break statement, so the only way it can be exited is by failing the loop condition. At that point, eor := i and i == string.size(), so eor == string.size(). Partially reverts 3df159ba174c1775a0e77d2305a639eeab1ea71d, but the loose scope of the variable was present even before that. Pick-to: 6.10 6.8 6.5 Change-Id: I983aef94caa8a3bc09ab378b8bb9bb4a18dabeb4 Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: improve variable allocation in getSentenceBreaks()Marc Mutz2025-10-271-4/+4
| | | | | | | | | | | | | | | Make 'pos' and 'prop' const, indicating that they're not modified by the lengthy loop body, and don't re-use 'ucs4' and 'prop' from the outer loop, make the inner loop have their own versions. This shadows the outer loop ones, but -Wshadow is not in effect in implementation files. I found it more important to avoid churn than to rename the variables to avoid the shadowing. Shadowing is well-defined in C++. These are in preparation of porting the function to QStringIterator. Pick-to: 6.10 6.8 6.5 Change-Id: Ia8b136c2cf4c8bc70d7444456adae93aecf6138b Reviewed-by: Ahmad Samir <a.samirh78@gmail.com>
* QUnicodeTools: improve variable allocation in getWordBreaks()Marc Mutz2025-10-271-4/+4
| | | | | | | | | | | | | | | Make 'pos' and 'prop' const, indicating that they're not modified by the lengthy loop body, and don't re-use 'ucs4' and 'prop' from the outer loop, make the inner loop have their own versions. This shadows the outer loop ones, but -Wshadow is not in effect in implementation files. I found it more important to avoid churn than to rename the variables to avoid the shadowing. Shadowing is well-defined in C++. These are in preparation of porting the function to QStringIterator. Pick-to: 6.10 6.8 6.5 Change-Id: I2b0c135276ccef403802dba8b780dcbf8c0ed519 Reviewed-by: Ahmad Samir <a.samirh78@gmail.com>
* QUnicodeTools: prefer lineBreakClass() convenience functionMarc Mutz2025-10-271-4/+2
| | | | | | | | | | | | | | | | ... where applicable Simplifies the code. In some cases, the code queries more than one property of the character, in which case we keep using qGetProp(). Amends 85899ff181984a1310cd1ad10cdb0824f1ca5118 and 1f73d4b87c153224b4eeee164269d0b313a11a8b. Pick-to: 6.10 6.8 6.5 Change-Id: I27cc0e5607b1e730f649c9d73f05f6b1227bdd17 Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* Fix off-by-one in QUnicodeTools::getWhiteSpaces()Marc Mutz2025-10-241-1/+2
| | | | | | | | | | | | | | | | | | There are no space characters in Unicode outside the BMP at the moment (QUnicodeTables::MaxSeparatorCodepoint == 0x3000 at this point), but if there were, the old code would flip the QCharAttributes::whiteSpace on the low-surrogate position, not the high one, as all other functions do. Fix by using the same pattern used by the other boundary-finding functions: save the index at the start of the loop, and use the saved value when indexing into attributes[]. Amends 824180a12249e48c0e3280fec64940825ce0aa6e. Pick-to: 6.10 6.8 6.5 Change-Id: I116a5e1da6c9df5e4237073481d71efbf956f27f Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Mark QUnicodeTools as security-criticalMarc Mutz2025-09-261-0/+1
| | | | | | | | | | | | The code implemenents the Unicode boundary finding algorithm, so it's security-critical with the same rationale as QStringMatcher. The header file contains only declarations, so is significant-only. Task-number: QTBUG-135195 Pick-to: 6.10 6.8 Change-Id: I5d99fc25d50b639f2bef3a93e6fa83be13208bda Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
* QUnicodeTools: collapse adjacent identical case statementsMarc Mutz2025-03-111-25/+0
| | | | | | | | | | They were left in to for easier reviewing when the old function pointer table was changed to this switch. In this second step, we can now collapse adjacent duplicates into one each. Pick-to: 6.9 6.8 6.5 Change-Id: I7b7fa8991817895a01c63251ab3b0ecc95b5756b Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QUnicodeTools: reduce unneeded relocationsMarc Mutz2025-03-101-70/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Turn the charAttributeFunction array of function pointers into a switch. This has two benefits: - The compiler now warns when we introduce a new QChar::Script value and statically forces us to think whether a new attribute handling function is needed. - A table of function pointers requires relocations. A switch might not. GCC uses a jump table to implement this switch, jumping to distinct lea instructions fetching distinct function pointer values, and thereby removes relocations, while Clang actually forms a function pointer table and turns the switch into an indexing operation (with compiler-generated guards). I didn't check whether Clang's table actually requires relocations, relinfo.pl doesn't report any reductions, but it's become unreliable over the years, because it doesn't for GCC, either. Difference: qunicodetools.cpp.o: - 0000000000000000 l O .data.rel.ro.local 0000000000000108 QUnicodeTools::Tailored::charAttributeFunction 0000000000000000 l d .data.rel.ro.local 0000000000000000 .data.rel.ro.local See https://stackoverflow.com/questions/19067010/finding-where-relocations-originate/19338343#19338343 for the script to generate this output. See https://www.akkadia.org/drepper/dsohowto.pdf Section 1.6 for why we care. Instead of collapsing identical return statements from adjacent case statements into one, keep the per-case return statements for now, to aid review, and clean up in a follow-up commit. Amends dd7d8304bbe599320b163b94e9a4ad9a6f35b740. Pick-to: 6.9 6.8 6.5 Task-number: QTBUG-100536 Change-Id: Ic5b6bd29e3a3a88f0d194fa7d76272a4770b9840 Reviewed-by: Marc Mutz <marc.mutz@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Update UCD to Unicode 16.0.0Mårten Nordheim2025-02-101-112/+385
| | | | | | | | | | | | | | | | | | | | | | | | They added some new scripts. There were a few changes to the line break algorithm, most notably there is more rules that require more context than before. While not major, there was some shuffling and additions to our implementation to match the new rules. IDNA test data now disallows the trailing dot/empty root label, technically to be toggled off by an option that controls a few things, but we don't have options. For test-data they changed the format a little - "" is used to mean empty string, while a blank segment is null/no string, update the parser to read this. [ChangeLog][Third-Party Code] Updated the Unicode Character Database to UCD revision 34/Unicode 16. Fixes: QTBUG-132902 Task-number: QTBUG-132851 Pick-to: 6.9 6.8 6.5 Change-Id: I4569703659f6fd0f20943110a03301c1cf8cc1ed Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Fix -Wimplicit-fallthrough for clangTim Blechmann2024-03-011-0/+1
| | | | | | | | | | | | | | | | | | | Clang's `-Wimplicit-fallthrough` warnings are a little stricter than gcc's interpretation: switch (i) { case 0: foo(); case 4: break; } While gcc accepts the implicit fallthrough, if the following statement is a trivial `break`, clang will warn about it. Pick-to: 6.7 Change-Id: I38e0817f1bc034fbb552aeac21de1516edcbcbb0 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Unicode line breaking: Implement rules LB15a and LB15bIevgenii Meshcheriakov2024-02-081-34/+93
| | | | | | | | | | | | | | | | | | | | | | | | The new rules were added in Unicode 15.1 (TR #14, revision 51). The rules read: LB15a: (sot | BK | CR | LF | NL | OP | QU | GL | SP | ZW) [\p{Pi}&QU] SP* × LB15b: × [\p{Pf}&QU] (SP | GL | WJ | CL | QU | CP | EX | IS | SY | BK | CR | LF | NL | ZW | eot) Add two new line breaking classes LineBreak_QU_Pi and _QU_Pf to represent quotation characters with context that matches left side of LB15a and right side of LB15b respectively. This way it is still possible to use the line breaking classes table. Also add a coment about the original source of the line break table. Task-number: QTBUG-121529 Change-Id: Ib35f400e39e76819cd1c3299691f7b040ea37178 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* QUnicodeTools: Use QVarLengthArray in Thai support codeIevgenii Meshcheriakov2023-01-171-41/+20
| | | | | | | | | | This replaces an ad-hoc solution. As a drive-by, remove a check that was always true. Pick-to: 6.5 Change-Id: I72166ee75a2c474dc91bc699c790f256b78b3b7a Reviewed-by: Marc Mutz <marc.mutz@qt.io>
* QUnicodeTools: Use a global static to manage libthai stateIevgenii Meshcheriakov2023-01-171-48/+85
| | | | | | | | | | | | | Move all libthai symbol resolution and state management into a single class. Create a single global static instance of this class. This allows freeing of the state on program exit. Task-number: QTBUG-105544 Pick-to: 6.5 Change-Id: I2610863f85f49f88e83f1fdaa200ea277c88c0ef Reviewed-by: Mikołaj Boc <Mikolaj.Boc@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Marc Mutz <marc.mutz@qt.io>
* QUnicodeTools: Use thread-safe libthai APIIevgenii Meshcheriakov2023-01-131-5/+21
| | | | | | | | | | | | | | | Use th_brk_new()/th_brk_find_breaks() instead of non-thread-safe th_brk(). The new API is available in libthai since version 0.1.25 released on 2016-06-28. [ChangeLog][QtCore] Correct line wrapping of Thai text now requires libthai version 0.1.25 or above. Fixes: QTBUG-105544 Pick-to: 6.5 Change-Id: I723050bef9f4e6445c946125c74c99e50addadef Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Marc Mutz <marc.mutz@qt.io>
* QUnicodeTools: mark a test-only flag constexpr/constinitMarc Mutz2022-08-121-3/+6
| | | | | | | | | | | | | | | | | | | For QT_BUILD_INTERNAL, mark the flag constinit, because tests may want to set it (which they better do before Qt spins up threads, because otherwise this non-atomic flag runs into UB (data races)). For non-QT_BUILD_INTERNAL, mark the flag constexpr, so dead code elimination can do its job. Inconsistently, of the two readers of the flag, one was ifdef'ed on QT_BUILD_INTERNAL, while the other wasn't. Settle on exposing both, which increases the compiler coverage of the code. Pick-to: 6.4 Task-number: QTBUG-100486 Task-number: QTBUG-100485 Change-Id: I6e041359b8214b40d80eefa92c26422aada3eb59 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QUnicodeTools: fix truncation in debug statementsMarc Mutz2022-08-111-5/+5
| | | | | | | | | | | Instead of casting to int, cast to qlonglong, which is guaranteed to be able to hold all qsizetype values. Task-number: QTBUG-103531 Pick-to: 6.4 6.3 6.2 Change-Id: I3e89892defd091fa6ef305b8ed5c3819a2cc13da Reviewed-by: Sona Kurazyan <sona.kurazyan@qt.io> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* QUnicodeTools: fix data race in initialization of libthai symbolsMarc Mutz2022-08-111-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The facilities of qunicodetools.cpp are not limited to the GUI thread, so initialization must be thread-safe. The old code wasn't, though, and contained several data races - non-atomic initialized was read while another thead may write it - th_brk and th_next_cell were read while another thead may write them Fix by using Double-Checked Locking. This also prepares the code for an eventual port to th_brk_find_breaks() (th_brk is deprecated). The function pointers don't need to be atomic, because all reads from them are guaranteed to happen-after the writes to them (as long as all users call init_libthai() and don't proceeed if it returns false; this could be ensured by returning a struct with the function pointers from init_libthai() instead of maintaining them as statically-visible globals, but that's outsize the scope of this patch). As a drive-by, remove a pointless static_cast<int>(~~int expression~~). Fixes: QTBUG-105543 Pick-to: 6.4 6.3 6.2 Change-Id: I492acd7e9a257e5c4b91f576e9bc448b6bb96ad1 Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io> Reviewed-by: Lars Knoll <lars.knoll@gmail.com> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* QUnicodeTools: fix types used around th_brk()Marc Mutz2022-08-111-2/+2
| | | | | | | | | | | | | | | | Libthai's th_brk() takes the breakpoints array lengths as size_t, so use that. This still doesn't fix thaiAssignAttributes() for ≥ 2 Gi characters, because th_brk returns break positions in an array of int, thus limiting any results to the first INT_MAX characters. Created QTBUG-105541 to track this. Task-number: QTBUG-103531 Pick-to: 6.4 6.3 6.2 Change-Id: Iba468cc9389f4533401bc18dd326c4ca7e85a5da Reviewed-by: Lars Knoll <lars.knoll@gmail.com> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* QUnicodeTools: fix types used around th_next_cellMarc Mutz2022-08-111-3/+4
| | | | | | | | | | | | | | | | Libthai's th_next_cell takes and returns lengths as size_t. - pass size_t, not qsizetype (the value can never be negative) - receive size_t, don't cast to uint As a drive-by, scope variables tighter. Task-number: QTBUG-103531 Pick-to: 6.4 6.3 6.2 Change-Id: Ib1eeb1f0e8974ee8b0f88d080d06136b307c324f Reviewed-by: Lars Knoll <lars.knoll@gmail.com> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* QUnicodeTools: Fix line breaking before open parenthesesIevgenii Meshcheriakov2022-05-241-4/+18
| | | | | | | | | | | | | | UAX #14, revision 45 (Unicode 13) has changed rule LB30 to only trigger if the open parentheses is non-wide: (AL | HL | NU) × [OP-[\p{ea=F}\p{ea=W}\p{ea=H}]] This fixes the remaining 24 line break tests. Task-number: QTBUG-97537 Pick-to: 6.3 Change-Id: I9870588c04bf0f6ae0a98289739bef8490f67f69 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix line breaking for potential emojisIevgenii Meshcheriakov2022-05-241-0/+11
| | | | | | | | | | | | | | Implement part of LB30b introduced by UAX #14, revision 47 (Unicode 14.0.0): [\p{Extended_Pictographic}&\p{Cn}] × EM This fixes one line breaking test. Task-number: QTBUG-97537 Pick-to: 6.3 Change-Id: I3fd2372a057b7391d8846e9c146f69a54686ea61 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix interactions of WB3d and WB4 rulesIevgenii Meshcheriakov2022-05-241-2/+7
| | | | | | | | | | | Word breaking rule WB3d should not be affected by WB4. This fixes the remaining word break test. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I99aee831d7c54fafcd2a9d526a3e078b12c5bfad Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Handle WB3c word break ruleIevgenii Meshcheriakov2022-05-241-2/+12
| | | | | | | | | | | | | | | Adjust handling of WB3c rule to UAX #29, revision 33 (Unicode 11.0.0). The rule reads: ZWJ × \p{Extended_Pictographic} This fixes 9 word break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I818d4048828e6663d5c090aa372d83f5099fdffe Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Unicode: Remove obsolete word break classesIevgenii Meshcheriakov2022-05-241-24/+20
| | | | | | | | | | Remove E_Base, Glue_After_Zwj, E_Base_GAZ, and E_Modifier obsoleted by UTS #29, version 33 (Unicode 11.0.0). Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: If5dc36ae17cd8746bbe81b73bbcc0863181e4a7a Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Adjust properties of WSegSpace word break classIevgenii Meshcheriakov2022-05-241-23/+23
| | | | | | | | | | | | | | | | | | Disable break between sequences of WSegSpace characters (rule WB3d, introduced in UAX #29, version 33, Unicode 11.0.0). Also disable breaks between WSegSpace and (Extend | Format | ZWJ) due to rule WB4. Adjust "words4" test to take the above changes into account (space character belongs to WSegSpace). Mention the full class name in a comment inside the word break table. This fixes 34 word break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I7dfe8367e45c86913bb7d7fe2adb053711978487 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix handling of LB22 line break ruleIevgenii Meshcheriakov2022-05-241-11/+11
| | | | | | | | | | | | | | This rule was simplified in version UTS #14 version 45 (Unicode 13.0.0) to read: × IN Re-enabled 28 fixed line break tests. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I1c5565a8c1633428c22379917215d4e424ff0055 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Remove ZWJ data from the line break tableIevgenii Meshcheriakov2022-05-241-38/+37
| | | | | | | | | | | ZWJ is handled separately by rule LB8a. The code for rule LB10 was adjusted to handle ZWJ as AL as required by the specification. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I814cbb4a26f2994296767cca0443d8a1a1aaf739 Reviewed-by: Øystein Heskestad <oystein.heskestad@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* QUnicodeTools: Fix handling of ZWJ for line breaksIevgenii Meshcheriakov2022-05-241-3/+2
| | | | | | | | | | | | | | | Adjust implementation of rule LB8a of UAX #14. The rule was changed in version 41 (corresponding to Unicode 11.0.0): ZWJ × (ID | EB | EM) ⇒ ZWJ × Fixing this rule fixes 9 line break tests. Those are re-enabled. Task-number: QTBUG-97537 Pick-to: 6.2 6.3 Change-Id: I1570719590a46ae28c98ed7d5053e72b12915db7 Reviewed-by: Øystein Heskestad <oystein.heskestad@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Use SPDX license identifiersLucie Gérard2022-05-161-38/+2
| | | | | | | | | | | | | Replace the current license disclaimer in files by a SPDX-License-Identifier. Files that have to be modified by hand are modified. License files are organized under LICENSES directory. Task-number: QTBUG-67283 Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1 Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
* Fix unused variables in qtbaseAndrei Golubev2022-04-251-8/+0
| | | | | | | | | | | | | | | | | | | | | | | clang compiler recently got smarter and detects "pseudo used variable" patterns where we declare a variable and only use it in self-increments, self-decrements and other similar expressions Errors: qtbase/src/corelib/text/qlocale.cpp:3898:9: error: variable 'group_cnt' set but not used [-Werror,-Wunused-but-set-variable] int group_cnt = 0; // counts number of group chars ^ qtbase/src/corelib/text/qunicodetools.cpp:1372:21: error: variable 'uc' set but not used [-Werror,-Wunused-but-set-variable] const char16_t *uc = text + from; ^ and more of the kind Remove the ones that have no usage, mark others with [[maybe_unused]] Pick-to: 6.3 6.2 Change-Id: Ib2d0722110e3da8c39e29ec78c0ec290d064c970 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Apply Q_CONSTINIT across the codebaseMarc Mutz2022-03-291-3/+3
| | | | | | | | | Still not complete. Just grepping for static and thread_local. Task-number: QTBUG-100486 Change-Id: I90ca14e8db3a95590ecde5f89924cf6fcc9755a3 Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* QtCore: replace QLatin1String/QLatin1Char with _L1/u'' where applicableSona Kurazyan2022-03-251-2/+4
| | | | | | | | | | | As a drive-by, did also minor refactorings/improvements. Task-number: QTBUG-98434 Change-Id: I81964176ae2f07ea63674c96f47f9c6aa046854f Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Anton Kudryavtsev <antkudr@mail.ru>
* Restore C++20-deprecated mixed-enum bitwise operatorsMarc Mutz2022-03-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | C++20 deprecated arithmetic on enum types. For enums used on QFlags<>, these operators have always been user-defined, but when the two enums are of different type, such as QFrame::Shape and QFrame::Shadow, the deprecation warning pops up. We have in the past fixed these in our headers by manual casts, but that doesn't help our users when our API requires them to OR together enums of different type. Until we can rework these APIs to use a variadic QFlags type, we need to fix it in an SC and BC way, which is what this patch sets out to do. The idea is simply to mark pairs of enums that are designed to be ORed together and replace the deprecated built-in bitwise operators with user-defined ones in C++20. To ensure SC and BC, we pass an explicit result type and use that to check, in C++17 builds, that it matches the decltype of the result of the built-in operator. This patch is the first in a series of similar patches. It introduces said markup macro and applies it to all enum pairs that create warnings on (my) Linux GCC 11.3 and Clang 10.0.0 builds. It is expected that more such markups are needed, for other modules, and for symmetry. Even with this patch, there is one mixed-enum warning left, in qxcbwindow.cpp. This appears to be a genuine bug (cf. QTBUG-101306), so this patch doesn't mark the enums involved in it as designed to be used together. This patch also unearthed that QT_TYPESAFE_FLAGS, possibly unsurprisingly so, breaks several mixed bitwise flags-enum operations (QTBUG-101344). Pick-to: 6.3 6.2 5.15 Task-number: QTBUG-99948 Change-Id: I86ec11c1e4d31dfa81e2c3aad031b2aa113503eb Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* Add additional grapheme, word, and sentence break class tests from tr29Øystein Heskestad2021-11-101-4/+0
| | | | | | | | | | | Stop turning THAI CHARACTER SARA AM into a grapheme boundary because it breaks a test and chromium does not consider it to be a separate grapheme. Fixes: QTBUG-88545 Change-Id: Ib1aea8dbb66ac42b2129cf9fe04c39f5f76eeb36 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* corelib: Fix typos in source code commentsJonas Kvinge2021-10-121-1/+1
| | | | | | Pick-to: 6.2 Change-Id: Ic78afb67143112468c6f84677ac88f27a74b53aa Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Unicode: fix the grapheme clustering algorithmGiuseppe D'Angelo2021-08-241-15/+27
| | | | | | | | | | | | | | | | | | | An oversight in the code kept the algorithm in the GB11 state, even if the codepoint that is being processed wouldn't allow for that (for instance a sequence of ExtPic, Ext and Any). Refactor the code of GB11/GB12/GB13 to deal with code points that break the sequences (falling back to "normal" handling). Add some manual tests; interestingly enough, the failing cases are not covered by Unicode's tests, as we now pass the entire test suite. Amends a794c5e287381bd056008b20ae55f9b1e0acf138. Fixes: QTBUG-94951 Pick-to: 6.1 5.15 Change-Id: If987d5ccf7c6b13de36d049b1b3d88a3c4b6dd00 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Fix build without features.libraryTasuku Suzuki2021-05-191-0/+4
| | | | | | Change-Id: I53eaaea149324d2495e794ba8bd58544e648e48e Reviewed-by: Janne Koskinen <janne.p.koskinen@qt.io> Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
* Unicodetools: compileGiuseppe D'Angelo2021-04-191-0/+2
| | | | | | | | | | | | Add an #include for a header that was only accidentally included transitively. Pick-to: 5.15 6.0 6.1 Task-number: QTBUG-92822 Change-Id: Ie29bb0e065f2db712e9cf9539b15124ff0ced349 Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com> Reviewed-by: Andreas Buhr <andreas.buhr@qt.io> Reviewed-by: Shawn Rutledge <shawn.rutledge@qt.io>
* Unicode: fix the extended grapheme cluster algorithmGiuseppe D'Angelo2021-04-161-58/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UAX #29 in Unicode 11 changed the EGC algorithm to its current form. Although Qt has upgraded the Unicode tables all the way up to Unicode 13, the algorithm has never been adapted; in other words, it has been working by chance for years. Luckily, MOST of the cases were dealt with correctly, but emoji handling actually manages to break it. This commit: * Adds parsing of emoji-data.txt into the unicode table generator. That is necessary to extract the Extended_Pictographic property, which is used by the EGC algorithm. * Regenerates the tables. * Removes some obsoleted grapheme cluster break properties, and adds the ones added in the meanwhile. * Rewrites the EGC algorithm according to Unicode 13. This is done by simplifying a lot the lookup table. Some rules (GB11, GB12, GB13) can't be done by the table alone so some hand-rolled code is necessary in that case. * Thanks to these fixes, the complete upstream GraphemeBreakTest now passes. Remove the "edited" version that ignored some rows (because they were failing). Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b Pick-to: 6.1 6.0 5.15 Fixes: QTBUG-92822 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
* Canonical pointer usageHou Lei2021-02-091-3/+3
| | | | | | | | Other affected rows have also been fixed. Change-Id: Ie0a32f724bd2e40e7bfacfaa43a78190b58e4a21 Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io> Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
* Specification of pointer usageHou Lei2021-01-291-7/+8
| | | | | | | Avoid C-style casts when possible. Change-Id: I8e86eb8c439b456da41d52a5666190330edeeda2 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Clean up QTextBoundaryFinder and qunicodetoolsLars Knoll2020-09-071-85/+85
| | | | | | | | | | | Make QTBF ready for Qt6 by using qsizetype in the API and use QStringView where it makes sense. Change the exported API of qunicodetools to use QStringView as well and use char16_t internally. Change-Id: I853537bcabf40546a8e60fdf2ee7d751bc371761 Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
* Another round of 0->nullptr cleanupAllan Sandfeld Jensen2020-07-311-3/+3
| | | | | Change-Id: Ic8db7dc252f8fea46eb5a4f334726d6c7f4645a6 Reviewed-by: Sona Kurazyan <sona.kurazyan@qt.io>
* Silence some warnings about fallthroughFriedemann Kleint2020-07-091-2/+2
| | | | | | | | src/corelib/text/qunicodetools.cpp:1243:13: warning: this statement may fall through [-Wimplicit-fallthrough=] src/corelib/text/qunicodetools.cpp:1247:55: warning: this statement may fall through [-Wimplicit-fallthrough=] Change-Id: I441000db46cb6d85a5dcd0534ea2168b39a3f3bd Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* QUnicodeTables: port to charNN_tMarc Mutz2020-04-271-5/+5
| | | | | | | | | | | This makes existing calls passing uint or ushort ambiguous, so fix all the callers. There do not appear to be callers outside QtBase. In fact, the ...BreakClass() functions appear to be utterly unused. Change-Id: I1c2251920beba48d4909650bc1d501375c6a3ecf Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>