This makes sure the DEL character (ASCII 127) is detected as a zero
width character even if Node.js is not built with ICU.
Signed-off-by: Ruben Bridgewater <ruben@bridgewater.de>
PR-URL: https://github.com/nodejs/node/pull/33650
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
The assumption here is that decomposed characters render like their
composed character equivalents, and that working with the former
comes with a risk of over-estimating string widths given that
we compute them on a per-code-point basis. The regression test
added here (한글 vs 한글) is an example of that happening.
PR-URL: https://github.com/nodejs/node/pull/33052
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Anto Aravinth <anto.aravinth.cse@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/32542
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Rich Trott <rtrott@gmail.com>
The array grouping function relies on the width of the characters.
It was not calculated correct so far, since it used the string
length instead.
This improves the unicode output by calculating the mono-spaced
font width (other fonts might differ).
PR-URL: https://github.com/nodejs/node/pull/31319
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Minwoo Jung <nodecorelab@gmail.com>
The option is now set to true by default. Most terminals do not have
full emoji support and visualize emojis with zero width joiners as
individual emojis.
Also verify that at least one argument is always passed through to the
function and remove support for passing through code points. Only
accept strings from now on to simplify the API.
PR-URL: https://github.com/nodejs/node/pull/31112
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
A lot of strings that are going to be passed to `getStringWidth()`
are ASCII strings, for which the calculation is rather easy and
calling into C++ can be skipped.
confidence improvement accuracy (*) (**) (***)
misc/getstringwidth.js n=100000 type='ascii' *** 328.99 % ±21.73% ±29.25% ±38.77%
misc/getstringwidth.js n=100000 type='emojiseq' 2.94 % ±7.66% ±10.19% ±13.26%
misc/getstringwidth.js n=100000 type='fullwidth' 4.70 % ±5.64% ±7.50% ±9.76%
PR-URL: https://github.com/nodejs/node/pull/29301
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Minwoo Jung <minwoo@nodesource.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
ESLint 4.x has stricter linting than previous versions. We are currently
using the legacy indentation rules in the test directory. This commit
changes the indentation of files to comply with the stricter 4.x linting
and enable stricter linting in the test directory.
PR-URL: https://github.com/nodejs/node/pull/14431
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
- Categorize all nonspacing marks (Mn) and enclosing marks (Me) as
0-width
- Categorize all spacing marks (Mc) as non-0-width.
- Treat soft hyphens (a format character Cf) as non-0-width.
- Do not treat all unassigned code points as 0-width; instead, let ICU
select the default for that character per UAX #11.
- Avoid getting the General_Category of a character multiple times as it
is an intensive operation.
Refs: http://unicode.org/reports/tr11/
PR-URL: https://github.com/nodejs/node/pull/13918
Reviewed-By: James M Snell <jasnell@gmail.com>
We should use `common.hasIntl` in tests for test cases which are
related to ICU.
This way we can easily find the test cases that are Intl dependent.
Plus, it will be able to make the tests a little faster if we check
hasIntl first.
Also, this tweaks the log messages to unify the message.
Refs: https://github.com/nodejs/node/pull/10707
PR-URL: https://github.com/nodejs/node/pull/10841
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Rather than the pseudo-wcwidth impl used currently, use the ICU
character properties database to calculate string width and
determine if a character is full width or not. This allows the
algorithm to correctly identify emoji's as full width, ensures
the algorithm will continue to fucntion properly as new unicode
codepoints are added, and it's faster.
This was originally part of a proposal to add a new unicode module,
but has been split out.
Refs: https://github.com/nodejs/node/pull/8075
PR-URL: https://github.com/nodejs/node/pull/9040
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>