node/doc/api/string_decoder.md
Alfredo González 51eb4c0cda
doc: add esm examples to node:string_decoder
PR-URL: https://github.com/nodejs/node/pull/55507
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Chemi Atlow <chemi@atlow.co.il>
2024-10-26 20:36:25 +00:00

123 lines
3.6 KiB
Markdown

# String decoder
<!--introduced_in=v0.10.0-->
> Stability: 2 - Stable
<!-- source_link=lib/string_decoder.js -->
The `node:string_decoder` module provides an API for decoding `Buffer` objects
into strings in a manner that preserves encoded multi-byte UTF-8 and UTF-16
characters. It can be accessed using:
```mjs
import { StringDecoder } from 'node:string_decoder';
```
```cjs
const { StringDecoder } = require('node:string_decoder');
```
The following example shows the basic use of the `StringDecoder` class.
```mjs
import { StringDecoder } from 'node:string_decoder';
import { Buffer } from 'node:buffer';
const decoder = new StringDecoder('utf8');
const cent = Buffer.from([0xC2, 0xA2]);
console.log(decoder.write(cent)); // Prints: ¢
const euro = Buffer.from([0xE2, 0x82, 0xAC]);
console.log(decoder.write(euro)); // Prints: €
```
```cjs
const { StringDecoder } = require('node:string_decoder');
const decoder = new StringDecoder('utf8');
const cent = Buffer.from([0xC2, 0xA2]);
console.log(decoder.write(cent)); // Prints: ¢
const euro = Buffer.from([0xE2, 0x82, 0xAC]);
console.log(decoder.write(euro)); // Prints: €
```
When a `Buffer` instance is written to the `StringDecoder` instance, an
internal buffer is used to ensure that the decoded string does not contain
any incomplete multibyte characters. These are held in the buffer until the
next call to `stringDecoder.write()` or until `stringDecoder.end()` is called.
In the following example, the three UTF-8 encoded bytes of the European Euro
symbol (`€`) are written over three separate operations:
```mjs
import { StringDecoder } from 'node:string_decoder';
import { Buffer } from 'node:buffer';
const decoder = new StringDecoder('utf8');
decoder.write(Buffer.from([0xE2]));
decoder.write(Buffer.from([0x82]));
console.log(decoder.end(Buffer.from([0xAC]))); // Prints: €
```
```cjs
const { StringDecoder } = require('node:string_decoder');
const decoder = new StringDecoder('utf8');
decoder.write(Buffer.from([0xE2]));
decoder.write(Buffer.from([0x82]));
console.log(decoder.end(Buffer.from([0xAC]))); // Prints: €
```
## Class: `StringDecoder`
### `new StringDecoder([encoding])`
<!-- YAML
added: v0.1.99
-->
* `encoding` {string} The character [encoding][] the `StringDecoder` will use.
**Default:** `'utf8'`.
Creates a new `StringDecoder` instance.
### `stringDecoder.end([buffer])`
<!-- YAML
added: v0.9.3
-->
* `buffer` {string|Buffer|TypedArray|DataView} The bytes to decode.
* Returns: {string}
Returns any remaining input stored in the internal buffer as a string. Bytes
representing incomplete UTF-8 and UTF-16 characters will be replaced with
substitution characters appropriate for the character encoding.
If the `buffer` argument is provided, one final call to `stringDecoder.write()`
is performed before returning the remaining input.
After `end()` is called, the `stringDecoder` object can be reused for new input.
### `stringDecoder.write(buffer)`
<!-- YAML
added: v0.1.99
changes:
- version: v8.0.0
pr-url: https://github.com/nodejs/node/pull/9618
description: Each invalid character is now replaced by a single replacement
character instead of one for each individual byte.
-->
* `buffer` {string|Buffer|TypedArray|DataView} The bytes to decode.
* Returns: {string}
Returns a decoded string, ensuring that any incomplete multibyte characters at
the end of the `Buffer`, or `TypedArray`, or `DataView` are omitted from the
returned string and stored in an internal buffer for the next call to
`stringDecoder.write()` or `stringDecoder.end()`.
[encoding]: buffer.md#buffers-and-character-encodings