The character encoding on Unix is usually UTF-8.
However on Windows it can also be UTF-16 or one of the Windows code pages. Few non-Unicode character encodings are also popular in some countries. This can result in characters not being printed properly, especially high Unicode code points and emoji.
The character encoding can be specified using an encoding
option with most
relevant Node.js core methods.
UTF-8 is always the default value.
Exception: buffer
(binary) is the default instead for:
fs.readFile()
- readable streams
(including
fs.createReadStream()
). - most
crypto
methods.
To convert between character encodings
string_encoder
(decoding only),
Buffer.transcode()
,
TextDecoder
and
TextEncoder
can be used.
Node.js supports UTF-8, UTF-16 little endian and Latin-1.
TextDecoder
and
TextEncoder
support UTF-8 and
UTF-16 little/big endian by default. If
Node.js is built with
full internationalization support
or provided with it at runtime,
many more character encodings
are supported by
TextDecoder
and
TextEncoder
.
If doing so is inconvenient,
iconv-lite or
iconv can be used instead.
When reading from a file or terminal, one should either:
- validate and/or document that the input should be in UTF-8.
- detect the character encoding using
node-chardet
orjschardet
and convert to UTF-8.
While ASCII characters display correctly on all terminals, this is not the case of all characters. When building a terminal application or tool, it is common to experience cross-platform issues like:
The main reasons are:
- The terminal font might not include this specific character.
- The terminal encoding may not support Unicode. For example, Windows default terminal (Console Host) often use specific encodings like CP437, CP850 or Windows-1252.
This can be solved by using characters known to display correctly on most terminals and environments:
- cross-platform-terminal-characters is a list of all of those characters.
- figures and log-symbols can be used to print common symbols consistently across platforms.
Keep the default encoding as UTF-8
.
File/terminal input should either be validated or converted to it
(node-chardet
).
Avoid printing Unicode characters (including emoji) except through projects like figures and log-symbols.