Skip to content

Latest commit

 

History

History
118 lines (92 loc) · 5.05 KB

character_encoding.md

File metadata and controls

118 lines (92 loc) · 5.05 KB

📝 Character encoding

Platform-specific encodings

The character encoding on Unix is usually UTF-8.

However on Windows it can also be UTF-16 or one of the Windows code pages. Few non-Unicode character encodings are also popular in some countries. This can result in characters not being printed properly, especially high Unicode code points and emoji.

Specifying an encoding

The character encoding can be specified using an encoding option with most relevant Node.js core methods.

UTF-8 is always the default value. Exception: buffer (binary) is the default instead for:

Converting

To convert between character encodings string_encoder (decoding only), Buffer.transcode(), TextDecoder and TextEncoder can be used.

Node.js supports UTF-8, UTF-16 little endian and Latin-1.

TextDecoder and TextEncoder support UTF-8 and UTF-16 little/big endian by default. If Node.js is built with full internationalization support or provided with it at runtime, many more character encodings are supported by TextDecoder and TextEncoder. If doing so is inconvenient, iconv-lite or iconv can be used instead.

Terminal

When reading from a file or terminal, one should either:

Characters

While ASCII characters display correctly on all terminals, this is not the case of all characters. When building a terminal application or tool, it is common to experience cross-platform issues like:

The main reasons are:

  • The terminal font might not include this specific character.
  • The terminal encoding may not support Unicode. For example, Windows default terminal (Console Host) often use specific encodings like CP437, CP850 or Windows-1252.

This can be solved by using characters known to display correctly on most terminals and environments:

Summary

Keep the default encoding as UTF-8. File/terminal input should either be validated or converted to it (node-chardet).

Avoid printing Unicode characters (including emoji) except through projects like figures and log-symbols.


Next (📝 Newlines)
Previous (📝 File encoding)
Top