PureTools

Unicode Escape Sequences: Handle Special Characters in Code

PureTools Team· 6 min read
Unicode Escape Sequences: Handle Special Characters in Code

What Are Unicode Escape Sequences?

A Unicode escape sequence represents a character by its Unicode code point rather than the character itself. This is essential when you need to include characters that can't be typed directly, aren't supported by your file encoding, or would break string parsing.

Escape Formats by Language

LanguageFormatExample (heart)
JavaScript\uXXXX or \u{XXXXX}\u2764 or \u{2764}
Python\uXXXX or \UXXXXXXXX\u2764
Java\uXXXX\u2764
C#\uXXXX or \UXXXXXXXX\u2764
HTML&#xXXXX; or &#DDDD;❤
CSS\XXXX\2764

Common Use Cases

  • Emoji in code: const heart = '\u2764'; ensures the emoji works regardless of file encoding
  • Non-Latin scripts: Including Arabic, Chinese, or Cyrillic in ASCII-only source files
  • Invisible characters: Zero-width joiner (\u200D), non-breaking space (\u00A0)
  • Special symbols: Copyright (\u00A9), trademark (\u2122), degree (\u00B0)

JavaScript Examples

// Basic Multilingual Plane (BMP) - 4 hex digits
const copyright = '\u00A9';     // copyright
const euro = '\u20AC';          // euro sign

// Supplementary planes - use curly braces (ES6+)
const rocket = '\u{1F680}';     // rocket emoji
const flag = '\u{1F1E7}\u{1F1F7}'; // Brazil flag

// String.fromCodePoint for dynamic conversion
String.fromCodePoint(0x2764);   // heart

Surrogate Pairs

Characters outside the BMP (code points above U+FFFF) require surrogate pairs in languages that use UTF-16 internally (JavaScript, Java):

// Rocket emoji U+1F680
// High surrogate: 0xD83D
// Low surrogate: 0xDE80
const rocket = '\uD83D\uDE80'; // same as '\u{1F680}'

Convert any text to Unicode escape sequences and back with the PureTools Unicode Escape Tool. Supports JavaScript, Python, HTML entity, and CSS formats.