Unicode Reference Table
| Char | U+ | \u | HTML | UTF-8 | Description |
|---|---|---|---|---|---|
| A | U+0041 | \u0041 | A | 41 | Latin capital A |
| é | U+00E9 | \u00E9 | é | C3 A9 | Latin e with acute |
| ă | U+0103 | \u0103 | ă | C4 83 | Latin a with breve (Vietnamese) |
| 世 | U+4E16 | \u4E16 | 世 | E4 B8 96 | CJK character (world) |
| ❤ | U+2764 | \u2764 | ❤ | E2 9D A4 | Heavy black heart |
| 😀 | U+1F600 | \u{1F600} | 😀 | F0 9F 98 80 | Grinning face emoji |
| U+00A0 | \u00A0 |   | C2 A0 | Non-breaking space | |
| | U+200B | \u200B | ​ | E2 80 8B | Zero-width space |
Code Snippets
Common Unicode Ranges
ASCII letters, digits, punctuation (1 byte in UTF-8)
Accented letters, Vietnamese, Greek, Cyrillic (2 bytes)
Chinese, Japanese, Korean, most symbols (3 bytes)
Emoji, historic scripts, rare symbols (4 bytes)
What Is Unicode?
Unicode is the universal character encoding standard that assigns a unique number (code point) to every character in every writing system. From basic Latin letters to emoji, CJK ideographs, Vietnamese diacritics, and mathematical symbols — Unicode covers them all. Version 15.1 includes over 149,000 characters from 161 scripts.
When working with international data, you frequently need to convert between plain text and encoded representations like \uXXXX (JavaScript), U+XXXX (standard notation), &#xHHHH; (HTML entities), or raw UTF-8 hex bytes.
Encoding Formats Explained
1. JavaScript Escape (\uXXXX)
Used in JavaScript source code and JSON strings. Each character in the Basic Multilingual Plane (BMP, code points 0–FFFF) is represented as \uXXXX with exactly 4 hex digits. Characters outside the BMP (like emoji) use the extended \u{XXXXX} syntax with 5-6 hex digits, introduced in ES6.
2. U+XXXX (Standard Notation)
This is the canonical way to reference a Unicode code point. It appears universally in technical documentation, character charts, and Unicode discussions. It is not a programming syntax but a notational convention used by the Unicode Consortium.
3. HTML Entities (&#xHHHH;)
Used in HTML and XML to embed special characters without changing the file encoding. Both hex &#xHHHH; and decimal &#DDDD; forms are valid. Especially useful for multilingual web content and ensuring characters render correctly regardless of the document charset declaration.
4. UTF-8 Hex Bytes
Shows how characters are actually stored in memory and files. UTF-8 uses 1–4 bytes per character: ASCII uses 1 byte, accented Latin and Vietnamese use 2–3 bytes, CJK uses 3 bytes, and emoji use 4 bytes. Understanding UTF-8 byte sequences is essential for debugging encoding issues like mojibake (garbled characters).
When Do You Need Unicode Conversion?
- Embedding special characters in JSON strings without breaking the parser.
- Decoding obfuscated escape sequences from API responses or log files.
- Creating HTML entities for safe rendering on web pages across all browsers.
- Debugging encoding problems by inspecting the raw UTF-8 bytes of each character.
- Looking up the code point for any character including emoji, CJK, and accented letters.
- Preparing internationalized text with proper escape sequences for multilingual applications.
Unicode Tips for Developers
- Always use
[...str].lengthinstead ofstr.lengthin JavaScript to count characters correctly (surrogate pairs count as 2 in.length). - Use
String.fromCodePoint()instead ofString.fromCharCode()for characters outside the BMP. - In Python 3, all strings are Unicode natively — use
.encode()to convert to bytes when needed. - When working with databases, ensure the collation supports full Unicode (e.g.,
utf8mb4in MySQL, not justutf8). - The HTTP header
Content-Type: text/html; charset=utf-8tells browsers how to decode the response correctly.
Frequently Asked Questions
More Text Tools
Hidden Character Detector — Find Invisible Unicode Chars in Text
Detect and remove invisible Unicode characters in text: zero-width spaces (ZWSP), joiners (ZWNJ/ZWJ), byte order marks (BOM), RTL/LTR overrides, soft hyphens, non-breaking spaces, and 20+ hidden character types. Each type is color-highlighted with a label, codepoint, and risk level. Free, instant, private — no data leaves your browser.
Case Converter — UPPERCASE lowercase Title camelCase snake_case kebab-case
Convert text between 11 case formats: UPPERCASE, lowercase, Title Case, camelCase, PascalCase, snake_case, kebab-case, and more. Instant copy, free, no signup.
Find and Replace Text Online — Regex Search Replace Tool
Find and replace text online with regex support, case-sensitive matching, whole-word search, and highlighted matches. See match count and replace all or one at a time. Free browser-based tool for writers, developers, and data analysts.
HTML Entity Encoder and Decoder — Escape HTML Characters Online
Encode special characters to HTML entities or decode entities back to text. Prevent XSS attacks and display code safely. Free, instant, browser-based.
Base64 Encoder and Decoder — Encode & Decode Text Online
Encode text to Base64 or decode Base64 strings back to text instantly. Supports full UTF-8 for international characters and emoji. Free, private, runs entirely in your browser.
About Text Tools
Text tools handle the daily grind of working with strings, paragraphs, and documents: counting words, reversing characters, transforming case, generating slugs, splitting long text, previewing Markdown. These replace separate desktop apps and complex CLI commands with a single URL you can bookmark and use without setup.
Why it matters
Writers, editors, and content teams work with text constraints everywhere — Twitter's 280-char limit, LinkedIn's 1,300-char optimal post, academic abstracts of 250 words, SEO meta descriptions capped at 155. A word counter that shows characters (with and without spaces), words, sentences, paragraphs, and reading time lets you hit platform specs without switching between tools.
Privacy and safety
Text tools process input entirely in your browser. Your blog draft, legal contract, or confidential email never leaves your device. Even the word counter doesn't transmit your text — it runs a simple counting function locally, which is actually all that's needed. If a text tool claims to 'process' your text on their server, the scope for data leakage is enormous and almost never justified.
Best practices
- For SEO titles, aim for 50-60 characters including spaces (Google truncates longer titles)
- Meta descriptions work best at 150-155 characters — Google has been showing ~160 on desktop, ~120 on mobile
- When generating slugs, keep them short (3-5 words), all lowercase, hyphens-not-underscores, avoid stop words
- Markdown preview is useful BEFORE publishing to verify headings, links, and lists render correctly on the target platform