Unicode Codec

Unicode Encoder / Decoder

Convert text to \uXXXX, U+XXXX, HTML entities, UTF-8 hex and back. Instant, free, private.

4 FormatsInstantFreeAll Scripts
Format
Input Text
Encoded Output
Output will appear here...

Unicode Reference Table

CharU+\uHTMLUTF-8Description
AU+0041\u0041A41Latin capital A
éU+00E9\u00E9éC3 A9Latin e with acute
ăU+0103\u0103ăC4 83Latin a with breve (Vietnamese)
U+4E16\u4E16世E4 B8 96CJK character (world)
U+2764\u2764❤E2 9D A4Heavy black heart
😀U+1F600\u{1F600}😀F0 9F 98 80Grinning face emoji
 U+00A0\u00A0 C2 A0Non-breaking space
U+200B\u200B​E2 80 8BZero-width space

Code Snippets

JavaScriptEncode to \uXXXX
const encode = (str) => [...str] .map(c => { const cp = c.codePointAt(0); return cp > 0xFFFF ? `\\u{${cp.toString(16)}}` : `\\u${cp.toString(16).padStart(4,'0')}`; }).join('');
PythonEncode / Decode
# Encode "Hello".encode('unicode_escape') # b'\\u0048\\u0065\\u006c\\u006c\\u006f' # Decode b'\\u0048ello'.decode('unicode_escape')
HTMLHTML Entities
<!-- Named entity --> &amp;hearts; &rarr; ♥ <!-- Numeric hex entity --> &amp;#x2764; &rarr; ❤ <!-- Numeric decimal --> &amp;#10084; &rarr; ❤

Common Unicode Ranges

\u0000–\u007F
Basic Latin

ASCII letters, digits, punctuation (1 byte in UTF-8)

\u0080–\u07FF
Extended Latin / Greek / Cyrillic

Accented letters, Vietnamese, Greek, Cyrillic (2 bytes)

\u0800–\uFFFF
CJK / BMP Symbols

Chinese, Japanese, Korean, most symbols (3 bytes)

\u{10000}+
Supplementary Planes

Emoji, historic scripts, rare symbols (4 bytes)

What Is Unicode?

Unicode is the universal character encoding standard that assigns a unique number (code point) to every character in every writing system. From basic Latin letters to emoji, CJK ideographs, Vietnamese diacritics, and mathematical symbols — Unicode covers them all. Version 15.1 includes over 149,000 characters from 161 scripts.

When working with international data, you frequently need to convert between plain text and encoded representations like \uXXXX (JavaScript), U+XXXX (standard notation), &#xHHHH; (HTML entities), or raw UTF-8 hex bytes.

Encoding Formats Explained

1. JavaScript Escape (\uXXXX)

Used in JavaScript source code and JSON strings. Each character in the Basic Multilingual Plane (BMP, code points 0–FFFF) is represented as \uXXXX with exactly 4 hex digits. Characters outside the BMP (like emoji) use the extended \u{XXXXX} syntax with 5-6 hex digits, introduced in ES6.

2. U+XXXX (Standard Notation)

This is the canonical way to reference a Unicode code point. It appears universally in technical documentation, character charts, and Unicode discussions. It is not a programming syntax but a notational convention used by the Unicode Consortium.

3. HTML Entities (&#xHHHH;)

Used in HTML and XML to embed special characters without changing the file encoding. Both hex &#xHHHH; and decimal &#DDDD; forms are valid. Especially useful for multilingual web content and ensuring characters render correctly regardless of the document charset declaration.

4. UTF-8 Hex Bytes

Shows how characters are actually stored in memory and files. UTF-8 uses 1–4 bytes per character: ASCII uses 1 byte, accented Latin and Vietnamese use 2–3 bytes, CJK uses 3 bytes, and emoji use 4 bytes. Understanding UTF-8 byte sequences is essential for debugging encoding issues like mojibake (garbled characters).

When Do You Need Unicode Conversion?

  • Embedding special characters in JSON strings without breaking the parser.
  • Decoding obfuscated escape sequences from API responses or log files.
  • Creating HTML entities for safe rendering on web pages across all browsers.
  • Debugging encoding problems by inspecting the raw UTF-8 bytes of each character.
  • Looking up the code point for any character including emoji, CJK, and accented letters.
  • Preparing internationalized text with proper escape sequences for multilingual applications.

Unicode Tips for Developers

  • Always use [...str].length instead of str.length in JavaScript to count characters correctly (surrogate pairs count as 2 in .length).
  • Use String.fromCodePoint() instead of String.fromCharCode() for characters outside the BMP.
  • In Python 3, all strings are Unicode natively — use .encode() to convert to bytes when needed.
  • When working with databases, ensure the collation supports full Unicode (e.g., utf8mb4 in MySQL, not just utf8).
  • The HTTP header Content-Type: text/html; charset=utf-8 tells browsers how to decode the response correctly.

Frequently Asked Questions

More Text Tools

Hidden Character Detector — Find Invisible Unicode Chars in Text

Detect and remove invisible Unicode characters in text: zero-width spaces (ZWSP), joiners (ZWNJ/ZWJ), byte order marks (BOM), RTL/LTR overrides, soft hyphens, non-breaking spaces, and 20+ hidden character types. Each type is color-highlighted with a label, codepoint, and risk level. Free, instant, private — no data leaves your browser.

Case Converter — UPPERCASE lowercase Title camelCase snake_case kebab-case

Convert text between 11 case formats: UPPERCASE, lowercase, Title Case, camelCase, PascalCase, snake_case, kebab-case, and more. Instant copy, free, no signup.

Find and Replace Text Online — Regex Search Replace Tool

Find and replace text online with regex support, case-sensitive matching, whole-word search, and highlighted matches. See match count and replace all or one at a time. Free browser-based tool for writers, developers, and data analysts.

HTML Entity Encoder and Decoder — Escape HTML Characters Online

Encode special characters to HTML entities or decode entities back to text. Prevent XSS attacks and display code safely. Free, instant, browser-based.

Base64 Encoder and Decoder — Encode & Decode Text Online

Encode text to Base64 or decode Base64 strings back to text instantly. Supports full UTF-8 for international characters and emoji. Free, private, runs entirely in your browser.

About Text Tools

Text tools handle the daily grind of working with strings, paragraphs, and documents: counting words, reversing characters, transforming case, generating slugs, splitting long text, previewing Markdown. These replace separate desktop apps and complex CLI commands with a single URL you can bookmark and use without setup.

Why it matters

Writers, editors, and content teams work with text constraints everywhere — Twitter's 280-char limit, LinkedIn's 1,300-char optimal post, academic abstracts of 250 words, SEO meta descriptions capped at 155. A word counter that shows characters (with and without spaces), words, sentences, paragraphs, and reading time lets you hit platform specs without switching between tools.

Privacy and safety

Text tools process input entirely in your browser. Your blog draft, legal contract, or confidential email never leaves your device. Even the word counter doesn't transmit your text — it runs a simple counting function locally, which is actually all that's needed. If a text tool claims to 'process' your text on their server, the scope for data leakage is enormous and almost never justified.

Best practices

  • For SEO titles, aim for 50-60 characters including spaces (Google truncates longer titles)
  • Meta descriptions work best at 150-155 characters — Google has been showing ~160 on desktop, ~120 on mobile
  • When generating slugs, keep them short (3-5 words), all lowercase, hyphens-not-underscores, avoid stop words
  • Markdown preview is useful BEFORE publishing to verify headings, links, and lists render correctly on the target platform