What formats are supported?

Four formats: \uXXXX (JavaScript), U+XXXX (standard), HTML &#xHH; entities, and UTF-8 hex bytes.

Does it handle emoji?

Yes — full Unicode range including emoji, CJK, and all supplementary planes.

Can I decode mixed text?

Yes — escape sequences within normal text are detected and decoded while plain text is preserved.

What is UTF-8 vs Unicode?

Unicode assigns numbers (code points) to characters. UTF-8 is how those numbers are stored as bytes.

Why is emoji length 2 in JavaScript?

JavaScript uses UTF-16 internally. Emoji need surrogate pairs. Use [...str].length for correct count.

Yes — all processing runs in your browser. Nothing is sent to any server.

How to encode Vietnamese diacritics?

Paste Vietnamese text and each accented character is encoded to its code point, e.g. ă → \u0103.

When to use HTML entities vs \uXXXX?

Use HTML entities in markup (HTML/XML). Use \uXXXX in JavaScript/TypeScript and JSON strings.

Unicode Codec

Unicode Encoder / Decoder

Convert text to \uXXXX, U+XXXX, HTML entities, UTF-8 hex and back. Instant, free, private.

4 FormatsInstantFreeAll Scripts

Format

Input Text

Encoded Output

Output will appear here...

Reference

Unicode Reference Table

Char	U+	\u	HTML	UTF-8	Description
A	U+0041	\u0041	A	41	Latin capital A
é	U+00E9	\u00E9	é	C3 A9	Latin e with acute
ă	U+0103	\u0103	ă	C4 83	Latin a with breve (Vietnamese)
世	U+4E16	\u4E16	世	E4 B8 96	CJK character (world)
❤	U+2764	\u2764	❤	E2 9D A4	Heavy black heart
😀	U+1F600	\u{1F600}	😀	F0 9F 98 80	Grinning face emoji
	U+00A0	\u00A0		C2 A0	Non-breaking space
	U+200B	\u200B		E2 80 8B	Zero-width space

Code

Code Snippets

JavaScriptEncode to \uXXXX

const encode = (str) => [...str] .map(c => { const cp = c.codePointAt(0); return cp > 0xFFFF ? `\\u{${cp.toString(16)}}` : `\\u${cp.toString(16).padStart(4,'0')}`; }).join('');

PythonEncode / Decode

# Encode "Hello".encode('unicode_escape') # b'\\u0048\\u0065\\u006c\\u006c\\u006f' # Decode b'\\u0048ello'.decode('unicode_escape')

HTMLHTML Entities

&hearts; → ♥  &#x2764; → ❤  &#10084; → ❤

Ranges

Common Unicode Ranges

\u0000–\u007F

Basic Latin

ASCII letters, digits, punctuation (1 byte in UTF-8)

\u0080–\u07FF

Extended Latin / Greek / Cyrillic

Accented letters, Vietnamese, Greek, Cyrillic (2 bytes)

\u0800–\uFFFF

CJK / BMP Symbols

Chinese, Japanese, Korean, most symbols (3 bytes)

\u{10000}+

Supplementary Planes

Emoji, historic scripts, rare symbols (4 bytes)

Full Guide

What Is Unicode?

Unicode is the universal character encoding standard that assigns a unique number (code point) to every character in every writing system. From basic Latin letters to emoji, CJK ideographs, Vietnamese diacritics, and mathematical symbols — Unicode covers them all. Version 15.1 includes over 149,000 characters from 161 scripts.

When working with international data, you frequently need to convert between plain text and encoded representations like \uXXXX (JavaScript), U+XXXX (standard notation), &#xHHHH; (HTML entities), or raw UTF-8 hex bytes.

Encoding Formats Explained

1. JavaScript Escape (\uXXXX)

Used in JavaScript source code and JSON strings. Each character in the Basic Multilingual Plane (BMP, code points 0–FFFF) is represented as \uXXXX with exactly 4 hex digits. Characters outside the BMP (like emoji) use the extended \u{XXXXX} syntax with 5-6 hex digits, introduced in ES6.

2. U+XXXX (Standard Notation)

This is the canonical way to reference a Unicode code point. It appears universally in technical documentation, character charts, and Unicode discussions. It is not a programming syntax but a notational convention used by the Unicode Consortium.

3. HTML Entities (&#xHHHH;)

Used in HTML and XML to embed special characters without changing the file encoding. Both hex &#xHHHH; and decimal &#DDDD; forms are valid. Especially useful for multilingual web content and ensuring characters render correctly regardless of the document charset declaration.

4. UTF-8 Hex Bytes

Shows how characters are actually stored in memory and files. UTF-8 uses 1–4 bytes per character: ASCII uses 1 byte, accented Latin and Vietnamese use 2–3 bytes, CJK uses 3 bytes, and emoji use 4 bytes. Understanding UTF-8 byte sequences is essential for debugging encoding issues like mojibake (garbled characters).

When Do You Need Unicode Conversion?

Embedding special characters in JSON strings without breaking the parser.
Decoding obfuscated escape sequences from API responses or log files.
Creating HTML entities for safe rendering on web pages across all browsers.
Debugging encoding problems by inspecting the raw UTF-8 bytes of each character.
Looking up the code point for any character including emoji, CJK, and accented letters.
Preparing internationalized text with proper escape sequences for multilingual applications.

Unicode Tips for Developers

Always use [...str].length instead of str.length in JavaScript to count characters correctly (surrogate pairs count as 2 in .length).
Use String.fromCodePoint() instead of String.fromCharCode() for characters outside the BMP.
In Python 3, all strings are Unicode natively — use .encode() to convert to bytes when needed.
When working with databases, ensure the collation supports full Unicode (e.g., utf8mb4 in MySQL, not just utf8).
The HTTP header Content-Type: text/html; charset=utf-8 tells browsers how to decode the response correctly.

FAQ

Frequently Asked Questions

Related Tools

More Text Tools

Hidden Character Detector — Find Invisible Unicode Chars in Text

Detect and remove invisible Unicode characters in text: zero-width spaces (ZWSP), joiners (ZWNJ/ZWJ), byte order marks (BOM), RTL/LTR overrides, soft hyphens, non-breaking spaces, and 20+ hidden character types. Each type is color-highlighted with a label, codepoint, and risk level. Free, instant, private — no data leaves your browser.

Case Converter — UPPERCASE lowercase Title camelCase snake_case kebab-case

Convert text between 11 case formats: UPPERCASE, lowercase, Title Case, camelCase, PascalCase, snake_case, kebab-case, and more. Instant copy, free, no signup.

Find and Replace Text Online — Regex Search Replace Tool

Find and replace text online with regex support, case-sensitive matching, whole-word search, and highlighted matches. See match count and replace all or one at a time. Free browser-based tool for writers, developers, and data analysts.

HTML Entity Encoder and Decoder — Escape HTML Characters Online

Encode special characters to HTML entities or decode entities back to text. Prevent XSS attacks and display code safely. Free, instant, browser-based.

Base64 Encoder and Decoder — Encode & Decode Text Online

Encode text to Base64 or decode Base64 strings back to text instantly. Supports full UTF-8 for international characters and emoji. Free, private, runs entirely in your browser.

About Text Tools

Text tools handle the daily grind of working with strings, paragraphs, and documents: counting words, reversing characters, transforming case, generating slugs, splitting long text, previewing Markdown. These replace separate desktop apps and complex CLI commands with a single URL you can bookmark and use without setup.

Why it matters

Writers, editors, and content teams work with text constraints everywhere — Twitter's 280-char limit, LinkedIn's 1,300-char optimal post, academic abstracts of 250 words, SEO meta descriptions capped at 155. A word counter that shows characters (with and without spaces), words, sentences, paragraphs, and reading time lets you hit platform specs without switching between tools.

Privacy and safety

Text tools process input entirely in your browser. Your blog draft, legal contract, or confidential email never leaves your device. Even the word counter doesn't transmit your text — it runs a simple counting function locally, which is actually all that's needed. If a text tool claims to 'process' your text on their server, the scope for data leakage is enormous and almost never justified.

Best practices

For SEO titles, aim for 50-60 characters including spaces (Google truncates longer titles)
Meta descriptions work best at 150-155 characters — Google has been showing ~160 on desktop, ~120 on mobile
When generating slugs, keep them short (3-5 words), all lowercase, hyphens-not-underscores, avoid stop words
Markdown preview is useful BEFORE publishing to verify headings, links, and lists render correctly on the target platform