Load Sample Text
Hidden Character Reference
| Name | Code | Risk | Purpose |
|---|---|---|---|
| Zero-Width Space | U+200B | high | Break long words without visible space |
| Zero-Width Non-Joiner | U+200C | high | Prevent ligature joining in Arabic/Indic scripts |
| Zero-Width Joiner | U+200D | high | Force ligature or join emoji sequences |
| Byte Order Mark | U+FEFF | medium | Mark file as UTF-8/UTF-16 (first byte) |
| Soft Hyphen | U+00AD | medium | Suggest hyphenation point for word breaks |
| RTL Override | U+202E | high | Force right-to-left text direction |
| Non-Breaking Space | U+00A0 | low | Prevent line break between words |
| Line Separator | U+2028 | medium | Unicode line separator (breaks JS strings) |
When to Check for Hidden Characters
Suspicious URLs that look normal but redirect elsewhere — RTL override attacks reverse the displayed filename.
Text copied from PDFs or web pages often carries invisible formatting characters that break search and comparison.
Source code with zero-width spaces causes compilation errors that are impossible to spot visually.
JSON/CSV data with BOM or invisible joiners will fail parsing even when the content looks valid.
Usernames and passwords with hidden characters can bypass security filters or cause authentication failures.
What Are Hidden Unicode Characters?
Hidden characters are Unicode characters that have no visible glyph but still exist within text. They consume data storage, influence text layout and rendering, and can cause hard-to-diagnose bugs in programming, data processing, and security systems.
Unicode defines hundreds of control and formatting characters, but the most commonly encountered fall into three groups: zero-width characters (invisible spacers), special whitespace variants, and bidirectional (bidi) control characters used for right-to-left scripts.
Common Hidden Character Types
1. Zero-Width Characters
U+200B (Zero-Width Space), U+200C (ZWNJ), and U+200D (ZWJ) are the three most dangerous invisible characters. They take up zero visual space but alter text processing behavior. ZWJ is legitimately used in compound emoji sequences (family groups, flag sequences), but it is also exploited to create phishing URLs that appear identical to legitimate ones.
2. Byte Order Mark (BOM)
U+FEFF was originally designed to signal byte order in UTF-16 files. Today, it most often appears accidentally at the start of UTF-8 files saved by certain Windows text editors. A BOM in a JSON or CSV file causes parser errors that are extremely confusing because the first character of the file is invisible rather than the expected { or header text.
3. Bidirectional (Bidi) Characters
RTL Override (U+202E) forces text to render right-to-left. This is a well-known attack vector: a file named invoice_fdp.exe displays as invoice_exe.pdf on many operating systems, tricking users into executing a malicious binary they believe is a harmless PDF.
4. Special Whitespace
Non-Breaking Space (U+00A0), En Space, Em Space, Thin Space, and Hair Space look like regular spaces but have different widths and break behaviors. They commonly appear when copying text from PDFs, Word documents, or richly formatted web pages. String comparison fails silently: "hello world" with NBSP is not equal to "hello world" with a regular space, despite looking identical on screen.
Hidden Characters in Security
Invisible characters are a favored weapon in phishing and social engineering attacks. Common techniques include:
- Homograph attacks: Combining hidden characters with Unicode look-alikes of Latin letters to forge domain names.
- RTL override: Reversing the displayed file extension to hide the true format (.exe masquerading as .pdf).
- Zero-width injection: Inserting ZWSP into usernames or passwords to bypass blacklists or security filters.
- Steganographic watermarking: Embedding a unique combination of invisible characters in documents to trace the source of leaks.
Prevention in Code
- Always normalize input with
String.prototype.normalize()before processing. - Use a regex to strip control characters:
/[- ]/g - Validate file BOM before parsing JSON/CSV — strip
from the first byte. - Display codepoints (
U+XXXX) alongside each character when debugging text processing. - Use an editor with visible whitespace mode (VS Code: Toggle Render Whitespace).
- Add a CI/CD pipeline step to check for hidden characters in source code and config files.
Frequently Asked Questions
More Text Tools
Unicode Encoder and Decoder — \uXXXX U+XXXX HTML Entity UTF-8 Hex
Encode text to Unicode escape sequences (\uXXXX, U+XXXX, HTML entities, UTF-8 hex) and decode them back to readable text. Supports the full Unicode range including emoji, CJK, Vietnamese diacritics, and all scripts. Free, instant, runs entirely in your browser.
Find and Replace Text Online — Regex Search Replace Tool
Find and replace text online with regex support, case-sensitive matching, whole-word search, and highlighted matches. See match count and replace all or one at a time. Free browser-based tool for writers, developers, and data analysts.
Word Counter — Count Words Characters Sentences Paragraphs
Count words, characters (with and without spaces), sentences, paragraphs, reading time, speaking time, and top keywords in real time. Free word counter for writers, students, and SEO professionals.
Diff Checker — Compare Two Text Blocks Side by Side Online
Compare two blocks of text side by side instantly. See added lines highlighted green, deleted lines red, and unchanged lines gray. Line numbers on both sides, ignore-whitespace toggle, case-insensitive option. Free, private, runs in your browser.
JSON Validator & Formatter — Check & Fix JSON Syntax Online
Validate JSON syntax instantly and see the exact error with line and column numbers. Format (pretty-print) or minify valid JSON. Free, private, runs entirely in your browser.
Case Converter — UPPERCASE lowercase Title camelCase snake_case kebab-case
Convert text between 11 case formats: UPPERCASE, lowercase, Title Case, camelCase, PascalCase, snake_case, kebab-case, and more. Instant copy, free, no signup.
About Text Tools
Text tools handle the daily grind of working with strings, paragraphs, and documents: counting words, reversing characters, transforming case, generating slugs, splitting long text, previewing Markdown. These replace separate desktop apps and complex CLI commands with a single URL you can bookmark and use without setup.
Why it matters
Writers, editors, and content teams work with text constraints everywhere — Twitter's 280-char limit, LinkedIn's 1,300-char optimal post, academic abstracts of 250 words, SEO meta descriptions capped at 155. A word counter that shows characters (with and without spaces), words, sentences, paragraphs, and reading time lets you hit platform specs without switching between tools.
Privacy and safety
Text tools process input entirely in your browser. Your blog draft, legal contract, or confidential email never leaves your device. Even the word counter doesn't transmit your text — it runs a simple counting function locally, which is actually all that's needed. If a text tool claims to 'process' your text on their server, the scope for data leakage is enormous and almost never justified.
Best practices
- For SEO titles, aim for 50-60 characters including spaces (Google truncates longer titles)
- Meta descriptions work best at 150-155 characters — Google has been showing ~160 on desktop, ~120 on mobile
- When generating slugs, keep them short (3-5 words), all lowercase, hyphens-not-underscores, avoid stop words
- Markdown preview is useful BEFORE publishing to verify headings, links, and lists render correctly on the target platform