Character Diagnostic Lab

Hidden Character Detector

Paste text → Scan → See hidden chars highlighted with color labels. Free, instant, private.

20+ TypesInstantFreePrivate
Ctrl+Enter to scan

Load Sample Text

Zero-Width Spaces
Click to load into scanner
RTL Override Attack
Click to load into scanner
Mixed Hidden Chars
Click to load into scanner
Clean Text
Click to load into scanner

Hidden Character Reference

NameCodeRiskPurpose
Zero-Width SpaceU+200BhighBreak long words without visible space
Zero-Width Non-JoinerU+200ChighPrevent ligature joining in Arabic/Indic scripts
Zero-Width JoinerU+200DhighForce ligature or join emoji sequences
Byte Order MarkU+FEFFmediumMark file as UTF-8/UTF-16 (first byte)
Soft HyphenU+00ADmediumSuggest hyphenation point for word breaks
RTL OverrideU+202EhighForce right-to-left text direction
Non-Breaking SpaceU+00A0lowPrevent line break between words
Line SeparatorU+2028mediumUnicode line separator (breaks JS strings)

When to Check for Hidden Characters

Suspicious URLs that look normal but redirect elsewhere — RTL override attacks reverse the displayed filename.

Text copied from PDFs or web pages often carries invisible formatting characters that break search and comparison.

Source code with zero-width spaces causes compilation errors that are impossible to spot visually.

JSON/CSV data with BOM or invisible joiners will fail parsing even when the content looks valid.

Usernames and passwords with hidden characters can bypass security filters or cause authentication failures.

What Are Hidden Unicode Characters?

Hidden characters are Unicode characters that have no visible glyph but still exist within text. They consume data storage, influence text layout and rendering, and can cause hard-to-diagnose bugs in programming, data processing, and security systems.

Unicode defines hundreds of control and formatting characters, but the most commonly encountered fall into three groups: zero-width characters (invisible spacers), special whitespace variants, and bidirectional (bidi) control characters used for right-to-left scripts.

Common Hidden Character Types

1. Zero-Width Characters

U+200B (Zero-Width Space), U+200C (ZWNJ), and U+200D (ZWJ) are the three most dangerous invisible characters. They take up zero visual space but alter text processing behavior. ZWJ is legitimately used in compound emoji sequences (family groups, flag sequences), but it is also exploited to create phishing URLs that appear identical to legitimate ones.

2. Byte Order Mark (BOM)

U+FEFF was originally designed to signal byte order in UTF-16 files. Today, it most often appears accidentally at the start of UTF-8 files saved by certain Windows text editors. A BOM in a JSON or CSV file causes parser errors that are extremely confusing because the first character of the file is invisible rather than the expected { or header text.

3. Bidirectional (Bidi) Characters

RTL Override (U+202E) forces text to render right-to-left. This is a well-known attack vector: a file named invoice_‮fdp.exe displays as invoice_exe.pdf on many operating systems, tricking users into executing a malicious binary they believe is a harmless PDF.

4. Special Whitespace

Non-Breaking Space (U+00A0), En Space, Em Space, Thin Space, and Hair Space look like regular spaces but have different widths and break behaviors. They commonly appear when copying text from PDFs, Word documents, or richly formatted web pages. String comparison fails silently: "hello world" with NBSP is not equal to "hello world" with a regular space, despite looking identical on screen.

Hidden Characters in Security

Invisible characters are a favored weapon in phishing and social engineering attacks. Common techniques include:

  • Homograph attacks: Combining hidden characters with Unicode look-alikes of Latin letters to forge domain names.
  • RTL override: Reversing the displayed file extension to hide the true format (.exe masquerading as .pdf).
  • Zero-width injection: Inserting ZWSP into usernames or passwords to bypass blacklists or security filters.
  • Steganographic watermarking: Embedding a unique combination of invisible characters in documents to trace the source of leaks.

Prevention in Code

  • Always normalize input with String.prototype.normalize() before processing.
  • Use a regex to strip control characters: /[​-‍­ ]/g
  • Validate file BOM before parsing JSON/CSV — strip  from the first byte.
  • Display codepoints (U+XXXX) alongside each character when debugging text processing.
  • Use an editor with visible whitespace mode (VS Code: Toggle Render Whitespace).
  • Add a CI/CD pipeline step to check for hidden characters in source code and config files.

Frequently Asked Questions

More Text Tools

Unicode Encoder and Decoder — \uXXXX U+XXXX HTML Entity UTF-8 Hex

Encode text to Unicode escape sequences (\uXXXX, U+XXXX, HTML entities, UTF-8 hex) and decode them back to readable text. Supports the full Unicode range including emoji, CJK, Vietnamese diacritics, and all scripts. Free, instant, runs entirely in your browser.

Find and Replace Text Online — Regex Search Replace Tool

Find and replace text online with regex support, case-sensitive matching, whole-word search, and highlighted matches. See match count and replace all or one at a time. Free browser-based tool for writers, developers, and data analysts.

Word Counter — Count Words Characters Sentences Paragraphs

Count words, characters (with and without spaces), sentences, paragraphs, reading time, speaking time, and top keywords in real time. Free word counter for writers, students, and SEO professionals.

Diff Checker — Compare Two Text Blocks Side by Side Online

Compare two blocks of text side by side instantly. See added lines highlighted green, deleted lines red, and unchanged lines gray. Line numbers on both sides, ignore-whitespace toggle, case-insensitive option. Free, private, runs in your browser.

JSON Validator & Formatter — Check & Fix JSON Syntax Online

Validate JSON syntax instantly and see the exact error with line and column numbers. Format (pretty-print) or minify valid JSON. Free, private, runs entirely in your browser.

Case Converter — UPPERCASE lowercase Title camelCase snake_case kebab-case

Convert text between 11 case formats: UPPERCASE, lowercase, Title Case, camelCase, PascalCase, snake_case, kebab-case, and more. Instant copy, free, no signup.

About Text Tools

Text tools handle the daily grind of working with strings, paragraphs, and documents: counting words, reversing characters, transforming case, generating slugs, splitting long text, previewing Markdown. These replace separate desktop apps and complex CLI commands with a single URL you can bookmark and use without setup.

Why it matters

Writers, editors, and content teams work with text constraints everywhere — Twitter's 280-char limit, LinkedIn's 1,300-char optimal post, academic abstracts of 250 words, SEO meta descriptions capped at 155. A word counter that shows characters (with and without spaces), words, sentences, paragraphs, and reading time lets you hit platform specs without switching between tools.

Privacy and safety

Text tools process input entirely in your browser. Your blog draft, legal contract, or confidential email never leaves your device. Even the word counter doesn't transmit your text — it runs a simple counting function locally, which is actually all that's needed. If a text tool claims to 'process' your text on their server, the scope for data leakage is enormous and almost never justified.

Best practices

  • For SEO titles, aim for 50-60 characters including spaces (Google truncates longer titles)
  • Meta descriptions work best at 150-155 characters — Google has been showing ~160 on desktop, ~120 on mobile
  • When generating slugs, keep them short (3-5 words), all lowercase, hyphens-not-underscores, avoid stop words
  • Markdown preview is useful BEFORE publishing to verify headings, links, and lists render correctly on the target platform