You write in AsciiDoc. Typography matters. Hidden characters also matter. Some help readability. Others break parsing, search, or exports.
So you don’t have to adjust all these characters individually, you can use “regular expressions” (in short: “regex”). They filter your text according to certain criteria and replace the search results en masse.
How to read the regex in this article
We use literal characters when safe (e.g. —).
We show Unicode escapes: \uXXXX or \x{XXXX}. Use whichever your regex engine accepts.
If in doubt, paste the literal character into the Find box.

turn on Replace, and ensure Regular Expression is active.
Spaces and joins
These control wrapping and word boundaries. They often slip in via copy-paste.
| Character | Looks like | Code | Use it when | Avoid because | Find (regex) | Replace with |
|---|---|---|---|---|---|---|
| Non-breaking space | U+00A0 | Keep numbers + units or names together | Invisible; breaks expected wrap/search | \u00A0 | normal space "" |
|
| Narrow no-break space | U+202F | French punctuation; tight number–unit spacing | Easy to miss; inconsistent in exports | \u202F | "" or regular NBSP” |
|
| Thin space | U+2009 | Fine typography (e.g., 5%) | May collapse or vanish | \u2009 | " " |
|
| Zero-width space | | U+200B | Soft break in long URLs | Breaks search, identifiers, markup | \u200B | "" (remove) |
| Zero-width non-joiner | | U+200C | Needed in some scripts | Same issues in Latin text | \u200C | "" |
| Zero-width joiner | | U+200D | Emoji ligatures; complex scripts | Same issues | \u200D | "" |
| Soft hyphen | | U+00AD | Optional hyphenation | Can show as stray ‘-’ and break copy | \u00AD | "" |
Batch cleanup (safe default for most docs)
Find: [\u200B\u200C\u200D\u00AD] → Replace: ""
Find: [\u00A0\u202F] → Replace: " " (or keep if you need non-breaking behavior)

Hyphens and dashes
AsciiDoc treats -, --, and --- literally unless your converter does typographic substitutions. Be explicit.
| Character | Looks like | Code | Use it when | Find | Replace with |
|---|---|---|---|---|---|
| Hyphen-minus | - | U+002D | Compound words; flags; code | - (literal) | keep |
| Non-breaking hyphen | - | U+2011 | Prevent break in compounds | \u2011 | - (or keep) |
| En dash | – | U+2013 | Ranges; relationships | \u2013 | — (ASCII normalization) |
| Em dash | – | U+2014 | Breaks in thought | \u2014 | —- (ASCII normalization) |
Quotes and apostrophes
Curly quotes look good in prose. They are bad in code, attributes, IDs, and markup.
| Character | Looks like | Code | Use it when | Find | Replace with” |
|---|---|---|---|---|---|
| Straight double quote | " | U+0022 | Code, attributes, JSON, CSV | " | keep |
| Straight single quote / apostrophe | ‘ | U+0027 | Contractions, possessives, code | ‘ | keep |
| Left double quote | \U201c | U+201C | Prose quotes | [\U201c] or \u201C | " |
| Right double quote | \U201d | U+201D | Prose quotes | [\U201d] or \u201D | " |
| Left single quote | \U2018 | U+2018 | Nested quotes | [\U2018] or \u2018 | ‘ |
| Right single quote / curly apostrophe | \U2019 | U+2019 | Prose apostrophes | [\U2019] or \u2019 | ‘ |
| Backtick | ´ | U+0060 | AsciiDoc monospace | \u0060 | keep |
Bulk fixes:
Curly doubles → straight: [\u201C\u201D] → "
Curly singles → straight: [\u2018\u2019] → '
Ellipsis and punctuation
| Character | Looks like | Code | Use it when | Find | Replace with |
|---|---|---|---|---|---|
| Ellipsis | … | U+2026 | Prose pause; UI truncation | \u2026 | … |
| Middle dot | · | U+00B7 | Inline lists; math vectors | \u00B7 | \U2022 or - |
| Bullet | • | U+2022 | Rich text bullets | \u2022 | AsciiDoc list "* " |
| Figure/En/Em spaces | U+2007 / U+2002 / U+2003 | Tabular alignment | [\u2007\u2002\u2003] | (space) or keep in tables” | |
| Non-ASCII punctuation | ¿ ¡ « » | U+00XX | Language-specific typography | [¡¿«»] | ASCII equivalents if needed |
Math, units, and symbols
Prefer semantic ASCII in source and let the converter handle typography—unless you publish raw text.
| Character | Looks like | Code | Use it when | Find | Replace with” |
|---|---|---|---|---|---|
| Multiplication sign | × | U+00D7 | Dimensions: 10\U00d720 | \u00D7 | x or * |
| Minus sign | − | U+2212 | True math minus | \u2212 | - |
| Division sign | ÷ | U+00F7 | Simple math | \u00F7 | / |
| Degree | ° | U+00B0 | Temperatures, angles | \u00B0 | keep (ensure spacing) |
| Trademark | ™ ® © | U+2122 / U+00AE / U+00A9 | Legal marks | [™ ® ©] | keep or remove by style |
Practical recipes
QA checklist before publishing
- Remove zero-width and soft hyphens.
- Decide: ASCII-only vs typographic output. Normalize accordingly.
- Ensure non-breaking spaces where needed (numbers + units, names).
- Check quotes in code and attributes are straight.
- Search for BOMs and stray figure/em spaces.
A. Purge hidden troublemakers
Find: [\u200B\u200C\u200D\u00AD\uFEFF] → Replace: ""
B. Replace non-breaking and thin spaces with normal spaces
Find: [\u00A0\u202F] → Replace: " "
(Skip this if you rely on non-breaking behavior for layout.)
C. Normalize quotes to ASCII
- Doubles:
[\u201C\u201D]→" - Singles:
[\u2018\u2019]→'
D. Normalize dashes (if your style requires ASCII)
En dash: \u2013 → —
Em dash: \u2014 → —-