Skip to main content

Regex in AsciiDoc: Hidden Characters & How to Find Them

Use this guide to decide when a character makes sense and how to find and replace it fast in adoc Studio with regex and unicode escapes.

You write in AsciiDoc. Typography matters. Hidden characters also matter. Some help readability. Others break parsing, search, or exports.

So you don't have to adjust all these characters individually, you can use “regular expressions" (in short: "regex"). They filter your text according to certain criteria and replace the search results en masse.


How to read the regex in this article

  • We use literal characters when safe (e.g., ).
  • We show Unicode escapes: \uXXXX or \x{XXXX}. Use whichever your regex engine accepts.
  • If in doubt, paste the literal character into the Find box.
In adoc Studio, press ⌘F, click the arrow next to the magnifying glass, turn on Replace, and ensure Regular Expression is active.

Spaces and joins

These control wrapping and word boundaries. They often slip in via copy-paste.

Character Looks like Code Use it when Avoid because Find (regex) Replace with
Non-breaking space   U+00A0 Keep numbers + units or names together Invisible; breaks expected wrap/search \u00A0 normal space " "
Narrow no-break space U+202F French punctuation; tight number–unit spacing Easy to miss; inconsistent in exports \u202F " " or regular NBSP
Thin space U+2009 Fine typography (e.g., 5 %) May collapse or vanish \u2009 " "
Zero-width space U+200B Soft break in long URLs Breaks search, identifiers, markup \u200B "" (remove)
Zero-width non-joiner U+200C Needed in some scripts Same issues in Latin text \u200C ""
Zero-width joiner U+200D Emoji ligatures; complex scripts Same issues \u200D ""
Soft hyphen ­ U+00AD Optional hyphenation Can show as stray '-' and break copy \u00AD ""

Batch cleanup (safe default for most docs):
Find: [\u200B\u200C\u200D\u00AD] → Replace: ""
Find: [\u00A0\u202F\u2009] → Replace: " " (or keep if you need non-breaking behavior)

In adoc Studio, invisible characters get highlighted (e.g. • , | , ¿)

Hyphens and dashes

AsciiDoc treats -, --, and --- literally unless your converter does typographic substitutions. Be explicit.

Character Looks like Code Use it when Find Replace with
Hyphen-minus - U+002D Compound words; flags; code - (literal) keep
Non-breaking hyphen - U+2011 Prevent break in compounds \u2011 - (or keep)
En dash U+2013 Ranges; relationships \u2013 -- (ASCII normalization)
Em dash U+2014 Breaks in thought \u2014 --- (ASCII normalization)

Quotes and apostrophes

Curly quotes look good in prose. They are bad in code, attributes, IDs, and markup.

Character Looks like Code Use it when Find Replace with
Straight double quote " U+0022 Code, attributes, JSON, CSV " keep
Straight single quote / apostrophe ' U+0027 Contractions, possessives, code ' keep
Left double quote U+201C Prose quotes [“] or \u201C "
Right double quote U+201D Prose quotes [”] or \u201D "
Left single quote U+2018 Nested quotes [‘] or \u2018 '
Right single quote / curly apostrophe U+2019 Prose apostrophes [’] or \u2019 '
Backtick ` U+0060 AsciiDoc monospace ` keep

Bulk fixes:

  • Curly doubles → straight: [\u201C\u201D]"

  • Curly singles → straight: [\u2018\u2019]'


Ellipsis and punctuation

Character Looks like Code Use it when Find Replace with
Ellipsis U+2026 Prose pause; UI truncation \u2026 ...
Middle dot · U+00B7 Inline lists; math vectors \u00B7 • or -
Bullet U+2022 Rich text bullets \u2022 AsciiDoc list "* "
Figure/En/Em spaces   /   /   U+2007 / U+2002 / U+2003 Tabular alignment [\u2007\u2002\u2003] (space) or keep in tables
Non-ASCII punctuation ¿ ¡ « » U+00XX Language-specific typography [¡¿«»] ASCII equivalents if needed

Math, units, and symbols

Prefer semantic ASCII in source and let the converter handle typography—unless you publish raw text.

Character Looks like Code Use it when Find Replace with
Multiplication sign × U+00D7 Dimensions: 10×20 \u00D7 x or *
Minus sign U+2212 True math minus \u2212 -
Division sign ÷ U+00F7 Simple math \u00F7 /
Degree ° U+00B0 Temperatures, angles \u00B0 keep (ensure spacing)
Trademark ™ ® © U+2122 / U+00AE / U+00A9 Legal marks [™®©] keep or remove by style

Practical recipes

QA checklist before publishing

  1. Remove zero-width and soft hyphens.

  2. Decide: ASCII-only vs typographic output. Normalize accordingly.

  3. Ensure non-breaking spaces where needed (numbers + units, names).

  4. Check quotes in code and attributes are straight.

  5. Search for BOMs and stray figure/em spaces.

A. Purge hidden troublemakers

Find: [\u200B\u200C\u200D\u00AD\uFEFF] → Replace: ""

B. Replace non-breaking and thin spaces with normal spaces

Find: [\u00A0\u202F\u2009] → Replace: " "
(Skip this if you rely on non-breaking behavior for layout.)

C. Normalize quotes to ASCII

  • Doubles: [\u201C\u201D]"

  • Singles: [\u2018\u2019]'

D. Normalize dashes (if your style requires ASCII)

  • En dash: \u2013--

  • Em dash: \u2014---


© adoc Studio