Html – How are multiple spaces showing in HTML, without pre, nbsp or white-space: pre

csshtmltext;tinymcewhitespace

I'm sure this shouldn't be possible. Somehow, multiple spaces within paragraphs in my HTML markup aren't collapsing. They are not &nbsp;, not in a <pre> tag, not set to white-space: pre-wrap; or white-space: pre; and the behaviour is not changed by forcing style="white-space: normal;" on the element.

My understanding is, that these are the only three ways that whitespace can be preserved and that two or more spaces can be made to show up in HTML.

So the question is: What else could cause sequential whitespace to show up as multiple spaces? There must be something else – but I can't look for it until I know what it is, and every source I find just talks about &nbsp;, <pre> and white-space: pre-wrap; or white-space: pre;

Key edit: Using Firebug I tried deleting some of the offending whitespace and typing it back in again. When deleted and re-entered from the keyboard, the spaces behave as normal – no unexpected whitespace in the browser. So it must be some character that shows up in view source, text editors, etc as a plain space, but actually behaves like &nbsp;. What can it be, and crucially, how can I identify it to remove it? The original source of the offending input is the wysiwyg editor TinyMCE so I'm adding that tag…


More detail: I have some HTML containing paragraph text that includes multiple spaces, like this (between the …s is copied direct from Firefox view source):

<p> blah blah.... nothing  more  than  a ... blah blah </p>

As you can see, these are regular spaces, not &nbsp;. The document has &nbsp; elsewhere and they show up as such in view source, so they're not &nbsp;s somehow masquerading as normal spaces in the source.

Also, the CSS is not set to white-space: pre; or anything like that:

  • Firebug's 'Computed' panel lists no whitespace related rules.
  • Forcing a white-space: normal; on the paragraph makes no difference to the spaces. The rule does show up in the Firebug 'Computed' panel, so it is applying.
  • Applying white-space: pre-wrap; to the paragraph results in other changes, but doesn't change these multiple sequencial spaces. For example, it makes an extra space visible at the end of each line break within a paragraph of selected text. So it's definitely not somehow being stealthily set to white-space: pre-wrap;

There are no <pre> tags in the document, anywhere. Find on <pre in the source finds nothing.

So it should therefore show in the browser with one space between each word. It doesn't. It shows multiple spaces as if it was <pre> or &nbsp; or white-space: pre;. But it isn't any of these.

There must be some other way of getting a white-space: pre; like effect I don't know about. What other ways to preformat whitespace and stop multiple spaces collapsing are there? What could possibly be causing this.


A few background notes:

  • Seen in which browsers? Firefox 16.0.2, Google Chrome 23.0.1271.95 m (Windows)
  • Why is there double spaced whitespace in the markup if you don't want it visible? I'm working on the front end / design for a CMS where the users will be inputting text from Word and PDFs via TinyMCE wysiwyg (non-negotiable). Therefore, the markup will be messy. Fixing the CMS text formatter to scrub whitespace and clean the markup is outside the scope of this work.
  • Doctype? <!DOCTYPE html PUBLIC "-//W3C//DTD HTML+RDFa 1.1//EN">
  • You're sure there are multiple spaces? Definitely, you can select one, two, in some cases three spaces between words in the browser front-end.
  • Can I seez all teh codez? Sorry, but it's a pre-launch site, it's not publicly available. Feel free to ask about specific CSS rules, HTML tags, javascript stuff loaded etc etc.

Best Answer

I'm guessing that the offending space characters in your source code are not SPACE (U+0020), but are actually NO-BREAK SPACE (U+00A0). Visually, they appear identical, but if you view your source code in a hex editor (which shows you the individual bytes in the file), you'll see a different encoding for these characters.

Edit 1

This PHP code should find and replace the offending characters with regular spaces:

$strNoBreakSpace = mb_convert_encoding('&#x00A0;', 'UTF-8', 'HTML-ENTITIES');
$strNormalSpace  = mb_convert_encoding('&#x0020;', 'UTF-8', 'HTML-ENTITIES');

$strInput = str_replace( $strNoBreakSpace, $strNormalSpace, $strInput );

Edit 2

A simpler way of creating the two space characters:

$strNoBreakSpace = json_decode('"\u00A0"');
$strNormalSpace  = json_decode('"\u0020"');
Related Topic