Handling Special Markup Characters for HTML and XML, Entities
Submitted by charlie.collins on Thu, 02/01/2001 - 16:00
Tagged:
There are several key "special" characters in markup languages that can wreak havoc if not used properly. This is due to the fact that some characters delimit special meaining to a markup parser that is generally different than the character data input, such as an ampersand and or the greater than and less than characters (&, >, <.)
I have seen several tutorials that present complicated ways to allow these characters as input that are far more complicated than they need be (even a developerWorks tutorial that suggested a CDATA section for each character that required it.)
The solution to this issue lies in entities. Entities are representations of characters that are not made up of just the character itself and allow the output to be parsed. This holds true for many markup languages such as HTML and XML.
For XML and HTML alike the main "problem" characters are again the ampersand and the greater than and less than characters (and for HTML the non breaking space as well.)
There are of course entities for all characters but generally only a few are required because in any given markup language only a few cause problems.
Enough about what special characters are and why they can cause "problems", so what do you do with entities and how? Well for HTML and XML this is very simple.
- ampersand = & and the entity is & or &
- greater than = > and the entity is >
- less than = < and the entity is <
- non breaking space = and the entity is  







Comments
Re: Handling Special Markup Characters, Entities
Re: Handling Special Markup Characters, Entities
Re: Handling Special Markup Characters, Entities