About Me

My photo
Working as Technical Lead in CollabNET software private limited.

Friday 17 April, 2009

Removing invalid xml characters

Recently I came across oneof the good way to remove InvalidXmlCharacters. Below is the snippet.

/**
* Returns the input stripped of invalid XML characters.
*
* see http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char for valid XML
* character list.
*/

public String removeInvalidXmlCharacters(String input)
{
if (input == null) {
return input;
}
char c;
StringBuffer sb = new StringBuffer();
for (int i = 0; i < input.length(); i++)
{
c = input.charAt(i);
//remove ZeroWidthSpace
if (c == '\u200b') {
continue;
}
if ((c == 0x9) || (c == 0xA) || (c == 0xD)
|| ((c >= 0x20) && (c <= 0xD7FF))
|| ((c >= 0xE000) && (c <= 0xFFFD))
|| ((c >= 0x10000) && (c <= 0x10FFFF))
) {
sb.append(c);
}
}
return sb.toString();
}

No comments: