XmlDocument + StringWriter = EVIL

2008-11-11 @ 22:02#

ok, you can proly mark this one up for me just being lazy/dumb. but, after months of nagging problems w/ string encodings for XSL-transformed results, it finally dawned on me how stoopid i've been.

XmlDocument + StringWriter = EVIL

cuz it's all about the encoding, folks.

since i do mostly web apps, i do lots of XSLT work in C#. this usually goes just great, but occasionally i end up w/ goofy encoding problems. for example, sometimes MSIE will refuse to render results as HTML and will instead just belch the XML onto the client window. sometimes, even though i *know* i indicate UTF-8 in my XSL documents, the result displayed in the browser shows UTF-16. it really gets bad when i start putting together XML pipelines mixing plain XML w/ transformed docs. sometimes i just pull my hair out.

and it's all because i'm lazy/dumb. cuz StringWriter has no business being involved in an XML pipeline. we all know that right? and we all know why, right? do we?

i did. but i forgot.

see strings are stored internally as UTF-16 (Unicode) in C#. that's cool. makes sense. but not when you want to ship around the string results of an XML pipeline. that's when you usually want to stick w/ UTF-8. but StringWriter don't play dat.

so i just stopped using StringWriter to hold output form XML/XSL work. instead i use MemoryStream and make sure to set the encoding beforehand. here's some examples:

first, the wrong/dumb/Old-Mike way:

private string Transform(XmlDocument xmldoc, XmlDocument xsldoc, XsltArgumentList args)
{
   XPathNavigator xdNav = xmldoc.CreateNavigator();
   XslTransform tr = new XslTransform();
   tr.Load(xsldoc);

   StringWriter sw = new StringWriter();
   tr.Transform(xdNav, args, sw);
   return sw.ToString();
}

the above code will always return a string encoded in UTF-16. bummer.

now the proper/sane/New-Mike way:

private string Transform(XmlDocument xmldoc, XmlDocument xsldoc, XsltArgumentList args)
{
   XPathNavigator xpn = xmldoc.CreateNavigator();
   XslTransform tr = new XslTransform();
   tr.Load(xsldoc);

   System.IO.MemoryStream ms = new System.IO.MemoryStream();
   tr.Transform(xpn, args, ms);
   System.Text.Encoding enc = System.Text.Encoding.UTF8;
   return enc.GetString(ms.ToArray());
}

this will return UTF-8 every time. much better.

code