Sunday, March 28, 2010

Why Fx Cop recommends String.ToUpperInvariant for normalizing ?

I used to normalize strings using lower case from Visual Basic 5.0 era before a string comparison activity. While executing Microsoft FX Cop on a recent SharePoint project, FX Cop rule #CA1308 stumbled upon on many places and recommending to convert to upper case invariant for normalizing.

As usual Microsoft doesn't have proper documentation and I'm just looking for what's the difference between ToLowerInvariant & ToUpperInvariant in terms of performance. Michael Kaplan's blog reveals the difference.

All Microsoft NTFS filesystems follow the normalizing to upper case, so the framework developers just followed the NTFS standards. The NTFS filesystem contains a metafile named $UpCase which is a table of unicode uppercase characters for ensuring case insensitivity in Win32 and DOS namespaces. This $UpCase file contains the upper case conversion table for NTFS file systems.

Now the question is why NTFS is following upper case conversion standards, there are some language scripts which doesn't have lower case letters. So for these scripts if we apply the lower casing logic this will also produce the upper case letters producing a round trip. To avoid this Microsoft seems to follow this upper case standards

No comments: