The other day I was alerted to a new feature in VS2005 whereby you can pre-generate an assembly with strongly-typed XmlSerializers for classes in a project. Let me step back for a minute to explain that the XmlSerializer can be evil if not used properly.
What the XmlSerializer does that is evil
The most important thing to be aware of is that when you use the XmlSerializer it actually dynamically creates an assembly containing special (strongly typed) readers and writers which know how to convert the types that you are serializing to and from Xml. This process has a few gotcha's, one of which is that, to even run an XmlSerializer you need partial trust on the machine so that the compiler can compile the assembly. The process of creating these strongly typed serializers is very expensive and taxing on machine resources because there's CodeDom, compilers, reflection and a whole host of other resources being used to achieve this.
The XmlSerializer has a few different overloaded constructors which exist to support things such as: loading a type; loading multiple types, supplying evidence. It's important that there is a huge difference in what happens under the covers if you call one of these constructors: ctor(), ctor(Type), ctor(Type, string) and any of the others.
When you call one of those 2 simple constructors what happens is that a special assembly is created and then stored in a cache the first time that you call it. On subsequent calls to that constructor, the cache is checked and the assembly will be pulled from there rather than re-created. This means that you only wear the cost the first time which is good - although remember, it's a big cost.
When you use any of the other constructors to construct the XmlSerializer, the cache is not checked. This means that you will wear the full cost of creating the special assembly every time. Not only that, your application will continue to grow the memory that it requires because the assemblies are created and then loaded into your AppDomain - these cannot be unloaded until the AppDomain is torn down.
Only use the simple constructors
One of the reasons for using the non-simple constructors is when you need to serialize multiple types. There's a couple of examples of this but a pretty simple one is polymorphism. Let's say that I have a base class called "Animal" which can have sub-classes for "Human" and for "Dog". The classes look like this:
[XmlRoot("Animal", Namespace = "urn-MarkItUp-Demo")]
public class Animal {
public Animal() { }
protected string _name;
[XmlAttribute("name")]
public string FirstName {
get { return _name; }
set { _name = value; }
}
}
[XmlRoot("Human", Namespace = "urn-MarkItUp-Demo")]
public class Human : Animal {
public Human() { }
public Human(string name) {
this._name = name;
}
}
[XmlRoot("Dog", Namespace = "urn-MarkItUp-Demo")]
public class Dog : Animal {
public Dog() { }
public Dog(string name) {
this._name = name;
}
}
In the application I will generally be passing Dog and Human classes to methods which take an "Animal" type to take advantage of the polymorphic behaviour. If, in one of those methods, I try to serialize the instance with the XmlSerializer, my code will throw an exception. For example the following code throws an InvalidOperation exception:
Human h = new Human("Darren Neimke");
SerializeAnimal(h);
private static void SerializeAnimal(Animal a) {
XmlSerializer ser = new XmlSerializer(typeof(Animal));
using (StringWriter sw = new StringWriter()) {
ser.Serialize(sw, a);
}
}
The reason for the exception is that, when we created the serializer passing the Animal type as an argument, the special assembly was creating under the covers using Animal as the type to base the new strongly typed reader and writer on. Then, when we pass the Human to the Serialize method of the serializer it cannot find the type mapping in its cache so it falls over.
A way to get around this is to pass the derivitive types to the overload of the XmlSerializer so that it can create special readers and writers for each of them right up front. So we change the construction of the XmlSerializer to look like this:
XmlSerializer ser = new XmlSerializer(typeof(Animal), new Type[] {typeof(Human), typeof(Dog)});
This works because we've told the XmlSerializer what to create up front but we've now reverted to one of the evil constructor overloads which is most certainly not what we want - just believe me on that!
The way to get around this is to use the XmlInclude attribute on the Animal class which the XmlSerializer will use to ensure that, when an Animal is passed as the constructor argument that strongly typed serializers are also created for any other included types. So we adorn the Animal class like so:
[XmlInclude(typeof(Human))]
[XmlInclude(typeof(Dog))]
[XmlRoot("Animal", Namespace = "urn-MarkItUp-Demo")]
public class Animal {
....
}
This means that we can revert to using the simple XmlSerializer constructor and get the benefits of type caching again.
Bypassing the simple constructors
In Visual Studio 2005 there is now a setting on the Project property pages under "Build" which allows you to specify that you want to create the strongly typed serializer classes at COMPILE TIME! This means that, when you compile your assembly, another assembly will be created and copied into your \bin folder which contains the strongly-typed reader and writer pairs for each class in your project.
Using this feature not only means that you can always use the simple constructor overload on XmlSerializer without even having to use the XmlInclude attributes but that you will also avoid the initial assembly creation which would normally occur.
How is this magic done?
The custom reader/writer classes are generated by a new tool called SGen.exe which, when you check the "automatically generate serialization assemblies" option on the projects pane, is configured to run as a post-build task.
By default, SGen will be kicked-off to create reader/writer pairs for each type in your input assembly so you should ensure that you are only using the option to automatically generate serialization assemblies for your "enity" projects. In the case where you don't require all classes within your project to go through this process you could consider not checking the "automatically generate serialization assemblies" option and manually invoking SGen from the post-build event and uing the \t command-line option to specify which type to generate code for.
Update: Joseph pointed me at this article which shows how to do serializer pre-gen in 1.x apps: http://support.microsoft.com/kb/872800