December 18, 2002
Code Generation Patterns
On TheServerSide there's an article about Model Driven Architecture, TheServerSide.com - MDA from a Developer's Perspective. This includes 10 Myths about Code Generation. I spent several years working for a company whose product relied on code generation, and we use quite a bit of it at work now so I have some thoughts myself about this issue.
In particular I don't agree with myth 7. I'll just use XML/XSLT and be done with it. The explanation for why this is a myth makes the assumption that you will be modifying the generated code and have to merge these changes in with any later regeneration. In my opinion this is the result of using a flawed pattern for generated code.
I believe the correct pattern is to never modify the generated code. Clearly you'll never be able to generate code that provides 100% of the functionality you need. The question is where to add the extra functionality. Instead of adding it by editing the generated code, add it by subclassing the generated code and overriding and adding methods where needed. When you have control over the framework and it is designed with this approach in mind the end result are much better than can be achieved with don't touch the code colored blue techniques you see in NetBeans, VisualStudio.NET and similar tools.
We use this scheme very successfully at work with our own XSLT based code generation system. Using XSLT really is an advantage over proprietary tools with their own little languages, each of which will have its own little bugs, with their lack of full access to the template source, and with their vendor lock in. The biggest remaining problem with code generation really remains for the programmers writing the templates. This is the fact that you're dealing with two control flows, the one of the templates used to generate the code, and the one of the code being generated.
Posted by Alex at December 18, 2002 08:37 PM
I have to agree with you. XSLT is far from perfect, but it fits the "steam shovel vs. hydraulic excavator" criterion of a disruptive technology. It's worse at traditional text manipulations, but it's incredibly good at many tree-based transformations, and these are exactly the sort of transformations that will be needed for converting relatively flat data structures into classes and methods.
The common language aspect is critical as well; XSLT and XPath are used all over for XML/HTML manipulations, so there's relatively little learning involved. Consider that a steam shovel might be great at digging holes, but a hydraulic-equipped front end loader can also move dirt, plow driveways, and in a pinch, pull a car out of a snow drift.
Plus, it can be written in a streaming-friendly fashion, so very large files can be constructed. Steam shovels just sit in one place, and have to be moved and setup again when they reach the limits of the hole. To a great extent, you can bring a front-end loader to the work, so it's easy to dig large shallow holes as well as deep holes.
There's an escape to Java for anything that XSLT does poorly, so it really is a great language for code generation. You can even generate XSLT stylesheets using XSLT, so some kinds of code generation could actually be written as code generator generation -- I never meta a template I didn't like....
If we can't have macros in Java, XSLT is a good substitute, and one with a shallower learning curve as well.
The "two control flows" has always been the problem. At ETI, it was always, "do we do this in the code or in the templates?" The Right Thing To Do, of course, was to do it in the templates, because then the code becomes extremely simple; so simple, in fact, that people can actually understand and use it. Unfortunately, most of ETI's latest code has a ton of run-time decisions that could have been taken during generation.
Ken: Obviously I agree with using XSLT/XPath. The fact that it's standard gives many benefits; there are books available, there are multiple implementations so no lock-in and a better chance of there being fewer bugs.
Glen: I think the biggest problem with the ETI generated code is in extending it. Some of that's down to the languages being generated not supporting overriding but looking back I think there's something that could have been done.