Why you should write comments before code.
I’ve blogged before that I think the way a developer names blocks of their code says a lot about their experience, not only in a particular platform, but also what they know about the Software Development Life Cycle. But I don’t want to rehash that. Today, I want to talk about when and why to comment. When I create a new class definition, I typically try to add a summary comment right then. I also expect this of my team members. It may seem pedantic, but there’s a very good reason for it. Expressing in English* the intent of the class tends to help me clarify the Single Responsibility that I have for that class. If I find that I have trouble specifying the expected behavior of a class in the summary, that generally means I don’t have the plan for why/how/when I need to use that class. If I find I need a paragraph to explain the purpose of the class, it might be doing too much.
Doing too much is bad.
So, when I find that this is the case, I look to see which parts of the summary should be broken out — those breakouts typically fall along class definition lines, and before I know it, I can summarize most classes in one or two sentences, and the only do as much as they should.
Only doing as much as you should is “A Good Thing.”
The other benefit is that my colleagues can take a look at my code and understand large blocks without digging though and inspecting each member of the class. Truly, that was the point of OOP. Rather than trying to explain how each block of code works (inline comments), explain why each block of code exists (first-class documentation). This rule of thumb works the same for members of a class. If you can type out a sentence explaining what you’re trying to do, there’s a much better chance that you will actually do that thing. There’s even a good chance that you’ll be able to identify things that are leaking in and shouldn’t be. If When your code has gotchas, these should be outlined in the block, or the blocks. This is your chance to explain yourself. Do you need that string to be formatted just so? Do you return something in a particular order? What do you expect from consumers? While I’m on this subject, please don’t just fill out comments to make the compiler warnings go away. That’s just a waste of your time.. I am aware of a particular tool that provides phantom summaries; summaries that are there, but have no substance. The intentions were good, but it entirely misses the point. Imagine you bought a dictionary and all of the definitions used the word in the definition.
Definition: A sentence that defines something.
C# and Java both have great documentation support, I believe that other languages/platforms have started to make these summary-type comment sections closer to “first-class citizens,” so your reasons for not commenting code is dwindling. I’ll leave you with this: Every line of code makes perfect sense while you’re writing it. Really good code will make sense to other people with a little bit of explanation. Great code is that which can be understood by your future self, and the people that follow you. Why not hedge your bets and provide some good documentation to make your good code, great code. * It doesn’t have to be English, the first language of your team is best for these comments.
The “Named Generic” Anti-pattern
Part One:
When it comes to code, I’m pretty particular about how things are named. To me, naming both classes and class members is one of the fine arts of software development. I actually think that this can demonstrate quite a bit about someone’s experience, what they understand about the software development life cycle, and ultimately, their conceptual understanding of Object-oriented design. That’s why this particular “code smell” bothers me so much.
I’ll call it the “Named Generic” pattern.
We’ve all seen it. A developer wanted to specify a typed parameter, but missed the boat.
For example:
public RecallList GetRecalls(AutoMakersList automakers)
{
// do something useful with each automaker and yield
//the recalls associated with them.
}
So far, this code isn’t bad. We have a decent method name, the parameter name is ok, but what are the two type definitions here:
public class AutoMakersList : List<String>{ /*no body*/ }
public class RecallList : List<String>{ /*no body*/ }
Uh oh.
With two classes I have simultaneously reduced flexibility, and increased complexity. From the perspective of the consumer of this method, I now have to marshal a set of strings, then add them to a new class called AutoMakersList, which I had to search out and attempt to understand.
From the API designer’s perspective, I’ve placed some requirement on what is legal to pass in. Except I haven’t. The list is still of string, and the last time I checked, there were no validation methods on string that validate they are auto maker’s names (C# 5.0, maybe?). So I’ve really just obfuscated what I wanted to happen, which was this:
"hand me an enumerable of validated automakers"
The same could be done with this method signature:
///<summary>This will produce the recalls associated with the specified automakers.</summary>
///<param name="automakers">This is a pre-validated set of automakers
/// for which recalls will be retrieved. Valid automakers
/// follow these rules: .....
///</param>
public RecallList GetRecalls(IEnumerable<String> automakers)
{
// iterate over each automaker, and do something
// interesting to yield out their recalls
}
In Visual Studio (and perhaps MonoDevelop?), I’ve now told the API consumer what I expect, they will get intellisense when they’re constructing the call. Whereas, if I just told you to hand me an AutoMakersList, there’s ambiguity in what is required. The other benefit of this approach is that I’ve reduced the calling requirements on this method. The above example is actually not done yet, and here’s where it will seemlike I’m contradicting my point, but I’m not, really.. really.
public class Automaker
{
public String Name{get; set;}
public bool IsValid(){return valid;}
}
public RecallList GetRecalls(IEnumerable<Automaker> automakers)
{
//do something interesting.
}
Instead of passing in just a a list of string, why not pass in “Automaker?” On the surface, it seems very much like just passing a simple String, the difference is that I have attached the context explicitly to the Name property, instead of implicitly from the name of the collection in which the object was stored. Let that marinate in your brain for a minute. They’re actually radically different concepts, one of them works, and IMHO, one of them doesn’t.
Part two:
“RecallList”
Here, the API designer got it half right. The the context for each recall is explicitly attached to the object that cares – “Recall”, but RecallList doesn’t actually add any value, it essentially says “this is a list of Recall”, which is the same thing as what “List<Recall>” says, in a much more concise way. Although I think Classes are cheap cheap cheap and people are too often reluctant to add them, in this case “RecallList” is just redundant.
Finally, passing a list in or out adds extra overhead that neither the API designer or the caller needs. In both cases, List places extra requirements on either side of the call, when really everyone meant to say, “here’s a set of something that you can read through. (IEnumerable)” When we apply all of these suggestions together, this is the method we’re left with:
///<summary>This will produce the recalls associated
/// with the specified automakers.</summary>
///<param name="automakers">This is a set of automakers for
/// which recalls will be retrieved.</param>
public IEnumerable<Recall> GetRecalls(IEnumerable<Automaker> automakers)
{
// iterate over each automaker, use the "IsValid()" method
// to determine if it should be processed, and do something
// interesting to yield out the recalls
}
Hopefully the above snippet makes some sense and shows why we should fight the urge to add classes that don’t bring their own “flavor” to the application.
Behaving badly in Public.
One of the key design decisions when writing any code is what level of access you should give each component. In general, my rule of thumb is “As little access as possible.” What this means is that I don’t want to make any class public that doesn’t need to be, and I don’t want to make any members public that are implementation details. In NoRM, we’ve tried to be judicious about what makes it into the Public API, there’s a number of reasons for this:
- KISS, presenting 20 classes to the user when there are only a handful of relevant ones makes for a confusing introduction. NoRM effectively has 4 major and immediately relevant classes: “Mongo”,”MongoDatabase”,”MongoCollection”,”MongoConfiguration”, telling the user about the other X classes that make everything happen is an over-share. I don’t like over-shares.
- Making things public that shouldn’t be increases the likelihood that people can break your code in ways that you can’t conceive of. If one controls how code is accessed, there’s a whole class of issues that just cannot happen.
- A responsible project will understand that the software they’re helping to form will be used by other people. By making everything public, you’re implicitly giving license to consumers to use it however they wish, likely in scenarios that you couldn’t plan for. This may sound good, but when the class they use is for mainly infrastructure purposes and the project maintainers want to implement the feature in a different way, they’ll likely break the consumer.
- The code is Free and Open, if a consumer really *must* use some class that the project maintainers marked as private/protected/internal, then it’s a simple matter of going in and marking it public. I believe this has the added benefit causing people to pause and ask the question: “Why was this marked in such a way as to prevent access to me — maybe the designers had a good reason?”
- A corollary to the last point is that if there is a true roadblock in how someone wants to use the existing classes and codebase, we want to hear from them in the google groups. We want to know how people are using the library. We want to add the features they need. I should note that I don’t think we’re infallible on this stuff, there are a number of features in NoRM that I think are wonderful, but had it been me alone working on the project, those features wouldn’t have been implemented. The point is, try to contribute your ideas and features of NoRM, don’t roll your own, and then get broken the next time there’s a release.
NoRM: A fantastic friction-free interface to MongoDB
As promised in my previous post, I am going to introduce you to a project that I’ve been working on with a great team of people on GitHub. NoRM is a .net library to interact with the document-oriented database MongoDB. We set about doing this in a way that makes sense for the C# developer who doesn’t want to spend an inordinate amount of time configuring the database. As you’ll see in a moment, with NoRM, there’s very little you need to do to get started with MongoDB. NoRM stands for “No Object-Relational Mapping” - it seems that people are concerned about leaving relational databases because they’ll lose the low-friction environments they’ve come to expect (think ‘LINQ-to-SQL’). Another concern around moving to a NoSQL option is the notion that these datastores carry little or no structure. By creating a strongly-typed interface to MongoDB, I feel that we have addressed both of these concerns. So, just to whet your appetite, here’s an example of how you’d use NoRM to store some widgets.
//First, define your document (this can be very similar to your concept of "Model"
// notice there's no special attributes or configuration.)
public class Widget
{
public ObjectId Id {get;set;}
public String Color {get;set;}
public double Price {get;set;}
public DateTime ReleaseDate {get;set;}
public IEnumerable Reviews {get;set;}
}
//Next, spool up a connection to your database
//(The DB doesn't have to exist yet, but MongoDB DOES need to be running)
using(var db = Mongo.Create("mongo://localhost/ProductDB")
{
//Get a reference to the collection in which we want to
//store our Widgets (doesn't have to exist yet.)
var widgets = db.GetCollection();
//create a widget instance.
var topSellingWidget = new Widget{ Id = ObjectId.NewObjectId(),
Color = "Red", Price = 39.95,
ReleaseDate = DateTime.Now, Reviews = Enumerable.Empty() };
//now, save the instance
widgets.Save(topSellingWidget);
//lastly, retrieve it from the DB.
var hydratedTopSellingWidgetFromDB = widgets.FindOne(new {Color = "Red",
Price = 39.95});
}
That’s just a taste of the simplicity that is NoRM, there’s a huge amount of functionality that I’m not covering including:
- Fluent configuration
- Solid LINQ support
- Advanced update capabilities (update single or multiple documents that match template documents)
- Map/Reduce functionality.
- Support for MongoDB operators via the “Q” (Qualifiers) and “M” (Modifiers) classes.
- If you need it, “Weakly-typed” interaction via the “Expando” class.
- Bulk-insert capability (a la SqlBulkCopy)
Aside from actual “features”, there are lots of elements that make software good, here’s a few things I think make NoRM awesome:
- Tests: We have more than 400 tests that verify the functionality found in NoRM, and although we have just reached the v0.9.0 milestone, the library is very stable, and I am aware of production deployments using NoRM, today.
- Participation: I started NoRM in the last few days of January 2010, and have seen incredible participation from the Open Source community - I’ve learned a lot about what people do — and don’t — need in a library, and some of the interesting pieces of helping to shape an Open Source project (hopefully I can share that in another blog post). We have a vibrant community athttp://groups.google.com/group/norm-mongodb.
- Pragmatism and Experience: I am routinely impressed by the ideas and code that the project’s contributors bring forward.
Please download and use the library:
- Stable Milestone:http://github.com/atheken/NoRM/zipball/v0.9.0,
- Project Page (the master branch will always “Stable”):http://github.com/atheken/norm.
Remember that we want NoRM to be the best C# driver for MongoDB possible, so please give us feedback, either in theGoogle Group or follow me on twitter (@atheken). Cheers.
No SQL, No Problems (or: Mo’ SQL, Mo’ Problems)
Let’s talk about NoSQL: If you’ve been following the latest trends in the .net world via reddit, twitter, or your favorite blogs, you’ve been seeing a great deal of chatter about a term called NoSQL. Some might even call it a ‘movement’ but that scares me just a little bit - when things become ‘movements’, their meanings become a bit nebulous. Here are a few cases where a ‘NoSQL’ solution might be useful to you:
- You have a several multi-million row tables that each have foreign key relationships to one another, joins against these tables pound the server, but must return rapidly.
- You have “jagged” datasets where each record has a different composition of fields and children, those children in turn have their own properties.
- You need to find a root record based on a complex set of relations to other tables.
- Dynamic creation and allocation of databases and document collections (including the capability of having nested collections)
- Ability to do deep-graph searches (i.e. locate documents based on child properties of the document)
- Ability to execute arbitrary JavaScript functions for aggregation and as criteria of searching
- Very few requirements on what can be inserted into a database
- Support for regex searching.
- Speed
- …more…
Let me break it to you: NoSQL doesn’t mean NoStructure.To that end, I’ve been (with lots of help) incubating a project on GitHub called NoRM. In brief, NoRM is a library to connect to MongoDB using .Net (and Mono) and to query and hydrate documents into strongly & statically-typed documents in a way that doesn’t make us C# developers queasy. We still have a long way to go before a 1.0 release, but I think it’s important to point out some of the guys that have been most enthusiastic about this project (and contributed some great code and ideas) include: James Avery Jason Alexander Rob Conery Nate Kohari Karl Seguin These guys really know their stuff, and I am so happy to have them working on this with me. My next post will be about some of the design decisions we’ve made with NoRM and how you might use it (soon) in some of your own projects.