Node.js App Basics - Part 3

20 Sep 2013

Feel free to review the other posts in this series:

==

Alright, so now we’re getting into the fun stuff!

Asynchronicity

Node.js is a great example where chosing to rigorously stick to a constraint can really pay off.

As you might know, JavaScript is, by definition, single-threaded (for our purposes, accept this as true). What this means is that you can do the following:

//WARNING: You should NEVER do this 
//in javascript! (and maybe nowhere)

//set up JS to fire an event after 1 millisecond...
setTimeout(function(){
	console.log('called!');
}, 1);

while(true){
	//do nothing, but watch your CPU peg! ;-)
}

And the function with the log call will never get called, simply because the while loop is running in a single thread, and the “event” that should fire after one millisecond will not happen until after the while loop completes.

This behavior makes working with javascript conceptually very easy, as you don’t need to deal with any pesky locking structures for things that might otherwise happen simultaneously. For example, within a function body, there’s no chance of shared state getting modified by another thread.

In Node.js, this means that all logic we write will run on a single thread, and due to the nature of node, this quality forces libraries to be developed in a non-blocking way. The above code snippet is “blocking” in that the while loop will continue to run indefinitely, and since we’ve only got one thread, this means all other work on this node process will also be halted (and this is basically true in the browser, too).

Let’s try to put this notion into perspective:

You go out to a nice, busy, restaurant. It’s a pretty fancy place, and only has few tables available. Unfortunately, you didn’t plan ahead, and all the tables are in use by other folks. The host is quite polite, and takes down your phone number. He’ll call you as soon as a table becomes available. This is great because you don’t have to stand by the door, waiting. Instead, you’re freed up to go and have a drink at the bar. This also allows other customers (who also didn’t plan ahead) to come in and ask to be called when a table is available. Eventually, a table frees up, and the host calls you back to let you know you can now be seated for dinner. This system works well for both you and the restaurant, they can serve customers very efficiently, and you can enjoy a drink at the bar while waiting (instead of just standing in the way, staring the host down!).

Whether you believe me yet, Node.js (and in general, asynchronous programming models) attempt to have the same interaction as described above. Instead of you, as a customer requesting a table, we can think of your node.js application making a request to an external resource (like a database, or the file system, or a web service). Most of the time this will introduce a lot of waiting around, and node.js has been optimized to make the best of this situation.

To simplify, this is the model that node is designed for:

  1. Node process makes request for “slow” resource, providing a “callback” function for when it completes.
  2. Node process continues to do other work, including additional requests to other resources.
  3. The “callback” function is called, with any potential error information, or response data if no error occurred.

Here’s a simple example using “nano,” a very nice and simple package for using CouchDB in node.

//assume "db" is a connection to a database.

//look up the record with "this_is_a_key" as the id.
db.get('this_is_a_key', function(error, data){
	if(error){
		console.log("An error occurred when looking "+
			"up the record for the key.", error);
	}
	else{
		console.log("Got some data, w00t!", data);
	}
});

(Don’t worry if you don’t understand all of this, the main point is to see how the callback works.)

The example is a bit contrived, but the point is that there are almost no “blocking” functions in javascript, instead, the preferred style for anything that would normally block is to provide a callback and then observe the data/error when it becomes available. Note that in many cases, it is possible to exclude a callback, this is sometimes called “fire and forget,” and there are cases when this is beneficial.

The above example works great for when you have a “one and done” requirement. But another common case is that I need to manipulate a large batch of records. Since reading data from an external source is (at least) 10x slower than reading it from memory, it might be nice to be able to do the manipulation on records as they become available.

Because node uses an “event loop” this ends up being very simple, and not much different than the example above.

//Assume a db created, and there's a 
//method that returns an "EventEmitter"

//get an object that will fire and 
//event each time a row is available.
var selection = db.getRows();

//listen for "row" events as the rows 
//become available and handle them.
selection.on('row', function(data){
	console.log('row became available', data);
});

//it is common with this sort of 
//model to listen for an "error" event separately.
selection.on('error', function(error){
	console.log('an error happend. darn!', error);	
});

The key difference here, is rather than having a single callback, we define callbacks to handle specific events that could happen once we cause the data selection action to occur. (See the EventEmitter docs here.) The event-based model is common in many languages, so this might already be familiar to you.

Understanding Asynchronicity is really important in node.js, and, is the main reason why a single node process is able to scale so well (though, v8 is also pretty fast, and serves as a good base from which to build). The “single non-blocking thread” constraint allows us to use our CPU very efficiently, but as you can see in the first code snippet above, it’s also really easy to do something that can get you into trouble.

Now that we’ve covered this fundamental part of node.js, we can get back to our wikipedia search app in the next installment.