Hidden Variables

Domenic's blog about coding and stuff

Strict Mode = Static Scoping

| Comments

It is indeed time to start using JavaScript's strict mode. There are many reasons, but one of the most compelling is that it brings sanity to JavaScript's scoping rules, by guaranteeing static scoping.

Simply put, code is statically scoped if you can statically analyze it and determine what all the identifiers refer to. In other words, you can statically determine where every variable was declared. As we'll see, JavaScript's sloppy mode does not have this property, giving you yet one more reason to shun it in favor of "use strict".

Sloppy Mode = Dynamic Scoping

Most of the time, JavaScript scoping is fairly simple. You look up the scope chain, as declared in the source code; if you can't find something, it must be on the global object. But in sloppy mode, there are several situations that can foil this algorithm.

Use of with

Using the with statement completely destroys the sanity of your scope chain:

var __filename = "/my/cool/file.js";
var anotherContext = { __filename: "/another/file.js", __dirname: "/another" };

var context = Math.random() > 0.5 ? anotherContext : {};
with(context) {
  console.log(__filename);
}

In this example, we can't statically determine if console.log(__filename) is referring to the free __filename variable, set to "/my/cool/file.js", or if it's referring to the property of anotherContext, set to "/another/file.js".

Strict mode fixes this by banning with entirely. Thus, the above code would be a syntax error if it were placed in a strict context.

Use of eval

Using eval in sloppy mode will introduce new variable bindings into your scope chain:

function require(moduleId) {
    // loading code elided.
}

function requireStuff() {
  if (Math.random() > 0.5) {
    eval("function require(things) { console.log('We require more ' + things); }");
  }

  require("minerals");
}

In this example, we have a similar problem as before: eval might have dynamically introduced a new variable binding. Thus, require can refer either to the new function introduced by eval into requireStuff's scope, or to the function declared in the outer scope. We just can't know, until runtime!

Strict mode fixes this by disallowing eval from introducing new bindings. Thus, if the above code were strict, require("minerals") would always refer to the outer module-loading require function.

Why Does This Matter?

In addition to the obvious implications of this for optimization and inlining in JITers, this matters because static analysis is becoming increasingly important in modern, complex JavaScript.

For example, let's say you were writing a tool for using Node.js-style modules in the browser. To do so, you'd probably need to detect require calls; in particular, if you see require, you'll want to know what scope that require comes from: global or local? Similarly, you might want to detect references to __filename, __dirname, Buffer, etc. and make them available if they're detected.

But in sloppy mode, it is literally unknowable what a given variable refers to. At any time, an enclosing with or a nearby eval could come along, and really ruin your day. In such a setting, static analysis is doomed; as we've seen in the above examples, the meaning of identifiers like require or __filename can't be determined until runtime.

So? Just Don't Use Those Things

A common refrain from people who can't handle typing “use strict” is that they'll simply not use these features. And this indeed suffices: if you subset the language, perhaps using tools like JSHint to enforce your subsetting rules, you can create a more sane programming environment.

Similar arguments are applied commonly in other languages, like prohibiting the use of exceptions or templates in C++. Even telling people to not pass expressions to their require calls in Node.js modules falls under this category (with the rationale that this breaks the popular browserify tool).

I don't buy these arguments. A language should give its users built-in tools to use it correctly. In the case of JavaScript, there is one very clear tool that has been given: a simple, backward-compatible "use strict" pragma at the top of your source files. If you think that's difficult, try being a C++ programmer and writing exception-safe code: the techniques you need to use are a lot more involved than a single pragma.

Use Strict Mode

In the words of Mark Miller, ECMAScript 5 strict mode has transitioned JavaScript into the “actually good” category of programming languages. Let's use it. Opt in to static scoping, and a saner language in general. Use strict.

This post was inspired by a long-ago es-discuss thread, which references a talk by Mark Miller. Further clarifications were had in another, recent es-discuss thread.

Peer Dependencies

| Comments

npm is awesome as a package manager. In particular, it handles sub-dependencies very well: if my package depends on request version 2 and some-other-library, but some-other-library depends on request version 1, the resulting dependency graph looks like:

├── request@2.12.0
└─┬ some-other-library@1.2.3
  └── request@1.9.9

This is, generally, great: now some-other-library has its own copy of request v1 that it can use, while not interfering with my package's v2 copy. Everyone's code works!

The Problem: Plugins

There's one use case where this falls down, however: plugins. A plugin package is meant to be used with another “host” package, even though it does not always directly use the host package. There are many examples of this pattern in the Node.js package ecosystem already:

Even if you're not familiar with any of those use cases, surely you recall “jQuery plugins” from back when you were a client-side developer: little <script>s you would drop into your page that would attach things to jQuery.prototype for your later convenience.

In essence, plugins are designed to be used with host packages. But more importantly, they're designed to be used with particular versions of host packages. For example, versions 1.x and 2.x of my chai-as-promised plugin work with chai version 0.5, whereas versions 3.x work with chai 1.x. Or, in the faster-paced and less-semver–friendly world of Grunt plugins, version 0.3.1 of grunt-contrib-stylus works with grunt 0.4.0rc4, but breaks when used with grunt 0.4.0rc5 due to removed APIs.

As a package manager, a large part of npm's job when installing your dependencies is managing their versions. But its usual model, with a "dependencies" hash in package.json, clearly falls down for plugins. Most plugins never actually depend on their host package, i.e. grunt plugins never do require("grunt"), so even if plugins did put down their host package as a dependency, the downloaded copy would never be used. So we'd be back to square one, with your application possibly plugging in the plugin to a host package that it's incompatible with.

Even for plugins that do have such direct dependencies, probably due to the host package supplying utility APIs, specifying the dependency in the plugin's package.json would result in a dependency tree with multiple copies of the host package—not what you want. For example, let's pretend that winston-mail 0.2.3 specified "winston": "0.5.x" in its "dependencies" hash, since that's the latest version it was tested against. As an app developer, you want the latest and greatest stuff, so you look up the latest versions of winston and of winston-mail, putting them in your package.json as

{
  "dependencies": {
    "winston": "0.6.2",
    "winston-mail": "0.2.3"
  }
}

But now, running npm install results in the unexpected dependency graph of

├── winston@0.6.2
└─┬ winston-mail@0.2.3
  └── winston@0.5.11

I'll leave the subtle failures that come from the plugin using a different Winston API than the main application to your imagination.

The Solution: Peer Dependencies

What we need is a way of expressing these “dependencies” between plugins and their host package. Some way of saying, “I only work when plugged in to version 1.2.x of my host package, so if you install me, be sure that it's alongside a compatible host.” We call this relationship a peer dependency.

The peer dependency idea has been kicked around for literally years. After volunteering to get this done “over the weekend” nine months ago, I finally found a free weekend, and now peer dependencies are in npm!

Specifically, they were introduced in a rudimentary form in npm 1.2.0, and refined over the next few releases into something I'm actually happy with. Today Isaac packaged up npm 1.2.10 into Node.js 0.8.19, so if you've installed the latest version of Node, you should be ready to use peer dependencies!

As proof, I present you the results of trying to install jitsu 0.11.6 with npm 1.2.10:

npm ERR! peerinvalid The package flatiron does not satisfy its siblings' peerDependencies requirements!
npm ERR! peerinvalid Peer flatiron-cli-config@0.1.3 wants flatiron@~0.1.9
npm ERR! peerinvalid Peer flatiron-cli-users@0.1.4 wants flatiron@~0.3.0

As you can see, jitsu depends on two Flatiron-related packages, which themselves peer-depend on conflicting versions of Flatiron. Good thing npm was around to help us figure out this conflict, so it could be fixed in version 0.11.7!

Using Peer Dependencies

Peer dependencies are pretty simple to use. When writing a plugin, figure out what version of the host package you peer-depend on, and add it to your package.json:

{
  "name": "chai-as-promised",
  "peerDependencies": {
    "chai": "1.x"
  }
}

Now, when installing chai-as-promised, the chai package will come along with it. And if later you try to install another Chai plugin that only works with 0.x versions of Chai, you'll get an error. Nice!

One piece of advice: peer dependency requirements, unlike those for regular dependencies, should be lenient. You should not lock your peer dependencies down to specific patch versions. It would be really annoying if one Chai plugin peer-depended on Chai 1.4.1, while another depended on Chai 1.5.0, simply because the authors were lazy and didn't spend the time figuring out the actual minimum version of Chai they are compatible with.

The best way to determine what your peer dependency requirements should be is to actually follow semver. Assume that only changes in the host package's major version will break your plugin. Thus, if you've worked with every 1.x version of the host package, use "~1.0" or "1.x" to express this. If you depend on features introduced in 1.5.2, use ">= 1.5.2 < 2".

Now go forth, and peer depend!

You’re Missing the Point of Promises

| Comments

This post originally appeared as a gist. Since then, the development of Promises/A+ has made its emphasis on the Promises/A spec seem somewhat outdated.

Promises are a software abstraction that makes working with asynchronous operations much more pleasant. In the most basic definition, your code will move from continuation-passing style:

getTweetsFor("domenic", function (err, results) {
  // the rest of your code goes here.
});

to one where your functions return a value, called a promise, which represents the eventual results of that operation.

var promiseForTweets = getTweetsFor("domenic");

This is powerful since you can now treat these promises as first-class objects, passing them around, aggregating them, and so on, instead of inserting dummy callbacks that tie together other callbacks in order to do the same.

I've talked about how cool I think promises are at length. This essay isn't about that. Instead, it's about a disturbing trend I am seeing in recent JavaScript libraries that have added promise support: they completely miss the point of promises.

Thenables and CommonJS Promises/A

When someone says “promise” in a JavaScript context, usually they mean—or at least think they mean—CommonJS Promises/A. This is one of the smallest “specs” I've seen. The meat of it is entirely about specifying the behavior of a single function, then:

A promise is defined as an object that has a function as the value for the property then:

then(fulfilledHandler, errorHandler, progressHandler)

Adds a fulfilledHandler, errorHandler, and progressHandler to be called for completion of a promise. The fulfilledHandler is called when the promise is fulfilled. The errorHandler is called when a promise fails. The progressHandler is called for progress events. All arguments are optional and non-function values are ignored. The progressHandler is not only an optional argument, but progress events are purely optional. Promise implementors are not required to ever call a progressHandler (the progressHandler may be ignored), this parameter exists so that implementors may call it if they have progress events to report.

This function should return a new promise that is fulfilled when the given fulfilledHandler or errorHandler callback is finished. This allows promise operations to be chained together. The value returned from the callback handler is the fulfillment value for the returned promise. If the callback throws an error, the returned promise will be moved to failed state.

People mostly understand the first paragraph. It boils down to callback aggregation. You use then to attach callbacks to a promise, whether for success or for errors (or even progress). When the promise transitions state—which is out of scope of this very small spec!—your callbacks will be called. This is pretty useful, I guess.

What people don't seem to notice is the second paragraph. Which is a shame, since it's the most important one.

What Is the Point of Promises?

The thing is, promises are not about callback aggregation. That's a simple utility. Promises are about something much deeper, namely providing a direct correspondence between synchronous functions and asynchronous functions.

What does this mean? Well, there are two very important aspects of synchronous functions:

  • They return values
  • They throw exceptions

Both of these are essentially about composition. That is, you can feed the return value of one function straight into another, and keep doing this indefinitely. More importantly, if at any point that process fails, one function in the composition chain can throw an exception, which then bypasses all further compositional layers until it comes into the hands of someone who can handle it with a catch.

Now, in an asynchronous world, you can no longer return values: they simply aren't ready in time. Similarly, you can't throw exceptions, because nobody's there to catch them. So we descend into the so-called “callback hell,” where composition of return values involves nested callbacks, and composition of errors involves passing them up the chain manually, and oh by the way you'd better never throw an exception or else you'll need to introduce something crazy like domains.

The point of promises is to give us back functional composition and error bubbling in the async world. They do this by saying that your functions should return a promise, which can do one of two things:

  • Become fulfilled by a value
  • Become rejected with an exception

And, if you have a correctly implemented then function that follows Promises/A, then fulfillment and rejection will compose just like their synchronous counterparts, with fulfillments flowing up a compositional chain, but being interrupted at any time by a rejection that is only handled by someone who declares they are ready to handle it.

In other words, the following asynchronous code:

getTweetsFor("domenic") // promise-returning function
  .then(function (tweets) {
    var shortUrls = parseTweetsForUrls(tweets);
    var mostRecentShortUrl = shortUrls[0];
    return expandUrlUsingTwitterApi(mostRecentShortUrl); // promise-returning function
  })
  .then(httpGet) // promise-returning function
  .then(
    function (responseBody) {
      console.log("Most recent link text:", responseBody);
    },
    function (error) {
      console.error("Error with the twitterverse:", error);
    }
  );

parallels* the synchronous code:

try {
  var tweets = getTweetsFor("domenic"); // blocking
  var shortUrls = parseTweetsForUrls(tweets);
  var mostRecentShortUrl = shortUrls[0];
  var responseBody = httpGet(expandUrlUsingTwitterApi(mostRecentShortUrl)); // blocking x 2
  console.log("Most recent link text:", responseBody);
} catch (error) {
  console.error("Error with the twitterverse: ", error);
}

Note in particular how errors flowed from any step in the process to our catch handler, without explicit by-hand bubbling code. And with the upcoming ECMAScript 6 revision of JavaScript, plus some party tricks, the code becomes not only parallel but almost identical.

That Second Paragraph

All of this is essentially enabled by that second paragraph:

This function should return a new promise that is fulfilled when the given fulfilledHandler or errorHandler callback is finished. This allows promise operations to be chained together. The value returned from the callback handler is the fulfillment value for the returned promise. If the callback throws an error, the returned promise will be moved to failed state.

In other words, then is not a mechanism for attaching callbacks to an aggregate collection. It's a mechanism for applying a transformation to a promise, and yielding a new promise from that transformation.

This explains the crucial first phrase: “this function should return a new promise.” Libraries like jQuery (before 1.8) don't do this: they simply mutate the state of the existing promise. That means if you give a promise out to multiple consumers, they can interfere with its state. To realize how ridiculous that is, consider the synchronous parallel: if you gave out a function's return value to two people, and one of them could somehow change it into a thrown exception! Indeed, Promises/A points this out explicitly:

Once a promise is fulfilled or failed, the promise's value MUST not be changed, just as a values in JavaScript, primitives and object identities, can not change (although objects themselves may always be mutable even if their identity isn't).

Now consider the last two sentences. They inform how this new promise is created. In short:

  • If either handler returns a value, the new promise is fulfilled with that value.
  • If either handler throws an exception, the new promise is rejected with that exception.

This breaks down into four scenarios, depending on the state of the promise. Here we give their synchronous parallels so you can see why it's crucially important to have semantics for all four:

  1. Fulfilled, fulfillment handler returns a value: simple functional transformation
  2. Fulfilled, fulfillment handler throws an exception: getting data, and throwing an exception in response to it
  3. Rejected, rejection handler returns a value: a catch clause got the error and handled it
  4. Rejected, rejection handler throws an exception: a catch clause got the error and re-threw it (or a new one)

Without these transformations being applied, you lose all the power of the synchronous/asynchronous parallel, and your so-called “promises” become simple callback aggregators. This is the problem with jQuery's current “promises”: they only support scenario 1 above, omitting entirely support for scenarios 2–4. This was also the problem with Node.js 0.1's EventEmitter-based “promises” (which weren't even thenable).

Furthermore, note that by catching exceptions and transforming them into rejections, we take care of both intentional and unintentional exceptions, just like in sync code. That is, if you write aFunctionThatDoesNotExist() in either handler, your promise becomes rejected and that error will bubble up the chain to the nearest rejection handler just as if you had written throw new Error("bad data"). Look ma, no domains!

So What?

Maybe you're breathlessly taken by my inexorable logic and explanatory powers. More likely, you're asking yourself why this guy is raging so hard over some poorly-behaved libraries.

Here's the problem:

A promise is defined as an object that has a function as the value for the property then

As authors of Promises/A-consuming libraries, we would like to assume this statement to be true: that something that is “thenable” actually behaves as a Promises/A promise, with all the power that entails.

If you can make this assumption, you can write very extensive libraries that are entirely agnostic to the implementation of the promises they accept! Whether they be from Q, when.js, or even WinJS, you can use the simple composition rules of the Promises/A spec to build on promise behavior. For example, here's a generalized retry function that works with any Promises/A implementation.

Unfortunately, libraries like jQuery break this. This necessitates ugly hacks to detect the presence of objects masquerading as promises, and who call themselves in their API documentation promises, but aren't really Promises/A promises. If the consumers of your API start trying to pass you jQuery promises, you have two choices: fail in mysterious and hard-to-decipher ways when your compositional techniques fail, or fail up-front and block them from using your library entirely. This sucks.

The Way Forward

So this is why I want to avoid an unfortunate callback aggregator solution ending up in Ember. That's why I wrote this essay. And that's why, in the hours following writing the original version of this essay, I worked up a general Promises/A compliance suite that we can all use to get on the same page in the future.

Since the release of that test suite, great progress has been made in promise interoperability and understanding. One library, rsvp.js, was released with the explicit goal of providing these features of Promises/A. Others followed suit. But the most exciting result was the formation of the Promises/A+ organization, a loose coalition of implementors who have produced the Promises/A+ specification extending and clarifying the prose of the original Promises/A spec into something unambiguous and well-tested.

There's still work to be done, of course. Notably, at current time of writing, the latest jQuery version is 1.9.1, and its promises implementation is completely broken with regard to the error handling semantics. Hopefully, with the above explanation to set the stage and the Promises/A+ spec and test suite in place, this problem can be corrected in jQuery 2.0.

In the meantime, here are the libraries that conform to Promises/A+, and that I can thus unreservedly recommend:

  • Q by Kris Kowal and myself: a full-featured promise library with a large, powerful API surface, adapters for Node.js, progress support, and preliminary support for long stack traces.
  • RSVP.js by Yehuda Katz: a very small and lightweight, but still fully compliant, promise library.
  • when.js by Brian Cavalier: an intermediate library with utilities for managing collections of eventual tasks, as well as support for both progress and cancellation.

If you are stuck with a crippled “promise” from a source like jQuery, I recommend using one of the above libraries' assimilation utilities (usually under the name when) to convert to a real promise as soon as possible. For example:

var promise = Q.when($.get("https://github.com/kriskowal/q"));
// aaaah, much better

Portable Node.js Code

| Comments

This post originally appeared as a gist, and then on the Node on Azure blog.

Node.js core does its best to treat every platform equally. Even if most Node developers use OS X day to day, some use Windows, and most everyone deploys to Linux or Solaris. So it's important to keep your code portable between platforms, whether you're writing a library or an application.

Predictably, most cross-platform issues come from Windows. Things just work differently there! But if you're careful, and follow some simple best practices, your code can run just as well on Windows systems.

Paths and URLs

On Windows, paths are constructed with backslashes instead of forward slashes. So if you do your directory manipulation by splitting on "/" and playing with the resulting array, your code will fail dramatically on Windows.

Instead, you should be using the path module. So instead of resolving paths with string contatenation, e.g. x + "/" + y, you should instead do path.resolve(x, y). Similarly, instead of relativizing paths with string replacement, e.g. x.replace(/^parent\/dirs\//, ""), you should do path.relative("/parent/dirs", y).

Another area of concern is that, when writing portable code, you cannot count on URLs and module IDs having the same separators as paths. If you use something like path.join on a URL, Windows users will get URLs containing backslashes! Similarly for path.normalize, or in general any path methods. All this applies if you're working with module IDs, too: they are forward-slash delimited, so you shouldn't use path functions with them either.

Non-portable APIs

Windows is completely missing the process.(get|set)(gid|uid) methods, so calling them will instantly crash your program on Windows. Always guard such calls with a conditional.

The fs.watchFile API is not sufficiently cross-platform, and is recommended against in the docs because of it. You should use fs.watch instead.

The child_process module requires care cross-platform. In particular, spawn and execFile do not execute in a shell, which means that on Windows only .exe files will run. This is rather problematic, as many cross-platform binaries are included on Windows as .cmd or .bat files, among them Git, CouchDB, and many others. So if you're using these APIs, things will likely work great on OS X, Linux, etc. But when you tell your users “just install the Git build for Windows, and make sure it's in your path!” that ends up not being sufficient. There is talk of fixing this behavior in libuv, but that's still tentative. In the meantime, if you don't need to stream your output, exec works well. Otherwise you'll need branching logic to take care of Windows.

A final edge-case comes when using named sockets, e.g. with net.connect. On Unix, simple filenames suffice, but on Windows, they must conform to a bizarre syntax. There's not really a better solution for this than branching per-platform.

Being Windows-Developer Friendly

One of the most egregious problems with many projects is their unnecessary use of Unix Makefiles. Windows does not have a make command, so the tasks stored in these files are entirely inaccessible to Windows users who might try to contribute to your project. This is especially annoying if you put your test command in there!

Fortunately, we have a solution: npm comes with a scripts feature where you can include commands to be run for testing (test), installation (install), building (prepublish), and starting your app (start), among many others. You can also create custom scripts, which are then run with npm run <script-name>; I often use this for lint steps. Also of note, you can reference any commands your app depends on by their short names here: for example, "mocha" instead of "./node_modules/.bin/mocha". So, please use these! If you must have a Makefile for whatever reason, just have it delegate to an npm script.

Another crucially important step is not using Unix shell scripts as part of your development process. Windows doesn't have bash, or ls, or mv, or any of those other commands you might use. Instead, write your shell scripts in JavaScript, using a tool like Grunt if you'd like.

Bonus: Something that Breaks on Linux and Solaris!

Both Windows and, by default, OS X, use case-insensitive file systems. That means if you install a package named foo, any of require("foo") or require("FOO") or require("fOo") will work—on Windows and OS X. But then when you go to deploy your code, out of your development environment and into your Linux or Solaris production system, the latter two will not work! So it's a little thing, but make sure you always get your module and package name casing right.

Conclusion

As you can see, writing cross-platform code is sometimes painful. Usually, it's just a matter of best practices, like using the path module or remembering that URLs are different from filesystem paths. But sometimes there are APIs that just don't work cross-platform, or have annoying quirks that necessitate branching code.

Nevertheless, it's worth it. Node.js is the most exciting software development platform in recent memory, and one of its greatest strengths is its portable nature. Try your best to uphold that!