dg.

Navigating files and directories with NodeJS

8 minutes / 1546 words

Listing files in directories has to be one of the most common actions that users perform. The behavior is virtually the same, from jumping on to a Unix system running ls to visually navigating through the GUI. However, the implementation can vary wildly.

A few approaches

There are plenty of examples on Stack Overflow as this topic seems to be rehashed every so often. For JavaScript development, the temptation to pull in a package from NPM is all too great. There are options to choose from as well. This totalist one is quite popular, with 1M+ weekly downloads, although the implementation can be improved on as we'll see below.

The answers on Stack Overflow are expectedly less polished than the libraries on NPM. A good amount of them will generally include reading directories synchronously, or calling fs.stat to check if an entry is a file or a directory, instead of relying on the output of fs.readdir. Many of the answers are also more convoluted, or provide an iterative solution despite the fact that a recursive one is more concise and has the ability to display the full directory tree structure. Articles written on this topic also tend to follow a similar pattern.

Reinventing the wheel

The question then is, can we do a little bit better (the answer is yes) here by:

11. Listing the full tree structure of a folder
22. Doing it in an asynchronous, non-blocking way
33. Use only native JavaScript (NodeJS) functionality (no npm packages!)

Take a look at the following implementation. It requires NodeJS v11.0.0+ and uses the Dirent object from fs.readdir. It also recurses when it encounters another directory. Therefore, the returned array will contain all of the files and any files in nested directories as well:

1import { readdir } from "fs/promises"; // Available NodeJS v10.10.0+
2import { join } from "path";
3
4async function ls(path = ".") {
5 const directories = await readdir(path, { withFileTypes: true });
6 const files = await Promise.all(
7 directories.map(async (file) => {
8 const filepath = join(path, file.name);
9 if (file.isDirectory()) return ls(filepath);
10 else if (file.isFile()) return filepath;
11 })
12 );
13 return files.flat(); // Available NodeJS v11.0.0+
14}
15
16console.log(await ls("./node_modules/typescript"));

Essentially, by passing { withFileTypes: true }, the response from readdir is transformed into a Promise<Dirent[]> type. We can see the two interfaces with the withFileTypes field set to either true or false. Notice how different they can be, with one returning an object and the other a string or buffer:

1/**
2 * Asynchronous readdir(3) - read a directory.
3 * @param path A path to a file. If a URL is provided, it must use the `file:` protocol.
4 * @param options The encoding (or an object specifying the encoding), used as the encoding of the result. If not provided, `'utf8'` is used.
5 */
6function readdir(path: PathLike, options?: BaseEncodingOptions & { withFileTypes?: false } | BufferEncoding | null): Promise<string[] | Buffer[]>;
7
8/**
9 * Asynchronous readdir(3) - read a directory.
10 * @param path A path to a file. If a URL is provided, it must use the `file:` protocol.
11 * @param options If called with `withFileTypes: true` the result data will be an array of Dirent.
12 */
13function readdir(path: PathLike, options: BaseEncodingOptions & { withFileTypes: true }): Promise<Dirent[]>;

The Dirent object type allows us to avoid an additional function call, the fs.stat on every file, and instead rely on the methods that it already has, namely isFile and isDirectory:

1export class Dirent {
2 isFile(): boolean;
3 isDirectory(): boolean;
4 isBlockDevice(): boolean;
5 isCharacterDevice(): boolean;
6 isSymbolicLink(): boolean;
7 isFIFO(): boolean;
8 isSocket(): boolean;
9 name: string;
10}

Still, the example above is a bit involved. It is calling promises, mapping arrays, flattening arrays. We can simplify it further by using generators and introduce some really useful functionality.

Working with generators

Generators have been around for a long time, and are not a new JavaScript concept. Support has existed in NodeJS since version 4.0, so quite a while now. Let's take a look at a simple example from the MDN documentation:

1function* generator() {
2 yield 1;
3 yield 2;
4 yield 3;
5}
6
7const gen = generator(); // execution paused at "yield 1"
8
9console.log(gen.next().value); // 1 is returned, execution paused at "yield 2"
10console.log(gen.next().value); // 2 is returned, execution paused at "yield 3"
11console.log(gen.next().value); // 3 is returned, function completes

The function* notation defines a generator function, that conforms to both an iterable and an interator protocol. The yield keyword delegates control, and calling .next() on a generator returns the value yielded by the yield keyword. Note, that a for loop will automatically call .next(). Lastly, execution also pauses on each yield statement until .next() is called. So, knowing this, let's rewrite our ls implementation above using generators:

1import { readdir } from "fs/promises";
2import { join } from "path";
3
4async function* ls(path = ".") {
5 const directories = await readdir(path, { withFileTypes: true });
6 for (const file of directories) {
7 if (file.isDirectory()) yield* ls(join(path, file.name));
8 else if (file.isFile()) yield join(path, file.name);
9 }
10}
11
12/** We need to iterate over the results */
13async function run() {
14 for await (const result of ls("./node_modules/typescript")) {
15 console.log(result);
16 }
17}

The main difference now is that we no longer have the Promise.all call. We simply iterate through the directories returned by readdir. We either return a value if we find a file (yield join(...)) or we recursively delegate to ourselves (yield* ls(join(...))). Additionally, the for await notation allows us to iterate over our generator (since it is an async iterable). The code is much more concise. Notice that we'll need an extra function to loop over the results. We can no longer log them out now without some additional processing. However, this tradeoff is well worth it as we'll see shortly.

Short circuiting

Usually, the main driver behind listing files in folders or directories is searching. We never want to just list files for the sake of it, we're really looking for a specific one after all. Generators enable us to short circuit, or return early from our execution. Consider the following code block, where we replace the run function with search:

1import { readdir } from "fs/promises";
2// Import basename to get *just* the file name
3import { join, basename } from "path";
4
5// async function* ls(path = '.') { ... }
6
7async function search(path, filename) {
8 for await (const result of ls(path)) {
9 if (basename(result) === filename) {
10 // Return as soon as we have a match
11 return result;
12 }
13 }
14}
15
16console.log(await search("./node_modules/typescript", "README.md"));

Remember that with generators, the function is only executed up to the yield statement. Control is then delegated by .next() calls in lock step with yield. The for loop automatically calls .next() on our ls iterable. As soon as we have a match, we return and exit the loop early. Compare this with our original implementation using Promise.all. That implementation has no clean way to exit the control flow when a match is found. Therefore, we have to list out the full directory tree structure, and then finally search for the file we want. An equivalent search function for the promise case would look like the following:

1async function search(path, filename) {
2 const files = await ls(path); // Get *all* files first
3 for (const result of files) {
4 if (basename(result) === filename) {
5 return result;
6 }
7 }
8}

Notice how on the second line we return all files in all directories, the full tree, then we have another loop that searches through all of the files again. The differences between the two search functions are very subtle. The for await is an integral part that allows us to iterate and compare each returned file one at a time. If you tried to do the same with the promise example, you would get the following error message:

1for await (const result of ls(path)) {
2 ^
3
4TypeError: ls(...) is not a function or its return value is not async iterable

So, our original implementation will run in about O(n) time for all cases. With generators, we have the potential to cut that in half, O(n/2) if the file is in the middle of the tree, or even better, close to O(1), if the file is near the very beginning of the tree due to the inherent nature of generators and async iterables.

Consider using this solution next time you have to list files in directories, as it will scale better, is easier to read, and it does not require installing any additional libraries.


Have I made a mistake? Please consider submitting a pull request
Back to top