2022-01-30
|~5 min read
|966 words
I was recently working on porting my blog to Remix. As part of this, I needed to parse my markdown files that constitute all of the posts.
The folder structure I use is very flat, it’s just a directory with a long list of files in it. All of the other data is stored in the frontmatter of posts.
That’s actually not quite true, I have one subdirectory to store some assets temporarily.
Why is that important? Because if you don’t check to confirm a path is a file / directory, it’s easy to throw an exception and kill the process.
Let’s paint a picture. Imagine the directory looks a bit like this:
drwxr-xr-x 4 stephen.weiss staff 128 Jan 10 20:06 temp
-rw-r--r-- 1 stephen.weiss staff 275 Jan 30 08:07 blogpost1.md
-rw-r--r-- 1 stephen.weiss staff 91 Jan 10 19:59 blogpost2.md
...
One directory. Lots of files.
Now, let’s try to read them and filter out the directories.
import fs from "fs/promises"
const dir = await fs.readdir(postsPath).then((paths) =>
paths.filter(async (pathName: string) => {
const fullPath = path.join(postsPath, pathName)
const isFile = (await fs.lstat(fullPath)).isFile()
if (!isFile) console.log(`initial filter`, { isFile, fullPath })
return isFile
}),
)
console.log(dir) // ['temp','blogpost1','blogpost2']
Hmm! That’s not what we wanted, why is that?
Well, we’re trying to use the filter prototype method, but that method requires a predicate that returns a boolean. We’re returning a promise that resolves to a boolean.
This is easier to see if we tease this apart into multiple pieces:
const filter = async (filePath): Promise<boolean> => {
const fullPath = path.join(postsPath, filePath)
return (await fs.stat(fullPath)).isFile()
}
const dir = await fs.readdir(postsPath)
const filtered = Promise.all(await dir.filter(filter))
return await filtered
A Promise
is truthy, so the filter sees no reason to exclude the non-files.
Okay, so if the prototype method cannot act on promises, what options do we have?
In my case, I want to actually remove the element from the list, so reduce
feels like a good option:
const filter = async (filePath) => {
const fullPath = path.join(postsPath, filePath)
return (await fs.stat(fullPath)).isFile()
}
const dir = await fs.readdir(postsPath)
const filtered = await dir.reduce(
async (acc, cur) => ((await filter(cur)) ? [...(await acc), cur] : acc),
[],
)
return await filtered
The money line is:
const filtered = await dir.reduce(
async (acc, cur) => ((await filter(cur)) ? [...(await acc), cur] : acc),
[],
)
Note that we are first awaiting the result of our filter
function - a promise that resolves to a boolean value. Then, if that resolves to true, we will hit the first branch of the ternary: [...(await acc), cur]
. We need the await
here because each iteration of the reduce
returns a promise.
async
/await
provides syntatic sugar for the promise chain, but in the background, we have a promise chain being built up with each pass through.
Unfortunately, when I tried this in an actual project, my Typescript linter started screaming.
Two separate issues:
string
when I very much intened this to be a promise of a list of strings (the promise is ignored because of the use of await
).
What to do?
Instead of trying to use the prototypal methods on the array, we can create our own map and filter functions that are designed to be asynchronous.
Let’s start with the filterAsync
since that will be our entry point:
async function filterAsync<T>(
array: T[],
callbackfn: (value: T, index: number, array: T[]) => Promise<boolean>,
): Promise<T[]> {
const filterMap = await mapAsync(array, callbackfn)
return array.filter((value, index) => filterMap[index])
}
The method takes an array of type T and returns an array of type T. The second argument is a callback that is presumed to be a deferred predicate.
The cool part about this is how that works with the mapAsync
function:
function mapAsync<T, U>(
array: T[],
callbackfn: (value: T, index: number, array: T[]) => Promise<U>,
): Promise<U[]> {
return Promise.all(array.map(callbackfn))
}
The map function transforms the array of type T into one of type U (this is a standard idea in map functions since the whole point is manipulating each element, we wouldn’t expect the types to match).
In our case, since we’re passing in a callback that results in a boolean, when we get to the Promise.all
line, we have a list of values that will all resolve to true
or false
.
Because this has been resolved before we run the .filter
method back in the filterAsync
we can refer to the index to get the value of the predicate.
Stepping through one piece at a time with pseudocode:
const arr = ['dir', 'post1', 'post2']
const mappedArr = [Promise<isFile('dir')>, Promise<isFile('post1')>, Promise<isFile('post2')>]
const resolvedMap = [false, true, true]
So, when we get to the filter, we’re asking questions like, “for the first element, at index 0, if we look at the resolvedMap
, is it true or false?”
The point is that the predicate has already been calculated, and we’re now looking things up based on index.
This approach does have the drawback that we’re doubling the amount of space needed, which might matter if the lists are really big, though not a concern in my case.
I put together a tiny example repo here, with this diff showing how I refactored a filter that wants to use an async function.
Additional resources are this article on advanced web covering different approaches and this stack overflow question about filtering arrays with async functions.
This is also similar in nature to what I was writing about in Javascript: Awaiting Asynchronous Operations on Lists (Arrays).
Hi there and thanks for reading! My name's Stephen. I live in Chicago with my wife, Kate, and dog, Finn. Want more? See about and get in touch!