Building a Rust Utility: Hugo Static Blog to Lunr Index
Rust is a language which has fascinated me since I was first introduced to it at the end of my first year of undergraduate. Since then Rust has evolved at a frantic pace and I feel like I’ve forgotten a lot of the subtleties of Rust syntax, so I’ve decided to do a number of small projects in Rust to brush up.
The first project I’ve tackled is replacing an outdated an abandoned npm package, hugo-lunr. This blog’s search is powered by Lunr which requires an array of JS objects containing information about each post. hugo-lunr
is designed to run during the site’s build step to produce the static index by iterating over Hugo markdown files and extracting key information from the front matter and dumping the contents to a JSON file which can be retrieved and consumed by Lunr at runtime. Unfortunately hugo-lunr
is a fairly bare-bones implementation with a number of open pull requests and issues making it an excellent candidate for replacement.
Version 0.1
The initial version of my efforts can be found here. Most of the complications fell into one of two camps: Strings and file access. Rust has a whole section of its book dedicated to Strings, and it’s easy to see why. Rust’s borrow checker makes typically simple operations, like concatenating strings, difficult. I ended up using a pattern I found in a repository that benchmarks string concatenation [[FORWARD_SLASH, &directory, FORWARD_SLASH, slug].join(EMPTY_STRING);]
as a workaround in that case, but I lost a lot of time figuring out how to manage the lifetime of borrowed string slices from a file access. In the end the only mechanism I could find was to turn them into heap allocated strings.
File access was a little easier to reason about but still not straightforward. Initially I expected there to be a standard library function for recursive traversal but instead I had to settle on an external dependency for the task. Rust’s Path struct also leaves a lot to be desired, getting the path for the directory of a file requires a bizarre function invocation
Path::new(&config.index_path).with_file_name(EMPTY_STRING)
The standard fs::write()
call also fails to write if there is a subdirectory missing in the path. Whilst these are small annoyances, there were enough of these idiosyncrasies for me to have an exceedingly large number of Rust doc tabs open by the time I finished the first version.
Whilst frustrating at times, the experience as a whole has been positive. The program runs fast and the code is easy to read. Additionally, I’m loving the Option module and the neat ways it allows you to deal with errors that arise.
Areas for Improvement
The first point of call will be to improve the argument parsing logic to allow for optional arguments and help text. I also intend to support features requests from hugo-lunr
such as the ability to include drafts and exclude certain child directories. I also want to include support for JSON and YAML front matter. Eventually I also intend to allow for configuration to be loaded from a file rather than command line arguments.
Use in Build Step
This blog is built and deployed by Netlify so I needed a way to run my new Rust utility on Netlify. Historically hugo-lunr
would be fetched from NPM by Yarn and run using Grunt. Initially I considered wrapping my utility in a Node container and uploading it to NPM but that seemed like far too much effort. Instead, I was inspired by GoDownloader - I fetch the latest version of the utility from Github releases using a simple bash script.
#!/usr/bin/env bash
set -e
# Get latest HugoLunr release
LATEST_RELEASE=$(curl -L -s -H 'Accept: application/json' https://github.com/arranf/HugoLunr/releases/latest)
# The releases are returned in the format {"id":3622206,"tag_name":"hello-1.0.0.11",...}, we have to extract the tag_name.
LATEST_VERSION=$(echo $LATEST_RELEASE | sed -e 's/.*"tag_name":"\([^"]*\)".*/\1/')
ARTIFACT_URL="https://github.com/arranf/HugoLunr/releases/download/$LATEST_VERSION/hugo_lunr"
INSTALL_DIRECTORY="."
INSTALL_NAME="hugo_lunr"
DOWNLOAD_FILE="$INSTALL_DIRECTORY/$INSTALL_NAME"
echo "Fetching $ARTIFACT_URL.."
if test -x "$(command -v curl)"; then
code=$(curl -s -w '%{http_code}' -L "$ARTIFACT_URL" -o "$DOWNLOAD_FILE")
elif test -x "$(command -v wget)"; then
code=$(wget -q -O "$DOWNLOAD_FILE" --server-response "$ARTIFACT_URL" 2>&1 | awk '/^ HTTP/{print $2}' | tail -1)
else
echo "Neither curl nor wget was available to perform http requests."
exit 1
fi
if [ "$code" != 200 ]; then
echo "Request failed with code $code"
exit 1
fi
chmod +x "$DOWNLOAD_FILE"
./"$DOWNLOAD_FILE"