Improving My Rust CLI Application: Hugo to JSON

March 16, 2019

I’ve been working on improving the small command line Rust utility I made in January. The tool takes the raw source code for this blog and parses the markdown files to produce a JSON index which is to power the blog’s search feature. Last time I posted about it I’d just finished the initial version. Whilst the first attempt was good enough to replace my previous solution 1, I’ve spent time improving the areas I wasn’t happy with and I’ve learned a lot in the process.

The changes I’ve made in the 53 commits since that first version have barely altered the functionality of the tool but have subtly moved the lone-file program from a hobby project to something that resembles ‘production grade’ code. I’m going to quickly outline some of the features I’ve added and the reflect a little on the patterns I’ve picked up, the resources I got a lot out of, and the areas I want to improve on.

New Features

The initial version only supported TOML front matter in my blog posts and so I added support for YAML front matter, which is the most widely used format. I also wrote and published a small Rust library to remove markdown syntax from the contents of the posts. Removing the markdown syntax for the index reduces the size of the JSON and makes it easier to stem the posts for search. The tool also lacked a friendly interface so I used structopt to produce command suggestions, add coloured output, and help text.

Below you can see the difference in output.

Initial Version of Hugo to JSON

Initial Version of Hugo to JSON

Hugo to JSON 0.3.1: Using structopt and clap

Hugo to JSON 0.3.1: Using structopt and clap

Improvements

A large part of the refactor was based around improving error handling and separating logic into modules. Gone is the single main.rs file stuffed with code and instead now there’s a clean separation between the the logic for generating JSON (which is now in a reusable library) and the command line interface. This separation of logic made it much easier to test. Whilst the program doesn’t have 100% coverage, the core logic of the application is now tested at a unit level and there are integration tests to test the logic end to end.

The biggest improvement to the program overall was the introduction of sensible patterns for error handling. Before, there were a lot of unwrap()s and a lot of errors silently swallowed through the use of Option<T> as opposed to Result<T, E>. For instance,if a front matter couldn’t be parsed I’d return an Option::None or an Option::Some(Thing) when things went well. Whilst this allowed me to handle the fact something went wrong when things go wrong it doesn’t allow you to know what went wrong. A lot of my refactoring has been ripping out Option<T>’s and replacing them with Result<T, E> to give better context for failure cases.

Error handling is definitely one of the more difficult areas in Rust but I found a lot of help from this blog post on Andrew Gallant’s blog (the author behind ripgrep) which introduced me to the ? syntax and the pattern of composing error types in a wrapper enum. You can see examples of the use of the ? syntax throughout my code and a good example of the wrapper enum here. Although these patterns provide a great mechanism for passing information around the code it doesn’t provide a particularly pleasant representation to the user. I’m interested in playing with the failure crate which promises to make things more explicit to the user and avoids the boilerplate of enums.

I also owe a lot to the Command Line Apps guide which helped me piece together how integration testing could work for a CLI and pointed me towards clap and structopt.

Future Goals

I’d love for it to be possible to configure the CLI via either a configuration file (for CI), command line options, or environment variables and the config-rs library looks like a great start for that.

I’d also like to add an option to serialize drafts, and the option to serialize front matter fields other than the defaults. I also think it could be handy to serialize the page index into formats other than JSON.


  1. In reality, it didn’t completely work. There was several edge cases where the TOML parsing failed and I had no real way of knowing! [return]

See Also

Last Updated: 2019-03-16 10:02