Build pipelines are a common pattern, where you have files and assets you want to process but want to do so efficiently and incrementally. Usually that means only re-processing files when they change, and otherwise re-using the already-processed assets as much as possible. This blog post will walk through how to use the Mill build tool to set up these build pipelines, using a real-world use case, and demonstrate the advantages a build pipeline gives you over a naive build script.
About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming
As a worked example for this blog post, we will be starting with the simple single-file static site generator discussed here:
The full 50 line build.sc
script is as follows:
// build.sc
import $ivy.`com.lihaoyi::scalatags:0.7.0`
import $ivy.`com.atlassian.commonmark:commonmark:0.5.1`
import scalatags.Text.all._
interp.watch(os.pwd / 'post)
val postInfo = os
.list(os.pwd / 'post)
.filter(_.last.contains(" - "))
.map(p => p.last.split(" - ") match{ case Array(prefix, suffix) => (prefix, suffix, p)})
.sortBy(_._1.toInt)
def mdNameToHtml(name: String) = name.stripSuffix(".md").replace(" ", "-").toLowerCase + ".html"
val bootstrapCss = link(
rel := "stylesheet",
href := "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
)
os.remove.all(os.pwd / "out" / "post")
os.makeDir.all(os.pwd / "out" / "post")
for((_, suffix, path) <- postInfo) {
val parser = org.commonmark.parser.Parser.builder().build()
val document = parser.parse(os.read(path))
val renderer = org.commonmark.html.HtmlRenderer.builder().build()
val output = renderer.render(document)
os.write(
os.pwd / "out" / 'post / mdNameToHtml(suffix),
html(
head(bootstrapCss),
body(
h1(a("Haoyi's Blog", href := "../index.html")),
h1(suffix.stripSuffix(".md")),
raw(output)
)
).render
)
}
os.write(
os.pwd / "out" / "index.html",
html(
head(bootstrapCss),
body(
h1("Haoyi's Blog"),
for((_, suffix, _) <- postInfo)
yield h2(a(suffix, href := ("post/" + mdNameToHtml(suffix))))
)
).render
)
To run this script:
sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.6.9/2.13-1.6.9) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm
Second, create a posts/
folder and put your you can put .md
files inside
Lastly, run the script using amm blog.sc
This will generate an index.html
file with the listing of all the blog posts, and a blog/
folder containing HTML files one for each input .md
. You can open these files in the browser and interact with them:
While the above build.sc
script works fine in small cases, there is one big limitation: the entire script runs every time. Even if you only change one blog post's .md
file, every file will need to be re-processed. This is wasteful, and can be slow as the number of blog posts grows. On this blog, re-processing every post can take upwards of 20-30 seconds: a long time to wait every time you tweak some wording!
While it is possible to manually keep track of which .md
file was converted into which .html
file, and thus avoid re-processing .md
files unnecessarily, this kind of book-keeping is tedious and easy to get wrong. This is especially true if we want to add more steps to the build process. For example, here are some possible extensions we may want to add to this build script:
Download the bootstrap.min.css
file at build time and bundle it with the static site, to avoid a dependency on the third party service
Extract the first paragraph of each blog post and include it on the home page
Use git log
to find when each blog post was first written, and include it on both that blog's page as well as on the home page
Deploy the completed static site to the web and make it available to the public
Each of these additional steps is something you would have to execute, cache, and decide when to re-execute. We'll now see how we can use the Mill build tool to do this automatically.
To begin with, let's install Mill
curl -L https://github.com/lihaoyi/mill/releases/download/0.5.0/0.5.0 > mill && chmod +x mill
This makes the ./mill
executable available in the current directory for you to use.
Now, we can convert the above build.sc
file into a Mill build pipeline:
// build.sc
import $ivy.`com.lihaoyi::scalatags:0.7.0`
import $ivy.`com.atlassian.commonmark:commonmark:0.5.1`
import mill._, scalatags.Text.all._
interp.watch(os.pwd / 'post)
val postInfo = os
.list(os.pwd / 'post)
.filter(_.last.contains(" - "))
.map(p => p.last.split(" - ") match{ case Array(prefix, suffix) => (prefix, suffix, p)})
.sortBy(_._1.toInt)
def mdNameToHtml(name: String) = name.stripSuffix(".md").replace(" ", "-").toLowerCase + ".html"
val bootstrapCss = link(
rel := "stylesheet",
href := "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
)
object post extends Cross[PostModule](postInfo.map(_._1):_*)
class PostModule(number: String) extends Module{
val Some((_, suffix, path)) = postInfo.find(_._1 == number)
def srcPath = T.sources(path)
def render = T{
val Seq(src) = srcPath()
val parser = org.commonmark.parser.Parser.builder().build()
val document = parser.parse(os.read(src.path))
val renderer = org.commonmark.html.HtmlRenderer.builder().build()
val output = renderer.render(document)
os.write(
T.ctx().dest / mdNameToHtml(suffix),
html(
head(bootstrapCss),
body(
h1(a("Haoyi's Blog", href := "../index.html")),
h1(suffix.stripSuffix(".md")),
raw(output)
)
).render
)
PathRef(T.ctx().dest / mdNameToHtml(suffix))
}
}
def index = T{
os.write(
T.ctx().dest / "index.html",
html(
head(bootstrapCss),
body(
h1("Haoyi's Blog"),
for ((_, suffix, _) <- postInfo)
yield h2(a(suffix, href := ("post/" + mdNameToHtml(suffix))))
)
).render
)
PathRef(T.ctx().dest / "index.html")
}
val posts = mill.define.Task.sequence(postInfo.map(_._1).map(post(_).render))
def dist = T {
for (post <- posts()) {
os.copy(post.path, T.ctx().dest / 'post / post.path.last, createFolders = true)
}
os.copy(index().path, T.ctx().dest / "index.html")
PathRef(T.ctx().dest)
}
Here, we are defining a cross-build of PostModule
s, one for each post in the post
folder, and each with a target .render
that parses the markdown into HTML, writing it to disk and returning a PathRef
to the generated files. We then combine that into a single posts
target, containing all the generated files, and make use of that in dist
which copies them all into a single folder and writes an index.html
containing links to the individual blog posts.
Given the following posts:
$ tree .
├── post
│ ├── 1 - Automatic Binary Serialization in uPickle 0.7.md
│ ├── 2 - Benchmarking Scala Collections.md
│ ├── 3 - What's in a Build Tool?.md
│ └── ...
└── build.sc
We can build this blog using
$ ./mill dist
And see the folder that's generated using
$ ./mill show dist
"ref:b33a3c95:/Users/lihaoyi/Github/blog/out/dist/dest"
We can list the contents using tree
:
$ tree /Users/lihaoyi/Github/blog/out/dist/dest
/Users/lihaoyi/Github/blog/out/dist/dest
├── post
│ ├── automatic-binary-serialization-in-upickle-0.7.html
│ ├── benchmarking-scala-collections.html
│ ├── what's-in-a-build-tool?.html
│ └── ...
└── index.html
And open the index.html
in our browser to view the blog.
Every time you run ./mill dist
, Mill will only re-process the blog posts that have changed since you last ran it. You can also use ./mill --watch dist
or ./mill -w dist
to have Mill watch the filesystem and automatically re-process the files every time they change.
Now that we've seen working code, let us walk through the example step by step to understand it.
import $ivy.`com.lihaoyi::scalatags:0.7.0`
import $ivy.`com.atlassian.commonmark:commonmark:0.5.1`
import mill._, scalatags.Text.all._
To begin with, we import the same third-party libraries as we did in our original Scala Script: Scalatags to render HTML, and CommonMark to parse Markdown. In addition to that we import mill._
, we brings our Mill-related functions into scope to build our pipelines.
interp.watch(os.pwd / 'post)
val postInfo = os
.list(os.pwd / 'post)
.filter(_.last.contains(" - "))
.map(p => p.last.split(" - ") match{ case Array(prefix, suffix) => (prefix, suffix, path})
.sortBy(_._1.toInt)
def mdNameToHtml(name: String) = name.stripSuffix(".md").replace(" ", "-").toLowerCase + ".html"
val bootstrapCss = link(
rel := "stylesheet",
href := "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
)
This section is roughly the same as the Ammonite script we saw earlier: listing the markdown files and extracting their index, suffix and path to use later. Note that the interp.watch(os.pwd / 'post)
is necessary because our build pipeline is dynamic: the number of post[n]
modules depends on the list of files in the post/
folder, and so we need to use interp.watch
to ensure Mill knwos to re-compute the post[n]
modules every time that folder changes
object post extends Cross[PostModule](postInfo.map(_._1):_*)
class PostModule(number: String) extends Module{
val Some((_, suffix, path)) = postInfo.find(_._1 == number)
def srcPath = T.sources(path)
def render = T{
val Seq(src) = srcPath()
val parser = org.commonmark.parser.Parser.builder().build()
val document = parser.parse(os.read(src.path))
val renderer = org.commonmark.html.HtmlRenderer.builder().build()
val output = renderer.render(document)
os.write(
T.ctx().dest / mdNameToHtml(suffix),
html(
head(bootstrapCss),
body(
h1(a("Haoyi's Blog", href := "../index.html")),
h1(suffix.stripSuffix(".md")),
raw(output)
)
).render
)
PathRef(T.ctx().dest / mdNameToHtml(suffix))
}
}
This is the first part of Mill-specific functionality: we define PostModule
extends Module
, each of which has a def render = T{...}
target, and populates a post extends Cross[PostModule]
object with the indices of all the markdown files from postInfo
.
Each post has a srcPath
defined using T.sources
; this tells the build pipeline that those specified files are inputs. When the files change, Mill then knows to invalidate the downstream build steps and re-evaluate them to keep the results up to date.
Each post's render
function, rather than writing the output to a global output folder, writes it to T.ctx().dest
folder. This helps ensure each post gets a unique working folder and avoid conflicts. We can render each post individually using ./mill post[n].render
:
$ ./mill post[1].render
$ ./mill show post[1].render
"ref:c53cc5ae:out/post/1/render/dest/code-reviewing-my-earliest-surviving-program.html"
$ ./mill show post[2].render
"ref:99e5ad6d:out/post/2/render/dest/strategic-scala-style:-principle-of-least-power.html"
We then prepare the index.html
file, in an index
target using def index =
T{...}
:
def index = T{
os.write(
T.ctx().dest / "index.html",
html(
head(bootstrapCss),
body(
h1("Haoyi's Blog"),
for ((_, suffix, _) <- postInfo)
yield h2(a(suffix, href := ("post/" + mdNameToHtml(suffix))))
)
).render
)
PathRef(T.ctx().dest / "index.html")
}
This index file simply has a title ("Haoyi's Blog"
) and a list of links to each of the individual posts.
Lastly, we have the target defined that assembled the blog posts and the index.html
file into a single folder:
val posts = mill.define.Task.sequence(postInfo.map(_._1).map(post(_).render))
def dist = T {
for (post <- posts()) {
os.copy(post.path, T.ctx().dest / 'post / post.path.last, createFolders = true)
}
os.copy(index().path, T.ctx().dest / "index.html")
PathRef(T.ctx().dest)
}
Here we use Task.sequence
to convert the Seq[T[PathRef]]
into a T[Seq[PathRef]]
, and make use of that in the dist
target. In dist
, we simply copy the already-generated HTML files for each blog post into the T.ctx().dest
folder, along with the index
target containing an index.html
file, and we're done. Essentially, we have defined the following pipeline:
We can now run ./mill dist
to build the dist
target, and assemble the output into a folder to use: either browsing locally, or for deployment.
Now that we've defined a simple pipeline, let's consider two of the four extensions we mentioned earlier:
Download the bootstrap.min.css
file at build time and bundle it with the static site, to avoid a dependency on the third party service
Extract the first paragraph of each blog post and include it on the home page
Bundling bootstrap is simple. We simply define a bootstrap target to download the file:
- val bootstrapCss = link(
- rel := "stylesheet",
- href := "https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css"
- )
+ def bootstrap = T{
+ os.write(
+ T.ctx().dest / "bootstrap.min.css",
+ requests.get("https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css").text()
+ )
+ PathRef(T.ctx().dest / "bootstrap.min.css")
+ }
Include it in our dist
:
os.copy(bootstrap().path, T.ctx().dest / "bootstrap.min.css")
And then update our two bootstrapCss
links to use a local URL:
- head(bootstrapCss)
+ head(link(rel := "stylesheet", href := "bootstrap.min.css"))
- head(bootstrapCss)
+ head(link(rel := "stylesheet", href := "../bootstrap.min.css"))
Now, when you run ./mill dist
, you can see that the bootstrap.min.css
file is downloaded and bundled with your dist
folder:
$ tree out/dist/dest/
out/dist/dest/
├── blog
│ ├── automatic-binary-serialization-in-upickle-0.7.html
│ ├── what's-functional-programming-all-about?.html
│ ├── what's-in-a-build-tool?.html
│ └── ...
├── bootstrap.min.css
└── index.html
And we can see in the browser that we are now using a locally-bundled version of Bootstrap:
Since it does not depend on any T.sources
, the bootstrap = T{}
target never invalidates, which is usually what you want when depending on a stable URL like bootstrap/3.3.6
.
We now have the following build pipeline, with the additional bootstrap
step:
The code now looks like this:
// build.sc
import $ivy.`com.lihaoyi::scalatags:0.7.0`
import $ivy.`com.atlassian.commonmark:commonmark:0.5.1`
import mill._, scalatags.Text.all._
interp.watch(os.pwd / 'post)
val postInfo = os
.list(os.pwd / 'post)
.filter(_.last.contains(" - "))
.map(p => p.last.split(" - ") match{ case Array(prefix, suffix) => (prefix, suffix, p)})
.sortBy(_._1.toInt)
def mdNameToHtml(name: String) = name.stripSuffix(".md").replace(" ", "-").toLowerCase + ".html"
def bootstrap = T{
os.write(
T.ctx().dest / "bootstrap.min.css",
requests.get("https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css").text()
)
PathRef(T.ctx().dest / "bootstrap.min.css")
}
object post extends Cross[PostModule](postInfo.map(_._1):_*)
class PostModule(number: String) extends Module{
val Some((_, suffix, path)) = postInfo.find(_._1 == number)
def srcPath = T.sources(path)
def render = T{
val Seq(src) = srcPath()
val parser = org.commonmark.parser.Parser.builder().build()
val document = parser.parse(os.read(src.path))
val renderer = org.commonmark.html.HtmlRenderer.builder().build()
val output = renderer.render(document)
os.write(
T.ctx().dest / mdNameToHtml(suffix),
html(
head(head(link(rel := "stylesheet", href := "../bootstrap.min.css"))),
body(
h1(a("Haoyi's Blog", href := "../index.html")),
h1(suffix.stripSuffix(".md")),
raw(output)
)
).render
)
PathRef(T.ctx().dest / mdNameToHtml(suffix))
}
}
val posts = mill.define.Task.sequence(post.itemMap.values.map(_.render).toSeq)
def dist = T {
for (post <- posts()) {
os.copy(post.path, T.ctx().dest / 'post / post.path.last, createFolders = true)
}
os.copy(bootstrap().path, T.ctx().dest / "bootstrap.min.css")
os.write(
T.ctx().dest / "index.html",
html(
head(head(link(rel := "stylesheet", href := "bootstrap.min.css"))),
body(
h1("Haoyi's Blog"),
for ((_, suffix, _) <- postInfo)
yield h2(a(suffix, href := ("post/" + mdNameToHtml(suffix))))
)
).render
)
PathRef(T.ctx().dest)
}
To render a paragraph preview of each blog post in the index.html
page, the first step is to generate such a preview for each PostModule
:
class PostModule(number: String) extends Module{
val Some((_, suffix, path)) = postInfo.find(_._1 == number)
def srcPath = T.sources(path)
def preview = T{
val Seq(src) = srcPath()
val parser = org.commonmark.parser.Parser.builder().build()
val firstPara = os.read.lines(src.path).takeWhile(_.nonEmpty)
val document = parser.parse(firstPara.mkString("\n"))
val renderer = org.commonmark.html.HtmlRenderer.builder().build()
val output = renderer.render(document)
output
}
def render = T{
...
}
}
Here we are simply leaving the preview as a output: String
rather than writing it to a file and using a PathRef
.
Next, we need to aggregate the preview
s the same way we aggregated the render
s earlier:
val previews = mill.define.Task.sequence(post.itemMap.values.map(_.preview).toSeq)
Lastly, in dist
, zip the preview together with the postInfo
in order to render them:
- for ((_, suffix, _) <- postInfo)
- yield h2(a(suffix, href := ("post/" + mdNameToHtml(suffix))))
+ for (((number, suffix, _), preview) <- postInfo.zip(previews()))
+ yield frag(
+ h2(a(suffix, href := ("post/" + mdNameToHtml(suffix)))),
+ raw(preview)
+ )
Now we get pretty previews in index.html
!
The build pipeline now looks like:
Note how we now have both post[n].preview
and post[n].render
targets, with the preview
targets being used in index
to generate the home page and the render
targets only being used in the final dist
.
And here's the complete code:
// build.sc
import $ivy.`com.lihaoyi::scalatags:0.7.0`
import $ivy.`com.atlassian.commonmark:commonmark:0.5.1`
import mill._, scalatags.Text.all._
interp.watch(os.pwd / 'post)
val postInfo = os
.list(os.pwd / 'post)
.filter(_.last.contains(" - "))
.map(p => p.last.split(" - ") match{ case Array(prefix, suffix) => (prefix, suffix, p)})
.sortBy(_._1.toInt)
def mdNameToHtml(name: String) = name.stripSuffix(".md").replace(" ", "-").toLowerCase + ".html"
def bootstrap = T{
os.write(
T.ctx().dest / "bootstrap.min.css",
requests.get("https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css").text()
)
PathRef(T.ctx().dest / "bootstrap.min.css")
}
object post extends Cross[PostModule](postInfo.map(_._1):_*)
class PostModule(number: String) extends Module{
val Some((_, suffix, path)) = postInfo.find(_._1 == number)
def srcPath = T.sources(path)
def renderMarkdown(s: String) = {
val parser = org.commonmark.parser.Parser.builder().build()
val document = parser.parse(s)
val renderer = org.commonmark.html.HtmlRenderer.builder().build()
renderer.render(document)
}
def preview = T{
val Seq(src) = srcPath()
val firstPara = os.read.lines(src.path).takeWhile(_.nonEmpty)
renderMarkdown(firstPara.mkString("\n"))
}
def render = T{
val Seq(src) = srcPath()
val output = renderMarkdown(os.read(src.path))
os.write(
T.ctx().dest / mdNameToHtml(suffix),
html(
head(head(link(rel := "stylesheet", href := "../bootstrap.min.css"))),
body(
h1(a("Haoyi's Blog", href := "../index.html")),
h1(suffix.stripSuffix(".md")),
raw(output)
)
).render
)
PathRef(T.ctx().dest / mdNameToHtml(suffix))
}
}
val posts = mill.define.Task.sequence(post.itemMap.values.map(_.render).toSeq)
val previews = mill.define.Task.sequence(post.itemMap.values.map(_.preview).toSeq)
def index = T{
os.write(
T.ctx().dest / "index.html",
html(
head(head(link(rel := "stylesheet", href := "bootstrap.min.css"))),
body(
h1("Haoyi's Blog"),
for (((number, suffix, _), preview) <- postInfo.zip(previews()))
yield frag(
h2(a(suffix, href := ("post/" + mdNameToHtml(suffix)))),
raw(preview)
)
)
).render
)
PathRef(T.ctx().dest / "index.html")
}
def dist = T {
for (post <- posts()) {
os.copy(post.path, T.ctx().dest / 'post / post.path.last, createFolders = true)
}
os.copy(bootstrap().path, T.ctx().dest / "bootstrap.min.css")
os.copy(index().path, T.ctx().dest / "index.html")
PathRef(T.ctx().dest)
}
In the interest of time, this blog post only contains examples walking you through the first two extensions we mentioned earlier:
Use git log
to find when each blog post was first written, and include it on both that blog's page as well as on the home page
Deploy the completed static site to the web and make it available to the public
These would require two more concepts we haven't seen so far:
T.input tasks, letting your build react to arbitrary changes in the system
T.command tasks, which can perform arbitrary actions without caching.
Implementing those extensions using these Mill features is left as an exercise to the reader!
In this blog post, we have seen how to take a simple Scala script that generates a static website, and convert it into a Mill build pipeline. Unlike a naive script, this pipeline allows fast incremental updates whenever the underlying sources changes. We have also seen how to extend the Mill build pipeline, adding additional build steps to compute things like bundling CSS files or showing post previews, all while preserving the ability to do fast incremental updates.
While the Mill build tool is often used to compile Java and Scala source code into executables, it can also be used to create general-purpose build pipelines for all sorts of data transformations. A developer can simply specify what each build step needs as input and what computation it performs, and Mill will handle all the ordering, caching and invalidation for you, giving you blazing fast incremental builds without any manual effort.
This blog post is just a quick introduction to the ideas and concepts behind the Mill build tool. For a more thorough reference, take a look at the Mill documentation:
About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming