JSON is one of the most common data interchange formats: a human-readable way of exchanging structured data that is ubiquitous throughout industry. This tutorial will walk you through how to work effectively with JSON data in Scala, walking through a few common workflows on a piece of real-world JSON data.
About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming
The easiest way to work with JSON is through the uPickle library. This is available on Maven Central for you to use with any version of Scala:
// SBT
"com.lihaoyi" %% "upickle" % "0.7.1"
// Mill
ivy"com.lihaoyi::upickle:0.7.1"
uJson and uPickle also come bundled with Ammonite, and can be used within the REPL and *.sc
script files. This tutorial will focus more on walking through a concrete example; for deeper details on the library's syntax and functionality, you can refer to its reference documentation:
To begin with, I will install Ammonite:
$ sudo sh -c '(echo "#!/usr/bin/env sh" && curl -L https://github.com/lihaoyi/Ammonite/releases/download/1.6.8/2.13-1.6.8) > /usr/local/bin/amm && chmod +x /usr/local/bin/amm' && amm
And open the Ammonite REPL, using ujson.<tab>
and upickle.<tab>
to see the list of available operations:
$ amm
Loading...
Welcome to the Ammonite Repl 1.6.8
(Scala 2.13.0 Java 11.0.2)
@ ujson.
Arr IncompleteParseException StringRenderer
...
@ upickle.
Api JsReadWriters MsgReadWriters core implicits
...
Also, download the sample JSON data at
Once this is set up, we are ready to begin the tutorial.
Given a JSON string:
@ val jsonString = os.read(os.pwd / "ammonite-releases.json")
jsonString: String = """[
{
"url": "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367",
"assets_url": "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets",
"upload_url": "https://uploads.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets{?name,label}",
...
You can read it into a ujson.Value
using ujson.read
:
@ val data = ujson.read(jsonString)
data: ujson.Value = Arr(
ArrayBuffer(
Obj(
LinkedHashMap(
"url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"),
"assets_url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets"),
...
You can look up entries in the JSON data structure using data(...)
syntax, e.g.
@ data(0)
res3: ujson.Value = Obj(
LinkedHashMap(
"url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"),
"assets_url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets"),
...
@ data(0)("url")
res4: ujson.Value = Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367")
@ data(0)("author")("id")
res5: ujson.Value = Num(2.0607116E7)
A ujson.Value
can be one of several types:
sealed trait Value
case class Str(value: String) extends Value
case class Obj(value: mutable.LinkedHashMap[String, Value]) extends Value
case class Arr(value: ArrayBuffer[Value]) extends Value
case class Num(value: Double) extends Value
sealed abstract class Bool extends Value
case object False extends Bool
case object True extends Bool
case object Null extends Value
You can conveniently cast a ujson.Value
to a specific sub-type and get its internal data by using the .bool
, .num
, .arr
, .obj
, or .str
methods:
@ data.
apply bool num render transform value
arr isNull obj str update
For example, fetching the fields of a ujson.Obj
:
@ data(0).obj
res6: collection.mutable.LinkedHashMap[String, ujson.Value] = LinkedHashMap(
"url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"),
"assets_url" -> Str("https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets"),
...
@ data(0).obj.keys
res7: Iterable[String] = Set(
"url",
"assets_url",
"upload_url",
...
@ data(0).obj.size
res8: Int = 18
Or the values in primitive types:
@ data(0)("url").str
res9: String = "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"
@ data(0)("author")("id").num
res10: Double = 2.0607116E7
ujson.Num
s are stored as doubles. You can call .toInt
to convert to an integer:
@ data(0)("author")("id").num.toInt
res11: Int = 20607116
You can construct JSON data using the ujson.*
constructors:
@ val output = ujson.Arr(
ujson.Obj("hello" -> ujson.Str("world"), "answer" -> ujson.Num(42)),
ujson.Bool(true)
)
output: ujson.Arr = Arr(
ArrayBuffer(Obj(LinkedHashMap("hello" -> Str("world"), "answer" -> Num(42.0))), true)
)
The constructors for primitive types like numbers, strings, and booleans are optional:
@ val output = ujson.Arr(
ujson.Obj("hello" -> "world", "answer" -> 42),
true
)
output: ujson.Arr = Arr(
ArrayBuffer(Obj(LinkedHashMap("hello" -> Str("world"), "answer" -> Num(42.0))), true)
)
These can be serialized back to a string using the ujson.write
function:
@ ujson.write(output)
res13: String = "[{\"hello\":\"world\",\"answer\":42},true]"
@ println(ujson.write(output))
[{"hello":"world","answer":42},true]
By default, the output JSON is compact. You can pass in an indent
parameter if you want your output JSON formatted in a human-readable fashion:
@ println(ujson.write(output, indent = 4))
[
{
"hello": "world",
"answer": 42
},
true
]
ujson.Value
s are mutable, and can be modified the same way as JSON structures in any other language:
@ println(output)
[{"hello":"world","answer":42},true]
@ output(0)("hello") = "goodbye"
@ output(0)("tags") = ujson.Arr("awesome", "yay", "wonderful")
@ println(output)
[{"hello":"goodbye","answer":42,"tags":["awesome","yay","wonderful"]},true]
When treating ujson.Arr
s as buffers or ujson.Obj
s as maps, you need to use .arr
or .obj
to cast the value beforehand:
@ println(output)
[{"hello":"goodbye","answer":42,"tags":["awesome","yay","wonderful"]},true]
@ output(0).obj.remove("hello")
@ println(output)
[{"answer":42,"tags":["awesome","yay","wonderful"]},true]
@ output.arr.append(123)
@ println(output)
[{"answer":42,"tags":["awesome","yay","wonderful"]},true,123]
@ output.arr.clear()
@ println(output)
[]
Going back to our original data
object:
@ ujson.write(data, indent = 4)
res40: String = """[
{
"url": "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367",
"assets_url": "https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets",
"upload_url": "https://uploads.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets{?name,label}",
"html_url": "https://github.com/lihaoyi/Ammonite/releases/tag/1.6.8",
"id": 17991367,
...
To traverse over the tree structure of the ujson.Value
, we can use a recursive function. For example, here is one that recurses over data
and collects all the ujson.Str
nodes in the JSON structure:
@ def traverse(v: ujson.Value): Iterable[String] = v match{
case a: ujson.Arr => a.arr.flatMap(traverse)
case o: ujson.Obj => o.obj.values.flatMap(traverse)
case s: ujson.Str => Seq(s.str)
case _ => Nil
}
defined function traverse
@ traverse(data)
res45: Iterable[String] = ArrayBuffer(
"https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367",
"https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets",
"https://uploads.github.com/repos/lihaoyi/Ammonite/releases/17991367/assets{?name,label}",
"https://github.com/lihaoyi/Ammonite/releases/tag/1.6.8",
"MDc6UmVsZWFzZTE3OTkxMzY3",
"1.6.8",
"master",
We could also modify the ujson.Value
during traversal. Here's a function that recurses over data
and removes all key-value pairs where the value is a string starting with https://
:
@ def traverse(v: ujson.Value): Boolean = v match{
case a: ujson.Arr =>
a.arr.foreach(traverse)
true
case o: ujson.Obj =>
o.obj.filterInPlace{case (k, v) => traverse(v)}
true
case s: ujson.Str => !s.str.startsWith("https://")
case _ => true
}
@ ujson.write(data, indent = 4)
res52: String = """[
{
"id": 17991367,
"node_id": "MDc6UmVsZWFzZTE3OTkxMzY3",
"tag_name": "1.6.8",
"target_commitish": "master",
"name": "1.6.8",
"draft": false,
"author": {
"login": "Ammonite-Bot",
"id": 20607116,
"node_id": "MDQ6VXNlcjIwNjA3MTE2",
...
Often you do not just want dynamically-typed JSON trees: you expect the data to conform to a particular schema, have your code use that schema in a safe way, and fail early (and loudly!) if the incoming data doesn't conform. You can do that by defining a case class
representing the fields and types you expect to be present in the JSON, and using upickle.default.macroRW
and upickle.default.read
to extract those fields from the JSON:
@ println(ujson.write(data(0)("author"), indent=4))
{
"login": "Ammonite-Bot",
"id": 20607116,
"node_id": "MDQ6VXNlcjIwNjA3MTE2",
"gravatar_id": "",
"type": "User",
"site_admin": false
}
@ case class Author(login: String, id: Int, site_admin: Boolean)
defined class Author
@ implicit val authorRW = upickle.default.macroRW[Author]
@ val author = upickle.default.read[Author](data(0)("author"))
author: Author = Author("Ammonite-Bot", 20607116, false)
@ author.login
res60: String = "Ammonite-Bot"
@ author.id
res61: Int = 20607116
@ author.site_admin
res62: Boolean = false
Here, the field names in the case class Author
correspond to the fields in the JSON that you want to read, and the values are deserialized to the corresponding types (String
, Int
, Boolean
). Extra fields present in the input JSON are ignored. Note that every case class
you define must have a corresponding upickle.default.macroRW
statement.
If you want a JSON field to deserialize to a case class
field of a different name, you can use the @upickle.implicits.key
annotation:
@ case class Author(login: String,
id: Int,
@upickle.implicits.key("site_admin") siteAdmin: Boolean)
@ implicit val authorRW = upickle.default.macroRW[Author]
@ val author = upickle.default.read[Author](data(0)("author"))
author: Author = Author("Ammonite-Bot", 20607116, false)
Your Scala case class
es can be converted back into JSON using upickle.default.write
:
@ upickle.default.write(author)
res68: String = "{\"login\":\"Ammonite-Bot\",\"id\":20607116,\"site_admin\":false}"
@ println(upickle.default.write(author))
{"login":"Ammonite-Bot","id":20607116,"site_admin":false}
You can also deserialize to Seq
s and other builtin data structures. Here we read assets
, which is a JSON array of objects, into a Scala Seq[Asset]
:
@ ujson.write(data(0)("assets"), indent=4)
res75: String = """[
{
"id": 13194960,
"node_id": "MDEyOlJlbGVhc2VBc3NldDEzMTk0OTYw",
"name": "2.12-1.6.8",
"label": "",
"uploader": {
"login": "Ammonite-Bot",
"id": 20607116,
"node_id": "MDQ6VXNlcjIwNjA3MTE2",
"gravatar_id": "",
"type": "User",
"site_admin": false
},
"content_type": "application/octet-stream",
"state": "uploaded",
"size": 33951394,
"download_count": 833,
"created_at": "2019-06-14T07:54:16Z",
"updated_at": "2019-06-14T07:54:17Z"
},
{
"id": 13194961,
"node_id": "MDEyOlJlbGVhc2VBc3NldDEzMTk0OTYx",
...
@ case class Asset(id: Int, name: String)
@ implicit val assetRW = upickle.default.macroRW[Asset]
@ upickle.default.read[Seq[Asset]](data(0)("assets"))
res81: Seq[Asset] = List(
Asset(13194960, "2.12-1.6.8"),
Asset(13194961, "2.13-1.6.8"),
Asset(13199400, "2.12-1.6.8-1-c7a656e"),
Asset(13199401, "2.13-1.6.8-1-c7a656e"),
Asset(13220957, "2.12-1.6.8-2-0a2abd6"),
...
You can also deserialize nested case class
es:
@ case class Uploader(id: Int, login: String, `type`: String)
@ case class Asset(id: Int, name: String, uploader: Uploader)
@ implicit val uploaderRW = upickle.default.macroRW[Uploader]
@ implicit val assetRW = upickle.default.macroRW[Asset]
@ val assets = upickle.default.read[Seq[Asset]](data(0)("assets"))
assets: Seq[Asset] = List(
Asset(13194960, "2.12-1.6.8", Uploader(20607116, "Ammonite-Bot", "User")),
Asset(13194961, "2.13-1.6.8", Uploader(20607116, "Ammonite-Bot", "User")),
Asset(13199400, "2.12-1.6.8-1-c7a656e", Uploader(20607116, "Ammonite-Bot", "User")),
Asset(13199401, "2.13-1.6.8-1-c7a656e", Uploader(20607116, "Ammonite-Bot", "User")),
Asset(13220957, "2.12-1.6.8-2-0a2abd6", Uploader(20607116, "Ammonite-Bot", "User")),
Asset(13220958, "2.13-1.6.8-2-0a2abd6", Uploader(20607116, "Ammonite-Bot", "User")),
...
If you wish to store a dynamically typed JSON field within your case class, simply label it as ujson.Value
:
@ case class Asset(id: Int, name: String, uploader: ujson.Value)
@ implicit val assetRW = upickle.default.macroRW[Asset]
@ val assets = upickle.default.read[Seq[Asset]](data(0)("assets"))
assets: Seq[Asset] = List(
Asset(
13194960,
"2.12-1.6.8",
Obj(
LinkedHashMap(
"login" -> Str("Ammonite-Bot"),
"id" -> Num(2.0607116E7),
"node_id" -> Str("MDQ6VXNlcjIwNjA3MTE2"),
"gravatar_id" -> Str(""),
"type" -> Str("User"),
"site_admin" -> false
)
)
),
...
@ println(assets(0).uploader)
{"login":"Ammonite-Bot","id":20607116,"node_id":"MDQ6VXNlcjIwNjA3MTE2","gravatar_id":"","type":"User","site_admin":false}
@ println(assets(0).uploader.obj.keys)
Set(login, id, node_id, gravatar_id, type, site_admin)
Lastly, all our Scala data types can be converted back to JSON strings using upickle.default.write
:
@ upickle.default.write(assets, indent=4)
res90: String = """[
{
"id": 13194960,
"name": "2.12-1.6.8",
"uploader": {
"id": 20607116,
"login": "Ammonite-Bot",
"type": "User"
}
},
{
"id": 13194961,
"name": "2.13-1.6.8",
"uploader": {
"id": 20607116,
...
While the above examples all demonstrated using upickle.default.read
to read ujson.Value
s into typed Scala case classes, you can also use it to efficiently read raw String
s into case classes without the overhead of intermediate data structures.
This tutorial introduces you to the basics of working with JSON in a Scala program, using the uPickle library. We have walked through common workflows: reading JSON data, extracting values of interest from it, generating and modifying our own JSON data, traversing the JSON tree structure, and finally serializing and de-serializing typed Scala case class
es from the untyped JSON data.
This post only covers the basics of the uPickle JSON library. You can refer to the reference documentation for more details:
About the Author: Haoyi is a software engineer, and the author of many open-source Scala tools such as the Ammonite REPL and the Mill Build Tool. If you enjoyed the contents on this blog, you may also enjoy Haoyi's book Hands-on Scala Programming