1. Introduction
Oftentimes, your business logic relies on remote data that you need to fetch from different sources: databases, caches, web services, or third party APIs, and you can’t mess things up. Urania helps you to keep your business logic clear of low-level details while performing efficiently:
-
batch multiple requests to the same data source
-
request data from multiple data sources concurrently
-
cache previous requests
Having all this gives you the ability to access remote data sources in a concise and consistent way, while the library handles batching and overlapping requests to multiple data sources behind the scenes.
2. Installation
The simplest way to use urania in a Clojure project is by including it as a dependency in your project.clj:
[funcool/urania "0.1.0"]
2.1. Limitations
-
requires Java 8 when used from Clojure due to its use of
java.util.concurrent.CompletableFuture
-
works with the Promesa library only (if you use other async mechanism like futures you can easily turn your code to be compatible with promises)
-
assumes your operations with data sources are "side-effect free", so you don’t really care about the order of fetches
-
you need enough memory to store the whole data fetched during a single
run!
call (in case it’s impossible you should probably look into other ways to solve your problem, i.e. data stream libraries)
3. User Guide
3.1. Rationale
A core problem of many systems is balancing expressiveness against performance.
Let’s imagine the problem of calculating the number of friends in common that two users have, where we fetch the user data from a remote data source.
(require '[clojure.set :refer [intersection]])
(defn friends-of
[id]
;; ...
)
(defn count-common
[a b]
(count (intersection a b)))
(defn count-common-friends
[x y]
(count-common (friends-of x) (friends-of y)))
(count-common-friends 1 2)
Here, (friends-of x)
and (friends-of y)
are independent, and you want them to be fetched concurrently
or in a single request. Furthermore, if x
and y
refer to the same person, you don’t want to redundantly re-fetch
their friend list. What would the code look like if we applied the mentioned optimizations? We’d have to mix
different concerns like caching and batching together with the business logic we perform with the data.
Urania allows your data fetches to be implicitly concurrent with little changes to the original code, here’s a spoiler:
(require '[urania.core :as u])
(defn count-common-friends [x y]
(u/map count-common
(friends-of x)
(friends-of y)))
(u/run! (count-common-friends 1 2))
As you may have noticed, Urania does so separating the data fetch declaration from its execution. When running
your fetches, urania
will:
-
request data from multiple data sources concurrently
-
batch multiple requests to the same data source
-
cache repeated requests
3.2. Fetching data from remote sources
Reading data from remote data sources is usually asynchronous and/or has the possibility of error, that’s why urania
uses the
Promise type available in the Promesa library as its result abstraction.
We’ll start by writing a small function for emulating data sources with unpredictable latency. In ClojureScript:
(require '[promesa.core :as prom])
(defn remote-req [id result]
(prom/promise
(fn [resolve reject]
(let [wait (rand 1000)]
(println (str "-->[ " id " ] waiting " wait))
(js/setTimeout #(do (println (str "<--[ " id " ] finished, result: " result))
(resolve result))
wait)))))
and in Clojure:
(require '[promesa.core :as prom])
(defn remote-req [id result]
(prom/promise
(fn [resolve reject]
(let [wait (rand 1000)]
(println (str "-->[ " id " ] waiting " wait))
(Thread/sleep wait)
(println (str "<--[ " id " ] finished, result: " result))
(resolve result)))))
3.3. Remote data sources
Now, we define our data sources as types that implement Urania’s DataSource
protocol. This protocol
has two functions:
-
-identity
, which returns an identifier for the resource (used for caching and deduplication). -
-fetch
, which fetches the result from the remote data source returning a promise.
(require '[urania.core :as u])
(defrecord FriendsOf [id]
u/DataSource
(-identity [_] id)
(-fetch [_ _]
(remote-req id (set (range id)))))
(defn friends-of [id]
(FriendsOf. id))
Now let’s try to fetch some data with Urania.
We’ll use urania.core/run! for running a fetch, it returns a promise.
(u/run! (friends-of 10))
;; -->[ 10 ] waiting 510.17118249719886
;; => #<Promise [~]>
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
We can block for the promise’s result with deref:
(deref
(u/run! (friends-of 10)))
;; -->[ 10 ] waiting 265.2789087406875
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; => #{0 7 1 4 6 3 2 9 5 8}
Or use Urania’s run!! function. Note that we can only block in Clojure, not in ClojureScript.
(u/run!! (friends-of 10))
;; -->[ 10 ] waiting 265.2789087406875
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; => #{0 7 1 4 6 3 2 9 5 8}
For convenience, the rest of the documentation will be using run!!
although is not available in
ClojureScript.
3.3.1. Transforming fetched data
We can use urania.core/map
function for transforming results of a data source.
(u/run!!
(u/map count (friends-of 10)))
;; -->[ 10 ] waiting 463.370748219846
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; => 10
And compose multiple transformations together:
(u/run!!
(u/map dec (u/map count (friends-of 10))))
;; -->[ 10 ] waiting 463.370748219846
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; => 9
3.3.2. Dependencies between results
Let’s imagine we have another information we want to fetch: a user’s activity score. For fetching
a user’s activity score we’ll need to fetch the user first, and urania
provides a combinator
for doing so: urania.core/mapcat
.
First of all, let’s define our activity score data source:
(defrecord ActivityScore [id]
u/DataSource
(-identity [_] id)
(-fetch [_ _]
(remote-req id (inc id))))
(defn activity
[id]
(ActivityScore. id))
Now we want to fetch the activity scores of the first friend of a certain user. We need to know intermediate
results of a fetch to continue, so we use urania.core/mapcat
:
(defn first-friends-activity
[id]
(u/mapcat (fn [friends]
(activity (first friends)))
(friends-of id)))
We can now run this fetch:
(u/run!! (first-friends-activity 10))
;; -->[ 10 ] waiting 575.5289747556875
;; <--[ 10 ] finished, result: #{0 7 1 4 6 3 2 9 5 8}
;; -->[ 0 ] waiting 63.24540090623976
;; <--[ 0 ] finished, result: 1
;; => 1
But, what if we wanted the activity score for every friend of a user? urania
provides a combinator for
transforming a list of data sources into a data source that returns a list of results: urania.core/collect
.
Let’s use it to collect the activity score for every user:
(defn friends-activity
[id]
(u/mapcat (fn [friends]
(u/collect (map activity friends)))
(friends-of id)))
If we run it:
(u/run!! (friends-activity 5))
;; -->[ 5 ] waiting 480.8846764476696
;; <--[ 5 ] finished, result: #{0 1 4 3 2}
;; -->[ 0 ] waiting 488.58045819535687
;; -->[ 1 ] waiting 87.96784013662884
;; -->[ 4 ] waiting 868.2747930486679
;; <--[ 1 ] finished, result: 2
;; -->[ 3 ] waiting 293.59429652774116
;; <--[ 3 ] finished, result: 4
;; -->[ 2 ] waiting 280.68098217346835
;; <--[ 0 ] finished, result: 1
;; <--[ 2 ] finished, result: 3
;; <--[ 4 ] finished, result: 5
;; => [1 2 5 4 3]
As you may have noticed, the data sources passed to urania.core/collect
are fetched concurrently. Furthermore,
it will detect and eliminate duplicate requests:
(u/run!! (u/collect [(friends-of 1) (friends-of 2) (friends-of 2)]))
;; -->[ 2 ] waiting 634.8383950264134
;; -->[ 1 ] waiting 924.8381446535985
;; <--[ 2 ] finished, result: #{0 1}
;; <--[ 1 ] finished, result: #{0}
;; => [#{0} #{0 1} #{0 1}]
See how the friends of the user with id 2
are only fetched once, even when is duplicated in the collection passed to
urania.core/collect
.
The above pattern of using collect
and mapcat
is captured by the urania.core/traverse
combinator so we can just write:
(defn friends-activity
[id]
(u/traverse activity (friends-of id)))
3.3.3. Batching requests
We’ve seen that urania
organizes and deduplicates fetches for us but there is still room for improvement. In our
examples using urania.core/collect
, we’ve seen how requests to the same data source are run concurrently.
In many cases, remote data sources will offer a batch API that we can use to reduce latency when fetching multiple
results. If our data sources can be fetched in batches, urania
can detect it and optimize our fetches further.
Let’s add batch fetching to the ActivityScore
, we just need to implement the BatchedSource
protocol. It
has only one method: -fetch-multi
, which receives the data sources to fetch and must return a promise with a
map from the data source identities to their results.
(extend-type ActivityScore
u/BatchedSource
(-fetch-multi [score scores _]
(let [ids (cons (:id score) (map :id scores))]
(remote-req ids (zipmap ids (map inc ids))))))
Let’s try to run our friends-activity
again:
(u/run!! (friends-activity 5))
;; -->[ 5 ] waiting 123.11807342157954
;; <--[ 5 ] finished, result: #{0 1 4 3 2}
;; -->[ (0 1 4 3 2) ] waiting 97.95578032830765
;; <--[ (0 1 4 3 2) ] finished, result: {0 1, 1 2, 4 5, 3 4, 2 3}
;; [1 2 5 4 3]
Our previous fetch of (friends-activity 5)
did n + 1
requests to fetch remote data, where n
is
the number of results of the first query, and we’ve been able to reduce it to 2! This is called the
N + 1 selects problem and urania
solves it nicely.
3.3.4. Fetching data conditionally
Sometimes we want to fetch some data conditionally. For example, when one of our data sources may have a relation with another but this relation is conditional, we may want to fetch the related data of the ones that have it.
Let’s imagine for a moment that some friends may have a pet associated. For simplicity, we will assume that users with an even id have a pet, and users with an odd id don’t. In real examples you may want to inspect the user’s payload and make the decision accordingly.
We want to fetch the pet for those users that have it, so how do we solve this? We can use another combinator
that the library provides: urania.core/value
. First of all, let’s define the data source for pets.
(defrecord Pet [id]
u/DataSource
(-identity [_] id)
(-fetch [_ _]
(remote-req id :dog)))
(defn pet
[id]
(Pet. id))
Now that we have it, we can define a function for fetching our friends and their pets:
(defn fetch-pet
[usr]
(if (even? usr)
(pet usr)
(u/value :no-pet)))
(defn friends-with-pets
[id]
(u/traverse fetch-pet (friends-of id)))
Note how requests for a pet only are sent when the user has a pet:
(u/run!! (friends-with-pets 3))
;; -->[ 3 ] waiting 510.9166733502036
;; <--[ 3 ] finished, result: #{0 1 2}
;; -->[ 0 ] waiting 936.1072648404947
;; -->[ 2 ] waiting 145.6521664212589
;; <--[ 2 ] finished, result: :dog
;; <--[ 0 ] finished, result: :dog
;; => [:dog :no-pet :dog]
4. Advanced usage
While providing a convenient high-level API, urania
allows you to customize how your fetches are run.
4.1. Caching
urania
stores intermediate results in a cache, grouping data sources by their name and mapping their
identity to the fetched value. You can run a fetch and get back both the final cache and the results using
urania.core/evaluate!
instead of urania.core/run!
.
Let’s define a simple data source and fetch some results with urania.core/evaluate!
to see the cached
values:
(deftype Simple [id result]
u/DataSource
(-identity [_] id)
(-fetch [_ _] (prom/resolved result)))
(deref
(u/evaluate! (Simple. 1 42)))
;; => [42 {"user.Simple" {1 42}}]
You can see how the resulting promise will have a two-element vector, the first being the result and the second the cache.
We now can run the same fetch without even having to call -fetch
again, just by passing a prepopulated cache.
We pass it under the :cache
keyword to the urania.core/run!
method’s options:
(u/run! (Simple. 1 42) {:cache {"user.Simple" {1 42}}})
Note that both urania.core/run!!
and urania.core/execute!
support receiving an options map with the cache.
If you want to programmaticaly populate a cache, you can do so easily:
(def simple1 (Simple. 1 42))
(def simple2 (Simple. 2 99))
{(u/resource-name simple1) {(u/cache-id simple1) 42
(u/cache-id simple2) 99}}
;; => {"user.Simple" {1 42, 2 99}}
4.2. Executor
urania
will run your fetch functions asynchronously by default. In Clojure it’ll use java.util.concurrent.ForkJoinPool/commonPool
whereas
in ClojureScript will use the global setTimeout
function.
However, you can customize how the fetch functions are run providing a custom executor as an option. In Clojure, you can pass any object that implements java.util.concurrent.Executor
and it will just work. If you want more fine-grained control you must pass a type implementing the IExecutor
protocol.
Let’s implement a dummy synchronous executor as an example and use it:
(def sync-executor
(reify u/IExecutor
(-execute [_ task]
(task))))
(u/run!! (Simple. 1 42) {:executor sync-executor})
;; => 42
4.3. Environment
The data fetching is commonly performed using stateful objects like a database connection, HTTP client and so on. You
may have noticed that both -fetch
and -fetch-multi
take a last argument that we haven’t used so far: the environment.
The environment is a way for passing arguments to the fetch and fetch-multi functions, let’s see it in action:
(defrecord Environment [id]
u/DataSource
(-identity [_] id)
(-fetch [_ env] (prom/resolved env)))
(u/run!! (Environment. 1) {:env {:connection :a-connection}})
;; => {:connection :a-connection}
As you can see, the :env
value that we pass in the options is available for single fetches. Batched fetches are no
exception:
(defrecord Environment [id]
u/DataSource
(-identity [_] id)
(-fetch [_ env] (prom/resolved env))
u/BatchedSource
(-fetch-multi [_ envs env]
(let [ids (cons id (map :id envs))]
(prom/resolved (zipmap ids (map vector ids (repeat env)))))))
(u/run!! (u/collect [(Environment. 1) (Environment. 2)])
{:env {:connection :a-connection}})
;; => [[1 {:connection :a-connection}] [2 {:connection :a-connection}]]
5. More resources
5.1. Talks
-
"Reinventing Haxl: Efficient, Concurrent and Concise Data Access" at EuroClojure 2015: [Video](https://goo.gl/masrsz), [Slides](https://goo.gl/h4Zuvr)
6. Development
6.1. Contributing
Just open an issue or pull request.
7. Acknowledgements
urania
is based on the initial work on Muse by Alexey Kachayev. It is also heavily inspired by:
-
Haxl (https://github.com/facebook/Haxl) - Haskell library, Facebook, open-sourced
-
Stitch (https://www.youtube.com/watch?v=VVpmMfT8aYw) - Scala library, Twitter, not open-sourced
8. License
Copyright (c) 2015 Alexey Kachayev
Copyright (c) 2015 Alejandro Gómez <alejandro@dialelo.com>
Copyright (c) 2015 Andrey Antukh <niwi@niwi.nz>
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.