File: README.md

package info (click to toggle)
ruby-batch-loader 1.4.1+dfsg.1-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 264 kB
  • sloc: ruby: 760; makefile: 5; sh: 4
file content (488 lines) | stat: -rw-r--r-- 20,001 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
# BatchLoader

[![Build Status](https://travis-ci.org/exAspArk/batch-loader.svg?branch=master)](https://travis-ci.org/exAspArk/batch-loader)
[![Coverage Status](https://coveralls.io/repos/github/exAspArk/batch-loader/badge.svg)](https://coveralls.io/github/exAspArk/batch-loader)
[![Code Climate](https://img.shields.io/codeclimate/maintainability/exAspArk/batch-loader.svg)](https://codeclimate.com/github/exAspArk/batch-loader/maintainability)
[![Downloads](https://img.shields.io/gem/dt/batch-loader.svg)](https://rubygems.org/gems/batch-loader)
[![Latest Version](https://img.shields.io/gem/v/batch-loader.svg)](https://rubygems.org/gems/batch-loader)

This gem provides a generic lazy batching mechanism to avoid N+1 DB queries, HTTP queries, etc.

Developers from these companies use `BatchLoader`:

<a href="https://about.gitlab.com/"><img src="images/gitlab.png" height="35" width="114" alt="GitLab" style="max-width:100%;"></a>
<img src="images/space.png" height="35" width="10" alt="" style="max-width:100%;">
<a href="https://www.netflix.com/"><img src="images/netflix.png" height="35" width="110" alt="Netflix" style="max-width:100%;"></a>
<img src="images/space.png" height="35" width="10" alt="" style="max-width:100%;">
<a href="https://www.alibaba.com/"><img src="images/alibaba.png" height="35" width="86" alt="Alibaba" style="max-width:100%;"></a>
<img src="images/space.png" height="35" width="10" alt="" style="max-width:100%;">
<a href="https://www.universe.com/"><img src="images/universe.png" height="35" width="137" alt="Universe" style="max-width:100%;"></a>
<img src="images/space.png" height="35" width="10" alt="" style="max-width:100%;">
<a href="https://www.wealthsimple.com/"><img src="images/wealthsimple.png" height="35" width="150" alt="Wealthsimple" style="max-width:100%;"></a>
<img src="images/space.png" height="35" width="10" alt="" style="max-width:100%;">
<a href="https://decidim.org/"><img src="images/decidim.png" height="35" width="94" alt="Decidim" style="max-width:100%;"></a>

## Contents

* [Highlights](#highlights)
* [Usage](#usage)
  * [Why?](#why)
  * [Basic example](#basic-example)
  * [How it works](#how-it-works)
  * [RESTful API example](#restful-api-example)
  * [GraphQL example](#graphql-example)
  * [Loading multiple items](#loading-multiple-items)
  * [Batch key](#batch-key)
  * [Caching](#caching)
  * [Replacing methods](#replacing-methods)
* [Installation](#installation)
* [API](#api)
* [Implementation details](#implementation-details)
* [Development](#development)
* [Related gems](#related-gems)
* [Contributing](#contributing)
* [Alternatives](#alternatives)
* [License](#license)
* [Code of Conduct](#code-of-conduct)

## Highlights

* Generic utility to avoid N+1 DB queries, HTTP requests, etc.
* Adapted Ruby implementation of battle-tested tools like [Haskell Haxl](https://github.com/facebook/Haxl), [JS DataLoader](https://github.com/facebook/dataloader), etc.
* Batching is isolated and lazy, load data in batch where and when it's needed.
* Automatically caches previous queries (identity map).
* Thread-safe (`loader`).
* No need to share batching through variables or custom defined classes.
* No dependencies, no monkey-patches, no extra primitives such as Promises.

## Usage

### Why?

Let's have a look at the code with N+1 queries:

```ruby
def load_posts(ids)
  Post.where(id: ids)
end

posts = load_posts([1, 2, 3])  #      Posts      SELECT * FROM posts WHERE id IN (1, 2, 3)
                               #      _ ↓ _
                               #    ↙   ↓   ↘
users = posts.map do |post|    #   U    ↓    ↓   SELECT * FROM users WHERE id = 1
  post.user                    #   ↓    U    ↓   SELECT * FROM users WHERE id = 2
end                            #   ↓    ↓    U   SELECT * FROM users WHERE id = 3
                               #    ↘   ↓   ↙
                               #      ¯ ↓ ¯
puts users                     #      Users
```

The naive approach would be to preload dependent objects on the top level:

```ruby
# With ORM in basic cases
def load_posts(ids)
  Post.where(id: ids).includes(:user)
end

# But without ORM or in more complicated cases you will have to do something like:
def load_posts(ids)
  # load posts
  posts = Post.where(id: ids)
  user_ids = posts.map(&:user_id)

  # load users
  users = User.where(id: user_ids)
  user_by_id = users.each_with_object({}) { |user, memo| memo[user.id] = user }

  # map user to post
  posts.each { |post| post.user = user_by_id[post.user_id] }
end

posts = load_posts([1, 2, 3])  #      Posts      SELECT * FROM posts WHERE id IN (1, 2, 3)
                               #      _ ↓ _      SELECT * FROM users WHERE id IN (1, 2, 3)
                               #    ↙   ↓   ↘
users = posts.map do |post|    #   U    ↓    ↓
  post.user                    #   ↓    U    ↓
end                            #   ↓    ↓    U
                               #    ↘   ↓   ↙
                               #      ¯ ↓ ¯
puts users                     #      Users
```

But the problem here is that `load_posts` now depends on the child association and knows that it has to preload data for future use. And it'll do it every time, even if it's not necessary. Can we do better? Sure!

### Basic example

With `BatchLoader` we can rewrite the code above:

```ruby
def load_posts(ids)
  Post.where(id: ids)
end

def load_user(post)
  BatchLoader.for(post.user_id).batch do |user_ids, loader|
    User.where(id: user_ids).each { |user| loader.call(user.id, user) }
  end
end

posts = load_posts([1, 2, 3])  #      Posts      SELECT * FROM posts WHERE id IN (1, 2, 3)
                               #      _ ↓ _
                               #    ↙   ↓   ↘
users = posts.map do |post|    #   BL   ↓    ↓
  load_user(post)              #   ↓    BL   ↓
end                            #   ↓    ↓    BL
                               #    ↘   ↓   ↙
                               #      ¯ ↓ ¯
puts users                     #      Users      SELECT * FROM users WHERE id IN (1, 2, 3)
```

As we can see, batching is isolated and described right in a place where it's needed.

### How it works

In general, `BatchLoader` returns a lazy object. Each lazy object knows which data it needs to load and how to batch the query. As soon as you need to use the lazy objects, they will be automatically loaded once without N+1 queries.

So, when we call `BatchLoader.for` we pass an item (`user_id`) which should be collected and used for batching later. For the `batch` method, we pass a block which will use all the collected items (`user_ids`):

<pre>
BatchLoader.for(post.<b>user_id</b>).batch do |<b>user_ids</b>, loader|
  ...
end
</pre>

Inside the block we execute a batch query for our items (`User.where`). After that, all we have to do is to call `loader` by passing an item which was used in `BatchLoader.for` method (`user_id`) and the loaded object itself (`user`):

<pre>
BatchLoader.for(post.<b>user_id</b>).batch do |user_ids, loader|
  User.where(id: user_ids).each { |user| loader.call(<b>user.id</b>, <b>user</b>) }
end
</pre>

When we call any method on the lazy object, it'll be automatically loaded through batching for all instantiated `BatchLoader`s:

<pre>
puts users # => SELECT * FROM users WHERE id IN (1, 2, 3)
</pre>

For more information, see the [Implementation details](#implementation-details) section.

### RESTful API example

Now imagine we have a regular Rails app with N+1 HTTP requests:

```ruby
# app/models/post.rb
class Post < ApplicationRecord
  def rating
    HttpClient.request(:get, "https://example.com/ratings/#{id}")
  end
end

# app/controllers/posts_controller.rb
class PostsController < ApplicationController
  def index
    posts = Post.limit(10)
    serialized_posts = posts.map { |post| {id: post.id, rating: post.rating} } # N+1 HTTP requests for each post.rating

    render json: serialized_posts
  end
end
```

As we can see, the code above will make N+1 HTTP requests, one for each post. Let's batch the requests with a gem called [parallel](https://github.com/grosser/parallel):

```ruby
class Post < ApplicationRecord
  def rating_lazy
    BatchLoader.for(post).batch do |posts, loader|
      Parallel.each(posts, in_threads: 10) { |post| loader.call(post, post.rating) }
    end
  end

  # ...
end
```

`loader` is thread-safe. So, if `HttpClient` is also thread-safe, then with `parallel` gem we can execute all HTTP requests concurrently in threads (there are some benchmarks for [concurrent HTTP requests](https://github.com/exAspArk/concurrent_http_requests) in Ruby). Thanks to Matz, MRI releases GIL when thread hits blocking I/O – HTTP request in our case.

In the controller, all we have to do is to replace `post.rating` with the lazy `post.rating_lazy`:

```ruby
class PostsController < ApplicationController
  def index
    posts = Post.limit(10)
    serialized_posts = posts.map { |post| {id: post.id, rating: post.rating_lazy} }

    render json: serialized_posts
  end
end
```

`BatchLoader` caches the loaded values. To ensure that the cache is purged between requests in the app add the following middleware to your `config/application.rb`:

```ruby
config.middleware.use BatchLoader::Middleware
```

See the [Caching](#caching) section for more information.

### GraphQL example

Batching is particularly useful with GraphQL. Using such techniques as preloading data in advance to avoid N+1 queries can be very complicated, since a user can ask for any available fields in a query.

Let's take a look at the simple [graphql-ruby](https://github.com/rmosolgo/graphql-ruby) schema example:

```ruby
Schema = GraphQL::Schema.define do
  query QueryType
end

QueryType = GraphQL::ObjectType.define do
  name "Query"
  field :posts, !types[PostType], resolve: ->(obj, args, ctx) { Post.all }
end

PostType = GraphQL::ObjectType.define do
  name "Post"
  field :user, !UserType, resolve: ->(post, args, ctx) { post.user } # N+1 queries
end

UserType = GraphQL::ObjectType.define do
  name "User"
  field :name, !types.String
end
```

If we want to execute a simple query like the following, we will get N+1 queries for each `post.user`:

```ruby
query = "
{
  posts {
    user {
      name
    }
  }
}
"
Schema.execute(query)
```

To avoid this problem, all we have to do is to change the resolver to return `BatchLoader::GraphQL` ([#32](https://github.com/exAspArk/batch-loader/pull/32) explains why not just `BatchLoader`):

```ruby
PostType = GraphQL::ObjectType.define do
  name "Post"
  field :user, !UserType, resolve: ->(post, args, ctx) do
    BatchLoader::GraphQL.for(post.user_id).batch do |user_ids, loader|
      User.where(id: user_ids).each { |user| loader.call(user.id, user) }
    end
  end
end
```

And setup GraphQL to use the built-in `lazy_resolve` method:

```ruby
Schema = GraphQL::Schema.define do
  query QueryType
  use BatchLoader::GraphQL
end
```

That's it.

### Loading multiple items

For batches where there is no item in response to a call, we normally return `nil`. However, you can use `:default_value` to return something else instead:

```ruby
BatchLoader.for(post.user_id).batch(default_value: NullUser.new) do |user_ids, loader|
  User.where(id: user_ids).each { |user| loader.call(user.id, user) }
end
```

For batches where the value is some kind of collection, such as an Array or Hash, `loader` also supports being called with a block, which yields the _current_ value, and returns the _next_ value. This is extremely useful for 1:Many relationships:

```ruby
BatchLoader.for(user.id).batch(default_value: []) do |user_ids, loader|
  Comment.where(user_id: user_ids).each do |comment|
    loader.call(comment.user_id) { |memo| memo << comment }
  end
end
```

### Batch key

It's possible to reuse the same `BatchLoader#batch` block for loading different types of data by specifying a unique `key`.
For example, with polymorphic associations:

```ruby
def lazy_association(post)
  id = post.association_id
  key = post.association_type

  BatchLoader.for(id).batch(key: key) do |ids, loader, args|
    model = Object.const_get(args[:key])
    model.where(id: ids).each { |record| loader.call(record.id, record) }
  end
end
post1 = Post.save(association_id: 1, association_type: 'Tag')
post2 = Post.save(association_id: 1, association_type: 'Category')

lazy_association(post1) # SELECT * FROM tags WHERE id IN (1)
lazy_association(post2) # SELECT * FROM categories WHERE id IN (1)
```

It's also required to pass custom `key` when using `BatchLoader` with metaprogramming (e.g. `eval`).

### Caching

By default `BatchLoader` caches the loaded values. You can test it by running something like:

```ruby
def user_lazy(id)
  BatchLoader.for(id).batch do |ids, loader|
    User.where(id: ids).each { |user| loader.call(user.id, user) }
  end
end

puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
# => <#User:...>

puts user_lazy(1) # no request
# => <#User:...>
```

Usually, it's just enough to clear the cache between HTTP requests in the app. To do so, simply add the middleware:

```ruby
use BatchLoader::Middleware
```

To drop the cache manually you can run:

```ruby
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
puts user_lazy(1) # no request

BatchLoader::Executor.clear_current

puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
```

In some rare cases it's useful to disable caching for `BatchLoader`. For example, in tests or after data mutations:

```ruby
def user_lazy(id)
  BatchLoader.for(id).batch(cache: false) do |ids, loader|
    # ...
  end
end

puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
puts user_lazy(1) # SELECT * FROM users WHERE id IN (1)
```

If you set `cache: false`, it's likely you also want `replace_methods: false` (see below section).

### Replacing methods

By default, `BatchLoader` replaces methods on its instance by calling `#define_method` after batching to copy methods from the loaded value.
This consumes some time but allows to speed up any future method calls on the instance.
In some cases, when there are a lot of instances with a huge number of defined methods, this initial process of replacing the methods can be slow.
You may consider avoiding the "up front payment" and "pay as you go" with `#method_missing` by disabling the method replacement:

```ruby
BatchLoader.for(id).batch(replace_methods: false) do |ids, loader|
  # ...
end
```

## Installation

Add this line to your application's Gemfile:

```ruby
gem 'batch-loader'
```

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install batch-loader

## API

```ruby
BatchLoader.for(item).batch(
  default_value: default_value,
  cache: cache,
  replace_methods: replace_methods,
  key: key
) do |items, loader, args|
  # ...
end
```

| Argument Key      | Default                                                              | Description                                                                           |
| ---------------   | ---------------------------------------------                        | -------------------------------------------------------------                         |
| `item`            | -                                                                    | Item which will be collected and used for batching.                                   |
| `default_value`   | `nil`                                                                | Value returned by default after batching.                                             |
| `cache`           | `true`                                                               | Set `false` to disable caching between the same executions.                           |
| `replace_methods` | `true`                                                               | Set `false` to use `#method_missing` instead of replacing the methods after batching. |
| `key`             | `nil`                                                                | Pass custom key to uniquely identify the batch block.                                 |
| `items`           | -                                                                    | List of collected items for batching.                                                 |
| `loader`          | -                                                                    | Lambda which should be called to load values loaded in batch.                         |
| `args`            | `{default_value: nil, cache: true, replace_methods: true, key: nil}` | Arguments passed to the `batch` method.                                               |

## Implementation details

See the [slides](https://speakerdeck.com/exaspark/batching-a-powerful-way-to-solve-n-plus-1-queries) [37-42].

## Development

After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).

## Related gems

These gems are built by using `BatchLoader`:

* [decidim-core](https://github.com/decidim/decidim/) – participatory democracy framework made with Ruby on Rails.
* [ams_lazy_relationships](https://github.com/Bajena/ams_lazy_relationships/) – ActiveModel Serializers add-on for eliminating N+1 queries.
* [batch-loader-active-record](https://github.com/mathieul/batch-loader-active-record/) – ActiveRecord lazy association generator to avoid N+1 DB queries.

## Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/exAspArk/batch-loader. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.

## Alternatives

There are some other Ruby implementations for batching such as:

* [shopify/graphql-batch](https://github.com/shopify/graphql-batch)
* [sheerun/dataloader](https://github.com/sheerun/dataloader)

However, `batch-loader` has some differences:

* It is implemented for general usage and can be used not only with GraphQL. In fact, we use it for RESTful APIs and GraphQL on production at the same time.
* It doesn't try to mimic implementations in other programming languages which have an asynchronous nature. So, it doesn't load extra dependencies to bring such primitives as Promises, which are not very popular in Ruby community.
Instead, it uses the idea of lazy objects, which are included in the [Ruby standard library](https://ruby-doc.org/core-2.4.1/Enumerable.html#method-i-lazy). These lazy objects allow one to return the necessary data at the end when it's necessary.
* It doesn't force you to share batching through variables or custom defined classes, just pass a block to the `batch` method.
* It doesn't require to return an array of the loaded objects in the same order as the passed items. I find it difficult to satisfy these constraints: to sort the loaded objects and add `nil` values for the missing ones. Instead, it provides the `loader` lambda which simply maps an item to the loaded object.
* It doesn't depend on any other external dependencies. For example, no need to load huge external libraries for thread-safety, the gem is thread-safe out of the box.

## License

The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).

## Code of Conduct

Everyone interacting in the Batch::Loader project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/exAspArk/batch-loader/blob/master/CODE_OF_CONDUCT.md).