File: README.md

package info (click to toggle)
ruby-beaneater 1.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, buster, sid, stretch
  • size: 276 kB
  • ctags: 132
  • sloc: ruby: 1,602; sh: 4; makefile: 2
file content (486 lines) | stat: -rw-r--r-- 17,941 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
# Beaneater
[![Build Status](https://secure.travis-ci.org/beanstalkd/beaneater.png)](http://travis-ci.org/beanstalkd/beaneater)
[![Coverage Status](https://coveralls.io/repos/beanstalkd/beaneater/badge.png?branch=master)](https://coveralls.io/r/beanstalkd/beaneater)

Beaneater is the best way to interact with beanstalkd from within Ruby.
[Beanstalkd](http://kr.github.com/beanstalkd/) is a simple, fast work queue. Its interface is generic, but was
originally designed for reducing the latency of page views in high-volume web applications by
running time-consuming tasks asynchronously. Read the [yardocs](http://rdoc.info/github/beanstalkd/beaneater) and/or the
[beanstalk protocol](https://github.com/kr/beanstalkd/blob/master/doc/protocol.md) for more details.

**Important Note**: This README is **for branch 1.0.x which is under development**. Please switch to latest `0.x` branch for stable version.

## Why Beanstalk?

Illya has an excellent blog post
[Scalable Work Queues with Beanstalk](http://www.igvita.com/2010/05/20/scalable-work-queues-with-beanstalk/) and
Adam Wiggins posted [an excellent comparison](http://adam.heroku.com/past/2010/4/24/beanstalk_a_simple_and_fast_queueing_backend/).

You will find that **beanstalkd** is an underrated but incredibly
powerful project that is extremely well-suited as a job or messaging queue.
Significantly better suited for this task than Redis or a traditional RDBMS. Beanstalk is a simple,
and fast work queue service rolled into a single binary - it is the memcached of work queues.
Originally built to power the backend for the 'Causes' Facebook app,
it is a mature and production ready open source project.
[PostRank](http://www.postrank.com) has used beanstalk to reliably process millions of jobs a day.

A single instance of Beanstalk is perfectly capable of handling thousands of jobs a second (or more, depending on your job size)
because it is an in-memory, event-driven system. Powered by libevent under the hood,
it requires zero setup (launch and forget, à la memcached), optional log based persistence,
an easily parsed ASCII protocol, and a rich set of tools for job management
that go well beyond a simple FIFO work queue.

Beanstalkd supports the following features out of the box:

| Feature | Description                     |
| ------- | ------------------------------- |
| **Easy Setup**       | Quick to install, no files to edit, no settings to tweak. |
| **Speed**            | Process thousands of jobs per second without breaking a sweat. |
| **Client Support**   | Client libraries exist for over 21 languages including Python, Ruby, and Go. |
| **Tubes**            | Supports multiple work queues created automatically on demand. |
| **Reliable**         | Beanstalk’s reserve, work, delete cycle ensures reliable processing. |
| **Scheduling**       | Delay enqueuing jobs by a specified interval to be processed later. |
| **Priorities**       | Important jobs go to the head of the queue and get processed sooner. |
| **Persistence**      | Jobs are stored in memory for speed, but logged to disk for safe keeping. |
| **Scalability**      | Client-side federation provides effortless horizontal scalability. |
| **Error Handling**   | Bury any job which causes an error for later debugging and inspection. |
| **Simple Debugging** | Talk directly to the beanstalkd server over telnet to get a handle on your app. |
| **Efficiency**       | Each beanstalkd process can handle tens of thousands of open connections. |
| **Memory Usage**     | Use the built-in `ulimit` OS feature to cap beanstalkd's memory consumption. |

Keep in mind that these features are supported out of the box with beanstalk and requires no special ruby specific logic.
In the end, **beanstalk is the ideal job queue** and has the added benefit of being easy to setup and configure.

## Installation

Install beanstalkd:

Mac OS

```
brew install beanstalkd
beanstalkd -p 11300
```

Ubuntu

```
apt-get install beanstalkd
beanstalkd -p 11300
```

Install beaneater as a gem:

```
gem install beaneater
```

or add this to your Gemfile:

```ruby
# Gemfile
gem 'beaneater'
```

and run `bundle install` to install the dependency.

## Breaking Changes since 1.0!

Starting in 1.0, we removed the concept of the `Beaneater::Pool` which introduced considerable complexity into this gem.

* Beginning from version 1.0.0 the support for `Beaneater::Pool` has been dropped.
The specific feature may be supported again in the next versions as separate module
or through a separate gem. If you want to use the pool feature you should switch to
0.x stable branches instead.
* `Jobs#find_all` method has been removed, since it is no longer necessary.

To manage a pool of beanstalkd instances, we'd prefer to leave the handling to the developer or other higher-level libraries.

## Quick Overview:

The concise summary of how to use beaneater:

```ruby
# Connect to pool
@beanstalk = Beaneater.new('localhost:11300')
# Enqueue jobs to tube
@tube = @beanstalk.tubes["my-tube"]
@tube.put '{ "key" : "foo" }', :pri => 5
@tube.put '{ "key" : "bar" }', :delay => 3
# Process jobs from tube
while @tube.peek(:ready)
  job = @tube.reserve
  puts "job value is #{JSON.parse(job.body)["key"]}!"
  job.delete
end
# Disconnect the pool
@beanstalk.close
```

For a more detailed rundown, check out the __Usage__ section below.

## Usage

### Configuration

To setup advanced options for beaneater, you can pass configuration options using:

```ruby
Beaneater.configure do |config|
  # config.default_put_delay   = 0
  # config.default_put_pri     = 65536
  # config.default_put_ttr     = 120
  # config.job_parser          = lambda { |body| body }
  # config.job_serializer      = lambda { |body| body }
  # config.beanstalkd_url      = 'localhost:11300'
end
```

The above options are all defaults, so only include a configuration block if you need to make changes.

### Connection

To interact with a beanstalk queue, first establish a connection by providing an address:

```ruby
@beanstalk = Beaneater.new('10.0.1.5:11300')

# Or if ENV['BEANSTALKD_URL'] == '127.0.0.1:11300'
@beanstalk = Beaneater.new
@beanstalk.connectiont # => localhost:11300
```

You can conversely close and dispose of a connection at any time with:

```ruby
@beanstalk.close
```

### Tubes

Beanstalkd has one or more tubes which can contain any number of jobs.
Jobs can be inserted (put) into the used tube and pulled out (reserved) from watched tubes.
Each tube consists of a _ready_, _delayed_, and _buried_ queue for jobs.

When a client connects, its watch list is initially just the tube named `default`.
Tube names are at most 200 bytes. It specifies the tube to use. If the tube does not exist, it will be automatically created.

To interact with a tube, first `find` the tube:

```ruby
@tube = @beanstalk.tubes.find "some-tube-here"
# => <Tube name='some-tube-here'>
```

To reserve jobs from beanstalk, you will need to 'watch' certain tubes:

```ruby
# Watch only the tubes listed below (!)
@beanstalk.tubes.watch!('some-tube')
# Append tubes to existing set of watched tubes
@beanstalk.tubes.watch('another-tube')
# You can also ignore tubes that have been watched previously
@beanstalk.tubes.ignore('some-tube')
```

You can easily get a list of all, used or watched tubes:

```ruby
# The list-tubes command returns a list of all existing tubes
@beanstalk.tubes.all
# => [<Tube name='foo'>, <Tube name='bar'>]

# Returns the tube currently being used by the client (for insertion)
@beanstalk.tubes.used
# => <Tube name='bar'>

# Returns a list tubes currently being watched by the client (for consumption)
@beanstalk.tubes.watched
# => [<Tube name='foo'>]
```

You can also temporarily 'pause' the execution of a tube by specifying the time:

```ruby
tube = @beanstalk.tubes["some-tube-here"]
tube.pause(3) # pauses tube for 3 seconds
```

or even clear the tube of all jobs:

```ruby
tube = @beanstalk.tubes["some-tube-here"]
tube.clear # tube will now be empty
```

In summary, each beanstalk client manages two separate concerns: which tube newly created jobs are put into,
and which tube(s) jobs are reserved from. Accordingly, there are two separate sets of functions for these concerns:

  * **use** and **using** affect where 'put' places jobs
  * **watch** and **watching** control where reserve takes jobs from

Note that these concerns are fully orthogonal: for example, when you 'use' a tube, it is not automatically 'watched'.
Neither does 'watching' a tube affect the tube you are 'using'.

### Jobs

A job in beanstalk gets inserted by a client and includes the 'body' and job metadata.
Each job is enqueued into a tube and later reserved and processed. Here is a picture of the typical job lifecycle:

```
   put            reserve               delete
  -----> [READY] ---------> [RESERVED] --------> *poof*
```

A job at any given time is in one of three states: **ready**, **delayed**, or **buried**:

| State   | Description                     |
| ------- | ------------------------------- |
| ready   | waiting to be `reserved` and processed after being `put` onto a tube. |
| delayed | waiting to become `ready` after the specified delay. |
| buried  | waiting to be kicked, usually after job fails to process |

In addition, there are several actions that can be performed on a given job, you can:

 * **reserve** which locks a job from the ready queue for processing.
 * **touch** which extends the time before a job is autoreleased back to ready.
 * **release** which places a reserved job back onto the ready queue.
 * **delete** which removes a job from beanstalk.
 * **bury** which places a reserved job into the buried state.
 * **kick** which places a buried job from the buried queue back to ready.

You can insert a job onto a beanstalk tube using the `put` command:

```ruby
@tube.put "job-data-here"
```

Beanstalkd can only stores strings as job bodies, but you can easily encode your data into a string:

```ruby
@tube.put({:foo => 'bar'}.to_json)
```

Moreover, you can provide a default job serializer by setting the corresponding configuration
option (`job_serializer`), in order to apply the encoding on each job body which
is going to be send using the `put` command. For example, to encode a ruby object to JSON format:

```ruby
Beaneater.configure do |config|
  config.job_serializer = lambda { |body| JSON.dump(body) }
end
```

Each job has various metadata associated such as `priority`, `delay`, and `ttr` which can be
specified as part of the `put` command:

```ruby
# defaults are priority 0, delay of 0 and ttr of 120 seconds
@tube.put "job-data-here", :pri => 1000, :delay => 50, :ttr => 200
```

The `priority` argument is an integer < 2**32. Jobs with a smaller priority take precedence over jobs with larger priorities.
The `delay` argument is an integer number of seconds to wait before putting the job in the ready queue.
The `ttr` argument is the time to run -- is an integer number of seconds to allow a worker to run this job.

### Processing Jobs (Manually)

In order to process jobs, the client should first specify the intended tubes to be watched. If not specified,
this will default to watching just the `default` tube.

```ruby
@beanstalk = Beaneater.new('10.0.1.5:11300')
@beanstalk.tubes.watch!('tube-name', 'other-tube')
```

Next you can use the `reserve` command which will return the first available job within the watched tubes:

```ruby
job = @beanstalk.tubes.reserve
# => <Beaneater::Job id=5 body="foo">
puts job.body
# prints 'job-data-here'
print job.stats.state # => 'reserved'
```

By default, reserve will wait indefinitely for the next job. If you want to specify a timeout,
simply pass that in seconds into the command:

```ruby
job = @beanstalk.tubes.reserve(5) # wait 5 secs for a job, then return
# => <Beaneater::Job id=5 body="foo">
```

You can 'release' a reserved job back onto the ready queue to retry later:

```ruby
job = @beanstalk.tubes.reserve
# ...job has ephemeral fail...
job.release :delay => 5
print job.stats.state # => 'delayed'
```

You can also 'delete' jobs that are finished:

```ruby
job = @beanstalk.tubes.reserve
job.touch # extends ttr for job
# ...process job...
job.delete
```

Beanstalk jobs can also be buried if they fail, rather than being deleted:

```ruby
job = @beanstalk.tubes.reserve
# ...job fails...
job.bury
print job.stats.state # => 'buried'
```

Burying a job means that the job is pulled out of the queue into a special 'holding' area for later inspection or reuse.
To reanimate this job later, you can 'kick' buried jobs back into being ready:

```ruby
@beanstalk.tubes['some-tube'].kick(3)
```

This kicks 3 buried jobs for 'some-tube' back into the 'ready' state. Jobs can also be
inspected using the 'peek' commands. To find and peek at a particular job based on the id:

```ruby
@beanstalk.jobs.find(123)
# => <Beaneater::Job id=123 body="foo">
```

or you can peek at jobs within a tube:

```ruby
@tube = @beanstalk.tubes.find('foo')
@tube.peek(:ready)
# => <Beaneater::Job id=123 body="ready">
@tube.peek(:buried)
# => <Beaneater::Job id=456 body="buried">
@tube.peek(:delayed)
# => <Beaneater::Job id=789 body="delayed">
```

When dealing with jobs there are a few other useful commands available:

```ruby
job = @beanstalk.tubes.reserve
print job.tube      # => "some-tube-name"
print job.reserved? # => true
print job.exists?   # => true
job.delete
print job.exists?   # => false
```

### Processing Jobs (Automatically)

Instead of using `watch` and `reserve`, you can also use the higher level `register` and `process` methods to
process jobs. First you can 'register' how to handle jobs from various tubes:

```ruby
@beanstalk.jobs.register('some-tube', :retry_on => [SomeError]) do |job|
  do_something(job)
end

@beanstalk.jobs.register('other-tube') do |job|
  do_something_else(job)
end
```

Once you have registered the handlers for known tubes, calling `process!` will begin a
loop processing jobs as defined by the registered processor blocks:

```ruby
@beanstalk.jobs.process!
```

Processing runs the following steps:

 1. Watch all registered tubes
 1. Reserve the next job
 1. Once job is reserved, invoke the registered handler based on the tube name
 1. If no exceptions occur, delete the job (success)
 1. If 'retry_on' exceptions occur, call 'release' (retry)
 1. If other exception occurs, call 'bury' (error)
 1. Repeat steps 2-5

The `process!` command is ideally suited for a beanstalk job processing daemon. 
Even though `process!` is intended to be a long-running process, you can stop the loop at any time
by raising `AbortProcessingError` while processing is running.

### Handling Errors

While using Beaneater, certain errors may be encountered. Errors are encountered when
a command is sent to beanstalk and something unexpected happens. The most common errors
are listed below:

| Errors                      | Description   |
| --------------------        | ------------- |
| Beaneater::NotConnected     | Client connection to beanstalk cannot be established. |
| Beaneater::InvalidTubeName  | Specified tube name for use or watch is not valid.    |
| Beaneater::NotFoundError    | Specified job or tube could not be found.             |
| Beaneater::TimedOutError    | Job could not be reserved within time specified.      |
| Beaneater::JobNotReserved   | Job has not been reserved and action cannot be taken. |

There are other exceptions that are less common such as `OutOfMemoryError`, `DrainingError`,
`DeadlineSoonError`, `InternalError`, `BadFormatError`, `UnknownCommandError`,
`ExpectedCRLFError`, `JobTooBigError`, `NotIgnoredError`. Be sure to check the
[beanstalk protocol](https://github.com/kr/beanstalkd/blob/master/doc/protocol.md) for more information.


### Stats

Beanstalk has plenty of commands for introspecting the state of the queues and jobs. To get stats for
beanstalk overall:

```ruby
# Get overall stats about the job processing that has occurred
print @beanstalk.stats
# => #<Beaneater::StatStruct current_jobs_urgent=0, current_jobs_ready=0, current_jobs_reserved=0, current_jobs_delayed=0, current_jobs_buried=0, ...

print @beanstalk.stats.current_tubes
# => 1
```

For stats on a particular tube:

```ruby
# Get statistical information about the specified tube if it exists
print @beanstalk.tubes['some_tube_name'].stats
# => { 'current_jobs_ready': 0, 'current_jobs_reserved': 0, ... }
```

For stats on an individual job:

```ruby
# Get statistical information about the specified job if it exists
print @beanstalk.jobs[some_job_id].stats
# => {'age': 0, 'id': 2, 'state': 'reserved', 'tube': 'default', ... }
```

Be sure to check the [beanstalk protocol](https://github.com/kr/beanstalkd/blob/master/doc/protocol.md) for
more details about the stats commands.

## Resources

There are other resources helpful when learning about beanstalk:

 * [Beaneater Yardocs](http://rdoc.info/github/beanstalkd/beaneater)
 * [Beaneater on Rubygems](https://rubygems.org/gems/beaneater)
 * [Beanstalkd homepage](http://kr.github.com/beanstalkd/)
 * [beanstalk on github](https://github.com/kr/beanstalkd)
 * [beanstalk protocol](https://github.com/kr/beanstalkd/blob/master/doc/protocol.md)
 * [Backburner](https://github.com/nesquena/backburner) - Ruby job queue for Rails/Sinatra
 * [BeanCounter](https://github.com/gemeraldbeanstalk/bean_counter) - TestUnit/MiniTest assertions and RSpec matchers for testing code that relies on Beaneater

## Contributors

 - [Nico Taing](https://github.com/Nico-Taing) - Creator and co-maintainer
 - [Nathan Esquenazi](https://github.com/nesquena) - Contributor and co-maintainer
 - [Keith Rarick](https://github.com/kr) - Much code inspired and adapted from beanstalk-client
 - [Vidar Hokstad](https://github.com/vidarh) - Replaced telnet with correct TCP socket handling
 - [Andreas Loupasakis](https://github.com/alup) - Improve test coverage, improve job configuration