File: README.md

package info (click to toggle)
ruby-hashdiff 1.2.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 232 kB
  • sloc: ruby: 1,296; makefile: 4
file content (323 lines) | stat: -rw-r--r-- 10,624 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
# Hashdiff [![Build Status](https://github.com/liufengyun/hashdiff/workflows/ci/badge.svg)](https://github.com/liufengyun/hashdiff/actions?query=workflow%3Aci) [![Gem Version](https://badge.fury.io/rb/hashdiff.svg)](http://badge.fury.io/rb/hashdiff)

Hashdiff is a ruby library to compute the smallest difference between two hashes.

It also supports comparing two arrays.

Hashdiff does not monkey-patch any existing class. All features are contained inside the `Hashdiff` module.

**Docs**: [Documentation](http://rubydoc.info/gems/hashdiff)


__WARNING__: Don't use the library for comparing large arrays, say ~10K (see #49).

## Why Hashdiff?

Given two Hashes A and B, sometimes you face the question: what's the smallest modification that can be made to change A into B?

An algorithm that responds to this question has to do following:

* Generate a list of additions, deletions and changes, so that `A + ChangeSet = B` and `B - ChangeSet = A`.
* Compute recursively -- Arrays and Hashes may be nested arbitrarily in A or B.
* Compute the smallest change -- it should recognize similar child Hashes or child Arrays between A and B.

Hashdiff answers the question above using an opinionated approach:

* Hash can be represented as a list of (dot-syntax-path, value) pairs. For example, `{a:[{c:2}]}` can be represented as `["a[0].c", 2]`.
* The change set can be represented using the dot-syntax representation. For example, `[['-', 'b.x', 3], ['~', 'b.z', 45, 30], ['+', 'b.y', 3]]`.
* It compares Arrays using the [LCS(longest common subsequence)](http://en.wikipedia.org/wiki/Longest_common_subsequence_problem) algorithm.
* It recognizes similar Hashes in an Array using a similarity value (0 < similarity <= 1).

## Usage

To use the gem, add the following to your Gemfile:

```Ruby
gem 'hashdiff'
```

## Quick Start

### Diff

Two simple hashes:

```ruby
a = {a:3, b:2}
b = {}

diff = Hashdiff.diff(a, b)
diff.should == [['-', 'a', 3], ['-', 'b', 2]]
```

More complex hashes:

```ruby
a = {a:{x:2, y:3, z:4}, b:{x:3, z:45}}
b = {a:{y:3}, b:{y:3, z:30}}

diff = Hashdiff.diff(a, b)
diff.should == [['-', 'a.x', 2], ['-', 'a.z', 4], ['-', 'b.x', 3], ['~', 'b.z', 45, 30], ['+', 'b.y', 3]]
```

Arrays in hashes:

```ruby
a = {a:[{x:2, y:3, z:4}, {x:11, y:22, z:33}], b:{x:3, z:45}}
b = {a:[{y:3}, {x:11, z:33}], b:{y:22}}

diff = Hashdiff.best_diff(a, b)
diff.should == [['-', 'a[0].x', 2], ['-', 'a[0].z', 4], ['-', 'a[1].y', 22], ['-', 'b.x', 3], ['-', 'b.z', 45], ['+', 'b.y', 22]]
```

### Patch

patch example:

```ruby
a = {'a' => 3}
b = {'a' => {'a1' => 1, 'a2' => 2}}

diff = Hashdiff.diff(a, b)
Hashdiff.patch!(a, diff).should == b
```

unpatch example:

```ruby
a = [{'a' => 1, 'b' => 2, 'c' => 3, 'd' => 4, 'e' => 5}, {'x' => 5, 'y' => 6, 'z' => 3}, 1]
b = [1, {'a' => 1, 'b' => 2, 'c' => 3, 'e' => 5}]

diff = Hashdiff.diff(a, b) # diff two array is OK
Hashdiff.unpatch!(b, diff).should == a
```

### Options

The following options are available: `:delimiter`, `:similarity`, `:strict`, `:ignore_keys`,
`:indifferent`, `:numeric_tolerance`, `:strip`, `:case_insensitive`, `:array_path`, 
`:use_lcs`, and `:preserve_key_order`

#### `:delimiter`

You can specify `:delimiter` to be something other than the default dot. For example:

```ruby
a = {a:{x:2, y:3, z:4}, b:{x:3, z:45}}
b = {a:{y:3}, b:{y:3, z:30}}

diff = Hashdiff.diff(a, b, delimiter: '\t')
diff.should == [['-', 'a\tx', 2], ['-', 'a\tz', 4], ['-', 'b\tx', 3], ['~', 'b\tz', 45, 30], ['+', 'b\ty', 3]]
```

#### `:similarity`

In cases where you have similar hash objects in arrays, you can pass a custom value for `:similarity` instead of the default `0.8`.  This is interpreted as a ratio of similarity (default is 80% similar, whereas `:similarity => 0.5` would look for at least a 50% similarity).

#### `:strict`

The `:strict` option, which defaults to `true`, specifies whether numeric types are compared on type as well as value.  By default, an Integer will never be equal to a Float (e.g. 4 != 4.0).  Setting `:strict` to false makes the comparison looser (e.g. 4 == 4.0).

#### `:ignore_keys`

The `:ignore_keys` option allows you to specify one or more keys to ignore, which defaults to `[]` (none). Ignored keys are ignored at all levels in both hashes. For example:

```ruby
a = { a: 4, g: 0, b: { a: 5, c: 6, e: 1 }       }
b = {             b: { a: 7, c: 3, f: 1 }, d: 8 }
diff = Hashdiff.diff(a, b, ignore_keys: %i[a f])
diff.should == [['-', 'g', 0], ['-', 'b.e', 1], ['~', 'b.c', 6, 3], ['+', 'd', 8]]
```
If you wish instead to ignore keys at a particlar level you should
use a [custom comparison method](https://github.com/liufengyun/hashdiff#specifying-a-custom-comparison-method) instead. For example to diff only at the 2nd level of both hashes:

```ruby
a = { a: 4, g: 0, b: { a: 5, c: 6, e: 1 }       }
b = {             b: { a: 7, c: 3, f: 1 }, d: 8 }
diff = Hashdiff.diff(a, b) do |path, _e, _a|
  arr = path.split('.')
  true if %w[a f].include?(arr.last) && arr.size == 2 # note '.' is the default delimiter
end
diff.should == [['-', 'a', 4], ['-', 'g', 0], ['-', 'b.e', 1], ['~', 'b.c', 6, 3], ['+', 'd', 8]]
```

#### `:indifferent`

The `:indifferent` option, which defaults to `false`, specifies whether to treat hash keys indifferently.  Setting `:indifferent` to true has the effect of ignoring differences between symbol keys (ie. {a: 1} ~= {'a' => 1})

#### `:numeric_tolerance`

The :numeric_tolerance option allows for a small numeric tolerance.

```ruby
a = {x:5, y:3.75, z:7}
b = {x:6, y:3.76, z:7}

diff = Hashdiff.diff(a, b, numeric_tolerance: 0.1)
diff.should == [["~", "x", 5, 6]]
```

#### `:strip`

The :strip option strips all strings before comparing.

```ruby
a = {x:5, s:'foo '}
b = {x:6, s:'foo'}

diff = Hashdiff.diff(a, b, numeric_tolerance: 0.1, strip: true)
diff.should == [["~", "x", 5, 6]]
```

#### `:case_insensitive`

The :case_insensitive option makes string comparisons ignore case.

```ruby
a = {x:5, s:'FooBar'}
b = {x:6, s:'foobar'}

diff = Hashdiff.diff(a, b, numeric_tolerance: 0.1, case_insensitive: true)
diff.should == [["~", "x", 5, 6]]
```

#### `:array_path`

The :array_path option represents the path of the diff in an array rather than
a string. This can be used to show differences in between hash key types and
is useful for `patch!` when used on hashes without string keys.

```ruby
a = {x:5}
b = {'x'=>6}

diff = Hashdiff.diff(a, b, array_path: true)
diff.should == [['-', [:x], 5], ['+', ['x'], 6]]
```

For cases where there are arrays in paths their index will be added to the path.
```ruby
a = {x:[0,1]}
b = {x:[0,2]}

diff = Hashdiff.diff(a, b, array_path: true)
diff.should == [["-", [:x, 1], 1], ["+", [:x, 1], 2]]
```

This shouldn't cause problems if you are comparing an array with a hash:

```ruby
a = {x:{0=>1}}
b = {x:[1]}

diff = Hashdiff.diff(a, b, array_path: true)
diff.should == [["~", [:x], {0=>1}, [1]]]
```

#### `:use_lcs`

The :use_lcs option is used to specify whether a
[Longest common subsequence](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem)
(LCS) algorithm is used to determine differences in arrays. This defaults to
`true` but can be changed to `false` for significantly faster array comparisons
(O(n) complexity rather than O(n<sup>2</sup>) for LCS).

When :use_lcs is false the results of array comparisons have a tendency to
show changes at indexes rather than additions and subtractions when :use_lcs is
true.

Note, currently the :similarity option has no effect when :use_lcs is false.

```ruby
a = {x: [0, 1, 2]}
b = {x: [0, 2, 2, 3]}

diff = Hashdiff.diff(a, b, use_lcs: false)
diff.should == [["~", "x[1]", 1, 2], ["+", "x[3]", 3]]
```

#### `:preserve_key_order`

By default, the change set is ordered by operation type: deletions (-) first, then updates (~), and finally additions (+). 
Within each operation group, keys are sorted alphabetically:

```ruby
a = {d: 1, c: 1,       a: 1}
b = {d: 2,       b: 2, a: 2}

diff = Hashdiff.diff(a, b)
diff.should == [["-", "c", 1], ["~", "a", 1, 2], ["~", "d", 1, 2], ["+", "b", 2]]
```

Setting :preserve_key_order to true processes keys in the order they appear in the first hash.
Keys that only exist in the second hash are appended in their original order:

```ruby
a = {d: 1, c: 1,       a: 1}
b = {d: 2,       b: 2, a: 2}

diff = Hashdiff.diff(a, b, preserve_key_order: true)
diff.should == [["~", "d", 1, 2], ["-", "c", 1], ["~", "a", 1, 2], ["+", "b", 2]]
```

#### Specifying a custom comparison method

It's possible to specify how the values of a key should be compared.

```ruby
a = {a:'car', b:'boat', c:'plane'}
b = {a:'bus', b:'truck', c:' plan'}

diff = Hashdiff.diff(a, b) do |path, obj1, obj2|
  case path
  when  /a|b|c/
    obj1.length == obj2.length
  end
end

diff.should == [['~', 'b', 'boat', 'truck']]
```

The yielded params of the comparison block is `|path, obj1, obj2|`, in which path is the key (or delimited compound key) to the value being compared. When comparing elements in array, the path is with the format `array[*]`. For example:

```ruby
a = {a:'car', b:['boat', 'plane'] }
b = {a:'bus', b:['truck', ' plan'] }

diff = Hashdiff.diff(a, b) do |path, obj1, obj2|
  case path
  when 'b[*]'
    obj1.length == obj2.length
  end
end

diff.should == [["~", "a", "car", "bus"], ["~", "b[1]", "plane", " plan"], ["-", "b[0]", "boat"], ["+", "b[0]", "truck"]]
```

When a comparison block is given, it'll be given priority over other specified options. If the block returns value other than `true` or `false`, then the two values will be compared with other specified options.

When used in conjunction with the `array_path` option, the path passed in as an argument will be an array. When determining the ordering of an array a key of `"*"` will be used in place of the `key[*]` field. It is possible, if you have hashes with integer or `"*"` keys, to have problems distinguishing between arrays and hashes - although this shouldn't be an issue unless your data is very difficult to predict and/or your custom rules are very specific.

#### Sorting arrays before comparison

An order difference alone between two arrays can create too many diffs to be useful. Consider sorting them prior to diffing.

```ruby
a = {a:'car', b:['boat', 'plane'] }
b = {a:'car', b:['plane', 'boat'] }

Hashdiff.diff(a, b).should == [["+", "b[0]", "plane"], ["-", "b[2]", "plane"]]

b[:b].sort!

Hashdiff.diff(a, b).should == []
```

## Maintainers

- Krzysztof Rybka ([@krzysiek1507](https://github.com/krzysiek1507))
- Fengyun Liu ([@liufengyun](https://github.com/liufengyun))

## License

Hashdiff is distributed under the MIT-LICENSE.