DeepDiff 5 Is Here!

illustrations illustrations illustrations illustrations illustrations illustrations

DeepDiff 5 Is Here!

Published on Jun 22, 2020 by Sep Dehpour

DeepDiff 5 is finally here!

Delta Object 

The Delta object is introduced.

DeepDiff Delta is a directed delta that when applied to t1 can yield t2 where delta is the difference between t1 and t2. The Delta objects are like git commits but for structured data. You can convert the diff results into Delta objects, store the deltas, and later apply to other objects.

Example:


>>> t1 = [1, 2, [3, 5, 6]]
>>> t2 = [2, 3, [3, 6, 8]]

>>> diff = DeepDiff(t1, t2, ignore_order=True, report_repetition=True)
>>> diff
{'values_changed': {'root[0]': {'new_value': 3, 'old_value': 1}, 'root[2][1]': {'new_value': 8, 'old_value': 5}}}
>>> delta = Delta(diff)
>>> delta
<Delta: {'values_changed': {'root[0]': {'new_value': 3}, 'root[2][1]': {'new_value': 8}}}>

>>> t1 + delta == t2
True

Note that we can apply delta to objects different than the original objects they were made from:

>>> t3 = ["a", 2, [3, "b", "c"]]
>>> t3 + delta
[3, 2, [3, 8, 'c']]

And it comes with Numpy support:

>>> from deepdiff import DeepDiff, Delta
>>> import numpy as np
>>> t1 = np.array([1, 2, 3, 5])
>>> t2 = np.array([2, 2, 7, 5])
>>> diff = DeepDiff(t1, t2)
>>> delta = Delta(diff)
>>> delta + t1
array([2, 2, 7, 5])
>>> delta + t2 == t2
array([ True,  True,  True,  True])

There is way more to Delta from serialization for storing Delta to other details.

Read more about Delta object here.

Deep Distance 

The concept of Deep Distance is introduced.

Deep Distance is the distance between 2 objects. It is a floating point number between 0 and 1. Deep Distance in concept is inspired by Levenshtein Edit Distance.

At its core, the Deep Distance is the number of operations needed to convert one object to the other divided by the sum of the sizes of the 2 objects capped at 1. Note that unlike Levensthtein Distance, the Deep Distance is based on the number of operations and NOT the “minimum” number of operations to convert one object to the other. The number is highly dependent on the granularity of the diff results. And the granularity is controlled by the parameters passed to DeepDiff.

>>> from deepdiff import DeepDiff
>>> DeepDiff(10.0, 10.1, get_deep_distance=True)
{'values_changed': {'root': {'new_value': 10.1, 'old_value': 10.0}}, 'deep_distance': 0.0014925373134328302}
>>> DeepDiff(10.0, 100.1, get_deep_distance=True)
{'values_changed': {'root': {'new_value': 100.1, 'old_value': 10.0}}, 'deep_distance': 0.24550408719346048}
>>> DeepDiff(10.0, 1000.1, get_deep_distance=True)
{'values_changed': {'root': {'new_value': 1000.1, 'old_value': 10.0}}, 'deep_distance': 0.29405999405999406}
>>> DeepDiff([1], [1], get_deep_distance=True)
{}
>>> DeepDiff([1], [1, 2], get_deep_distance=True)
{'iterable_item_added': {'root[1]': 2}, 'deep_distance': 0.2}
>>> DeepDiff([1], [1, 2, 3], get_deep_distance=True)
{'iterable_item_added': {'root[1]': 2, 'root[2]': 3}, 'deep_distance': 0.3333333333333333}
>>> DeepDiff([[2, 1]], [[1, 2, 3]], ignore_order=True, get_deep_distance=True)
{'iterable_item_added': {'root[0][2]': 3}, 'deep_distance': 0.1111111111111111}

Read more about Deep Distance here.

Improved granularity of results when ignore_order=True 

>>> from pprint import pprint
>>> from deepdiff import DeepDiff
>>> t1 = [
...     {
...         'key3': [[[[[1, 2, 4, 5]]]]],
...         'key4': [7, 8],
...     },
...     {
...         'key5': 'val5',
...         'key6': 'val6',
...     },
... ]
>>> 
>>> t2 = [
...     {
...         'key5': 'CHANGE',
...         'key6': 'val6',
...     },
...     {
...         'key3': [[[[[1, 3, 5, 4]]]]],
...         'key4': [7, 8],
...     },
... ]

In DeepDiff 4:

>>> pprint(DeepDiff(t1, t2, ignore_order=True))
{'iterable_item_added': {'root[0]': {'key5': 'CHANGE', 'key6': 'val6'},
                         'root[1]': {'key3': [[[[[1, 3, 5, 4]]]]],
                                     'key4': [7, 8]}},
 'iterable_item_removed': {'root[0]': {'key3': [[[[[1, 2, 4, 5]]]]],
                                       'key4': [7, 8]},
                           'root[1]': {'key5': 'val5', 'key6': 'val6'}}}

In DeepDiff 5:

>>> pprint(DeepDiff(t1, t2, ignore_order=True, cache_size=5000, cutoff_intersection_for_pairs=1))
{'values_changed': {"root[0]['key3'][0][0][0][0][1]": {'new_value': 3,
                                                       'old_value': 2},
                    "root[1]['key5']": {'new_value': 'CHANGE',
                                        'old_value': 'val5'}}}

Pretty print 

Use the pretty method for human readable output regardless of what view you have used to generate the results.

>>> from deepdiff import DeepDiff
>>> t1={1,2,4}
>>> t2={2,3}
>>> print(DeepDiff(t1, t2).pretty())
Item root[3] added to set.
Item root[4] removed from set.
Item root[1] removed from set.

New Optimizations 

Many new optimizations are introduced, especially when dealing with nested data structures, numeric lists and, Numpy arrays.

Read about optimizations here.

Caching 

Caching can dramatically improve the performance for nested objects especially when ignore_order=True.

For example, lets take a look at the performance of the benchmark_deeply_nested_a in the DeepDiff-Benchmark repo.

Without any caching it takes 10 seconds to do the diff!

without caching

And with caching it takes under a second:

with caching

Improved Numpy Support 

Previously, DeepDiff barely supported Numpy. DeepDiff 5 comes with a much more comprehensive support of Numpy.

For example, a sample diff with numbers took up to 30 seconds without the optimizations:

without numpy optimizations

And 5 seconds with Numpy optimizations:

numpy optimizations

Read more about optimizations here.

Conclusion 

DeepDiff 5 comes with many new features and improvements. Please star it on github if you find it useful.

I would like to thank everybody who helped this release possible from creating PR’s, to beta testings, and providing feedback.

See Also

DeepDiff Tutorial: Comparing Numbers

One of the features of DeepDiff that comes very handy is comparing nested data structures that include numbers. There are times that we do care about the exact numbers and want it to be reported if anything slightly changed. We explore different parameters that help when diffing numbers.

Read More
You AutoComplete Me

You AutoComplete Me

Autocomplete in Python. Get familiar with various data structures in Python, from the built-in deque to creating Trie-tree and Directed Acyclic Word Graph (DAWG) and even fuzzy matching via phonetic algorithms and Levenshtein edit distance.

Read More