1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
|
Materialized Path trees
=======================
.. module:: treebeard.mp_tree
This is an efficient implementation of Materialized Path
trees for Django, as described by `Vadim Tropashko`_ in `SQL Design
Patterns`_. Materialized Path is probably the fastest way of working with
trees in SQL without the need of extra work in the database, like Oracle's
``CONNECT BY`` or sprocs and triggers for nested intervals.
In a materialized path approach, every node in the tree will have a
:attr:`~MP_Node.path` attribute, where the full path from the root
to the node will be stored. This has the advantage of needing very simple
and fast queries, at the risk of inconsistency because of the
denormalization of ``parent``/``child`` foreign keys. This can be prevented
with transactions.
``django-treebeard`` uses a particular approach: every step in the path has
a fixed width and has no separators. This makes queries predictable and
faster at the cost of using more characters to store a step. To address
this problem, every step number is encoded.
Also, two extra fields are stored in every node:
:attr:`~MP_Node.depth` and :attr:`~MP_Node.numchild`.
This makes the read operations faster, at the cost of a little more
maintenance on tree updates/inserts/deletes. Don't worry, even with these
extra steps, materialized path is more efficient than other approaches.
.. warning::
As with all tree implementations, please be aware of the
:doc:`caveats`.
.. note::
The materialized path approach makes heavy use of ``LIKE`` in your
database, with clauses like ``WHERE path LIKE '002003%'``. If you think
that ``LIKE`` is too slow, you're right, but in this case the
:attr:`~MP_Node.path` field is indexed in the database, and all
``LIKE`` clauses that don't **start** with a ``%`` character will use
the index. This is what makes the materialized path approach so fast.
.. inheritance-diagram:: MP_Node
.. autoclass:: MP_Node
:show-inheritance:
.. warning::
Do not change the values of :attr:`path`, :attr:`depth` or
:attr:`numchild` directly: use one of the included methods instead.
Consider these values *read-only*.
.. warning::
Do not change the values of the :attr:`steplen`, :attr:`alphabet` or
:attr:`node_order_by` after saving your first object. Doing so will
corrupt the tree.
.. warning::
If you need to define your own
:py:class:`~django.db.models.Manager` class,
you'll need to subclass
:py:class:`~MP_NodeManager`.
Also, if in your manager you need to change the default
queryset handler, you'll need to subclass
:py:class:`~MP_NodeQuerySet`.
Example:
.. code-block:: python
class SortedNode(MP_Node):
node_order_by = ['numval', 'strval']
numval = models.IntegerField()
strval = models.CharField(max_length=255)
Read the API reference of :class:`treebeard.models.Node` for info on methods
available in this class, or read the following section for methods with
particular arguments or exceptions.
.. attribute:: steplen
Attribute that defines the length of each step in the :attr:`path` of
a node. The default value of *4* allows a maximum of
*1679615* children per node. Increase this value if you plan to store
large trees (a ``steplen`` of *5* allows more than *60M* children per
node). Note that increasing this value, while increasing the number of
children per node, will decrease the max :attr:`depth` of the tree (by
default: *63*). To increase the max :attr:`depth`, increase the
max_length attribute of the :attr:`path` field in your model.
.. attribute:: alphabet
Attribute: the alphabet that will be used in base conversions
when encoding the path steps into strings. The default value,
``0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ`` is the most optimal possible
value that is portable between the supported databases (which means:
their default collation will order the :attr:`path` field correctly).
.. note::
In case you know what you are doing, there is a test that is
disabled by default that can tell you the optimal default alphabet
in your enviroment. To run the test you must enable the
:envvar:`TREEBEARD_TEST_ALPHABET` enviroment variable:
.. code-block:: console
$ TREEBEARD_TEST_ALPHABET=1 py.test -k test_alphabet
In OS X Mavericks, good readable values for the three supported
databases in their *default* configuration:
================ ================ ====
Database Optimal Alphabet Base
================ ================ ====
MySQL 5.6.17 0-9A-Z 36
PostgreSQL 9.3.4 0-9A-Za-z 62
Sqlite3 0-9A-Za-z 62
================ ================ ====
The default value is MySQL's since it will work in all DBs,
but when working with a better database, changing the
:attr:`alphabet` value is recommended in order to increase the
density of the paths.
For an even better approach, change the collation of the
:attr:`path` column in the database to handle raw ASCII, and
use the printable ASCII characters (0x20 to 0x7E) as the
:attr:`alphabet`.
.. attribute:: node_order_by
Attribute: a list of model fields that will be used for node
ordering. When enabled, all tree operations will assume this ordering.
Example:
.. code-block:: python
node_order_by = ['field1', 'field2', 'field3']
.. attribute:: path
``CharField``, stores the full materialized path for each node. The
default value of it's max_length, *255*, is the max efficient and
portable value for a ``varchar``. Increase it to allow deeper trees (max
depth by default: *63*)
.. note::
`django-treebeard` uses Django's abstract model inheritance, so
to change the ``max_length`` value of the path in your model, you
have to redeclare the path field in your model:
.. code-block:: python
class MyNodeModel(MP_Node):
path = models.CharField(max_length=1024, unique=True)
.. note::
For performance, and if your database allows it, you can safely
define the path column as ASCII (not utf-8/unicode/iso8859-1/etc) to
keep the index smaller (and faster). Also note that some databases
(mysql) have a small index size limit. InnoDB for instance has a
limit of 765 bytes per index, so that would be the limit if your path
is ASCII encoded. If your path column in InnoDB is using unicode,
the index limit will be 255 characters since in MySQL's indexes,
unicode means 3 bytes per character.
.. note::
``django-treebeard`` uses `numconv`_ for path encoding.
.. attribute:: depth
``PositiveIntegerField``, depth of a node in the tree. A root node
has a depth of *1*.
.. attribute:: numchild
``PositiveIntegerField``, the number of children of the node.
.. automethod:: add_root
See: :meth:`treebeard.models.Node.add_root`
.. automethod:: add_child
See: :meth:`treebeard.models.Node.add_child`
.. automethod:: add_sibling
See: :meth:`treebeard.models.Node.add_sibling`
.. automethod:: move
See: :meth:`treebeard.models.Node.move`
.. automethod:: get_tree
See: :meth:`treebeard.models.Node.get_tree`
.. note::
This method returns a queryset.
.. automethod:: find_problems
.. note::
A node won't appear in more than one list, even when it exhibits
more than one problem. This method stops checking a node when it
finds a problem and continues to the next node.
.. note::
Problems 1, 2 and 3 can't be solved automatically.
Example:
.. code-block:: python
MyNodeModel.find_problems()
.. automethod:: fix_tree
Example:
.. code-block:: python
MyNodeModel.fix_tree()
.. autoclass:: MP_NodeManager
:show-inheritance:
.. autoclass:: MP_NodeQuerySet
:show-inheritance:
.. _`Vadim Tropashko`: http://vadimtropashko.wordpress.com/
.. _`Sql Design Patterns`:
http://www.rampant-books.com/book_2006_1_sql_coding_styles.htm
.. _numconv: https://tabo.pe/projects/numconv/
|