1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
|
Implementation decisions
========================
Dependencies
------------
When a needed feature is already implemented in some other software, there're
usually some things to consider whether to use that software as dependency or
re-implement the feature:
Possible advantages using other software:
- zero maintenance
- not reinventing the wheel
Possible disadvantages using other software:
- maybe too big
- maybe introduce security issues
- maybe is not maintained
sbws version
````````````
Because some bwauths install sbws from the git repository, it is useful to
know from which git revision they install it from.
We'd prefer to do not see the git revision when it is installed from a git
tag or Debian package (which is usually built from a git tag or git archive
release).
A first solution would be to obtain the git revision at runtime, but:
- sbws is not usually running from the same directory as the git repository,
as the installation might install it in other directory.
- if some other git repository is the current path, sbws might be obtaining
the git revision of that other repository.
So next solution was to obtain the git revision at build/install time.
To achieve this, an script should be called from the installer or at runtime
whenever `__version__` needs to be read.
While it could be implemented by us, there're two external tools that achieve
this.
setuptools_scm
~~~~~~~~~~~~~~
https://github.com/pypa/setuptools_scm/
Advantages:
- does what we want, for 19 commits after 1.1.0 tag it'd add an string like
`'1.1.1.dev19+g76ef2fe0.d20200221`.
We don't need the date, but it can probably be removed and it does not hurt.
- we don't need to maintain it.
Disadvantages:
- it adds the extra dependency setuptools_scm.
- it does not obtain the version from a git archive, though there's other tool
that does that.
- the version reported comes only from build time, so if we make a commit
without running `setup.py`, sbws will not report the new version.
versioneer
~~~~~~~~~~
https://github.com/warner/python-versioneer
Advantages:
- it does not add any extra dependency. The first time, versioneer needs to
be installed. When run, it will generate `versioneer.py` and `_version.py`,
which are created from versioneer itself. Then it can be uninstall
- does what we want, for 19 commits after 1.1.0 tag it'd add an string like
`1.1.0+19.g76ef2fe0`. Note the difference with `1.1.0` from he `1.1.1`
generated by
- we don't need to maintain it.
- it is also capable to obtain the version from a git archive.
- the version reported at build time and runtime is the same.
Disadvantages:
- it adds extra code to sbws and it's quite a lot
- the generated code is independent from the upstream and loses the tests.
- does not seem maintained.
Conclussion
~~~~~~~~~~~
Because `setuptools_scm` gives only the version at build time, we decided to
use `versioneer`.
We might need to change it in the future if starts giving problems with other
git or python versions or we find a way to make `setuptools_scm` to detect
the same version at buildtime and runtime.
See `<https://github.com/MartinThoma/MartinThoma.github.io/blob/1235fcdecda4d71b42fc07bfe7db327a27e7bcde/content/2018-11-13-python-package-versions.md>`_
for other comparative versioning python packages.
Changing Bandwidth file monitoring KeyValues
--------------------------------------------
In version 1.1.0 we added KeyValues call ``recent_X_count`` and
``relay_X_count`` which implied to modify several parts of the code.
We only stored numbers for simpliciy, but then the value of this numbers
accumulate over the time and there is no way to know to which number decrease
since some of the main objects are not recreated at runtime and do not have
attributes about when they were created or updated.
The relations between the object do no follow usual one-to-many or many-to-many
relationships either, to be able to induce some numbers from the related
objects.
The only way we could think to solve this is to store list of timestamps,
instead of just numbers, as an attribute in the objects that need to store
some counting.
Where the values of the keys come from?
```````````````````````````````````````
In the file system, there are only two types of files were these values can be
stored:
- the results files in ``datadir``
- the ``state.dat`` file
Because of the structure of the content in the results files, they can store
KeyValues for the relays, but not for the headers, which need to be stored in
the ``state.dat`` file.
The classes that manage these KeyValues are:
``RelayList``:
- recent_consensus_count
- recent_measurement_attempt_count
``RelayPrioritizer``:
- recent_priority_list_count
- recent_priority_relay_count
``Relay`` and ``Result``:
- relay_in_recent_consensus_count
- relay_recent_measurement_attempt_count
- relay_recent_priority_list_count
Transition from numbers to datetimes
````````````````````````````````````
The KeyValues named ``_count`` in the results and the state will be ignored
when sbws is restarted with this change, since they will be written without
``_count`` names in these files json .
We could add code to count this in the transition to this version, but these
numbers are wrong anyway and we don't think it's worth the effort since they
will be correct after 5 days and they have been wrong for long time.
Additionally ``recent_measurement_failure_count`` will be negative, since it's
calculated as ``recent_measurement_attempt_count`` minus all the results.
While the total number of results in the last 5 days is correct, the number of
the attempts won't be until 5 days have pass.
Disadvantages
`````````````
``sbws generate``, with 27795 measurement attempts takes 1min instead of a few
seconds.
The same happens with the ``RelayPrioritizer.best_priority``, though so far
that seems ok since it's a python generator in a thread and the measurements
start before it has calculated all the priorities.
The same happens with the ``ResultDump`` that read/write the data in a thread.
Conclussion
```````````
All these changes required lot of effort and are not optimal. It was the way
we could correct and maintain 1.1.0 version.
If a 2.0 version happens, we highly recommend re-design the data structures to
use a database using a well maintained ORM library, which will avoid the
limitations of json files, errors in data types conversions and which is
optimized for the type of counting and statistics we aim to.
.. note:: Documentation about a possible version 2.0 and the steps to change
the code from 1.X needs to be created.
|