File: homogenize.py

package info (click to toggle)
python-agate 1.9.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,996 kB
  • sloc: python: 8,512; makefile: 126
file content (77 lines) | stat: -rw-r--r-- 3,031 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
from agate import utils
from agate.rows import Row


def homogenize(self, key, compare_values, default_row=None):
    """
    Fill in missing rows in a series.

    This can be used, for instance, to add rows for missing years in a time
    series.

    Missing rows are found by comparing the values in the :code:`key` columns
    with those provided as :code:`compare_values`.

    Values not found in the table will be used to generate new rows with
    the given :code:`default_row`.

    :code:`default_row` should be an array of values or an array-generating
    function. If not specified, the new rows will have :code:`None` in columns
    all columns not specified in :code:`key`.

    If :code:`default_row` is an array of values, its length should be row
    length minus the number of column names provided in the :code:`key`.

    If it is an array-generating function, the function should take an array
    of missing values for each new row and output a full row including those
    values.

    :param key:
        Either a column name or a sequence of such names.
    :param compare_values:
        Either an array of column values if key is a single column name or a
        sequence of arrays of values if key is a sequence of names. It can
        also be a generator that yields either of the two. A row is created for
        each value or list of values not found in the rows of the table.
    :param default_row:
        An array of values or a function to generate new rows. The length of
        the input array should be equal to row length minus column_names
        count. The length of array generated by the function should be the
        row length.
    :returns:
        A new :class:`.Table`.
    """
    rows = list(self._rows)

    if not utils.issequence(key):
        key = [key]

    if len(key) == 1:
        if any(not utils.issequence(compare_value) for compare_value in compare_values):
            compare_values = [[compare_value] for compare_value in compare_values]

    column_values = [self._columns.get(name) for name in key]
    column_indexes = [self._column_names.index(name) for name in key]

    compare_values = [[column_values[i].data_type.cast(v) for i, v in enumerate(values)] for values in compare_values]

    column_values = zip(*column_values)
    differences = list(set(map(tuple, compare_values)) - set(column_values))

    for difference in differences:
        if callable(default_row):
            new_row = default_row(difference)
        else:
            if default_row is not None:
                new_row = list(default_row)
            else:
                new_row = [None] * (len(self._column_names) - len(key))

            for i, d in zip(column_indexes, difference):
                new_row.insert(i, d)

        new_row = [self._columns[i].data_type.cast(v) for i, v in enumerate(new_row)]
        rows.append(Row(new_row, self._column_names))

    # Do not copy the row_names, since this function adds rows.
    return self._fork(rows, row_names=[])