File: lesson_04_debugging_2.py

package info (click to toggle)
halide 21.0.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 55,420 kB
  • sloc: cpp: 289,327; ansic: 22,751; python: 7,486; makefile: 4,299; sh: 2,508; java: 1,549; javascript: 282; pascal: 207; xml: 127; asm: 9
file content (65 lines) | stat: -rw-r--r-- 2,072 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#!/usr/bin/python3

# Halide tutorial lesson 4

# This lesson demonstrates how to follow what Halide is doing at runtime.

# This lesson can be built by invoking the command:
#    make test_tutorial_lesson_04_debugging_2
# in a shell with the current directory at python_bindings/

import halide as hl


def main():
    gradient = hl.Func("gradient")
    x, y = hl.Var("x"), hl.Var("y")

    # We'll define our gradient function as before.
    gradient[x, y] = x + y

    # And tell Halide that we'd like to be notified of all
    # evaluations.
    gradient.trace_stores()

    # Realize the function over an 8x8 region.
    print("Evaluating gradient")
    gradient.realize([8, 8])

    # This will print out all the times gradient(x, y) gets
    # evaluated.

    # Now that we can snoop on what Halide is doing, let's try our
    # first scheduling primitive. We'll make a new version of
    # gradient that processes each scanline in parallel.
    parallel_gradient = hl.Func("parallel_gradient")
    parallel_gradient[x, y] = x + y

    # We'll also trace this function.
    parallel_gradient.trace_stores()

    # Things are the same so far. We've defined the algorithm, but
    # haven't said anything about how to schedule it. In general,
    # exploring different scheduling decisions doesn't change the code
    # that describes the algorithm.

    # Now we tell Halide to use a parallel for loop over the y
    # coordinate. On linux we run this using a thread pool and a task
    # queue. On os x we call into grand central dispatch, which does
    # the same thing for us.
    parallel_gradient.parallel(y)

    # This time the printfs should come out of order, because each
    # scanline is potentially being processed in a different
    # thread. The number of threads should adapt to your system, but
    # on linux you can control it manually using the environment
    # variable HL_NUMTHREADS.
    print("\nEvaluating parallel_gradient")
    parallel_gradient.realize([8, 8])

    print("Success!")
    return 0


if __name__ == "__main__":
    main()