Description: Fail unsupported tril/u instead of returning nonsense

The extract_tri kernel assumes 4 bytes per entry when deciding which
memory addresses to zero out: if the actual data type is another size,
it will "succeed" but produce nonsense.

There are real fixes for this upstream (
https://github.com/Theano/libgpuarray/commit/07cd4ad56054c279442ee28413b26939f4c03632
plus https://github.com/Theano/libgpuarray/pull/584), but I am not sure
whether those are appropriate as a Debian patch.

Author: Rebecca N. Palmer <rebecca_palmer@zoho.com>
Forwarded: not-needed

--- a/pygpu/basic.py
+++ b/pygpu/basic.py
@@ -32,6 +32,8 @@ def triu(A, inplace=True):
         raise ValueError("triu only works for 2d arrays")
     if A.flags.c_contiguous is A.flags.f_contiguous is False:
         raise ValueError("triu only works for contiguous arrays")
+    if A.dtype.itemsize != 4:
+        raise TypeError("triu only works on 4 byte dtypes (usually np.float32) - use upstream libgpuarray if you need it on other types")
 
     if not inplace:
         A = A.copy()
@@ -51,6 +53,8 @@ def tril(A, inplace=True):
         raise ValueError("tril only works for 2d arrays")
     if A.flags.c_contiguous is A.flags.f_contiguous is False:
         raise ValueError("tril only works for contiguous arrays")
+    if A.dtype.itemsize != 4:
+        raise TypeError("tril only works on 4 byte dtypes (usually np.float32) - use upstream libgpuarray if you need it on other types")
 
     if not inplace:
         A = A.copy()
