File: bug_reports.rst

package info (click to toggle)
apache-arrow 23.0.1-1
  • links: PTS
  • area: main
  • in suites: sid
  • size: 76,220 kB
  • sloc: cpp: 654,608; python: 70,522; ruby: 45,964; ansic: 18,742; sh: 7,365; makefile: 669; javascript: 125; xml: 41
file content (216 lines) | stat: -rw-r--r-- 8,232 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
.. Licensed to the Apache Software Foundation (ASF) under one
.. or more contributor license agreements.  See the NOTICE file
.. distributed with this work for additional information
.. regarding copyright ownership.  The ASF licenses this file
.. to you under the Apache License, Version 2.0 (the
.. "License"); you may not use this file except in compliance
.. with the License.  You may obtain a copy of the License at

..   http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing,
.. software distributed under the License is distributed on an
.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
.. KIND, either express or implied.  See the License for the
.. specific language governing permissions and limitations
.. under the License.

.. _bug-reports:

********************************
Bug reports and feature requests
********************************

Arrow relies upon user feedback to identify defects and improvement
opportunities. All users are encouraged to participate by creating bug reports
and feature requests or commenting on existing issues. Even if you cannot
contribute solutions to the issues yourself, your feedback helps us understand
problems and prioritize work to improve the libraries.

.. _github_issues:

GitHub issues
=============

The Arrow project uses `GitHub issues <https://github.com/apache/arrow/issues>`_
to track issues - both bug reports and feature requests.

.. _creating-issues:

Creating issues
===============

Apache Arrow relies upon community contributions to address reported bugs and
feature requests. As with most software projects, contributor time and
resources are finite. The following guidelines aim to produce high-quality
bug reports and feature requests, enabling community contributors to respond
to more issues, faster:

.. _check-existing-issues:

Check existing issues
+++++++++++++++++++++

Before you create a new issue, we recommend you first
`search <https://github.com/apache/arrow/issues>`_
for unresolved existing issues identifying the same problem or feature request.

.. _describe-issue:

Issue description
+++++++++++++++++

A clear description of the problem or requested feature is the most important
element of any issue.  An effective description helps developers understand
and efficiently engage on reported issues, and may include the following:

* **Clear, minimal steps to reproduce the issue, with as few non-Arrow
  dependencies as possible.** If there's a problem on reading a file, try to
  provide as small of an example file as possible, or code to create one.
  If your bug report says "it crashes trying to read my file, but I can't
  share it with you," it's really hard for us to debug.
* Any relevant operating system, language, and library version information
* If it isn't obvious, clearly state the expected behavior and what actually
  happened.
* Avoid overloading a single issue with multiple problems or feature requests.
  Each issue should deal with a single bug or feature.

If a developer can't get a failing unit test, they won't be able to know that
the issue has been identified, and they won't know when it has been fixed.
Try to anticipate the questions you might be asked by someone working to
understand the issue and provide those supporting details up front.

Examples of good bug reports are found below:

.. tab-set::

   .. tab-item:: Python

      The ``print`` method of a timestamp with timezone errors:

      .. code-block:: python

         import pyarrow as pa

         a = pa.array([0], pa.timestamp('s', tz='+02:00'))

         print(a) # representation not correct?
         # <pyarrow.lib.TimestampArray object at 0x7f834c7cb9a8>
         # [
         #  1970-01-01 00:00:00
         # ]

         print(a[0])
         #Traceback (most recent call last):
         #  File "<stdin>", line 1, in <module>
         #  File "pyarrow/scalar.pxi", line 80, in pyarrow.lib.Scalar.__repr__
         #  File "pyarrow/scalar.pxi", line 463, in pyarrow.lib.TimestampScalar.as_py
         #  File "pyarrow/scalar.pxi", line 393, in pyarrow.lib._datetime_from_int
         #ValueError: fromutc: dt.tzinfo is not self

   .. tab-item:: R

      Error when reading a CSV file with ``col_types`` option ``"T"`` or ``"t"`` when source data is in millisecond precision:

      .. code-block:: R

         library(arrow, warn.conflicts = FALSE)
         tf <- tempfile()
         write.csv(data.frame(x = '2018-10-07 19:04:05.005'), tf, row.names = FALSE)

         # successfully read in file
         read_csv_arrow(tf, as_data_frame = TRUE)
         #> # A tibble: 1 × 1
         #>   x
         #>   <dttm>
         #> 1 2018-10-07 20:04:05

         # the unit here is seconds - doesn't work
         read_csv_arrow(
           tf,
           col_names = "x",
           col_types = "T",
           skip = 1
         )
         #> Error in `handle_csv_read_error()`:
         #> ! Invalid: In CSV column #0: CSV conversion error to timestamp[s]: invalid value '2018-10-07 19:04:05.005'

         # the unit here is ms - doesn't work
         read_csv_arrow(
           tf,
           col_names = "x",
           col_types = "t",
           skip = 1
         )
         #> Error in `handle_csv_read_error()`:
         #> ! Invalid: In CSV column #0: CSV conversion error to time32[ms]: invalid value '2018-10-07 19:04:05.005'

         # the unit here is inferred as ns - does work!
         read_csv_arrow(
           tf,
           col_names = "x",
           col_types = "?",
           skip = 1,
           as_data_frame = FALSE
         )
         #> Table
         #> 1 rows x 1 columns
         #> $x <timestamp[ns]>

Other resources for producing useful bug reports:

* `Python: Craft Minimal Bug Reports by Matthew Rocklin <https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports>`_
* `R: Tidyverse: Make a reprex <https://www.tidyverse.org/help/#reprex>`_
* `R: Tidyverse's Reprex do's and don'ts <https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html>`_
* `Mozilla's bug-reporting guidelines <https://developer.mozilla.org/en-US/docs/Mozilla/QA/Bug_writing_guidelines>`_

.. _identify-component:

Identify Arrow component
++++++++++++++++++++++++

Arrow is an expansive project supporting many languages and organized into a
number of components. Identifying the affected component(s) helps new issues
get attention from appropriate contributors.

* **Component label**, which can be added by a committer of the Apache Arrow
  project, is used to indicate the area of the project that your issue pertains
  to (for example "Component: Python" or "Component: C++").
* Prefix the issue title with the component name in brackets, for example
  ``[Python] issue summary`` ; this helps when navigating lists of open issues,
  and it also makes our changelogs more readable. Most prefixes are exactly the
  same as the **Component** name, with the following exceptions:

  * **Component:** Continuous Integration — **Summary prefix:** [CI]
  * **Component:** Developer Tools — **Summary prefix:** [Dev]
  * **Component:** Documentation — **Summary prefix:** [Docs]

.. _issue-lifecycle:

Issue lifecycle
===============

Both bug reports and feature requests follow a defined lifecycle. If an issue
is currently worked on, it should have a developer assigned. When an issue has
reached a terminal status, it is closed with one of two outcomes:

* **Closed as completed** - indicates the issue is complete; the PR that
  resolved the issue should have been automatically linked by GitHub
  (assuming the PR correctly mentioned the issue number).

  If you are merging a PR it is good practice to add a comment
  to the linked issue about which PR is resolving it. This way
  GitHub creates a notification for anybody that collaborated on
  the issue.

* **closed as not planned** - indicates the issue is closed and should
  not receive any further updates, but *without* action being taken.

.. _issue-assignment:

Issue assignment
++++++++++++++++

Assignment signals commitment to work on an issue, and contributors should
self-assign issues when that work starts. Anyone can now self-assign issues
by commenting ``take``.