File: contribute.rst

package info (click to toggle)
python-skbio 0.6.3-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 11,924 kB
  • sloc: python: 67,527; ansic: 672; makefile: 225
file content (461 lines) | stat: -rw-r--r-- 35,025 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
Contribute to scikit-bio
========================

**Scikit-bio** is a community-driven open-source software project, and we warmly welcome your contributions!

We are interested in many types of contributions, including feature additions, bug fixes, continuous integration improvements, and documentation/website updates, additions, and fixes. Whether you are a researcher, educator, or developer; whether your interest lies in biology, mathematics, statistics, or computer science; your input is invaluable. You can help making scikit-bio a better software package for our entire community.

This document covers the information you may need to get started with contributing to scikit-bio. In addition, for a broader perspective, we recommend the inspiring guide: `How to Contribute to Open Source <https://opensource.guide/how-to-contribute/>`_.

Visit our GitHub repository: :repo:`scikit-bio/scikit-bio <>` for the source code of scikit-bio. You will need a GitHub account to interact with the scikit-bio codebase and community.

Valuable contributions can be made without or with minimum amount of coding. We detail various ways you can contribute below:

- `Ask a question`_ | `Report an error`_ | `Suggest a new feature`_ | `Fix a typo`_

Contributing code to scikit-bio is a rigorous and rewarding process. We have prepared the following step-by-step guidelines:

- `Before coding`_ | `Set up a workspace`_ | `Write code`_ | `Test code`_ | `Document code`_ | `Style code`_ | `Submit code`_ | `Review code`_

.. .. contents::
..    :depth: 1
..    :local:
..    :backlinks: none

In addition, there are separate documents covering advanced topics:

.. toctree::
   :maxdepth: 1

   devdoc/code_guide
   devdoc/doc_guide
   devdoc/new_module
   devdoc/review
   devdoc/release


Ask a question
--------------

Your inquiry matters! By asking questions in the scikit-bio :repo:`issue tracker <issues>` or :repo:`discussion board <discussions>`, you are not only giving us (and the community) the chance to help, but also let us assess the needs of users like you. Before asking a question, take a moment to search existing threads to see if there are any relevant ones. We also keep an eye on broader community forums such as Stack Overflow and BioStars for questions related to our scope.


Report an error
---------------

The scikit-bio team is proud of our high-quality, well-tested codebase. That being said, no software is immune to errors, which may arise from bugs, overlooked edge cases, or confusions in documentation. In any situation, we would appreciate it if you can report the error you encountered to us.

You may :repo:`open an issue <issues/new/choose>` to report the error. Please provide a detailed description of the error such that the developers can reproduce it. Specifically, you may include the following information in the report:

1. The exact **command(s)** necessary to reproduce the error.

2. The input **file(s)** necessary for reproducing the error. You may either attach the file in the issue (by dragging & dropping) if it is small, or provide a link to it otherwise. The file should only be as large as necessary to reproduce the error.

.. note:: For example, if you have a FASTA file with 10,000 sequences but the error only arises due to one of the sequences, create a new FASTA file with only that sequence, run the command that was giving you problems, and verify that you still get an error. Then post that command and link to the trimmed FASTA file.

This is *extremely* useful to the developers, and it is likely that if you don't provide this information you'll get a response asking for it. Often this process helps you to better understand the error as well.

We take error reports very seriously. Once confirmed that they should be fixed, we will update the code to fix them as soon as we can, and ship the update in the next scheduled release of scikit-bio. If the error could result in incorrect results or inability to access certain functionality, we may release a bug-fix version of scikit-bio ahead of the schedule.


Suggest a new feature
---------------------

We are always looking for new ideas to enhance scikit-bio's capabilities, especially from users with unique research interests. If you believe there is an analysis or feature that could extend scikit-bio's current offerings, we warmly invite you to share your suggestions with us.

Please describe why the functionality that you are suggesting is relevant. For it to be relevant, it should be demonstrably useful to scikit-bio users and it should also fit within the biology/bioinformatics domain. This typically means that a new analytic method is implemented (you should describe why it's useful, ideally including a link to a paper that uses this method), or an existing method is enhanced (e.g., improved performance).

If the scope of the suggested method overlaps with any pre-existing methods in scikit-bio, we may request benchmark results comparing your method to the pre-existing ones (which would also be required for publication of your method) so pointing to a paper or other document containing benchmark results, or including benchmark results in your issue, will help.

Before suggesting a new feature, it is also a good idea to check whether the functionality exists in other Python packages, or if the feature would fit better in another Python package. For example, low-level statistical methods/tests may fit better in a project that is focused on statistics (e.g., `SciPy <https://scipy.org/>`_ or `statsmodels <https://www.statsmodels.org/>`_).

If your proposal represents a significant research direction or requires a substantial suite of methods, we encourage you to consider establishing a formal academic or industrial collaboration with the scikit-bio team. For more details on this process, please refer to the :ref:`about:Collaboration` section.


Fix a typo
----------

If you spot small errors such as typos, redundant spaces, broken links, missing citations etc. in the scikit-bio code or documentation, and want to give it a quick fix, you may follow the procedures detailed below. All procedures will take place in the web browser, and don't involve creating anything in your local computer.

.. warning:: This approach should not be applied to anything larger than small errors. For the latter, please read `Before coding`_.

1. Locate the specific code file that needs to be fixed in the GitHub repository. If you are reading the documentation, you can click the `[source] <about:blank>`__ link next to the header to locate the corresponding code file.

2. In the top-right corner of the code viewer there is an Edit (:octicon:`pencil`) button, with a prompt "Fork this repository and edit the file". Click it. Then click the button :bdg-success:`Fork this repository`. This will open GitHub's `online file editor <https://docs.github.com/en/repositories/working-with-files/managing-files/editing-files>`_.

3. **Edit the code**.

4. When done, click :bdg-success:`Commit changes...`. Then enter a commit message to describe what you did, like "Fixed a typo in the documentation of skbio.module.function". Then click :bdg-success:`Propose changes`.

5. You will be able to review the changes you made and compare with the original code :octicon:`git-compare`. If everything looks good to you, click :bdg-success:`Create pull request`. Then enter a title and description that you think are informative to the scikit-bio maintainers. The **title** may or may not be the same as the commit message. In the **description**, you will need to answer a few questions by typing an ``x`` in the relevant checkboxes. You may also explain why the original code should be replaced by yours. Finally, click :bdg-success:`Create pull request`.

6. This will create a `pull request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request>`__ :octicon:`git-pull-request`, i.e., the changes you made to the scikit-bio repository. A scikit-bio maintainer will review your pull request, and run necessary tests to make sure it is sound. You may be asked to clarify or to make modifications to your code. Please work with the maintainer by replying in the pull request.

7. If the maintainer believes that your code is good to go, they will `merge <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/merging-a-pull-request>`_ it into the scikit-bio codebase. Once merged, the pull request webpage will have a purple notice :octicon:`git-merge`, saying: "Pull request successfully merged and closed".

8. At this point, your contribution is completed 🎉. You may optionally click :bdg-light:`Delete branch` to clean up your workspace. Then, you can move on, and enjoy the improved scikit-bio!


Before coding
-------------

We sincerely value your willingness to contribute code to scikit-bio (beyond reporting issues or correcting typos). This process can be intensive, particularly for those new to software engineering. The following sections detail the steps for contributing code to scikit-bio. Please review them carefully.

Discuss your plan
^^^^^^^^^^^^^^^^^

When considering contributing code to scikit-bio, you should begin by posting an issue to the :repo:`scikit-bio issue tracker <issues>`. The information that you include in that post will differ based on the type of contribution. The two types of issues discussed in `Report an error`_ and `Suggest a new feature`_ can be a good start of the discussion.

The scikit-bio developers will respond to let you know if we agree with the addition or change. It's very important that you go through this step to avoid spending time working on a feature that we are not interested in including in scikit-bio.

Take existing tasks
^^^^^^^^^^^^^^^^^^^

Alternatively, if you're looking to contribute where help is needed, you can explore the following types of issues:

- **Quick fix**: Some of our issues are labeled as ``quick fix``. Working on :repo:`these issues <issues?q=is%3Aopen+is%3Aissue+label%3A%22quick+fix%22>` is a good way to get started with contributing to scikit-bio. These are usually small bugs or documentation errors that will only require one or a few lines of code to fix. Getting started by working on one of these issues will allow you to familiarize yourself with our development process before committing to a large amount of work (e.g., adding a new feature to scikit-bio). Please post a comment on the issue if you're interested in working on one of these "quick fixes".

- **On deck**: Once you are more comfortable with our development process, you can check out the ``on deck`` :repo:`label <labels/on%20deck>` on our issue tracker. These issues represent what our current focus is in the project. As such, they are probably the best place to start if you are looking to join the conversation and contribute code.


Set up a workspace
------------------

To start contributing code to scikit-bio, you'll need to prepare a local development environment. This section guides you through the process step-by-step.

1. `Fork <https://help.github.com/articles/fork-a-repo>`_ the scikit-bio repository on the GitHub website. This will create a copy of the repository under your account, and you can access it using the URL: ``https://github.com/urname/scikit-bio/`` (``urname`` is your GitHub account).

2. `Clone <https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository>`_ the forked repository to your local computer. Specifically, in your forked repository, you may click the :bdg-success:`<> Code` button, make sure that you are under the "Local" - "SSH" tab, and copy the URL to the clipboard. It should typically be: ``git@github.com:urname/scikit-bio.git``.

.. note:: If the "SSH" tab is not available, it could mean that you have not set up an SSH key for your GitHub account. You may follow the instructions on `Connecting to GitHub with SSH <https://docs.github.com/en/authentication/connecting-to-github-with-ssh>`_ to set it up.

Then, open "Terminal" :octicon:`terminal` (or anything similar) in your local computer and navigate to a directory where you want to place the workspace, and execute::

    git clone git@github.com:urname/scikit-bio.git

.. note::

   If this is the first time you use ``git``, you may follow the `Set up Git <https://docs.github.com/en/get-started/getting-started-with-git/set-up-git>`_ guidelines to install Git and set your user name and email address.

   This tutorial assumes that you will use the classical ``git`` to create a local development environment. If you prefer other methods such as `GitHub CLI <https://cli.github.com/>`_ or `Codespace <https://github.com/features/codespaces>`_, please follow corresponding instructions.

This will create a directory ``scikit-bio`` containing all files in the repository. Enter the directory::

    cd scikit-bio

Add the official scikit-bio repo as the **upstream** of your fork::

    git remote add upstream https://github.com/scikit-bio/scikit-bio.git

3. Create a development environment with necessary dependencies. This is typically done using `Conda <https://conda.io/>`_ (or `Mamba <https://mamba.readthedocs.io/en/latest/index.html>`_, in which case the command ``conda`` in the following code should be replaced with ``mamba``).

.. note::

   If you do not have Conda (or Mamba) in your computer, you may install one of the distributions such as `Miniconda <https://conda.io/miniconda.html>`_, `Miniforge <https://github.com/conda-forge/miniforge>`_ or `Anaconda <https://www.anaconda.com/download/>`_.

   We recommend Conda over other approaches such as ``pip``, ``pyenv``, and ``virtualenv``. However, you are not blocked from using them in necessary situations.

Execute the following command (``skbio-dev`` can be any name you like)::

    conda create -n skbio-dev -c conda-forge --file ci/conda_requirements.txt --file ci/requirements.test.txt --file ci/requirements.lint.txt --file ci/requirements.doc.txt

When done, activate the environment::

    conda activate skbio-dev

.. note:: This may be slightly different depending on the operating system. Refer to the `Conda documentation <https://docs.conda.io/>`_ to find instructions for your OS.

4. Install scikit-bio from source code::

    pip install --no-deps -e .

This will install scikit-bio to the current conda environment. After this, you can use scikit-bio like a normal user (e.g., you can do ``import skbio`` in Python code). When you edit the code in the this directory, the changes will be immediately reflected as you use the software.

5. Test the installation::

    make test

This will run all unit tests implemented in the scikit-bio codebase to check if the corresponding functionality works correctly. The output should only indicate passes and warnings, but no failures.

6. Activate pre-commit hooks::

    pre-commit install

This will enable a set of tools that will automatically execute every time you commit changes to ensure code quality.


Write code
----------

Before you start writing code, you may discuss with the scikit-bio team to make sure that your intended contribution is relevant (see `Before coding`_ above). Next, you may work through the following steps to start coding.

1. Update your main branch such that it has the latest version of all files. This is especially important if you cloned a long time ago::

    git checkout main
    git pull upstream main

Optionally, you may do the following to keep your forked repository's main branch up-to-date as well::

    git push origin main

2. Create a new `branch <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-branches>`__ that you will make changes in::

    git checkout -b mywork

``mywork`` is the name of the branch. What you name the branch is up to you, though we recommend including the issue number, if there is a relevant one (see above). For example, if you were addressing issue #42, you might name your branch ``issue-42``.

.. warning:: It is not recommended that you directly code in your ``main`` branch.

3. Run ``make test`` to confirm that the tests pass before you make any changes.

4. **Edit the code** using any code editor of your favor. Now it is the time to bring your creativity to scikit-bio!

Scikit-bio's :doc:`coding guidelines <devdoc/code_guide>` provide more details on how to write high-quality code. It is recommended that you read this document carefully and apply the guidelines in your code.


Test code
---------

Testing your code is an essential step to ensure its quality and functionality. For scikit-bio, we emphasize the importance of comprehensive testing to guarantee that your contribution works as expected. You might find testing tedious (it often costs more time than implementing the algorithm!), but it is a valuable step that will significantly enhance the robustness of your code, and will help fostering a culture of reliability and trust within our community. This section is to guide you through the testing process, providing you with the tools and knowledge to perform effective tests for scikit-bio.

Functional test
^^^^^^^^^^^^^^^

You will want to test your code like a *user* would do: import the function, execute it on some input data, and examine whether the results are correct. You may install additional software such as `Jupyter <https://jupyter.org/>`_ in the same conda environment to make the testing process convenient and pleasant. There is no need to reinstall the modified scikit-bio package in a separate environment in order to test. As soon as you edit the code, the changes will immediately reflect when you use the code.

For example, you wrote a function ``gc_content``, which calculates the fraction of ``G`` and ``C`` in a nucleotide sequence::

    def gc_content(seq):
        return sum(1 for x in seq if x in "GC") / len(seq)

You added this function to the code file ``skbio/sequence/nucl.py``. It will be available for use in any Python code launched in the same conda environment::

    >>> from skbio.sequence.nucl import gc_content

Now test the function on some data. For example, you would expect that ``gc_content("ACGT")`` and ``gc_content("GGATCCGC")`` return 0.5 and 0.75, respectively. Is that the case?

We highly recommend that you use real-world biological data in addition to small, dummy data for testing. This will let you evaluate the robustness and scalability of your code in real-world applications. For example, DNA sequences retrieved from public databases may contain lowercase letters, and you will find that the ``gc_content`` function cannot handle them properly. For another instance, a FASTQ file with ten million sequences (which is common) may cost a function forever to process, in which case you should consider optimization.

Unit test
^^^^^^^^^

`Unit testing <https://en.wikipedia.org/wiki/Unit_testing>`_ involves testing the smallest units of your code, such as classes, functions, and methods, to ensure they function correctly in isolation. It is a fundamental best practice in software engineering, but is often overlooked by beginners. Unit testing is made easier by writing test code alongside the algorithm code. Both types of code are integrated into the scikit-bio codebase. This test code is then regularly executed whenever changes are made to ensure that the intended behavior remains consistent over time.

For example, the test code for the ``gc_content`` may live in ``skbio/sequence/tests/test_nucl.py``, under class ``TestNucl``, as a method ``test_gc_content``. It may read like::

    def test_gc_content(self):
        self.assertEqual(gc_content("ACGT"), 0.5)
        self.assertEqual(gc_content("GGATCCGC"), 0.75)
        ...

You can run this test with::

    python skbio/sequence/tests/test_nucl.py TestNucl.test_gc_content

The screen output will tell you whether the test passed, and if not, what went wrong. This information will help you debug your code.

Ideally, every line of the code should be covered by unit test(s). For example, if your function has an ``if`` statement, both ``True`` and ``False`` situations should be tested.

It is a good practice to test all types of cases you can think of, including normal cases and `edge cases <https://en.wikipedia.org/wiki/Edge_case>`_. For example, an empty sequence (``""``) will cause the ``gc_content`` function to crash, because zero cannot serve as a denominator in an equation. Having edge cases like this will help you to identify limitations of your code and think whether you should implement special handling to avoid problems.

You should also test whether the changed code fits into scikit-bio without causing problems in the other parts of the codebase. There is a convenient command to run all unit tests implemented in scikit-bio::

    make test

Alternatively, you may run all unit tests in a Python session (including Jupyter)::

    >>> from skbio.test import pytestrunner
    >>> pytestrunner()

Code coverage
^^^^^^^^^^^^^

`Code coverage <https://en.wikipedia.org/wiki/Code_coverage>`_ refers to the percentage of source code lines covered by unit tests. It is an assessment of the quality of a software project. In scikit-bio, code coverage can be calculated using the following command::

    coverage run -m skbio.test && coverage report

This will report the coverage of each code file and the entire codebase. If the coverage decreased for the file you edited, you may have missed some anticipated unit tests. You can create a detailed HTML report with::

    coverage html

Then open ``htmlcov/index.html`` in a web browser, navigate to the page for the relevant code file, and check which lines of your code are not covered by unit tests. Work on them to bring back coverage.

Please read the :ref:`devdoc/code_guide:How should I test my code?` section of the coding guidelines to learn more about unit testing.


Document code
-------------

`Documentation <https://en.wikipedia.org/wiki/Software_documentation>`_ is a vital part of software engineering, especially for projects like scikit-bio, which involve many contributors and are designed to endure over time. Documentation helps everyone -- users and developers -- get on the same page. It also helps you, as even seasoned developers can lose track of their own coding logic over time. Also remember that scikit-bio brings together people from various fields, and nobody is expected to have the same level of understanding across all disciplines. Therefore, documenting your code with the broader audience in mind is important. This section will cover the basics of documenting your code in a manner that benefits the scikit-bio community at large.

Scikit-bio's :doc:`documentation guidelines <devdoc/doc_guide>` provide more details on how to write effective documentation.

Comments
^^^^^^^^

`Comments <https://en.wikipedia.org/wiki/Comment_(computer_programming)>`_ in the source code explain the rationale to fellow developers. Please make comments frequently in your code, especially where the code itself is not that intuitive. For example::

    # Perform eigendecomposition on the covariance matrix of data.
    # w and v are eigenvalues and eigenvectors, respectively.
    # eigh is used in favor of eig to avoid returning complex numbers due to
    # matrix asymmetry caused by floating point errors.
    # Discussed in: https://stackoverflow.com/questions/12345678/
    w, v = np.linalg.eig(np.cov(X))

Please read the :ref:`devdoc/code_guide:How should I write comments?` section of the coding guidelines to learn more about writing comments.

Docstrings
^^^^^^^^^^

`Docstrings <https://en.wikipedia.org/wiki/Docstring>`_ are structured text blocks associated with each unit of the code that detail the usage of the code. Docstrings will be rendered to the software documentation. That is, *users* (not just developers) will read them. Therefore, docstrings are critical if you want your code to be used, and in the correct way.

Below is a very simple example for the ``gc_content``. The lines between the triple double quotes (``"""``) are the docstring::

    def gc_content(seq):
        """Calculate the GC content of a nucleotide sequence.

        Parameters
        ----------
        seq : str
            Input sequence.

        Returns
        -------
        float
            Fraction of G and C.

        """
        return sum(1 for x in seq if x in "GC") / len(seq)

As shown, the docstring explains the purpose, the parameter(s), and the return value(s) of the function. In more complicated cases, the docstring should also include example usage, potential errors, related functions, mathematics behind the algorithm, references to webpages or literature, etc. Every public-facing component should have a docstring.

Please read the :ref:`devdoc/doc_guide:Docstring style` section of the documentation guidelines to learn more about writing docstrings.

Doctests
^^^^^^^^

You may consider adding **example usages** of your code to its docstring. For example::

    def gc_content(seq):
        """Calculate the GC content of a nucleotide sequence.
        ...

        Examples
        --------
        >>> from skbio.sequence.nucl import gc_content
        >>> gc_content("ACGT")
        0.5

        """
        ...

The example code and its output must match. This is ensured by `doctest <https://docs.python.org/3/library/doctest.html>`_. When you run ``make test`` (see above), doctests are automatically executed as part of the test suite. You may fix any issues according to the screen output.

HTML rendering
^^^^^^^^^^^^^^

After completing docstrings, you will want to check how they look like when rendered to the documentation webpages. You may build the entire HTML documentation package locally with::

    make doc

The built documentation will be at ``doc/build/html/index.html``, and can be examined using your web browser. If errors arise during the building process, or the rendered webpages don't look as anticipated, you should address the issues accordingly.

Changelog
^^^^^^^^^

Please mention your changes in :repo:`CHANGELOG.md <blob/main/CHANGELOG.md>`. This file informs scikit-bio *users* of changes made in each release, so be sure to describe your changes with this audience in mind. It is especially important to note API additions and changes, particularly if they are backward-incompatible, as well as bug fixes. Be sure to make your updates under the section designated for the latest development version of scikit-bio (this will be at the top of the file). Describe your changes in detail under the most appropriate section heading(s). For example, if your pull request fixes a bug, describe the bug fix under the "Bug fixes" section of the changelog. Please also include a link to the issue(s) addressed by your changes.


Style code
----------

`Code style <https://en.wikipedia.org/wiki/Programming_style>`_ is a set of rules for formatting and structuring code in a particular software project. Although violating these rules won't cause errors in executing the code, adhering to them ensures that the codebase remains uniform and professional, and facilitates team collaboration.

Scikit-bio utilizes the `Ruff <https://docs.astral.sh/ruff/>`_ program for autoformatting and linting to ensure code consistency and quality. The rules are specified in :repo:`pyproject.toml <blob/main/pyproject.toml>`. Basically, we largely adopted the `Black <https://black.readthedocs.io/>`_ code style.

When you `set up the development environment <#set-up-a-workspace>`_, Ruff was already installed and integrated into a `pre-commit hook <https://github.com/astral-sh/ruff-pre-commit>`__. This means that Ruff will automatically format and lint your code every time you commit changes (see `Submit code`_ below). Therefore *you are not required to take any explicit action*. However, you can still manually run Ruff to check and fix issues in specific code files you have worked on::

    ruff check --fix mycode.py

If Ruff identifies any errors that cannot be automatically fixed, you will need to manually fix them based on Ruff's feedback. When done, let Ruff reformat your code::

    ruff format mycode.py

You will notice the improvement in your code's appearance before and after using Ruff. While it is always beneficial to strive for professional-looking code from the start, the necessity for perfection has lessened with the advent of tools like Ruff.


Submit code
-----------

Having completed, tested, and documented your code, you may now believe it deserves a place in scikit-bio to benefit the community. This section outlines the steps to submit your code to the official scikit-bio repository.

1. Add any new code file(s) you created to the git repository::

    git add path/to/mycode.py

Alternatively, if you have multiple new files, you can add them all at once::

    git add .

2. `Commit <https://github.com/git-guides/git-commit>`__ the changes (this is like "saving" your code in the current branch)::

    git commit -am "describe what I did"

Here, "describe what I did" is the placeholder of a *commit message*. You should write a meaningful commit message to describe what you did. We recommend following `NumPy's commit message guidelines <https://numpy.org/doc/stable/dev/development_workflow.html#writing-the-commit-message>`_, including the usage of commit tags (i.e., starting commit messages with acronyms such ``ENH``, ``BUG``, etc.).

The `commit` command will trigger the `pre-commit hook <https://pre-commit.com/>`__, which automatically runs Ruff to check and fix any code style problems (see `Style code`_ above). If there are any errors flagged by Ruff, you will need to resolve them and commit again.

3. Merge the latest code from the official scikit-bio repository to your local branch::

    git fetch upstream
    git merge upstream/main

This step is important as it ensures that your code doesn't conflict with any recent updates in the official repository. This could happen as there are other developers simultaneously working on the project.

If there are conflicts, you will need to `resolve the conflicts <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-on-github>`__ by editing the affected files. When done, run ``git add`` on those files, then run ``git commit`` with a relevant commit message (such as "resolved merge conflicts").

4. Run ``make test`` for the last time to ensure that your changes don't cause anything to break.

5. Once the tests pass, you should push your changes to your forked repository on GitHub using::

    git push origin mywork

6. Navigate to the GitHub website, and create a `pull request <https://help.github.com/articles/using-pull-requests>`__ from your ``mywork`` branch to the ``main`` branch of the official scikit-bio repository. Usually, GitHub will prompt you to do so, and you may click the :bdg-success:`Compare & pull request` button to initiate this process. If not, you can invoke a :bdg-success:`New pull request` under the ":octicon:`git-pull-request` pull request" tab.

7. Enter a meaningful title and a description of your code in the pull request. You may mention the issue you attempt to address in the description, such as "Resolves #42". This will `link your pull request to the issue <https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue>`__. You will also need to answer a few questions by either typing an ``x`` in the checkboxes that apply to your code or leaving them empty otherwise. These questions can be found in :repo:`PULL_REQUEST_TEMPLATE.md <blob/main/.github/PULL_REQUEST_TEMPLATE.md>`. When done, click :bdg-success:`Create pull request`.


Review code
-----------

Your `pull request will be reviewed <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/about-pull-request-reviews>`__ by one or more maintainers of scikit-bio. These reviews are intended to confirm a few points:

- Your code provides relevant changes or additions to scikit-bio.
- Your code adheres to our coding guidelines.
- Your code is sufficiently well-tested.
- Your code is sufficiently well-documented.

This process is designed to ensure the quality of scikit-bio and can be a very useful experience for new developers.

Typically, the reviewer will launch some automatic checks on your code. These checks are defined in :repo:`ci.yml <blob/main/.github/workflows/ci.yml>`. They involve:

- Full unit test suite and doctests execute without errors in all supported software and hardware environments.
- C code can be correctly compiled.
- Cython code is correctly generated.
- Documentation can be built.
- Code coverage is maintained or improved.
- Code passes linting.

The checks may take several to a few dozen minutes. If some check(s) fail, you may click "Details" in these checks to view the error messages, and fix the issues accordingly.

Meanwhile, the reviewer will comment on your code inline and/or below. They may request changes (which is very common). Please work with the reviewer to improve your code.

You should revise your code in your local branch. When completed, commit and push your code again (steps 1-5 of `Submit code`_). This will automatically update your pull request and restart the checks. *Don't issue a new pull request*.

.. note:: Particularly for big changes, if you'd like feedback on your code in the form of a code review as you work, you should request help in the issue that you created and one of the scikit-bio maintainers will work with you to perform regular code reviews. This can greatly reduce development time. We highly recommend that new developers take advantage of this rather than submitting a pull request with a massive amount of code. That can lead to frustration when the developer thinks they are done but the reviewer requests large amounts of changes, and it also makes it harder to review.

Please read :doc:`devdoc/review` for more details on how pull requests should be reviewed in the scikit-bio project.

After your code has been improved and the reviewer has approved it, they will `merge your pull request <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/incorporating-changes-from-a-pull-request/merging-a-pull-request>`__ into the ``main`` branch of the official scikit-bio repository. This will be indicated by a note: :octicon:`git-merge` "Pull request successfully merged and closed".

Congratulations! Your code is now an integral part of scikit-bio, and will benefit the broader community. You have successfully completed your contribution, and we extend our appreciation to you! 🎉🎉🎉"