File: coding_guidelines.rst

package info (click to toggle)
mistral 21.0.0-6
links: PTS, VCS
area: main
in suites: forky, sid
size: 16,140 kB
sloc: python: 54,052; sh: 701; makefile: 58
file content (543 lines) | stat: -rw-r--r-- 20,734 bytes
parent folder | download | duplicates (4)
Mistral Coding Guidelines
=========================

Why learn more coding guidelines?
---------------------------------

This document contains the description of the guidelines used for writing
code on the Mistral project. Some of the guidelines follow from the
nature of Python programming language (dynamic types, etc), some are the
result of the consensus achieved by the project contributors during many
contribution cycles. All contributors need to follow these guidelines
when contributing to Mistral. The purpose of having such well described
practices is improving communication between team members, reducing a number
of controversial situations related to how a certain code snippet should
be written, and letting contributors focus on unique engineering tasks
rather than low level decisions that any engineer makes many times a day,
like choosing a good name for a class or variable, or how to organise a loop,
whether they need to put a blank line before "if" or "return", in what cases
and so on. The document, when accepted and followed by team members, aims
to improve overall development speed and quality.

Note that the guidelines described below almost don't conflict with the
official PEP8 style guide (https://www.python.org/dev/peps/pep-0008/)
describing how any Python program should be formatted. Mistral guidelines
add more high level semantics on top of it. PEP8 still should be considered
a necessary base for those who write Python programs. Strictly speaking,
this guide is not exactly about style, it's about writing maintainable code.

Some of the concepts being discussed below may seem a bit too philosophical
but, in fact, they reflect our real experience of solving practical tasks.
According to it, some decisions work well and some don't given the Python
programming nature and the nature of the project.

The guidelines are based on the three main values:

- **Communication.** When writing code we always try to create it in a way
  that it's easy to read and understand. This is important because most of
  the time developers spend on reading existing code, not writing new.
- **Simplicity.** It makes sense to write code that uses minimal means that
  solves a task at hand.
- **Flexibility.** In most cases it's better to keep options open because
  in programming there's no such thing as "done". Pretty much all code gets
  modified during lifecycle of a program working in production. This is a
  very important difference form designing real buildings where we can't
  rebuild a basement after a building is completed and inhabited by people.

In the document, there is a number of code snippets, some are considered
incorrect and some are correct, with explanations why from the perspective
of these key values.

Further sections are devoted to certain aspects of writing readable code.
They will typically have a list of guidelines upfront so it's easy to find
them in the text and then more details on them, if needed.

Naming
------

Naming is always important in programing. Good naming should always serve
for better communication, i.e. a name of a code entity (module, variable,
class etc.) should convey a clear message to a code reader about the meaning
of the entity. For dynamic languages naming is even more important than for
languages with static types. Below it will be shown why.

Using Abbreviations
^^^^^^^^^^^^^^^^^^^

Guidelines
''''''''''

 - *Use well-known abbreviations to name variables, method arguments, constants*

Well-known abbreviations (shortcuts) used for names of constants, local
variables and method arguments help simplify code, since it becomes less
verbose, and simultaneously improve communication because all team members
understand them the same way. Below is a list of abbreviations used on the
Mistral project. The list is not final and is supposed to grow over time.

 - **app** - application
 - **arg** - argument
 - **cfg** - configuration
 - **cls** - class
 - **cmd** - command
 - **cnt** - count
 - **ctx** - context
 - **db** - database
 - **desc** - description
 - **dest** - destination
 - **def** - definition
 - **defs** - definitions
 - **env** - environment
 - **ex** - execution
 - **execs** - executions
 - **exc** - exception
 - **log** - logger
 - **ns** - namespace
 - **info** - information
 - **obj** - object
 - **prog** - program
 - **spec** - specification
 - **sync** - synchronous
 - **wf** - workflow
 - **wfs** - workflows

Note that when using a well-known abbreviation to name a constant we need to
use capital letters only (just as required by PEP8).

Local Variables and Method Arguments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Guidelines
''''''''''

 - *The name of a method argument should clearly communicate its semantics
   and purpose*
 - *The name of a method argument or a local variable should include a type
   name if it creates an ambiguity otherwise*
 - *The name of a local variable can be shortened from a full word (up to one
   letter) if the scope of the variable is very narrow (several lines) and if
   it doesn't create ambiguities*

Principles of choosing names in dynamic languages like Python are especially
important because oftentimes we don't clearly see of what type a certain
variable is. Whereas a type itself (no matter if it's primitive or represented
by a class) carries very important information about all objects of this type.
For example, if we see a variable in Java or C++ code it's not a problem to
find out of what type this variable is, we can easily find where the variable
is defined. Any modern IDE also allows to navigate to a type declaration (e.g.
if it's a class) just by clicking on it and see all required info about it.
In Python, it's often much more problematic to find a variable type. Injecting
a type name, at least a shortcut, into a variable name often helps mitigate
this fundamental issue and improve code readability, and hence communication.

Below are the code snippets that help illustrate this problem. Let's assume we
have the following Python classes:

 .. code-block:: python

    class TaskExecution(Execution):
        # Carries information about a running task.
        ...

    class TaskSpec(BaseSpec):
        # Defines a work logic of a particular task.
        ...

For what we want to illustrate, it's not important if they belong to the same
module or different.

Problematic Code
''''''''''''''''

 .. code-block:: python

    def calculate_next_tasks(self, task, context):
        result = set()

        for clause in task.get_on_clauses():
            task_names = self.evaluate_on_clause(clause, context)

            tasks = [self.get_task_by_name(t_n) for t_n in task_names]

            result.update(tasks)

        return result

Is this method easy to understand? Well, if this code is part of a small
program (e.g. a 200-300 lines script) then it may be ok. But when it's a
system with dozens of thousands lines of code then it has a number of issues.

The most important issue is that we don't know the type of the "task" method
argument. In order to find it we'll have to see where this method is called
and what is passed as an argument. Then, if an object is not created there
we'll have to go further and find callers of that method too, and so on until
we find where the object is actually instantiated. The longer the method call
chains are, the worse. So, jus by looking at this code we can't determine
whether the argument "task" is of type TaskSpec or TaskExecution. So for
example, we can't even say for sure if ``task.get_on_clauses()`` is a correct
instruction. If we have hundreds of places like this, it is very challenging
to read code and it is very easy to make a mistake when modifying it.

The other obvious issues are also related to naming:

#. It's not clear objects of what type are returned by the method
#. It's not clear objects of what type is returned by
   ``self.get_task_by_name(t_n)``

Better Code
'''''''''''

 .. code-block:: python

    def calculate_next_task_specs(self, task_spec, context):
        result = set()

        for clause in task_spec.get_on_clauses():
            task_names = self.evaluate_on_clause(clause, context)

            task_specs = [self.get_task_spec_by_name(t_n) for t_n in task_names]

            result.update(task_specs)

        return result

Now we won't be confused. Of course, we still have to remember we have those
two classes but at least we have a clear pointer to a type.

Functions and Methods
^^^^^^^^^^^^^^^^^^^^^

Guidelines
''''''''''

 - *The name of a method should clearly communicate its semantics and purpose*
 - *The name of a method should include a type name of a returned value when it
   creates an ambiguity otherwise*


For example, there are two classes like these again:

 .. code-block:: python

    class TaskExecution(Execution):
        # Carries information about a running task.
        ...

    class TaskSpec(BaseSpec):
        # Defines a work logic of a particular task.
        ...

And we need a method that implements some logic and returns an instance of
**TaskSpec**. How would we choose a good name for the method?

Problematic Code
''''''''''''''''

.. code-block:: python

    def calculate_parent_task(self, task_spec):
        ...

Looking at this method signature it's not clear what to expect as a returned
value since in Python method declarations don't contain a returned type value.
Although there's a temptation to use just "task" in the method name it leads
to the naming ambiguity: a returned value may be either of **TaskExecution**
or **TaskSpec** type. Strictly speaking, it may be any type since it's Python
but we can help a reader of this code a little bit and leave a hint about
what's going to be returned from the method when it's called.

At first glance, it may seem a bit weird why we pay attention to such things.
One may say that it's already totally clear that need to use full "task_spec"
in the method name instead of "task" just according to the name of the class.
Why anybody may want to use "task"? So, the reality is that during informal
communication within a team people tend to simplify/shorten words. That is,
when several team members are working on something related to task
specifications within a certain scope (module, class, etc.) they often move
to using a simplified terminology and instead of saying "task specification"
they start saying just "task". And this kind of habit often also sneaks into
code. And eventually it breaks communication between code author and code
reader.

Better Code
'''''''''''

.. code-block:: python

    def calculate_parent_task_spec(self, task_spec):
        ...

Although it's more verbose, it allows to mitigate the naming ambiguity for
a code reader.

Constants
^^^^^^^^^

Guidelines
''''''''''

 - *All hardcoded values (strings, numbers, etc.) must be defined as
   constants, i.e. global module variables*
 - *Constants must be defined in the beginning of a module*

Problematic Code
''''''''''''''''

.. code-block:: python

    def print_task_report_line(self, line, level=0):
        self.app.stdout.write(
            "task: %s%s\n" % (' ' * (level * 4), line)
        )

    ...

    def print_action_report_line(self, line, level=0):
        self.app.stdout.write(
            "action: %s%s\n" % (' ' * (level * 4), line)
        )

The problem of this code is that it uses hard-coded integer values at
different places. Why is it a problem?

 - It's hard to find hard-coded values while reading code
 - It's hard to understand whether same values mean semantically the same
   thing, or different
 - It's easy to make a mistake when changing a hard-coded value because we
   can change it at one place and miss other places

Better Code
'''''''''''

.. code-block:: python

    REPORT_ENTRY_INDENT = 4
    DEFAULT_ENTRY_LEVEL = 0

    def print_task_report_line(self, line, level=DEFAULT_ENTRY_LEVEL):
        self.app.stdout.write(
            "task: %s%s\n" % (' ' * (level * REPORT_ENTRY_INDENT), line)
        )

    ...

    def print_action_report_line(self, line, level=DEFAULT_ENTRY_LEVEL):
        self.app.stdout.write(
            "action: %s%s\n" % (' ' * (level * REPORT_ENTRY_INDENT), line)
        )

Now the code clearly communicates to a reader that value 4 in these two
methods means exactly the same entity: an indent of any report entry. The
other constant similarly adds clarity about value 0. Previously, these two
integers were nameless. It's now also easy to change values, we just need
to set different values to the named constants.

Grouping and Blank Lines
------------------------

Guidelines
^^^^^^^^^^
 - *Use blank lines to split individual steps (units) of algorithms.*
 - *Always put blank lines before and after "if", "for", "try" and "with"
   blocks if they don't contain one another w/o anything else in between.*
 - *Always put a blank line before "return" unless it's the only instruction
   in en enclosing code block like a method or an "if" block.*
 - *Use blank lines to split logically not symmetric lines, e.g. less
   abstract and more abstract lines like variable assignment and a method
   call.*
 - *Put a blank line after any call to a superclass.*

Although for someone it may not seem important, blank lines consciously put
in code can improve readability. The general recommendation is to use blank
lines to separate different logical blocks. When writing code it is useful
to ask ourselves the question "what are the main steps of the algorithm I'm
implementing?". When answered, it gives understanding of how code can be
decomposed into sections. And in order to reflect that they are individual
steps of the algorithm, the corresponding code blocks can be split by blank
lines. Let's consider examples.

Problematic Code
^^^^^^^^^^^^^^^^

.. code-block:: python

    def update_task_state(self, state, state_info=None):
        old_task_state = self.task_ex.state
        if states.is_completed(self.task_ex.state):
            self.notify(old_task_state, self.task_ex.state)
            return
        if not states.is_valid_transition(self.task_ex.state, state):
            return
        child_states = [a_ex.state for a_ex in self.task_ex.executions]
        if state == states.RUNNING and states.PAUSED in child_states:
            return
        self.set_state(state, state_info)
        if states.is_completed(self.task_ex.state):
            self.register_workflow_completion_check()
        self.notify(old_task_state, self.task_ex.state)

Is this method easy to read? The method does a lot of things: it invokes
other methods, checks conditions, calculates values and sets them to variables.
Even more importantly, all of that are different computational steps of the
method. However, they appear in the code one by one without any gaps. So when
a human eye is reading this code there isn't any element in the code that would
tell us where the next step of the algorithm starts. And a blank line can
naturally play a role of such element.

Better Code
^^^^^^^^^^^

.. code-block:: python

    def update_task_state(self, state, state_info=None):
        old_task_state = self.task_ex.state

        if states.is_completed(self.task_ex.state):
            self.notify(old_task_state, self.task_ex.state)

            return

        if not states.is_valid_transition(self.task_ex.state, state):
            return

        child_states = [a_ex.state for a_ex in self.task_ex.executions]

        if state == states.RUNNING and states.PAUSED in child_states:
            return

        self.set_state(state, state_info)

        if states.is_completed(self.task_ex.state):
            self.register_workflow_completion_check()

        self.notify(old_task_state, self.task_ex.state)

Now when we read the method, we clearly see the individual steps:

 - Saving an old task state for further usage
 - Check if task is already completed and if it is, notify clients about it
   and return
 - Check if the state transition we're going to make is valid
 - Calculate state of the child entities
 - Do not proceed with updating the task state if there any running or paused
   child entities
 - Actually update the task state
 - If the task is now completed, schedule the corresponding workflow completion
   check
 - Notify clients about a task transition

Of course, when writing code like this it may be hard to format code this way
in the first place. But once we already have some version of code, we should
take care of people who will surely be reading it in future. After all, the
author may be reading it after some time. Again: programs are read much more
often than written. So we need to make sure our code tells a good story about
what it serves to.

As far as putting a blank line before "if", "try", "for" and "with", the
reasoning is pretty straightforward: all these code flow controls already
reflect separate computational steps because they do something that's
different from the previous command by nature. For example, "if" may route a
program in a different direction at runtime. So all these blocks should be
clearly visible to a reader. "return" is also an outstanding command since it
stops the execution of the current method and gives control back to the caller
of the method. So it also deserves to be well visible.

Using blank lines consciously can also make code more symmetric. That is,
if we don't mix up significantly different commands.

Problematic Code
^^^^^^^^^^^^^^^^

.. code-block:: python

    var1 = "some value"
    my_obj.method1()
    my_obj.method2()
    var2 = "another value"

    ... # The rest of the code snippet

What's wrong with this code? The thing is that we mixed up lines where we do
absolutely different things. Two of them just do set string values to the two
new variables whereas the other two send messages to a different object, i.e.
give it command to do something. In other words, the two lines here are more
abstract than the two others since they don't run any concrete calculation,
it is hidden by method calls against other object. So this code is not
symmetric, it doesn't group commands of similar nature together and it doesn't
separate them from each other.

Better Code
^^^^^^^^^^^

.. code-block:: python

    var1 = "some value"
    var2 = "another value"

    my_obj.method1()
    my_obj.method2()

    ... # The rest of the code snippet

This code fixes the mentioned issues and note that, again, a blank line
clearly communicates that a more abstract block starts and that this block
can and should be maintained separately.


Multi-line Method Calls
-----------------------

Guidelines
^^^^^^^^^^
 - *Long method calls that don't fit into 80 characters must be written down
   in a way that each argument is located on an individual line, as well as
   the closing round bracket.*

All code lines need to be not longer than 80 characters. Once in a while it's
required to break lines when we deal with long instructions. For example, when
we need to write a method call with lots of arguments or the names of the
arguments are long enough so the entire code instruction doesn't fit into
80 characters.

Problematic Code
^^^^^^^^^^^^^^^^

.. code-block:: python

    executor.run_action(self.action_ex.id, self.action_def.action_class,
        self.action_def.attributes or {}, self.action_ex.input,
        self.action_ex.runtime_context.get('safe_rerun', False),
        execution_context, target=target, timeout=timeout)

On many Python projects this way or breaking lines for long method calls
is considered right. However, when we need to read and understand this code
quickly we may experience the following issues:

 - Hard to see where one argument ends and where another one starts
 - Hard to check if the order of arguments is correct
 - If such method call declaration is followed by another code line (e.g.
   another method call) then it's hard to see where the method call
   declaration ends

Better Code
^^^^^^^^^^^

.. code-block:: python

    executor.run_action(
        self.action_ex.id,
        self.action_def.action_class,
        self.action_def.attributes or {},
        self.action_ex.input,
        self.action_ex.runtime_context.get('safe_rerun', False),
        execution_context,
        target=target,
        timeout=timeout
    )

Although the second version of the method call sacrifices conciseness to
some extent, it eliminates the issues mentioned above. Every method argument
is easily visible, it's easy to check the number of the arguments and their
order (e.g. to compare with the method signature) and it's easy to see where
the entire command ends because the ending round bracket on a separate line
communicates it clearly.