File: inference.tex

package info (click to toggle)
checker-framework-java 3.2.0%2Bds-3
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 23,104 kB
sloc: java: 145,916; xml: 839; sh: 518; makefile: 404; perl: 26
file content (344 lines) | stat: -rw-r--r-- 14,390 bytes
parent folder | download | duplicates (2)
\htmlhr
\chapterAndLabel{Type inference}{type-inference}

There are two different tasks that are commonly called ``type inference'':
\begin{enumerate}
\item
  Type inference during type-checking:
  The type-checker fills in an appropriate type where the programmer didn't
  write one, but does not change the source code.
  See Section~\ref{type-inference-refinement}.
\item
  Type inference to annotate a program:
  As a separate step before type-checking, a type inference tool
  inserts type qualifiers into the source code.
  See Section~\ref{type-inference-to-annotate}.
\end{enumerate}

Each variety has its own advantages, discussed below.
Advantages of \emph{all} varieties of type inference include:
\begin{itemize}
\item
  Less work for the programmer.
\item
  The tool chooses the most general type, whereas a programmer might
  accidentally write a more-specific, less-generally-useful annotation.
\end{itemize}


\sectionAndLabel{Local type inference during type-checking}{type-inference-refinement}

During type-checking, if certain variables have no programmer-written type qualifier, the
type-checker determines whether there is some type qualifier that would
permit the program to type-check.  If so, the type-checker uses that type
qualifier, but does not change the source code.  Each time the
type-checker runs, it re-infers the type qualifier for that variable.  If
no type qualifier exists that permits the program to type-check, the
type-checker issues a warning.

Local type inference is built into the Checker Framework.
Every checker automatically uses it.  As a result, a programmer typically
does not have to write any qualifiers inside the body of a method
(except occasionally on type arguments).
However, it primarily
works within a method, not across method boundaries.
The source code must already contain annotations for method
signatures (arguments and return values) and fields.

Advantages of this variety of type inference include:
\begin{itemize}
\item
  If the type qualifier is obvious to the programmer, then omitting it
  can reduce annotation clutter in the program.
\item
  If the code changes, then there is no old annotation that
  might need to be updated.
\item
  Within-method type inference occurs automatically.
  The programmer doesn't have to do anything to take advantage of it.
\end{itemize}

For more details about local type inference during type-checking, also
known as ``flow-sensitive type refinement'', see
Section~\ref{type-refinement}.


\sectionAndLabel{Type inference to annotate a program}{type-inference-to-annotate}

As a separate step before type-checking, a type inference tool takes the
program as input, and outputs a set of type qualifiers that would
make the program type-check.  (If no such set exists, for example because
the program is not type-correct, then the inference tool does its best but
makes no guarantees.)
These qualifiers are inserted into the source code or the
class file.  They can be viewed and adjusted by the programmer, and can
be used by tools such as the type-checker.

Advantages of this variety of type inference include:
\begin{itemize}
\item
  The inference may be more precise by taking account of the entire program
  rather than just reasoning one method at a time.
\item
  The program source code contains documentation in the form of type
  qualifiers, which can aid programmer understanding and may make
  type-checking warnings more comprehensible.
\end{itemize}


\subsectionAndLabel{Type inference tools}{type-inference-tools}

This section lists tools that take a program and output a set of
annotations for it.
It first lists tools that work only for a single type system (but may do a
more accurate job for that type system)
then lists general tools that work for any type system.


\subsubsectionAndLabel{Type inference for specific type systems}{type-inference-tools-specialized}

Section~\ref{nullness-inference} lists several tools that infer
annotations for the Nullness Checker.

If you run the Checker Framework with the \<-AsuggestPureMethods>
command-line option, it will suggest methods that can be marked as
\<@SideEffectFree>, \<@Deterministic>, or \<@Pure>; see
Section~\ref{type-refinement-purity}.


\subsubsectionAndLabel{Type inference for any type system}{type-inference-tools-general}

By supplying the \<-Ainfer> command-line option,
any type-checker can infer annotations. See Section~\ref{whole-program-inference}.

\href{https://github.com/reprogrammer/cascade/}{Cascade}~\cite{VakilianPEJ2014}
is an Eclipse plugin that implements interactive type qualifier inference.
Cascade is interactive rather than fully-automated:  it makes it easier for
a developer to insert annotations.
Cascade starts with an unannotated program and runs a type-checker.  For each
warning it suggests multiple fixes, the developer chooses a fix, and
Cascade applies it.  Cascade works with any checker built on the Checker
Framework.
You can find installation instructions and a video tutorial at \url{https://github.com/reprogrammer/cascade}.


\sectionAndLabel{Whole-program inference}{whole-program-inference}

Whole-program inference is an interprocedural inference that
infers types for fields, method parameters, and method return types that do not
have a user-written qualifier (for the given type system).
The inferred type qualifiers are inserted into
your program.
The inferred type is the most specific type that is compatible with all the
uses in the program.  For example, the inferred type for a field is the
least upper bound of the types of all the expressions that are assigned
into the field.

\begin{sloppypar}
To use whole-program inference, make sure that
\<insert-annotations-to-source>, from the Annotation File Utilities project,
is on your path (for example, its directory is in the \<\$PATH> environment variable).
Then, run the script \<checker-framework/checker/bin/infer-and-annotate.sh>.
Its command-line arguments are:
\end{sloppypar}

\begin{enumerate}
\item Optional: Command-line arguments to
  \href{https://checkerframework.org/annotation-file-utilities/#insert-annotations-to-source}{\<insert-annotations-to-source>}.
\item Processor's name.
\item Target program's classpath.  This argument is required; pass "" if it
  is empty.
\item Optional: Extra processor arguments which will be passed to the checker, if any.
  You may supply any number of such arguments, or none.  Each such argument
  must start with a hyphen.
\item Optional: Paths to \<.jaif> files used as input in the inference
    process.
\item Paths to \<.java> files in the program.
\end{enumerate}

% TODO: Change the example project that is being annotated, since plume-lib is now deprecated.
For example, to add annotations to the \<plume-lib> project:
\begin{Verbatim}
git clone https://github.com/mernst/plume-lib.git
cd plume-lib
make jar
$CHECKERFRAMEWORK/checker/bin/infer-and-annotate.sh \
    "LockChecker,NullnessChecker" java/plume.jar:java/lib/junit-4.12.jar:$JAVA_HOME/lib/tools.jar \
    `find java/src/plume/ -name "*.java"`
# View the results
git diff
\end{Verbatim}

You may need to wait a few minutes for the command to complete.
You can ignore warnings that the command outputs while it tries different
annotations in your code.

It is recommended that you run \<infer-and-annotate.sh> on a copy of your
code, so that you can see what changes it made and so that it does not
change your only copy.  One way to do this is to work in a clone of your
repository that has no uncommitted changes.

Whole-program inference differs from type refinement (Section~\ref{type-refinement})
in three ways.  First, type refinement only works within a method body.
Second, type refinement always
refines the current type, regardless of whether the value already has an
annotation in the source code.
Third, whole-program inference can infer a subtype
or a supertype of the default type, by contrast with type refinement which
always refines the current type to a subtype.


\subsectionAndLabel{Whole-program inference ignores some code}{whole-program-inference-ignores-some-code}

Whole-program inference ignores code within the scope of a
\<@SuppressWarnings> annotation with an appropriate key
(Section~\ref{suppresswarnings-annotation}).  In particular, uses within
the scope do not contribute to the inferred type, and declarations within
the scope are not changed.  You should remove \<@SuppressWarnings> annotations
from the class declaration of any class you wish to infer types for.

As noted below, whole-program inference requires invocations of your code, or
assignments to your methods, to generalize from.  If a field is set via
reflection (such as via injection), then whole-program inference would produce
an inaccurate result.  There are two ways to make whole-program inference
ignore such a field.
%
(1)
You probably have an annotation such as
\javaeejavadocanno{javax/inject/Inject.html}{Inject}
or
\href{https://types.cs.washington.edu/plume-lib/api/plume/Option.html}{\<@Option>}
that indicates such fields.  Meta-annotate the declaration of the \<Inject>
or \<Option> annotation with
\refqualclass{framework/qual}{IgnoreInWholeProgramInference}.
%
(2)
Annotate the field to be ignored with
\refqualclass{framework/qual}{IgnoreInWholeProgramInference}.

Whole-program inference, for a type-checker other than the Nullness Checker,
ignores (pseudo-)assignments where the right-hand-side is the \<null> literal.


\subsectionAndLabel{Manually checking whole-program inference results}{whole-program-inference-manual-checking}

As with any type inference tool, it is a good idea to manually examine the
results.

\begin{itemize}
\item
Whole-program inference can produce undesired results when your code has
non-representative or erroneous calls to a particular method or assignments to a
particular field, as explained below.
This is especially noticeable when the arguments or assignments are literals.

\item
If an annotation is inferred for a \emph{use} of type variables;
it might be more appropriate for you to move those annotations
to the corresponding upper bounds of the type variable \emph{declaration}.

\end{itemize}


\subsubsectionAndLabel{Poor whole-program inference results due to non-representative uses}{whole-program-inference-non-representative-uses}

Whole-program inference determines a method parameter's type
annotation based on what arguments are passed to the method, but not on how the
parameter is used within the method body.

\begin{itemize}
\item
If the program contains erroneous calls, the
inferred annotations may reflect those errors.

Suppose you intend method \<m2> to be called with
non-null arguments, but your program contains an error and one of the calls
to \<m2> passes \<null> as the argument.  Then the tool will infer that
\<m2>'s parameter has \<@Nullable> type.
You should correct the bug and re-run inference.

\item
If the program uses (say) a method in a limited way, then the inferred
annotations will be legal for the program as
currently written but may not be as general as possible and may not
accommodate future program changes.

Here are some examples:

\begin{itemize}
\item
Suppose that your program currently calls
method \<m1> only with non-null
arguments.  The tool will infer that \<m1>'s parameter has
\<@NonNull> type.  If you had intended the method to be able to
take \<null> as an argument and you later add such a call, the type-checker
will issue a warning because the automatically-inserted \<@NonNull>
annotation is inconsistent with the new call.

\item
If your program (or test suite) passes only \<null> as an argument, the
inferred type will be the bottom type, such as \<@GuardedByBottom>.

\item
It is common for whole-program inference to infer
\<@Interned> and \<@Regex> annotations on String variables for which the
analyzed code only uses a constant string.

\end{itemize}

In each case, you can correct the inferred results manually, or you can
add tests that pass additional values then re-run inference.

\end{itemize}


\subsectionAndLabel{How whole-program inference works}{how-whole-program-inference-works}

This section explains how the \<infer-and-annotate.sh> script works.  If you
merely want to run the script and you are not encountering trouble, you can
skip this section.

The \<infer-and-annotate.sh> script repeatedly runs the following steps:
\begin{enumerate}
\item Run the checker with the \<-Ainfer> command-line option to infer
  types for fields and method signatures.  The output of this step
  is a \<.jaif> file that records the inferred types.
\item Insert the inferred annotations in the program.  This step uses the
  Annotation File Utilities
  (\myurl{https://checkerframework.org/annotation-file-utilities/}).
\end{enumerate}

The process halts when there are no more changes to the inference results,
that is, the \<.jaif> file is unchanged between two runs.  On each
iteration through the process, there may be new annotations in the \<.jaif>
file, and some type-checking errors may be eliminated (though others might
be introduced).

When the type-checker is run on the program with the final annotations
inserted, there might still be errors.  This may be because the tool did
not infer enough annotations, or because your program cannot typecheck
(either because contains a defect, or because it contains subtle code that
is beyond the capabilities of the type system).
However, each of the inferred annotations is sound, and this reduces your
manual effort in annotating the program.

The iterative process is required because type-checking is modular:  it
processes each class and each method only once, independently.  Modularity
enables you to run type-checking on only part of your program, and it makes
type-checking fast.  However, it has some disadvantages:
\begin{itemize}
\item
  The first run of the type-checker cannot take advantage of whole-program
  inference results because whole-program inference is only complete at the
  end of type-checking, and modular type-checking does not revisit any
  already-processed classes.
\item
  Revisiting an
  already-processed class may result in a better estimate.
\end{itemize}



%%  LocalWords:  Ainfer java jaif plugin classpath m2 m1 multi javax
%%  LocalWords:  AsuggestPureMethods CHECKERFRAMEWORK GuardedByBottom
%%  LocalWords:  IgnoreInWholeProgramInference