Python Guide
============

Project Discovery
-----------------

The following filter and items are vital for analyzing Python projects. They are responsible for
discovering source files that can later be analyzed. The following configuration will read a *ProjectLocation* item from
input, discover all source files, and write the items associated with the source files to the file *report.yml*.


.. code-block:: mortar

    channel input

    filter builtin.python.project:PythonProjectFilter project
    ignore_special_files: false

    report builtin.writer.filesystem:YAMLWriterReport out
    path: "report.yml"

    input -> project -> out

The pipeline is started with the following command.

.. code-block:: bash

    bauklotz <config_file_above> <python_project_dir> input


This should create a file called *report.yml* with rather unspectacular data like the following (cropped) list:


.. code-block:: yaml

    facts: {}
    item: bauklotz/__init__.py
    labels: []
    ---
    facts: {}
    item: bauklotz/console.py
    labels: []
    ---
    facts: {}
    item: bauklotz/configuration/__init__.py
    labels: []
    ---
    facts: {}
    item: bauklotz/configuration/dsl/__init__.py
    labels: []


Statement count
---------------

Lines of code or number of statements are a good starting metric for projects. The following configuration will add this
information as facts to the *PythonSourceFile* items.

.. code-block:: mortar

    channel input

    filter builtin.python.project:PythonProjectFilter project
    ignore_special_files: false

    filter builtin.python.file:PythonStatementCountFilter statement

    report builtin.writer.filesystem:YAMLWriterReport out
    path: "report.yml"


    input -> project -> statement -> out


This will yield a similar list like above, but with some facts added:
 * lines_of_code: Number of lines in the file that contain a Python statement
 * statement_count: Number of expressions found in the Python file.
 * expression_loc_ratio: Ratio between *statement_count* and *lines_of_code*. Crude metric for code density.

The file should look like this:

.. code-block:: yaml

    facts:
      expression_loc_ratio: 0.0
      lines_of_code: 0
      statement_count: 0
    item: bauklotz/__init__.py
    labels: []
    ---
    facts:
      expression_loc_ratio: 3.3
      lines_of_code: 27
      statement_count: 89
    item: bauklotz/console.py
    labels: []
    ---
    facts:
      expression_loc_ratio: 0.0
      lines_of_code: 0
      statement_count: 0
    item: bauklotz/configuration/__init__.py
    labels: []
    ---
    facts:
      expression_loc_ratio: 0.0
      lines_of_code: 0
      statement_count: 0
    item: bauklotz/configuration/dsl/__init__.py
    labels: []
    ---
    facts:
      expression_loc_ratio: 3.79
      lines_of_code: 103
      statement_count: 390
    item: bauklotz/configuration/dsl/tokenizer.py
    labels: []


Class and Methods
-----------------

Statement count and lines of code for files are a great start but it would be nicer to get a better view on the distribution
of code between classes and methods. In order to to so classes and methods must be first extracted with the
*PythonClassFilter* and *PythonMethodFilter*. The resulting items can be routed to the statement filter.


.. code-block:: mortar

    channel input

    filter builtin.python.project:PythonProjectFilter project
    ignore_special_files: false

    filter builtin.python.file:PythonStatementCountFilter statement

    filter builtin.python.definition:PythonClassFilter classes

    filter builtin.python.definition:PythonMethodFilter methods

    report builtin.writer.filesystem:YAMLWriterReport out
    path: "report.yml"


    input -> project -> classes -> statement -> out
    project -> statement
    classes -> methods -> statement


The report should be filled with entries like:


.. code-block:: yaml


    facts:
      abstract: false
      expression_loc_ratio: 2.67
      interface: false
      lines_of_code: 3
      methods:
      - __bool__
      statement_count: 8
      type_parameters: {}
    item:
      body: "class BooleanToken(Token):\n    def __bool__(self) -> bool:\n        return\
        \ self.content in (\"true\", \"yes\")"
      module: bauklotz.configuration.dsl.tokenizer
      name: BooleanToken
    labels: []
    ---
    facts:
      expression_loc_ratio: 8.5
      lines_of_code: 2
      statement_count: 17
    item:
      args:
      - _type: argument
        name: self
        type: None
      - _type: argument
        name: filter_uri
        type: str
      - _type: argument
        name: name
        type: str
      - _type: argument
        name: config
        type: JSONType
      body: "def build_filter(self, filter_uri: str, name: str, config: JSONType) -> Filter[Item,\
        \ Item, FilterConfig]:\n    return self.get_location(filter_uri).create_filter(name,\
        \ config)"
      class: bauklotz.configuration.catalog.Catalog
      generics: []
      name: build_filter
      returns: null
    labels: []
    ---
    facts:
      classes:
      - MortarParser
      expression_loc_ratio: 4.89
      lines_of_code: 129
      statement_count: 631
    item: bauklotz/configuration/dsl/parser.py
    labels: []
    ---


Note that after passing the *PythonClassFilter* the names of the classes are passed as facts to the *PythonSourceFile*
items.


Imports
-------

Dependencies are critical when it comes to search for architectural problems. The *PythonImportFilter* can extract
the dependencies of a project while the *DependencyNetworkFilter* can create a graph representation of the dependency
graph.


.. code-block:: mortar

    channel input

    filter builtin.python.project:PythonProjectFilter project
    ignore_special_files: false

    filter builtin.python.file:PythonStatementCountFilter statement

    filter builtin.python.definition:PythonClassFilter classes

    filter builtin.python.definition:PythonMethodFilter methods

    filter builtin.python.file:PythonImportFilter imports

    filter.builtin.python.network:DependencyNetworkFilter importNet

    report builtin.writer.filesystem:YAMLWriterReport out
    path: "report.yml"

    report builtin.writer.filesystem:YAMLWriterReport importOut
    path: "imports.yml"

    report builtin.writer.graph:GraphWriterReport importGraph
    path: "imports.gml"


    input -> project -> classes -> statement -> out
    project -> statement
    project -> imports -> importOut
    imports -> importNet -> importGraph
    classes -> methods -> statement


The imports.yaml file should contain entries like

.. code-block:: yaml

    facts: {}
    item:
      dependant: bauklotz.console
      dependency_source: argparse
      imported_artifacts:
        ArgumentParser: ArgumentParser
    labels: []
    ---
    facts: {}
    item:
      dependant: bauklotz.console
      dependency_source: functools
      imported_artifacts:
        partial: partial
    labels: []
    ---


The graph output should reside in *importGraph_0.gml* (for each graph written the counter is incremented so you can use
the same writer for different dependency graphs). It can be rendered by programs like *Gephi* but opening it in a text
editor is also possible.


Filtering with labels
---------------------

Sometimes only entries with a certain quality should go into a report. In *Bauklotz* this can be done by using labels.
Label filtering only works on arrows leading to a report, for filters they are ignored for now. In this example only
items with label *long* will go to the report.


.. literalinclude:: only_long_statements.bauklotz
   :language: mortar


The config references a file called *long_statement.py* which contains the logic for applying labels. A rather low number
is used in this example for determining if a method is long, so that some methods will show up. Unless an absolute path
or otherwise configured, imported file path are relative to the configuration file. The behaviour can be controlled
with the *-config-relative-paths* (default) and *-no-config-relative-paths* flags when running *Bauklotz*.


.. literalinclude:: long_statement.py
   :language: python

The language for labeling code is a subset of Python, meaning mostly conditional statements, variable assignment and
math. There are three variables predefined in this context:

 - *facts* holds all facts of the item
 - *item* holds the serialized item
 - *labels* hold the labels of the item


The resulting *long.yml* report file will now only contain methods that are larger than the threshold.