Python Guide

Project Discovery

The following filter and items are vital for analyzing Python projects. They are responsible for discovering source files that can later be analyzed. The following configuration will read a ProjectLocation item from input, discover all source files, and write the items associated with the source files to the file report.yml.

channel input

filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false

report builtin.writer.filesystem:YAMLWriterReport out
path: "report.yml"

input -> project -> out

The pipeline is started with the following command.

bauklotz <config_file_above> <python_project_dir> input

This should create a file called report.yml with rather unspectacular data like the following (cropped) list:

facts: {}
item: bauklotz/__init__.py
labels: []
---
facts: {}
item: bauklotz/console.py
labels: []
---
facts: {}
item: bauklotz/configuration/__init__.py
labels: []
---
facts: {}
item: bauklotz/configuration/dsl/__init__.py
labels: []

Statement count

Lines of code or number of statements are a good starting metric for projects. The following configuration will add this information as facts to the PythonSourceFile items.

channel input

filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false

filter builtin.python.file:PythonStatementCountFilter statement

report builtin.writer.filesystem:YAMLWriterReport out
path: "report.yml"



input -> project -> statement -> out
This will yield a similar list like above, but with some facts added:
  • lines_of_code: Number of lines in the file that contain a Python statement

  • statement_count: Number of expressions found in the Python file.

  • expression_loc_ratio: Ratio between statement_count and lines_of_code. Crude metric for code density.

The file should look like this:

facts:
  expression_loc_ratio: 0.0
  lines_of_code: 0
  statement_count: 0
item: bauklotz/__init__.py
labels: []
---
facts:
  expression_loc_ratio: 3.3
  lines_of_code: 27
  statement_count: 89
item: bauklotz/console.py
labels: []
---
facts:
  expression_loc_ratio: 0.0
  lines_of_code: 0
  statement_count: 0
item: bauklotz/configuration/__init__.py
labels: []
---
facts:
  expression_loc_ratio: 0.0
  lines_of_code: 0
  statement_count: 0
item: bauklotz/configuration/dsl/__init__.py
labels: []
---
facts:
  expression_loc_ratio: 3.79
  lines_of_code: 103
  statement_count: 390
item: bauklotz/configuration/dsl/tokenizer.py
labels: []

Class and Methods

Statement count and lines of code for files are a great start but it would be nicer to get a better view on the distribution of code between classes and methods. In order to to so classes and methods must be first extracted with the PythonClassFilter and PythonMethodFilter. The resulting items can be routed to the statement filter.

channel input

filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false

filter builtin.python.file:PythonStatementCountFilter statement

filter builtin.python.definition:PythonClassFilter classes

filter builtin.python.definition:PythonMethodFilter methods

report builtin.writer.filesystem:YAMLWriterReport out
path: "report.yml"



input -> project -> classes -> statement -> out
project -> statement
classes -> methods -> statement

The report should be filled with entries like:

facts:
  abstract: false
  expression_loc_ratio: 2.67
  interface: false
  lines_of_code: 3
  methods:
  - __bool__
  statement_count: 8
  type_parameters: {}
item:
  body: "class BooleanToken(Token):\n    def __bool__(self) -> bool:\n        return\
    \ self.content in (\"true\", \"yes\")"
  module: bauklotz.configuration.dsl.tokenizer
  name: BooleanToken
labels: []
---
facts:
  expression_loc_ratio: 8.5
  lines_of_code: 2
  statement_count: 17
item:
  args:
  - _type: argument
    name: self
    type: None
  - _type: argument
    name: filter_uri
    type: str
  - _type: argument
    name: name
    type: str
  - _type: argument
    name: config
    type: JSONType
  body: "def build_filter(self, filter_uri: str, name: str, config: JSONType) -> Filter[Item,\
    \ Item, FilterConfig]:\n    return self.get_location(filter_uri).create_filter(name,\
    \ config)"
  class: bauklotz.configuration.catalog.Catalog
  generics: []
  name: build_filter
  returns: null
labels: []
---
facts:
  classes:
  - MortarParser
  expression_loc_ratio: 4.89
  lines_of_code: 129
  statement_count: 631
item: bauklotz/configuration/dsl/parser.py
labels: []
---

Note that after passing the PythonClassFilter the names of the classes are passed as facts to the PythonSourceFile items.

Imports

Dependencies are critical when it comes to search for architectural problems. The PythonImportFilter can extract the dependencies of a project while the DependencyNetworkFilter can create a graph representation of the dependency graph.

channel input

filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false

filter builtin.python.file:PythonStatementCountFilter statement

filter builtin.python.definition:PythonClassFilter classes

filter builtin.python.definition:PythonMethodFilter methods

filter builtin.python.file:PythonImportFilter imports

filter.builtin.python.network:DependencyNetworkFilter importNet

report builtin.writer.filesystem:YAMLWriterReport out
path: "report.yml"

report builtin.writer.filesystem:YAMLWriterReport importOut
path: "imports.yml"

report builtin.writer.graph:GraphWriterReport importGraph
path: "imports.gml"


input -> project -> classes -> statement -> out
project -> statement
project -> imports -> importOut
imports -> importNet -> importGraph
classes -> methods -> statement

The imports.yaml file should contain entries like

facts: {}
item:
  dependant: bauklotz.console
  dependency_source: argparse
  imported_artifacts:
    ArgumentParser: ArgumentParser
labels: []
---
facts: {}
item:
  dependant: bauklotz.console
  dependency_source: functools
  imported_artifacts:
    partial: partial
labels: []
---

The graph output should reside in importGraph_0.gml (for each graph written the counter is incremented so you can use the same writer for different dependency graphs). It can be rendered by programs like Gephi but opening it in a text editor is also possible.

Filtering with labels

Sometimes only entries with a certain quality should go into a report. In Bauklotz this can be done by using labels. Label filtering only works on arrows leading to a report, for filters they are ignored for now. In this example only items with label long will go to the report.

channel input

filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false

filter builtin.python.file:PythonStatementCountFilter statement

filter builtin.python.definition:PythonClassFilter classes

filter builtin.python.definition:PythonMethodFilter methods

filter builtin.generic.predicate:ComplexLabelFilter long
   code: @long_statement.py


report builtin.writer.filesystem:YAMLWriterReport out
path: "long.yml"


input -> project -> classes -> methods -> statement -> long -[ long ]-> out

The config references a file called long_statement.py which contains the logic for applying labels. A rather low number is used in this example for determining if a method is long, so that some methods will show up. Unless an absolute path or otherwise configured, imported file path are relative to the configuration file. The behaviour can be controlled with the -config-relative-paths (default) and -no-config-relative-paths flags when running Bauklotz.

if facts["lines_of_code"] > 10:
    labels += "long"

The language for labeling code is a subset of Python, meaning mostly conditional statements, variable assignment and math. There are three variables predefined in this context:

  • facts holds all facts of the item

  • item holds the serialized item

  • labels hold the labels of the item

The resulting long.yml report file will now only contain methods that are larger than the threshold.