Python Guide
Project Discovery
The following filter and items are vital for analyzing Python projects. They are responsible for discovering source files that can later be analyzed. The following configuration will read a ProjectLocation item from input, discover all source files, and write the items associated with the source files to the file report.yml.
channel input
filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false
report builtin.writer.filesystem:YAMLWriterReport out
path: "report.yml"
input -> project -> out
The pipeline is started with the following command.
bauklotz <config_file_above> <python_project_dir> input
This should create a file called report.yml with rather unspectacular data like the following (cropped) list:
facts: {}
item: bauklotz/__init__.py
labels: []
---
facts: {}
item: bauklotz/console.py
labels: []
---
facts: {}
item: bauklotz/configuration/__init__.py
labels: []
---
facts: {}
item: bauklotz/configuration/dsl/__init__.py
labels: []
Statement count
Lines of code or number of statements are a good starting metric for projects. The following configuration will add this information as facts to the PythonSourceFile items.
channel input
filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false
filter builtin.python.file:PythonStatementCountFilter statement
report builtin.writer.filesystem:YAMLWriterReport out
path: "report.yml"
input -> project -> statement -> out
- This will yield a similar list like above, but with some facts added:
lines_of_code: Number of lines in the file that contain a Python statement
statement_count: Number of expressions found in the Python file.
expression_loc_ratio: Ratio between statement_count and lines_of_code. Crude metric for code density.
The file should look like this:
facts:
expression_loc_ratio: 0.0
lines_of_code: 0
statement_count: 0
item: bauklotz/__init__.py
labels: []
---
facts:
expression_loc_ratio: 3.3
lines_of_code: 27
statement_count: 89
item: bauklotz/console.py
labels: []
---
facts:
expression_loc_ratio: 0.0
lines_of_code: 0
statement_count: 0
item: bauklotz/configuration/__init__.py
labels: []
---
facts:
expression_loc_ratio: 0.0
lines_of_code: 0
statement_count: 0
item: bauklotz/configuration/dsl/__init__.py
labels: []
---
facts:
expression_loc_ratio: 3.79
lines_of_code: 103
statement_count: 390
item: bauklotz/configuration/dsl/tokenizer.py
labels: []
Class and Methods
Statement count and lines of code for files are a great start but it would be nicer to get a better view on the distribution of code between classes and methods. In order to to so classes and methods must be first extracted with the PythonClassFilter and PythonMethodFilter. The resulting items can be routed to the statement filter.
channel input
filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false
filter builtin.python.file:PythonStatementCountFilter statement
filter builtin.python.definition:PythonClassFilter classes
filter builtin.python.definition:PythonMethodFilter methods
report builtin.writer.filesystem:YAMLWriterReport out
path: "report.yml"
input -> project -> classes -> statement -> out
project -> statement
classes -> methods -> statement
The report should be filled with entries like:
facts:
abstract: false
expression_loc_ratio: 2.67
interface: false
lines_of_code: 3
methods:
- __bool__
statement_count: 8
type_parameters: {}
item:
body: "class BooleanToken(Token):\n def __bool__(self) -> bool:\n return\
\ self.content in (\"true\", \"yes\")"
module: bauklotz.configuration.dsl.tokenizer
name: BooleanToken
labels: []
---
facts:
expression_loc_ratio: 8.5
lines_of_code: 2
statement_count: 17
item:
args:
- _type: argument
name: self
type: None
- _type: argument
name: filter_uri
type: str
- _type: argument
name: name
type: str
- _type: argument
name: config
type: JSONType
body: "def build_filter(self, filter_uri: str, name: str, config: JSONType) -> Filter[Item,\
\ Item, FilterConfig]:\n return self.get_location(filter_uri).create_filter(name,\
\ config)"
class: bauklotz.configuration.catalog.Catalog
generics: []
name: build_filter
returns: null
labels: []
---
facts:
classes:
- MortarParser
expression_loc_ratio: 4.89
lines_of_code: 129
statement_count: 631
item: bauklotz/configuration/dsl/parser.py
labels: []
---
Note that after passing the PythonClassFilter the names of the classes are passed as facts to the PythonSourceFile items.
Imports
Dependencies are critical when it comes to search for architectural problems. The PythonImportFilter can extract the dependencies of a project while the DependencyNetworkFilter can create a graph representation of the dependency graph.
channel input
filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false
filter builtin.python.file:PythonStatementCountFilter statement
filter builtin.python.definition:PythonClassFilter classes
filter builtin.python.definition:PythonMethodFilter methods
filter builtin.python.file:PythonImportFilter imports
filter.builtin.python.network:DependencyNetworkFilter importNet
report builtin.writer.filesystem:YAMLWriterReport out
path: "report.yml"
report builtin.writer.filesystem:YAMLWriterReport importOut
path: "imports.yml"
report builtin.writer.graph:GraphWriterReport importGraph
path: "imports.gml"
input -> project -> classes -> statement -> out
project -> statement
project -> imports -> importOut
imports -> importNet -> importGraph
classes -> methods -> statement
The imports.yaml file should contain entries like
facts: {}
item:
dependant: bauklotz.console
dependency_source: argparse
imported_artifacts:
ArgumentParser: ArgumentParser
labels: []
---
facts: {}
item:
dependant: bauklotz.console
dependency_source: functools
imported_artifacts:
partial: partial
labels: []
---
The graph output should reside in importGraph_0.gml (for each graph written the counter is incremented so you can use the same writer for different dependency graphs). It can be rendered by programs like Gephi but opening it in a text editor is also possible.
Filtering with labels
Sometimes only entries with a certain quality should go into a report. In Bauklotz this can be done by using labels. Label filtering only works on arrows leading to a report, for filters they are ignored for now. In this example only items with label long will go to the report.
channel input
filter builtin.python.project:PythonProjectFilter project
ignore_special_files: false
filter builtin.python.file:PythonStatementCountFilter statement
filter builtin.python.definition:PythonClassFilter classes
filter builtin.python.definition:PythonMethodFilter methods
filter builtin.generic.predicate:ComplexLabelFilter long
code: @long_statement.py
report builtin.writer.filesystem:YAMLWriterReport out
path: "long.yml"
input -> project -> classes -> methods -> statement -> long -[ long ]-> out
The config references a file called long_statement.py which contains the logic for applying labels. A rather low number is used in this example for determining if a method is long, so that some methods will show up. Unless an absolute path or otherwise configured, imported file path are relative to the configuration file. The behaviour can be controlled with the -config-relative-paths (default) and -no-config-relative-paths flags when running Bauklotz.
if facts["lines_of_code"] > 10:
labels += "long"
The language for labeling code is a subset of Python, meaning mostly conditional statements, variable assignment and math. There are three variables predefined in this context:
facts holds all facts of the item
item holds the serialized item
labels hold the labels of the item
The resulting long.yml report file will now only contain methods that are larger than the threshold.