Python Guide ============ Project Discovery ----------------- The following filter and items are vital for analyzing Python projects. They are responsible for discovering source files that can later be analyzed. The following configuration will read a *ProjectLocation* item from input, discover all source files, and write the items associated with the source files to the file *report.yml*. .. code-block:: mortar channel input filter builtin.python.project:PythonProjectFilter project ignore_special_files: false report builtin.writer.filesystem:YAMLWriterReport out path: "report.yml" input -> project -> out The pipeline is started with the following command. .. code-block:: bash bauklotz input This should create a file called *report.yml* with rather unspectacular data like the following (cropped) list: .. code-block:: yaml facts: {} item: bauklotz/__init__.py labels: [] --- facts: {} item: bauklotz/console.py labels: [] --- facts: {} item: bauklotz/configuration/__init__.py labels: [] --- facts: {} item: bauklotz/configuration/dsl/__init__.py labels: [] Statement count --------------- Lines of code or number of statements are a good starting metric for projects. The following configuration will add this information as facts to the *PythonSourceFile* items. .. code-block:: mortar channel input filter builtin.python.project:PythonProjectFilter project ignore_special_files: false filter builtin.python.file:PythonStatementCountFilter statement report builtin.writer.filesystem:YAMLWriterReport out path: "report.yml" input -> project -> statement -> out This will yield a similar list like above, but with some facts added: * lines_of_code: Number of lines in the file that contain a Python statement * statement_count: Number of expressions found in the Python file. * expression_loc_ratio: Ratio between *statement_count* and *lines_of_code*. Crude metric for code density. The file should look like this: .. code-block:: yaml facts: expression_loc_ratio: 0.0 lines_of_code: 0 statement_count: 0 item: bauklotz/__init__.py labels: [] --- facts: expression_loc_ratio: 3.3 lines_of_code: 27 statement_count: 89 item: bauklotz/console.py labels: [] --- facts: expression_loc_ratio: 0.0 lines_of_code: 0 statement_count: 0 item: bauklotz/configuration/__init__.py labels: [] --- facts: expression_loc_ratio: 0.0 lines_of_code: 0 statement_count: 0 item: bauklotz/configuration/dsl/__init__.py labels: [] --- facts: expression_loc_ratio: 3.79 lines_of_code: 103 statement_count: 390 item: bauklotz/configuration/dsl/tokenizer.py labels: [] Class and Methods ----------------- Statement count and lines of code for files are a great start but it would be nicer to get a better view on the distribution of code between classes and methods. In order to to so classes and methods must be first extracted with the *PythonClassFilter* and *PythonMethodFilter*. The resulting items can be routed to the statement filter. .. code-block:: mortar channel input filter builtin.python.project:PythonProjectFilter project ignore_special_files: false filter builtin.python.file:PythonStatementCountFilter statement filter builtin.python.definition:PythonClassFilter classes filter builtin.python.definition:PythonMethodFilter methods report builtin.writer.filesystem:YAMLWriterReport out path: "report.yml" input -> project -> classes -> statement -> out project -> statement classes -> methods -> statement The report should be filled with entries like: .. code-block:: yaml facts: abstract: false expression_loc_ratio: 2.67 interface: false lines_of_code: 3 methods: - __bool__ statement_count: 8 type_parameters: {} item: body: "class BooleanToken(Token):\n def __bool__(self) -> bool:\n return\ \ self.content in (\"true\", \"yes\")" module: bauklotz.configuration.dsl.tokenizer name: BooleanToken labels: [] --- facts: expression_loc_ratio: 8.5 lines_of_code: 2 statement_count: 17 item: args: - _type: argument name: self type: None - _type: argument name: filter_uri type: str - _type: argument name: name type: str - _type: argument name: config type: JSONType body: "def build_filter(self, filter_uri: str, name: str, config: JSONType) -> Filter[Item,\ \ Item, FilterConfig]:\n return self.get_location(filter_uri).create_filter(name,\ \ config)" class: bauklotz.configuration.catalog.Catalog generics: [] name: build_filter returns: null labels: [] --- facts: classes: - MortarParser expression_loc_ratio: 4.89 lines_of_code: 129 statement_count: 631 item: bauklotz/configuration/dsl/parser.py labels: [] --- Note that after passing the *PythonClassFilter* the names of the classes are passed as facts to the *PythonSourceFile* items. Imports ------- Dependencies are critical when it comes to search for architectural problems. The *PythonImportFilter* can extract the dependencies of a project while the *DependencyNetworkFilter* can create a graph representation of the dependency graph. .. code-block:: mortar channel input filter builtin.python.project:PythonProjectFilter project ignore_special_files: false filter builtin.python.file:PythonStatementCountFilter statement filter builtin.python.definition:PythonClassFilter classes filter builtin.python.definition:PythonMethodFilter methods filter builtin.python.file:PythonImportFilter imports filter.builtin.python.network:DependencyNetworkFilter importNet report builtin.writer.filesystem:YAMLWriterReport out path: "report.yml" report builtin.writer.filesystem:YAMLWriterReport importOut path: "imports.yml" report builtin.writer.graph:GraphWriterReport importGraph path: "imports.gml" input -> project -> classes -> statement -> out project -> statement project -> imports -> importOut imports -> importNet -> importGraph classes -> methods -> statement The imports.yaml file should contain entries like .. code-block:: yaml facts: {} item: dependant: bauklotz.console dependency_source: argparse imported_artifacts: ArgumentParser: ArgumentParser labels: [] --- facts: {} item: dependant: bauklotz.console dependency_source: functools imported_artifacts: partial: partial labels: [] --- The graph output should reside in *importGraph_0.gml* (for each graph written the counter is incremented so you can use the same writer for different dependency graphs). It can be rendered by programs like *Gephi* but opening it in a text editor is also possible. Filtering with labels --------------------- Sometimes only entries with a certain quality should go into a report. In *Bauklotz* this can be done by using labels. Label filtering only works on arrows leading to a report, for filters they are ignored for now. In this example only items with label *long* will go to the report. .. literalinclude:: only_long_statements.bauklotz :language: mortar The config references a file called *long_statement.py* which contains the logic for applying labels. A rather low number is used in this example for determining if a method is long, so that some methods will show up. Unless an absolute path or otherwise configured, imported file path are relative to the configuration file. The behaviour can be controlled with the *-config-relative-paths* (default) and *-no-config-relative-paths* flags when running *Bauklotz*. .. literalinclude:: long_statement.py :language: python The language for labeling code is a subset of Python, meaning mostly conditional statements, variable assignment and math. There are three variables predefined in this context: - *facts* holds all facts of the item - *item* holds the serialized item - *labels* hold the labels of the item The resulting *long.yml* report file will now only contain methods that are larger than the threshold.