Skip to content

Contributing

Python Development Setup#

  1. Fork and clone the repository on GitHub.

  2. Create a new Python environment and install the repository in editable mode with development dependencies (we recommend to use uv):

    cd python
    uv sync --extra dev --all-groups
    source .venv/bin/activate
    
  3. The repository uses a number of automated checks and tests which you have to satisfy before your PR can be merged, see .github/workflows/python.yml for the exact checks.

    To run the tests locally, use pytest:

    pytest tests
    

    The linting and formatting are done via ruff, see tool.ruff section of pyproject.toml for the configuration.

    To automate linting and formatting, install pre-commit and then activate its hooks. pre-commit is a framework for managing and maintaining multi-language pre-commit hooks. Activate the git hooks with:

    pre-commit install
    

    Afterwards, pre-commit will run whenever you commit. It is not guaranteed that it will fix all linting problems.

    To run formatting and code-style separately, you can configure your IDE, such as VSCode, to use the ruff extension, or run it via the command line:

    # linting
    ruff check --fix
    # formatting
    ruff format
    

Python API Reference Documentation#

The hopsworks-api repository contains the code of our Python API, and its reference documentation is built from the docstrings in the code. The rest of the documentation is located in logicalclocks.github.io repository. To build the full docs, hopsworks-api is cloned into logicalclocks.github.io as a subdirectory and then the whole docs are built as a single website. For the details of the building process, see README.md of logicalclocks.github.io. The whole building process is automated and is triggered on a releasing push to the release branches of hopsworks-api via .github/workflows/mkdocs-release.yml GitHub Action.

For development purposes, you can build and serve the Python API reference documentation locally (assuming you have activated the virtual environment):

mkdocs serve

Note that the visuals and the structure of the website are different when building only the hopsworks-api documentation, as the full website uses a custom theme and additional plugins. You can use local serving for checking that the docstrings are provided in correct format and the individual entity documentation is correct, but not to ensure how the full docs look.

Docstring Guidelines#

Everything public should have a docstring, following PEP-257 and formatted using the Google style. We use mkdocs-material admonitions where appropriate, which we specify using the Google style as well. We do not specify the types or defaults of parameters in the docstrings, as they are automatically extracted from the signatures by the documentation generator, so specify them in the signature.

Writing Good Docstrings#

Avoid tautological docstrings. For example, instead of:

description: str
"""Description of the feature."""

write:

description: str
"""The description of the feature as it is shown in the UI."""

Note how here the user gets new information about description which is not deducible from the name alone, concretely, that the description parameter is shown in the UI.

Always try to provide an insight or show the intent of the method or class in the docstring, instead of just repeating its name or signature. Instead of a warning which does not really explain the reasons to be cautious, like:

Danger: Potentially dangerous operation
    This operation stops the execution.

write something like:

Danger: Potentially dangerous operation
    This operation kills the execution, without allowing it to shut down gracefully.
    This may result in corrupted data, which can be very hard to reverse.

Note how in the second example it becomes clear why exactly the operation is dangerous, and therefore the user can make an educated choice on whether to use it. Never write "do not do ..." without explaining the reasons behind these restrictions.

Overall, think about what information the user of the API would need to know in order to use it correctly and effectively, and provide that information in the docstrings in a concise and precise manner.

Technical Details#

Always place a sentence per line for clear git diffs.

Overall a docstrings should be structured as follows:

"""Does something.

The details about the exact way something is done.
It can span multiple lines or paragraphs.
Use proper sentences for everything in the docstring except admonition titles; that is, start with a capital letter and end with a period, question or exclamation mark.

Note: Use admonitions
    Where appropriate, use admonitions to highlight important information.

After the full description, list the parameters, return values, raised exceptions, and examples.

Parameters:
    param1: A one-line description of param1.
    param2:
        A multi-line description of param2.
        In this case, the first line should be on the next line and indented.

Returns:
    A description of the return value.

Raises:
    SomeError: If something goes wrong.

You should use Yields instead of Returns for generator functions. Also, always use Example admonition instead of Examples section, as it will not be rendered correctly.

For example, a method docstring could look like this:

def delete(self, path: str, missing_ok: bool = False) -> None:
    """Delete a file or directory at the given path.

    The path can be specified with or without the project prefix, that is, `/Projects/{project_name}`.

    Example: Ensuring a file or directory does not exist
        ```python
        import hopsworks

        project = hopsworks.login()
        project.get_dataset_api().delete("/my_tmp", missing_ok=True)
        ```

    Parameters:
        path: The path to the file or directory to delete.
        missing_ok: Whether to raise an error if the file or directory does not exist.

    Raises:
        hopsworks.client.exceptions.RestAPIError: If the server returns an error.
        FileNotFoundError: If `missing_ok` is `False` and the file or directory does not exist.
    """
    ...

Or this:

def contains(self, other: str | list) -> filter.Filter:
    """Construct a filter similar to SQL's `IN` operator.

    Warning: Deprecated
        `contains` method is deprecated.
        Use [`Feature.isin`][hsfs.feature.Feature.isin] instead.

    Parameters:
        other: A single feature value or a list of feature values.

    Returns:
        A filter that leaves only the feature values also contained in `other`.
    """
Linking#

A good API reference is easy to explore, and links are essential for that.

If you mention other classes, methods, or functions in the docstring, link to them using the following syntax:

[`ClassName`][full.module.path.ClassName]
[`ClassName.method_name`][full.module.path.ClassName.method_name]
As a convention, for methods and properties include the class name as well to reduce ambiguity.

Note that you can link entities defined in other libraries as well, like pandas or numpy.

To link a page of documentation and not an API refeerence object, use a relative link:

[Model Training](../concepts/mlops/training.md)
[Hopsworks On-Premise Installation](../setup_installation/on_prem/contact_hopsworks.md)

Always start it with .. to escape from the API reference page to the root.

For external links, use the normal Markdown syntax:

[Hopsworks Website](https://www.hopsworks.ai)
Summary#
  • Always document public classes, methods, functions, and modules.
  • Show the intent and provide an insight with your docstrings, avoid tautologies.
  • Do not place a warning without a proper explanation of the reasons behind it.
  • Use proper sentences, starting with a capital letter and ending with a period, question or exclamation mark.
  • Place a sentence per line.
  • Use Google style for docstrings.
  • Provide a link whenever you mention something linkable.
  • Use mkdocs-material admonitions where appropriate.
  • Do not duplicate information that can be extracted from the code signatures.
  • Keep the documentation in the code (docstrings) as complete as possible, and avoid writing custom Markdown text in the files of the docs directory.

Extending the API Reference#

To create a new API reference page, you have to create a new markdown file in docs and add it to the nav section of the mkdocs.yml file:

nav:
  - Login: login.md
  - Platform API:
    - ...
    - New Package: new_package.md

Inside the new_package.md file you can use ::: syntax to include the documentation of different Python entities by providing their full path:

# The New Package

::: hopsworks_common.new_package.NewClass

You can add more entities as needed using the same include syntax. Prefer to include all the information into the docstring, and avoid writting Markdown text inside the markdown files of docs directory, except for the main title and the includes. We plan to move to fully automatic API reference generation in the future, which would not support custom Markdown text outside the docstrings.

Java Development Setup#

You must add the Hopsworks Enterprise Edition repository to your ~/.m2/settings.xml file in order to build the Java code. You can get access to the repository using your nexus credentials. Add the following to your settings.xml:

<settings>
  <servers>
    <server>
      <id>HopsEE</id>
      <username>YOUR_NEXUS_USERNAME</username>
      <password>YOUR_NEXUS_PASSWORD</password>
    </server>
  </servers>
</settings>

You can then build either hsfs or hsfs_utils:

cd java
mvn clean package -Pspark-3.5,with-hops-ee
# Or
cd ../utils/python
mvn clean package