Contributing to spindle-token
All interest in spindle-token, as a user or contributor, is greatly appreciated! This document will go into detail on how to contribute to the development of the spindle-token software package.
If you are looking to contribute to the research and design of the Open Privacy Preserving Record Linkage (OPPRL) specification, see this page.
Before Contributing
Before reading further we ask that you read our Code of Conduct which will be enforced by the maintainers in order to ensure that development of spindle-token stays focused and productive.
If you are new to contributing to open source, or GitHub, the following links may be helpful starting places:
We Use Github Flow
This means that all code and documentation changes happen through pull requests. We actively welcome your pull requests. We highly recommend the following workflow.
- Fork the repo and create your branch from
main. - If you've added code that should be tested, add tests.
- If you've changed APIs, update the documentation.
- Ensure the test suite passes.
- Create the pull request.
Any contributions you make will be under the MIT Software License
In short, when you submit code changes, your submissions are understood to be under the same Apache License 2.0 that covers the project. Feel free to contact the maintainers if that's a concern.
How to contribute a ...
Bug Report
We use GitHub issues to track public bugs. Report a bug by opening a new issue.
Great Bug Reports tend to have at least the following:
- A quick summary and/or background
- The steps to reproduce.
- When possible, minimal code that reproduces the bug.
- A description of what you expected versus what actually happens.
Feature Request
We like to hear in all feature requests and discussion around the direction of the project. The best place to discuss future features is the project's discussion page under the ideas category.
Bug fix, new feature, documentation improvement, or other change.
We welcome contribution to the codebase via pull requests. In most cases, it is beneficial to discuss your change with the community via a GitHub issue or discussion before working on a pull request. Once you decide to work on a pull request, please follow the workflow outlined in the above sections.
Once you open the pull request, it will be tested with by CI and reviewed by other contributors (including at least one project maintainer). After all iterations of review are finished, one of the project maintainers will merge your pull request.
Running Tests
When working on a code change or addition to spindle-token, it is expected that all changes pass existing tests and usually introduce new tests to ensure stability of future changes.
Spindle-token uses Poetry to manage Python environments. From the root directory of the project, point Poetry at Python 3.12, create the virtualenv, and install dependencies:
poetry env use python3.12
poetry install
Before opening a pull request, validate the Spark-backed test suite locally against both supported Spark lines:
- Spark 3.5.x through 4.1.x
Run the suite once per environment, using a separate Poetry virtualenv for each Spark line so the local installs do not overwrite each other:
# Spark 3.5.x env
POETRY_VIRTUALENVS_PATH=.venvs-spark35 poetry env use 3.12
poetry install
poetry run python -m pip install --upgrade --force-reinstall pyspark==3.5.2
VENV="$(poetry env info --path)"
PYSPARK_PYTHON="$VENV/bin/python" \
PYSPARK_DRIVER_PYTHON="$VENV/bin/python" \
PYARROW_IGNORE_TIMEZONE=1 \
poetry run pytest
# Spark 4.0.x env
POETRY_VIRTUALENVS_PATH=.venvs-spark40 poetry env use 3.12
poetry install
poetry run python -m pip install --upgrade --force-reinstall pyspark==4.0.2
VENV="$(poetry env info --path)"
PYSPARK_PYTHON="$VENV/bin/python" \
PYSPARK_DRIVER_PYTHON="$VENV/bin/python" \
PYARROW_IGNORE_TIMEZONE=1 \
poetry run pytest
# Spark 4.1.x env
POETRY_VIRTUALENVS_PATH=.venvs-spark41 poetry env use 3.12
poetry install
poetry run python -m pip install --upgrade --force-reinstall pyspark==4.1.0
VENV="$(poetry env info --path)"
PYSPARK_PYTHON="$VENV/bin/python" \
PYSPARK_DRIVER_PYTHON="$VENV/bin/python" \
PYARROW_IGNORE_TIMEZONE=1 \
poetry run pytest