Contributing to Octopus
All contributions to Octopus are welcome! Bug fixes, new features, docs improvements, typo corrections - everything helps.
Quick Start
-
Fork and clone the repository
-
Set up your development environment (Python 3.12 only):
Note: We use uv for fast and reliable dependency management.
- Install pre-commit hooks:
- Create a feature branch:
git checkout -b <type>/<issue>_<description>
# Example: git checkout -b feat/123_add-ensemble-selection
Development Workflow
- Make your changes
- Run tests and quality checks (see below)
- Update CHANGELOG.md
- Commit with semantic message
- Push and create PR
Package Management
We use uv for dependency management. All dependencies are defined in pyproject.toml and locked in uv.lock.
Adding a New Package
Updating the Lock File
After manually editing pyproject.toml, update the lock file:
Syncing Your Environment
After pulling changes from the repository:
This installs all dependencies according to the lock file, ensuring everyone has the same versions.
Testing and Quality
- Run tests:
- Run specific test module:
- Run with coverage:
- Run all quality checks:
- Individual tools:
Branch Naming
Format: <type>/<issue>_<slug>
Valid types:
feat- New featuresfix- Bug fixesdocs- Documentation changesstyle- Formatting, missing semicolons, etc. (no code change)refactor- Code restructuring without changing behaviortest- Adding or updating testschore- Maintenance tasks, dependency updatesperf- Performance improvementsci- CI/CD configuration changesbuild- Build system or external dependency changesrevert- Reverting previous commits
Examples:
- ✓
feat/90_add-ensemble-selection - ✓
fix/124_memory-leak - ✓
docs/update-readme - ✓
ci/138_add-pre-commit-hooks - ✗
Add-New-Feature(wrong format)
Commit Messages
Format: <type>: <description>
Valid types: See Branch Naming section for complete list of types and their descriptions.
Examples:
- ✓
feat: add ensemble selection method - ✓
fix: resolve memory leak in data loader - ✓
docs: update installation guide - ✓
ci: add pre-commit validation hooks
Auto-close issues:
fixes #123→ for bugsresolves #123→ for featurescloses #123→ for tasks/docs
Full example:
CHANGELOG.md
Required: Each PR must update CHANGELOG.md
-
Add entry under
## [Unreleased] -
Use appropriate section:
-
Added - New features
- Changed - Changes to existing functionality
- Deprecated - Soon-to-be removed features
- Removed - Removed features
- Fixed - Bug fixes
-
Security - Vulnerability fixes
-
Format:
## [Unreleased]
### Added
- New ensemble selection method for improved performance (#123)
### Fixed
- Memory leak in data loader (#124)
Tips:
- Write from user perspective
- Reference issue/PR number
- Be concise but clear
Code Style
Docstrings
Follow Google Style Guide:
def example_function(arg1: str, arg2: int) -> bool:
"""Short one-line summary.
Optional longer description if needed.
Args:
arg1: Description of arg1.
arg2: Description of arg2.
Returns:
Description of return value.
Raises:
ValueError: When something goes wrong.
"""
...
Notes:
- Use type hints (don't repeat them in docstrings)
- One-line summary first
- Use double backticks for literals:
MyString
attrs Classes
from attrs import define
@define
class DataConfig:
"""Configuration for data processing."""
n_samples: int
"""Number of samples in the dataset."""
n_features: int
"""Number of features in the dataset."""
@n_samples.validator
def _validate_n_samples(self, attribute, value):
"""Validate n_samples is positive."""
if value <= 0:
raise ValueError("n_samples must be positive")
Conventions:
- Attribute docstrings below the declaration
- Blank line between attributes
- Name validators:
_validate_<attribute_name> - Name defaults:
_default_<attribute_name>
Package Structure
octopus/
├── data/ # Data handling and validation
├── models/ # Model definitions and wrappers
├── modules/ # Feature selection and optimization
├── metrics/ # Performance metrics
└── config/ # Configuration management
When adding new functionality:
- Follow existing package structure
- Import public APIs into high-level namespaces
- Consider small dataset optimization (<1k samples)
Pull Request Guidelines
Main branch stability:
- All commits on the main branch should be stable
- PRs are squash-merged or rebased to maintain clean history
Commit organization: - Multiple commits in a PR are allowed - Should be consolidated into a reasonable contribution - Each commit should be logical and buildable - Avoid WIP commits, fixups, or "oops" commits
Good PR structure:
✓ feat/123_add-feature
- Add core functionality
- Add tests
- Update documentation
✗ feat/123_add-feature
- WIP initial try
- fix typo
- oops forgot file
- actually fix it
- revert previous
Tips:
- Squash small fixups before submitting
- Use interactive rebase to clean up history: git rebase -i main
- Each commit should pass tests
- Maintainers may squash-merge if needed
Syncing Your PR
If the main branch has moved ahead:
# Fetch latest changes
git fetch upstream
# Rebase (recommended for clean history)
git rebase upstream/main
# Or merge (if rebase is too complex)
git merge upstream/main
Note: We prefer rebase for linear history, but may squash-merge your PR if needed.
Developer Tools
| Tool | Purpose |
|---|---|
| uv | Dependency management |
| ruff | Linting and formatting |
| pydoclint | Docstring checking |
| pyupgrade | Python syntax upgrading |
| typos | Spell checking |
| pytest | Testing |
| pytest-cov | Test coverage |
| pre-commit | Git hooks orchestration |
All tools run automatically via pre-commit hooks and CI/CD.
Questions?
- Open an issue: GitHub Issues
- Contact maintainers: CONTRIBUTORS.md
Thank you for contributing to Octopus!