Skip to content

Releases: cachevector/hashprep

v0.1.0b3

14 Mar 04:22

Choose a tag to compare

v0.1.0b3 Pre-release
Pre-release

HashPrep v0.1.0b3

New features

  • Config file loading (#69) Load analysis settings from YAML, TOML, or JSON via --config. Supports runtime threshold overrides (e.g. outlier, missingness, correlation) so you can tune checks without code changes.
  • Mutual information and Shannon entropy (#68) New checks and summaries for feature-target and feature-feature mutual information, plus Shannon entropy for categorical columns. Helps spot low-information or redundant features.
  • Normality and variance homogeneity tests (#67) Built-in normality tests (e.g. Shapiro-Wilk) and variance homogeneity (e.g. Levene) for numeric columns. Surfaces non-normal or heteroscedastic variables that may need transforms.
  • First-class DateTime support (#66) Proper handling of datetime columns: inference, summaries, and checks (e.g. future dates, skew). Datetime columns are no longer treated as plain text.
  • Edge-case tests and CI (#64) Broader test coverage for correlation, leakage, and other edge cases, plus GitHub Actions CI so regressions are caught automatically.
  • Website UI and docs (#59) Updated hashprep.com with clearer UI and documentation (installation, CLI, Python API, checks).

Fixes

  • PDF reports in limited environments PDF generation is optional: if WeasyPrint or system libs (e.g. libgobject) are missing, MD/JSON/HTML still work and the CLI reports a clear error for --format pdf instead of crashing.
  • Docs page light mode (#70) Fixed syntax highlighting on the docs site in light theme (contrast and colors) so code blocks are readable.
  • Mobile menu and routing (#60) Fixed mobile menu behavior, responsiveness, and routing issues on the website.

Refactors and quality

  • Renderers and text perf (#65) Deduplicated report renderers and optimized text output.
  • Structured logging and error handling (#63) More consistent logging and validation so invalid inputs and errors are clearer.
  • Config and codebase (#61) Centralized config handling and removal of dead code.

v0.1.0b1 - Beta Release

09 Feb 15:22
69798e2

Choose a tag to compare

Pre-release

HashPrep v0.1.0b1 - Beta Release

This release marks HashPrep's graduation from alpha to beta status.

What's New

HashPrep is now feature-complete and ready for broader community testing. Core features are stable and the API is mature enough for real-world ML workflows.

Highlights

  • 82 passing tests with comprehensive coverage across all features
  • Stable APIs for both CLI and library usage
  • Complete documentation with installation and usage guides
  • Multiple report formats (HTML, PDF, Markdown, JSON)
  • Production-ready code generation (fix scripts and sklearn pipelines)

Installation

pip install hashprep

Key Features

  • Intelligent dataset profiling with ML-specific checks
  • Automated data quality issue detection
  • Context-aware preprocessing suggestions
  • Rich report generation with modern themes
  • Reproducible pipeline code generation

Documentation

See the README for complete usage instructions.

What Beta Means

  • Core features are stable and tested
  • APIs should remain stable (breaking changes will trigger major version bump)
  • Ready for community testing and feedback
  • Minor bugs and edge cases may still exist

We encourage users to test HashPrep in their ML workflows and report any issues on GitHub.

v0.1.0a1

02 Oct 19:31

Choose a tag to compare

v0.1.0a1 Pre-release
Pre-release

Improved correlation checks and reduced false positives in missing patterns

Improvements

  • Refined correlation checks in calculate_correlations
    • Fixed type inference errors by iterating over analyzer.column_types instead of analyzer.df
    • Updated mixed-variable thresholds to {'warning': 0.5, 'critical': 0.8} for consistency with Cramer’s V
    • Ensured seamless integration with run_checks
  • Reduced over-flagging in missing patterns detection
    • Introduced effect size thresholds:
      • Categorical: Cramer’s V > 0.1
      • Numeric: Cohen’s d > 0.2
    • Tightened p-value threshold to 0.01
    • Increased minimum samples per group to 10
    • Replaced ANOVA (f_oneway) with Mann-Whitney U test for better handling of skewed distributions
    • Added pattern grouping to summarize correlations per missing column (top 3 shown for conciseness)

Fixes

Corrected correlation dictionary iteration (analyzer.column_types)
Prevented spurious warnings by filtering weak associations

v0.1.0a0

27 Sep 19:24
91b2ae2

Choose a tag to compare

v0.1.0a0 Pre-release
Pre-release

First alpha release of HashPrep