14 Mar 04:22

maskedsyntax

c79509b

v0.1.0b3 Pre-release

Pre-release

HashPrep v0.1.0b3

New features

Config file loading (#69) Load analysis settings from YAML, TOML, or JSON via --config. Supports runtime threshold overrides (e.g. outlier, missingness, correlation) so you can tune checks without code changes.
Mutual information and Shannon entropy (#68) New checks and summaries for feature-target and feature-feature mutual information, plus Shannon entropy for categorical columns. Helps spot low-information or redundant features.
Normality and variance homogeneity tests (#67) Built-in normality tests (e.g. Shapiro-Wilk) and variance homogeneity (e.g. Levene) for numeric columns. Surfaces non-normal or heteroscedastic variables that may need transforms.
First-class DateTime support (#66) Proper handling of datetime columns: inference, summaries, and checks (e.g. future dates, skew). Datetime columns are no longer treated as plain text.
Edge-case tests and CI (#64) Broader test coverage for correlation, leakage, and other edge cases, plus GitHub Actions CI so regressions are caught automatically.
Website UI and docs (#59) Updated hashprep.com with clearer UI and documentation (installation, CLI, Python API, checks).

Fixes

PDF reports in limited environments PDF generation is optional: if WeasyPrint or system libs (e.g. libgobject) are missing, MD/JSON/HTML still work and the CLI reports a clear error for --format pdf instead of crashing.
Docs page light mode (#70) Fixed syntax highlighting on the docs site in light theme (contrast and colors) so code blocks are readable.
Mobile menu and routing (#60) Fixed mobile menu behavior, responsiveness, and routing issues on the website.

Refactors and quality

Renderers and text perf (#65) Deduplicated report renderers and optimized text output.
Structured logging and error handling (#63) More consistent logging and validation so invalid inputs and errors are clearer.
Config and codebase (#61) Centralized config handling and removal of dead code.

Assets 2

09 Feb 15:22

maskedsyntax

v0.1.0b1

69798e2

v0.1.0b1 - Beta Release Pre-release

Pre-release

HashPrep v0.1.0b1 - Beta Release

This release marks HashPrep's graduation from alpha to beta status.

What's New

HashPrep is now feature-complete and ready for broader community testing. Core features are stable and the API is mature enough for real-world ML workflows.

Highlights

82 passing tests with comprehensive coverage across all features
Stable APIs for both CLI and library usage
Complete documentation with installation and usage guides
Multiple report formats (HTML, PDF, Markdown, JSON)
Production-ready code generation (fix scripts and sklearn pipelines)

Installation

pip install hashprep

Key Features

Intelligent dataset profiling with ML-specific checks
Automated data quality issue detection
Context-aware preprocessing suggestions
Rich report generation with modern themes
Reproducible pipeline code generation

Documentation

See the README for complete usage instructions.

What Beta Means

Core features are stable and tested
APIs should remain stable (breaking changes will trigger major version bump)
Ready for community testing and feedback
Minor bugs and edge cases may still exist

We encourage users to test HashPrep in their ML workflows and report any issues on GitHub.

Assets 2

02 Oct 19:31

maskedsyntax

v0.1.0a1

9d698d6

v0.1.0a1 Pre-release

Pre-release

Improved correlation checks and reduced false positives in missing patterns

Improvements

Refined correlation checks in calculate_correlations
- Fixed type inference errors by iterating over analyzer.column_types instead of analyzer.df
- Updated mixed-variable thresholds to {'warning': 0.5, 'critical': 0.8} for consistency with Cramer’s V
- Ensured seamless integration with run_checks
Reduced over-flagging in missing patterns detection
- Introduced effect size thresholds:
  - Categorical: Cramer’s V > 0.1
  - Numeric: Cohen’s d > 0.2
- Tightened p-value threshold to 0.01
- Increased minimum samples per group to 10
- Replaced ANOVA (f_oneway) with Mann-Whitney U test for better handling of skewed distributions
- Added pattern grouping to summarize correlations per missing column (top 3 shown for conciseness)

Fixes

Corrected correlation dictionary iteration (analyzer.column_types)
Prevented spurious warnings by filtering weak associations

Assets 2

27 Sep 19:24

maskedsyntax

0.1.0a0

91b2ae2

v0.1.0a0 Pre-release

Pre-release

First alpha release of HashPrep

Assets 2

Releases: cachevector/hashprep

v0.1.0b3

HashPrep v0.1.0b3

New features

Fixes

Refactors and quality

Uh oh!

v0.1.0b1 - Beta Release

HashPrep v0.1.0b1 - Beta Release

What's New

Highlights

Installation

Key Features

Documentation

What Beta Means

Uh oh!

v0.1.0a1

Improved correlation checks and reduced false positives in missing patterns

Improvements

Fixes

Uh oh!

v0.1.0a0

Uh oh!