Last updated:
0 purchases
clusteroversampling 0.6.0
cluster-over-sampling
Category
Tools
Development
Package
Documentation
Communication
Introduction
A general interface for clustering based over-sampling algorithms.
Installation
For user installation, cluster-over-sampling is currently available on the PyPi's repository, and you can
install it via pip:
pip install cluster-over-sampling
Development installation requires to clone the repository and then use PDM to install the
project as well as the main and development dependencies:
git clone https://github.com/georgedouzas/cluster-over-sampling.git
cd cluster-over-sampling
pdm install
SOM clusterer requires optional dependencies:
pip install cluster-over-sampling[som]
Usage
All the classes included in cluster-over-sampling follow the imbalanced-learn API using the functionality of the base
oversampler. Using scikit-learn convention, the data are represented as follows:
Input data X: 2D array-like or sparse matrices.
Targets y: 1D array-like.
The clustering-based oversamplers implement a fit method to learn from X and y:
clustering_based_oversampler.fit(X, y)
They also implement a fit_resample method to resample X and y:
X_resampled, y_resampled = clustering_based_oversampler.fit_resample(X, y)
References
If you use cluster-over-sampling in a scientific publication, we would appreciate citations to any of the following papers:
G. Douzas, F. Bacao, "Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning", Expert Systems with
Applications, vol. 82, pp. 40-52, 2017.
G. Douzas, F. Bacao, F. Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and
SMOTE", Information Sciences, vol. 465, pp. 1-20, 2018.
G. Douzas, F. Bacao, F. Last, "G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE", Expert
Systems with Applications, vol. 183,115230, 2021.
For personal and professional use. You cannot resell or redistribute these repositories in their original state.
There are no reviews.