Home About Login Current Archives Announcements Editorial Board
Submit Now For Authors Call for Submissions Statistics Contact
Home > Archives > Volume 20, No 8 (2022) > Article

DOI: 10.14704/nq.2022.20.8.NQ44603

A NOVEL PERFORMANCE ENHANCED AND OUTLIER RESISTANT HYBRIDIZED GINI_HDBSCAN DEEP CLUSTERING ALGORITHM FOR BIG DATA ANALYSIS

N.Valarmathy, Dr.Krishnaveni Sakkarapani

Abstract

HDBSCAN is a unique and most prominent density-based clustering algorithm in which is it possible to construct hierarchy trees and extract flat clusters from that tree using specific stability measures. Predominantly most of the hierarchical clustering algorithms used nowadays have a huge number of computations in obtaining pairwise dissimilarity measures. Such limitations can be overcome using a clustering algorithm that makes use of a single linkage concept and faces many problems like it is very much prone to outliers and can produce extremely skewed or slanted dendrograms. To overcome the limitations a hierarchical clustering linkage criterion commonly known as Genie is being used which can link two clusters with a chosen inequity measure (Gini Index or Bonferroni Index) so that the size of the cluster will not go more than the assigned threshold value. The additional use of the Gini index and threshold value can result in the potential benefit of this hybrid approach is the possibility of clustering data with variable densities. This hybrid GINI_HDBSCAN algorithm is suitable for handling various applications where low minimum cluster sizes are required and where there is a need to elude a huge number of small clusters which are seen in high-density regions. In this proposed hybrid algorithm to increase the speed, parallel execution is performed and can be executed using multiple threads. The memory overhead for this proposed algorithm is small and the distance matrix need not be pre-computed to obtain the desired clustering results. The proposed algorithm is experimentally tested on the educational dataset and the obtained results show that this proposed approach is efficient for clustering huge datasets in terms of all metrics.

Keywords

HDBSCAN is a unique and most prominent density-based clustering algorithm

Full Text

PDF

References

?>