Portail documentaire de l'universitè Mohamed Khider Biskra

Portail documentaire de l'université Mohamed Khider-Biskra Bibliothèque de la faculté sciences exactes

Photo bandeau superieur

Nouvelle recherche

Déscription

Improving Data Mining through Dimensionality Reduction and Feature Selection

Nouveauté

Nouveauté

Commentaires

Exprimer un avis

Suggestions

Suggerer acquisition

Titre :	Improving Data Mining through Dimensionality Reduction and Feature Selection
Auteurs :	Amani Massa, Auteur ; karima Femmam, Directeur de thèse ; Bilal Mokhtari, Directeur de thèse
Editeur :	Biskra [Algérie] : Faculté des Sciences Exactes et des Sciences de la Nature et de la Vie, Université Mohamed Khider, 2024
Format :	1 vol. (84 p.) / couv. ill. en coul
Note générale :
Langues:	Français
Mots-clés:	Feature extraction and selection, Data mining, Correlation measures, Auto-encoders, Machine Learning, Classification , High dimensional data.
Résumé :	Data mining in various fields is challenged by high-dimensional datasets containing redundant and irrelevant information. This dissertation explores combining correlation measures and autoencoders to enhance dimensionality reduction, preserving essential data features. Despite extensive research, the effectiveness of these techniques in retaining crucial informationremains unclear. This work investigates the impact of applying correlation measures with autoencoders on maintaining data integrity for improved data mining. In this work, we proposed a hybrid approach that significantly retains essential information and enhances classification performance by considering correlation measures to select features. These selected features are then used to train an autoencoder to obtain the final retained data. We evaluated the reduced data using various machine learning algorithms and correlation measures. The obtained results have shown significant implications for future optimization of machine learning data preprocessing, emphasizing the potential of combining correlation measurements with autoencoders for efficient dimensionality reduction
Sommaire :	List of Figures v List of Tables vi General introduction 1 1 DIMENSIONALITY REDUCTION 3 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Linear Dimensionality Reduction Techniques . . . . . . . . . . . . . . 4 1.2.1.1 Principal Components Analysis PCA . . . . . . . . . . . . 4 1.2.1.2 Singular Value Decomposition (SVD) . . . . . . . . . . . . 6 1.2.1.3 Analysis of Linear Discriminant (LDA) . . . . . . . . . . . 8 1.2.2 Non-Linear Dimensionaltiy Reduction techniques . . . . . . . . . . . 9 1.2.2.1 Kernel Principle Component Analysis (KPCA) . . . . . . . 9 1.2.2.2 Multi dimensions Scaling (MDS) . . . . . . . . . . . . . . . 11 1.2.2.3 Isomap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.2.4 t-Stochastic Neighbor Embedding . . . . . . . . . . . . . . 13 1.2.2.5 Uniform Manifold Approximation and Projection (UMAP) . 14 1.2.3 Deep Learning based dimensionality reduction techniques . . . . . . . 15 1.2.3.1 Auto-encoders . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.3.2 Variational autoencoder . . . . . . . . . . . . . . . . . . . . 16 1.2.3.3 Self Organizing Map . . . . . . . . . . . . . . . . . . . . . . 18 1.2.3.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2.4 Hybrid techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.2.4.1 PCA and Neural Networks . . . . . . . . . . . . . . . . . . . 20 1.2.4.2 tSVD and Neural Networks . . . . . . . . . . . . . . . . . . 21 Contents 1.3 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.3.1 Filter Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.3.1.1 Correlation-based Feature Selection (CFS) . . . . . . . . . . 22 1.3.1.2 Information Gain based feature selection . . . . . . . . . . . 22 1.3.2 Wrapper methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.3.2.1 Backward elimination . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.2 Forward Selection . . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.3 Recursive Feature Elimination (RFE) . . . . . . . . . . . . . 24 1.3.2.4 Stepwise Selection . . . . . . . . . . . . . . . . . . . . . . . 24 1.3.2.5 Support Vector Machine (SVM) . . . . . . . . . . . . . . . 25 1.3.2.6 Particle Swarm Optimization (PSO) . . . . . . . . . . . . . 25 1.3.3 Embedded methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.3.3.1 LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.3.3.2 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.3.3.3 Extra Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2 Dimensionality reduction using correlation and Auto-encoders 32 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2 Correlation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.1 Types of correlation metrics . . . . . . . . . . . . . . . . . . . . . . . 35 2.3 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4 Detailed architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4.1 Dataset description . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.4.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.4.3 Splitting dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.3.1 Training set . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.3.2 Testing set . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4.4 Dimensionality reduction using correlation and Auto-encoders . . . . 41 2.4.5 Classification using machine learning algorithms . . . . . . . . . . . 43 2.4.5.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . 43 2.4.5.2 Random forest . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.4.5.3 Support vector machine . . . . . . . . . . . . . . . . . . . . 44 2.4.5.4 Artificial neural networks . . . . . . . . . . . . . . . . . . . 45 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Contents 3 Implementation and results 48 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2 Implementation frameworks and tools . . . . . . . . . . . . . . . . . . . . . . 48 3.2.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.2 Kaggle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.3 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.4 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.5 SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3.1 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.3 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.4 Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.5 F1-score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4 Implementation phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4.1 Loading and preprocessing the datasets . . . . . . . . . . . . . . . . . 53 3.4.1.1 Handling categorical variables . . . . . . . . . . . . . . . . . 53 3.4.1.2 Handling missing values . . . . . . . . . . . . . . . . . . . . 54 3.4.1.3 Data normalisation . . . . . . . . . . . . . . . . . . . . . . . 54 3.4.1.4 Data splitting . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.2 Building correlation and Auto-encoders model . . . . . . . . . . . . . 55 3.4.2.1 Correlation measures . . . . . . . . . . . . . . . . . . . . . . 55 3.4.2.2 Feature selection using correlation . . . . . . . . . . . . . . 56 3.4.2.3 Auto-encoders architecture . . . . . . . . . . . . . . . . . . 57 3.4.3 Training and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4.4.1 Parameters initialization for the proposed implementation . 61 3.4.4.2 Dimensionality reduction . . . . . . . . . . . . . . . . . . . 62 3.4.4.3 Reconstraction error based correlation . . . . . . . . . . . . 63 3.4.4.4 Classification accuracy . . . . . . . . . . . . . . . . . . . . 65 3.4.4.5 Confusion matrix comparisons . . . . . . . . . . . . . . . . 68 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 General conclusion 71 Contents Bibliography 72
Type de document :	Mémoire master

Demande de reservation

Disponibilité (1)

Cote	Support	Localisation	Statut
MINF/882	Mémoire master	bibliothèque sciences exactes	Consultable

Retour a la page d'accueil

Compte d'adhesion

Demande de mot de passe
Demande d'inscription en ligne

Coordonnées bibliothéque

Portail documentaire de l'universitè Mohamed Khider Biskra

7000 Biskra
Algerie
033 54 32 99

Recherche documentaire

Catalogues des bibliothéques

Ce portail vous permet l'identification des ressources disponibles dans les differentes bibliotheques de l'universite.

Lire la suite

Système National Documentation

Le Systeme National de Documentation en Ligne est un portail documentaire permetant l'acces a la documentation electronique nationale et internationale.

Lire la suite

Thèses en ligne sur e-print

Afin de permetre une visibilité et accéssibilité du patrimoine scientifique de l'université l'université déja publié sur le portail national de thèses aux chercheurs ,le portail offre la possibilité

Catalogue Collectif d'Algerie

CCdz est un catalogue national qui regroupe l’ensemble des fonds documentaires des bibliothèques algériennes.

Lire la suite