Titre : | Recognizing Visual Object Using Machine Learning Techniques |
Auteurs : | Aicha Korichi, Auteur ; Sihem Slatnia, Directeur de thèse |
Type de document : | Thése doctorat |
Editeur : | Biskra [Algérie] : Faculté des Sciences Exactes et des Sciences de la Nature et de la Vie, Université Mohamed Khider, 2022 |
Format : | 1 vol. (91 p.) / couv. ill. en coul / 30 cm |
Langues: | Anglais |
Mots-clés: | Visual object recognition, Machine learning, Pattern recognition, Deep learning, Ear recognition, Arabic handwriting recognition |
Résumé : |
Nowadays, Visual Object Recognition (VOR) has received growing interest from researchers and it has become a very active area of research due to its vital applications including handwriting recognition, diseases classification, face identification ..etc. However, extracting the relevant features that faithfully describe the image represents the challenge of most existing VOR systems. This thesis is mainly dedicated to the development of two VOR systems, which are presented in two different contributions. As a first contribution, we propose a novel generic feature-independent pyramid multilevel (GFIPML) model for extracting features from images. GFIPML addresses the shortcomings of two existing schemes namely multi-level (ML) and pyramid multi-level (PML), while also taking advantage of their pros. As its name indicates, the proposed model can be used by any kind of the large variety of existing features extraction methods. We applied GFIPML for the task of Arabic literal amount recognition. Indeed, this task is challenging due to the specific characteristics of Arabic handwriting. While most literary works have considered structural features that are sensitive to word deformations, we opt for using Local Phase Quantization (LPQ) and Binarized Statistical Image Feature (BSIF) as Arabic handwriting can be considered as texture. To further enhance the recognition yields, we considered a multimodal system based on the combination of LPQ with multiple BSIF descriptors, each one with a different filter size. As a second contribution, a novel simple yet effcient, and speedy TR-ICANet model for extracting features from unconstrained ear images is proposed. To get rid of unconstrained conditions (e.g., scale and pose variations), we suggested first normalizing all images using CNN. The normalized images are fed then to the TR-ICANet model, which uses ICA to learn filters. A binary hashing and block-wise histogramming are used then to compute the local features. At the final stage of TR-ICANet, we proposed to use an effective normalization method namely Tied Rank normalization in order to eliminate the disparity within blockwise feature vectors. Furthermore, to improve the identification performance of the proposed system, we proposed a softmax average fusing of CNN-based feature extraction approaches with our proposed TR-ICANet at the decision level using SVM classifier. |
Sommaire : |
Contents Abstract iv List of Figures xi List of Tables xiii Abbreviations xiv 1 General Introduction 1 1.1 Introduction 1 1.2 Problematic 2 1.2.1 Why Arabic Handwriting Recognition? . 2 1.2.2 Why Unconstrained Ear Recognition? . 3 1.3 Overview On The Related Work . 4 1.4 Motivation. 5 1.5 Contributions . 6 1.6 Thesis Structure .. 7 2 General Background: Machine Learning and Visual Object Recognition 8 2.1 Introduction . . . 8 2.2 Learning Paradigms 8 2.2.1 Supervised Learning . 9 2.2.2 Unsupervised Learning 10 2.2.3 Semi-supervised Learning 11 2.3 Recognition Systems .. . 11 2.3.1 Preprocessing . 13 2.3.1.1 Histogram Normalization 13 2.3.1.2 Gaussian Smoothing 13 2.3.2 Feature Extraction 14 2.3.2.1 Texture-based Techniques For Feature Extraction 14 2.3.2.2 Geometrical-based Techniques For Feature Extraction 16 2.3.2.3 Deep learning-based Techniques For Feature Extraction18 viiContents viii 2.3.3 Feature Extraction Schemes 21 2.3.3.1 Multi-Level (ML) 21 2.3.3.2 Pyramid Multi-Level (PML) 22 2.3.4 Classification 22 2.3.4.1 K-Nearest Neighbors Classifier 22 2.3.4.2 Naive Bayes Classifier 23 2.3.4.3 Support Vector Machine (SVM) 24 2.3.4.4 Linear Discriminant Analyses (LDA) 25 2.4 Multi-modal Systems Based On Classifier Combination Schemes 25 2.5 System Evaluation Metrics . 27 2.5.1 Recognition/identification rate (Accuracy) 27 2.5.2 False Positive Rate (FPR)28 2.5.3 False Negative Rate (FNR) 28 2.5.4 Sensitivity and Specificity 28 2.5.5 Precision 28 2.5.6 Statistical Significance Tests For Models Comparison 28 2.5.7 Curve-based Methods For System Performance Evaluation 29 2.5.7.1 Receiver Operating Characteristic (ROC) 29 2.6 Conclusion 30 3 State Of The Art Methods 31 3.1 Introduction 31 3.2 Related Work For Arabic Handwriting Recognition 31 3.2.1 Arabic Handwriting Literal Amount Recognition Based On Structural Features 32 3.2.2 Arabic Handwriting Literal Amount Recognition Based On Statistical Features 34 3.2.3 Arabic Handwriting Literal Amount Recognition Based On Hybrid Features 34 3.2.4 Limitations Of Existing Works 35 3.3 Related Work For Ear Recognition. 35 3.3.1 Texture-based Techniques For Ear Recognition 36 3.3.2 Geometrical and Holistic-based Techniques For Ear Recognition . . . 36 3.3.3 Deep-Learning Techniques For Ear Recognition 36 3.3.4 Hybride-based Techniques For Ear Recognition 37 3.4 Conclusion 38 4 GFIPML & TRICANet Models For Arabic Handwriting Recognitionand Unconstrained Ear Recognition 42Contents ix 4.1 Introduction 42 4.2 Contribution N°1: A Generic Feature Independent Pyramid Multi-Level (GFIPML) Model For Arabic Handwriting Recognition 43 4.2.1 Limitations of Multi-Level and Pyramid Multi-Level representations . 44 4.2.2 Generic Feature-Independent Pyramid Multi-Level (GFIPML) model . 46 4.2.3 The Proposed System For Arabic Handwriting Recognition 47 4.3 Contribution N°2: TR-ICANet: A Fast Unsupervised Deep-Learning-Based Scheme for Unconstrained Ear Recognition 50 4.3.1 ICANet Network For Filter Learning and Feature Extraction 4.3.1.1 Filter Bank Learning 52 4.3.1.2 Binary Hashing and Block-Wise Histogramming 53 4.3.1.3 Tied Rank (TR) Normalization . 54 4.3.2 The Proposed System For Unconstrained Ear Recognition 54 4.3.2.1 CNN-based Image Preprocessing . . . 55 4.3.2.2 Feature Extraction and Classification 57 4.3.3 Multimodal Scheme For Human Ear Identification 57 4.4 Conclusion 58 5 Experimental Results and Discussion 59 5.1 Introduction 59 5.2 Experimental results and discussion 60 5.2.1 Databases 60 5.2.1.1 AHDB Database For Arabic Literal Amount Recognition60 5.2.1.2 AWE Database For Unconstrained Ear Recognition61 5.2.2 Experimental Results For The First Contribution: A Generic Feature Independent Pyramid Multi-Level (GFIPML) Model For Arabic Handwriting Recognition 61 5.2.2.1 Multi-Level (ML) experiments62 5.2.2.2 Pyramid Multi-Level (PML) Experiments 63 5.2.2.3 Generic Feature-independent Pyramid Multi-Level model (GFIPML)? Experiments64 5.2.2.4 Multi-modal System Results 66 5.2.2.5 Comparison With State Of The Art 69 5.2.3 Experimental Results For The Second Contribution: TR-ICANet: A Fast Unsupervised Deep-Learning-Based Scheme for Unconstrained Ear Recognition 71 5.2.3.1 Data augmentation. 71 5.2.3.2 ICANet Parameters Tuning . 72 5.2.3.3 Comparaison With PCANet and Deep-Learning Models .73 5.2.3.4 Multimodal System Results . 5.2.3.5 Comparison With State Of The Art 76 5.3 Conclusion . 77 6 Conclusions, Perspectives, and Future Directions 79 A Personal Contributions 81 A.1 Publications 81 A.2 Chapter Books 81 A.3 International Communications indexed in the IEEE xplore database. 81 A.4 Non indexed International Communications 82 Bibliography 83 |
Disponibilité (1)
Cote | Support | Localisation | Statut |
---|---|---|---|
TINF/176 | Théses de doctorat | bibliothèque sciences exactes | Consultable |