Titre : | Pattern recognition for healthcare analytics |
Auteurs : | Aboubakr Seddik Drid, Auteur ; Salim Bitam, Directeur de thèse |
Type de document : | Monographie imprimée |
Editeur : | Biskra [Algérie] : Faculté des Sciences Exactes et des Sciences de la Nature et de la Vie, Université Mohamed Khider, 2017 |
Format : | 1 vol. (84 p.) / 30 cm |
Langues: | Anglais |
Résumé : |
Data explosion led by the evolution of the information technology and computing infrastructure and the requirement of improving the computational abilities, has given birth of the new domain, which is the Big Data. This new emergent domain has the abilities to manage and deal with new generation data. However, the real challenge of the Big Data is how to get maximum out of the data already available. Big Data Analytics is computational process, allowing storing organising and mining data, using math, statistics and the artificial intelligent; to take the existing data and make it meaningful to provide an accurate insight in the past data, and predict what is going to happen in the future. Data Mining is one of the most used disciplines in the Big Data Analytics process to discover interesting and useful patterns and relationships in large volumes of data. Healthcare is one of the most interesting sectors, which can gain an important benefit by applying Big Data Analytics tools on the collected data, to improve care, save lives and lower costs. However, what does Big Data look like when it comes to healthcare, what are the opportunities and challenges of applying the Big Data Analytics in this sensible sector, and what are the potential benefits to patients? In this study, we have chosen the heart disease as a case of study due to its direct influence on the human life and the high number of death caused by this type of disease. We worked on the ECG signal, regarding to its capabilities to detect abnormalities and failures in the heart activity. Data used are gathered from two sources, the first one is the e-Health platform mounted on Raspberry Pi3, which allows to generate and record ECG signals, the second source are records provided by Physionet website. We propose in this memory to use features extracted and calculated from the ECG signal records to feed an adopted Random Forest model in the training and testing steps, in order to classify heartbeat types. The scored results are quite acceptable, however some adjustments can be introduced to the way of selecting samples and the targeted features. Despite this, the trained model still improve high capabilities on classifying heart beat types. |
Sommaire : |
Acknowledgments........................................................................................................... i Dedication.....................................................................................................................ii Table of Contents ......................................................................................................... iii List of Figures ............................................................................................................... vi List of Tables ............................................................................................................. viii List of Abbreviations ....................................................................................................ix Abstract .......................................................................................................................... x Introduction .................................................................................................................... 1 1.Big data analysis and e-health: basic concept ........................................................ 3 1.1. Introduction ..................................................................................................... 3 1.2. Big Data and analytics..................................................................................... 4 1.2.1. What is Big Data? .................................................................................... 5 1.2.1.1. Multiple definition on the web! ........................................................ 5 1.2.1.2. What is not Big Data?....................................................................... 5 1.2.1.3. The V’s of Big Data.......................................................................... 6 1.2.2. How Big Data relevant to multiple verticals............................................ 7 1.2.3. Big Data, Big Challenges! ....................................................................... 8 1.2.4. Analytics .................................................................................................. 9 1.2.4.1. Descriptive analytics......................................................................... 9 1.2.4.2. Predictive analytics........................................................................... 9 1.2.4.3. Perspective analytics....................................................................... 10 1.2.5. Big Data (BDA) Analytics unique capabilities...................................... 10 1.3. Data Mining and Pattern Recognition ........................................................... 11 1.3.1. Data Mining ........................................................................................... 11 1.3.2. Pattern recognition................................................................................. 11 1.3.3. Knowledge Discovery from Data (KDD) process ................................. 12 1.3.4. Data Mining techniques and concepts ................................................... 13 1.3.4.1. Predictive Vs descriptive modeling................................................ 13 1.3.4.2. Supervised Vs Unsupervised learning ............................................ 14 1.3.4.3. Common data mining tasks ...........................................................14 1.3.4.4. Artificial Neural Networks (ANN)................................................. 15 1.3.4.5. Support Vector Machine (SVM) .................................................... 16 1.3.4.6. Decision Tree.................................................................................. 17 1.3.4.7. Random forest................................................................................. 19 1.4. Healthcare and ECG...................................................................................... 19 1.4.1. BDA Analytics in Healthcare ................................................................ 20 1.4.2. Healthcare sources and data types ......................................................... 20 1.4.3. Opportunities of BDA in Healthcare ..................................................... 21 1.4.4. Challenges of BDA in Healthcare.......................................................... 22 1.4.5. ECG........................................................................................................ 22 1.4.5.1. Definition........................................................................................ 23 1.4.5.2. ECG pattern .................................................................................... 24 1.4.5.3. Heart disease................................................................................... 25 1.4.5.4. Screening ........................................................................................ 25 1.5. Conclusion ..................................................................................................... 26 2.State of the art....................................................................................................... 27 2.1. Introduction ................................................................................................... 27 2.2. General works on the wearable sensors of vital signs ................................... 28 2.3. ECG data mining in the literature ................................................................. 30 2.3.1. Preprocessing ......................................................................................... 30 2.3.2. R-peak and QRS complex detection ...................................................... 31 2.3.3. Feature extraction................................................................................... 31 2.3.4. Classification.......................................................................................... 32 2.4. Adopted method ............................................................................................ 33 2.5. Conclusion ..................................................................................................... 33 3.Conception ............................................................................................................ 34 3.1. Introduction ................................................................................................... 34 3.2. Global conception ......................................................................................... 35 3.3. Detailed conception....................................................................................... 36 3.3.1. Data selection and acquisition ............................................................... 38 3.3.1.1. Data selection ................................................................................. 38 3.3.1.2. Data acquisition .............................................................................. 38 3.3.1.3. Data selection inputs and outputs ................................................... 39 3.3.2. Preprocessing and features extraction.................................................... 39 3.3.2.1. Preprocessing.................................................................................. 39 3.3.2.2. Preprocessing inputs and outputs ................................................... 40 3.3.2.3. Features extraction.......................................................................... 40 3.3.2.4. Features extraction inputs and outputs ........................................... 41 3.3.3. Features selection................................................................................... 41 3.3.3.1. Features selection inputs and outputs ............................................. 42 3.3.4. Model training and validation ................................................................ 42 3.3.4.1. Model training ................................................................................42 3.3.4.2. Model validation.............................................................................43 3.3.4.3. Model precision ..............................................................................43 3.3.4.4. Model training and validation inputs/outputs.................................45 3.3.5. Similarity measurement .........................................................................45 3.3.5.1. Similarity measurement inputs and outputs....................................45 3.4 Conclusion.....................................................................................................46 4. Implementation and results discussion.................................................................47 4.1. Introduction ...................................................................................................47 4.2. Implementation..............................................................................................48 4.2.1. Data Acquisition ....................................................................................48 4.2.1.1. e-Health (e-HP) platform................................................................48 4.2.1.2. Physionet website ...........................................................................59 4.2.2. Environment of work for the rest of the project ....................................62 4.2.2.1. The machine used ...........................................................................62 4.2.2.2. The development environment .......................................................63 4.2.3. Pre-processing and Features extraction .................................................64 4.2.4. Selecting and calculating features..........................................................66 4.2.5. Batch processing ....................................................................................67 4.2.6. Model training........................................................................................69 4.2.6.1. Data set used...................................................................................69 4.2.6.2. Model used .....................................................................................70 4.2.7. Similarity measurement .........................................................................71 4.2.7.1. Data set used...................................................................................71 4.3. Results and discussion...................................................................................72 4.3.1. Pre-Training ...........................................................................................72 4.3.2. Tuning....................................................................................................73 4.3.3. Training and model accuracy.................................................................75 4.3.4. Other parameters....................................................................................77 4.3.4.1. Number of nods for the trees ..........................................................77 4.3.4.2. Variable importance .......................................................................78 4.3.4.3. Predictor variables used..................................................................78 4.4. Conclusion.....................................................................................................79 Conclusion ...................................................................................................................80 Bibliography ................................................................................................................82 |
Disponibilité (1)
Cote | Support | Localisation | Statut |
---|---|---|---|
MINF/268 | Mémoire master | bibliothèque sciences exactes | Consultable |