توضیحاتی در مورد کتاب Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics
نام کتاب : Machine Learning in Bioinformatics of Protein Sequences: Algorithms, Databases and Resources for Modern Protein Bioinformatics
عنوان ترجمه شده به فارسی : یادگیری ماشینی در بیوانفورماتیک توالی پروتئین: الگوریتم ها، پایگاه های داده و منابع برای بیوانفورماتیک پروتئین مدرن
سری :
نویسندگان : Łukasz Kurgan
ناشر : World Scientific Publishing
سال نشر : 2022
تعداد صفحات : 378
ISBN (شابک) : 9811258570 , 9789811258572
زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 44 مگابایت
بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.
فهرست مطالب :
Contents
Preface
Acknowledgments
About the Editor
Part I Machine Learning Algorithms
Chapter 1 Deep Learning Techniques for De novo Protein Structure Prediction
1. Introduction
2. Architectures of deep neural networks
2.1 Convolutional neural networks
2.2 Recurrent neural networks
2.3 Attention-based neural networks
3. Self-supervised protein sequence representation
3.1 Single-sequence-based protein sequence representation
3.2 MSA-based protein sequence representation
4. Secondary structure prediction
4.1 Neural networks used for local structure prediction
4.2 State-of-the-art SSP approaches benefit from larger data, deeper networks, and better evolutionary features
5. Contact map prediction
5.1 Neural networks used for contact map prediction
5.2 Novel strategies used in state-of-the-art CMP approaches
6. End-to-end tertiary structure prediction
7. Conclusions
References
Part II Inputs for Machine Learning Models
Chapter 2 Application of Sequence Embedding in Protein Sequence-Based Predictions
1. Introduction
2. A brief overview of language models and embeddings in Natural Language Processing
3. Protein databases facilitating language modeling
4. Adapting language models for protein sequences
4.1 ProtVec (word2vec)
4.2 UDSMProt (AWD-LSTM)
4.3 UniRep (mLSTM)
4.4 SeqVec (ELMo)
4.5 ESM-1b (Transformer)
4.6 ProtTrans (BERT)
5. Conclusions
6. Acknowledgement
References
Chapter 3 Applications of Natural Language Processing Techniques in Protein Structure and Function Prediction
1. Introduction
2. Methods for protein sequence analysis
3. Computational prediction of protein structures
3.1 Protein fold recognition
3.2 Intrinsically disorder regions/proteins identification
4. Computational prediction of protein functions
4.1 Prediction of functions of intrinsically disorder regions
4.2 Protein-nucleic acids binding prediction
4.2.1 Nucleic acid binding protein prediction
4.2.2 Nucleic acid binding residue prediction
5. Biological language models (BLM)
6. Summary and recommendations
7. Acknowledgments
References
Chapter 4 NLP-based Encoding Techniques for Prediction of Post-translational Modification Sites and Protein Functions
1. Introduction
2. NLP-based encoding techniques for protein sequence
2.1 Tokenization
2.2 Local/sparse representation of protein sequences
2.3 Distributed representation of protein sequences
2.3.1 Word embedding in proteins
2.3.2 Context-independent word embedding for protein sequence
2.3.3 Contextual word embedding for protein sequence
2.4 Variety of databases for generating pre-trained language models
3. Methods using NLP-based encoding for PTM prediction: local-level task
3.1 Local/sparse representation-based methods for PTM prediction
3.2 PTM site prediction approaches using distribute drepresentation
3.2.1 Context-independent supervised word embedding-based PTM site-prediction approaches (aka supervised embedding layer)
3.2.2 Context-independent pre-trained word embedding-based approaches for PTM prediction
3.2.3 Methods using both context independent supervised word embedding + context independent pre-trained model
3.2.4 Methods using pre-trained contextual embedding language model (BERT-based)
4. Methods using NLP-based encoding for GO-based protein function prediction: Global-level task
4.1 Local/sparse representation-based methods GO-based protein function prediction
4.2 Distributed representation-based methods GO-based protein function prediction
4.2.1 Context-independent supervised word embedding-based GO prediction approaches word (aka supervised embedding layer)
4.2.2 Context-independent pre-trained word embedding-based approaches for the GO prediction
4.2.3 Contextual word embedding-based protein GO prediction
4.2.4 Other approaches
5. Conclusion and discussion
References
Chapter 5 Feature-Engineering from Protein Sequences to Predict Interaction Sites Using Machine Learning
1. Introduction
2. Data labeling for the prediction of interaction sites
3. Featurization of protein sequences
4. Direct protein sequence features
4.1 One-hot/sparse encoding
4.2 Amino acid indices
4.3 Physicochemical property-based encoding
4.4 Global protein sequence features
4.5 Window size
5. Derived sequence features
6. Predicted features
7. Summary and conclusions
References
Part III Predictors of Protein Structure and Function
Chapter 6 Machine Learning Methods for Predicting Protein Contacts
1. Introduction
2. Residue contact definitions
2.1 Contact maps
3. Machine learning algorithms in contact prediction methods
3.1 Hidden Markov Models
3.2 Support Vector Machines
3.3 Random Forest Algorithms
3.4 Naïve Bayes Classifiers
3.5 Neural Networks
3.5.1 Deep neural networks
3.5.1.1 Residual convolutional neural networks
3.5.1.2 Recurrent neural network
3.5.1.3 End-to-end learning models
4. Conclusions
5. Acknowledgments
References
Chapter 7 Machine Learning for Protein Inter-Residue Interaction Prediction
1. Introduction
2. Computational methods for protein inter-residue interaction prediction
2.1 Definition of geometry terms for inter-residue interactions
2.2 Unsupervised methods for contact map prediction
2.3 Supervised methods for contact map prediction
3. Application of protein inter-residue interaction prediction
4. Discussion
5. Acknowledgment
References
Chapter 8 Machine Learning for Intrinsic Disorder Prediction
1. Introduction
2. Overview of disorder predictors
3. Disorder prediction using machine learning
4. Selected machine learning-based disorder predictors
4.1 PrDOS
4.2 MFDp
4.3 DISOPRED3
4.4 AUCpred
4.5 SPOT-Disorder2
4.6 flDPnn
5. Related resources
6. Summary
7. Funding
References
Chapter 9 Sequence-Based Predictions of Residues that Bind Proteins and Peptides
1. Introduction
2. Commonly used biological databases
2.1 Protein Data Bank (PDB)
2.2 BioLiP
3. Biological characteristics and sequence-based representation of proteins
3.1 Protein sequence-derived information
3.1.1 One-hot encoding
3.1.2 Pre-trained embedding
3.2 Evolutionary information
3.3 Predicted structural features of proteins
3.4 Amino acid physicochemical characteristics
3.5 Other features relevant to the prediction of PPI and PPepI sites
4. Performance evaluation
4.1 Experimental data preparation
4.2 Validation scheme
4.2.1 Hold-out validation
4.2.2 K-fold CV
4.2.3 Leave-one-out CV
4.3 Evaluation metrics
5. Computational methods for protein-protein interaction (PPI) site prediction
5.1 PSIVER
5.2 LORIS
5.3 DLPred
5.4 SCRIBER
5.5 DELPHI
6. Computational methods for protein-peptide interaction (PPepI) site prediction
6.1 SPRINT
6.2 PepBind
6.3 Visual
6.4 MTDsite
7. Summary
8. Acknowledgment
References
Chapter 10 Machine Learning Methods for Predicting Protein-Nucleic Acids Interactions
1. Introduction
2. Prediction of the protein-nucleic acid binding residues from sequence
2.1 Overview of the sequence-based predictors
2.2 Architectures of the sequence-based predictors
3. Summary
References
Chapter 11 Identification of Cancer Hotspot Residues and Driver Mutations Using Machine Learning
1. Introduction
2. Experimental and computational studies on cancer mutations
3. Databases of the cancer-causing mutations
4. Identification of hotspot residues
5. Methods for predicting disease-causing mutations
6. Machine learning techniques for predicting cancer-causing mutations
7. Large-scale annotation of cancer-causing mutations
8. Conclusions
9. Acknowledgements
References
Part IV Practical Resources
Chapter 12 Designing Effective Predictors of Protein Post-Translational Modifications Using iLearnPlus
1. Introduction
2. Brief review of computational PTM site prediction
3. Design of novel predictive methods using iLearnPlus
3.1 iLearnPlus
3.2 Data collection and preprocessing
3.3 Model construction and performance evaluation
3.4 Comparison with other ML algorithms
4. Summary
References
Chapter 13 Databases of Protein Structure and Function Predictions at the Amino Acid Level
1. Introduction
2. Databases of the AA-level predictions
2.1 MobiDB
2.2 D2P2
2.3 DescribePROT
2.4 Example results
3. Conclusions, impact and limitations
4. Funding
References
Index