ML and Data-Science
Raghav GuptaSource : ML Minds
Index:
- Brief Introduction about Components of Big Data Analysis
- Introduction to Hadoop and Big Data Infrastructure
- Introduction to Data Mining
- Introduction to Machine Learning
- Introduction to Nature Language Processing
- Introduction to Information Retrieval
- Introduction to Web Mining
- Introduction to Social Network Analytics
- Introduction to IOT
- Introduction to Visualization
NumPy
- Introduction to Numpy and ndarrays
- Datatypes of ndarrays
- Arithmetic operations, Indexing, Slicing
- Boolean and fancy indexing
- Basic ndarray operations
- Array-oriented programming with arrays
- Conditional, Statistical and Boolean operation
- Sorting and set operation
- File IO with NumPy
- Linear Algebra for Numpy
- Reshaping, Concatenating and Splitting Arrays
- Broadcasting
Pandas
- Series Data Structures
- DataFrame
- Index objects
- Reindexing
- Dropping entries from an axis
- Indexing, Selection and Filtering
- Arithmetic and Data Alignment
- Operations between DataFrame and Series
- Function Application and Mapping
- Sorting and Ranking
- Axis indexes with duplicate labels
- Computing Descriptive Statistics
- pct_change(), Correlation and Covariance, Unique values, Value counts and membership
Visualization
- Introduction to Matpotlib
- Colours, Markers and line styles
- Customization of Matplotlib
- Plotting with Pandas
- Barplots, Histograms plots, Density Plots
- Introduction to Seaborn, Style Management
- Controlling figure aesthetics
- Colour Palettes
- Plotting univariate Distribution
- Plotting bivariate Distribution
- Visualizing pairwise relationship in pairplots
- Plotting with Categorical Data
- Visualizing Linear Relationships
- Plotting on Data-aware grids
- Other Python Visualization tools
SciPy
- Linear Algebra in SciPy
- Sparse Matrices in SciPy
- Constants, Cluster and FFT Packages
- Integration using SciPy
- Interpolation in SciPy
- SciPy I/O, SciPy ndimage
- Optimization and root finding
- SciPy.Stats
Scikit learn
Introduction to SciKit Learn and Machine Learning
Sample Dataset in SciKit Learn
Train Test using SciKit Learn
Classification IRIS using Decision Trees
Holdout Validation, K-fold cross Validation
Cross Validation using SciKit Learn
K-means Clustering in SciKit Learn
Basic Text Mining using Python
- Introduction to Nature Language Processing tool kit
- Tokenization, Lower casing and removing stop words, Lemmatization, Stemming
- ngrams, Sentence tokenization, Part of speech tagging
- Chunking, Named Entity Recognition
- Introduction to WordNet, and word sense disambiguation
Projects
- Project on Word Ladders Game
- Project on Data Analysis and Prediction using the Loan Prediction Dataset
Probability
- Introduction to Probability
- Events, Sample space, Simple Probability, Join Probability
- Mutually Exclusive events collectively exhaustive events marginal probability
- Addition Rule
- Conditional Probability
- Multiplication Rule
- Bayes theorem
- Counting rules caution advanced stuff
Probability Distributions
- What are probability distributions
- Poisson Probability Distribution
- Normal Probability Distribution
- Binomial Probability Distribution
CLT and Confidence Intervals
- Central Limit Theorem
- CLT Example
- CLT Using R-code
- Confidence Intervals of Mean
- Confidence Intervals of Mean Examples
- Confidence interval of mean in details
- Confidence interval for the mean with population deviation unknow
- Confidence interval using Python
- What do confidence intervals actually mean
- Confidence intervals for pop mean with unknown pop std dev using Python
Hypothesis Testing
- what is hypothesis testing? Null and alternative hypothesis
- Hypothesis testing for pop mean type1 and type2 errors
- 1-tailed hypothesis testing (known sigma)
- 2-tailed hypothesis testing (known sigma)
- Hypothesis testing (unknown sigma)
- 2-sample tests
- Independent 2-sample t-tests
- Paired 2-sample t-tests
- Chi-squared tests of independence
Measures of Central Tendency and Deviation
- Descriptive Vs Inferential statistics
- Central Tendency (mean, median, mode)
- Measures of dispresion (Range, IQR, std dev, variance)
- Five Number summary and skew
- Graphic displays of basic statistical descriptions
- Correlation Analysis
Machine Learning
- Introduction to machine learning
- Supervised, semisupervised, unsupervised machine learning
- Types of data sets
- Data() in R
- Introduction to classification
Decision Trees
- Introduction to Decision tree
- Hunt's algorithm for learning a decision tree
- Details of tree induction
- GINI index computation
- ID3, Entropy and information gain
- ID3 Example
- C4.5
- Pruning
- Metrics for performance Evaluation
- Iris Decision Tree Example
K Nearest Neighbors (KNN)
- Introduction to KNN algorithm
- Decision boundary KNN Vs Decision tree
- What is the best K
- KNN Problems
- Feature selection using KNNs
- Wilson Editing
- KNN Imputation
- Speeding up KNN using KMeans
- Coding up KNN from scratch in Python
- KNN using sklearn
- Digits classification using KNN in Python
Naïve Bayes
- Examples of few text classification problems
- Classification for text using bag of words
- Naïve Bayes for text classification
- Multinomial Naïve Bayes
- Multinomial Naïve Bayes Example
- Naïve Bayes for Hand-written digit recognition
- Naïve Bayes for weather data
- Numeric stability issue with Naïve bayes
- Gaussian Naïve Bayes from scratch in Python
- Naïve Bayes using sklearn
- Multinomial Naïve Bayes
SVMs
- Linear Classifiers
- Margin of SVM's
- SVM optimization
- SVM for Data which is not linear separable
- Learning non-linear patterns
- Kernel Trick
- SVM Parameter Tuning
- Handling class imbalance in SVM's
- SVM's pros and cons and summary
- Linear SVM using Python
- SVM with RBF kernel with Python
- Learning SVM with noise data in Python
Ensemble Learning
- Introduction to Ensemble learning
- Why Ensemble learning
- Independently constructed ensembles for classification: Majority voting
- Independently constructed ensembles for classification: Bagging
- Independently constructed ensembles for classification: Random forests
- Independently constructed ensembles for classification: Error correcting output codes
- Sequentially constructed ensembles for classification boosting
- Sequentially constructed ensembles for classification boosting example
- Sequentially constructed ensembles for classification stacking
- Introduction to gradient boosted machines (GBM)
- Relations between GBM gradient Descent
- GBM regression with squared loss
- Bagging in Python
- Random forests in Python
- Boosting in Python
- Feature importance using ensemble classifiers
- XGBoost in Python
- Parameter tuning for GBM's
- Voting classifier using skLearn
Artificial Neural Networks
- Motivation for Artificial Neural Network
- Mimicing a single neuron, integration function, Activation Function
- Perceptron Algorithm
- Perceptron Algorithm Example
- Decision Boundary for a single Neuron
- Learning Non-Linear Patterns
- Introduction to Deep Learning
- What can we achieve using a single hidden layers
- MLPs with Sigmoid activation Function
- Layers are transformation into a new space
- Playing at the Tensorflow playground
- Cost function, Loss function, Error Surface
- How to learn Weights
- Stochastic Gradient descent, Minibatch SGD, Momentum
- Choosing a learning Rate
- Updaters
- Back Propagation
- Softmax and Binary/Multi-class cross entropy loss
Linear Regression
Feature Selection
Sequence Labeling
Multi-task learning
Time Series Analysis
Architecting ML solutions
ML case studies
Frequent pattern mining and association rules
Data warehouse basic concepts
Clustering
Outlier Detection
Dimensionality Reduction using PCA and LDA
Data Mining
- Frequent pattern mining and association rules
- Data warehouse basic concepts
- Outlier Detection
Text Processing
- n-gram models
- Named entity recognition
- Natural Language Processing
- Sentiment Analysis
- Summarization
- Topic Modeling
- Word Representation learning
- NLTK practical
- Question Answering
Web Mining
- Text indexing
- Crawling
- Relevance ranking
- Pagerank
- Recommendation Systems
- Social Network Analysis
- Social Influence Analysis
- Event Detection from Twitter
- Location Prediction in Twitter
- Computational Advertizing
- Crowdsourcing
- Mining Structured Information from the Web
- Entity Resolution in the Web of Data
Data Collection
- Basics of Data Collection
- Web Scraping
- Twitter Scraping example
- Graph data collection
- Sensor Data collection
- IoT
Deep Learning
- TensorFlow
- CNNs
- RNNs
- LSTMs
- Auto-encoders
Visualizations
- Complex Visualizations
- Visualizing Large Data