ML and Data-Science

ML and Data-Science

Raghav Gupta

Source : ML Minds

Index:

  • Brief Introduction about Components of Big Data Analysis
  • Introduction to Hadoop and Big Data Infrastructure
  • Introduction to Data Mining
  • Introduction to Machine Learning
  • Introduction to Nature Language Processing
  • Introduction to Information Retrieval
  • Introduction to Web Mining
  • Introduction to Social Network Analytics
  • Introduction to IOT
  • Introduction to Visualization


NumPy

  • Introduction to Numpy and ndarrays
  • Datatypes of ndarrays
  • Arithmetic operations, Indexing, Slicing
  • Boolean and fancy indexing
  • Basic ndarray operations
  • Array-oriented programming with arrays
  • Conditional, Statistical and Boolean operation
  • Sorting and set operation
  • File IO with NumPy
  • Linear Algebra for Numpy
  • Reshaping, Concatenating and Splitting Arrays
  • Broadcasting

Pandas

  • Series Data Structures
  • DataFrame
  • Index objects
  • Reindexing
  • Dropping entries from an axis
  • Indexing, Selection and Filtering
  • Arithmetic and Data Alignment
  • Operations between DataFrame and Series
  • Function Application and Mapping
  • Sorting and Ranking
  • Axis indexes with duplicate labels
  • Computing Descriptive Statistics
  • pct_change(), Correlation and Covariance, Unique values, Value counts and membership

Visualization

  • Introduction to Matpotlib
  • Colours, Markers and line styles
  • Customization of Matplotlib
  • Plotting with Pandas
  • Barplots, Histograms plots, Density Plots
  • Introduction to Seaborn, Style Management
  • Controlling figure aesthetics
  • Colour Palettes
  • Plotting univariate Distribution
  • Plotting bivariate Distribution
  • Visualizing pairwise relationship in pairplots
  • Plotting with Categorical Data
  • Visualizing Linear Relationships
  • Plotting on Data-aware grids
  • Other Python Visualization tools

SciPy

  • Linear Algebra in SciPy
  • Sparse Matrices in SciPy
  • Constants, Cluster and FFT Packages
  • Integration using SciPy
  • Interpolation in SciPy
  • SciPy I/O, SciPy ndimage
  • Optimization and root finding
  • SciPy.Stats

Scikit learn

Introduction to SciKit Learn and Machine Learning

Sample Dataset in SciKit Learn

Train Test using SciKit Learn

Classification IRIS using Decision Trees

Holdout Validation, K-fold cross Validation

Cross Validation using SciKit Learn

K-means Clustering in SciKit Learn

Basic Text Mining using Python

  • Introduction to Nature Language Processing tool kit
  • Tokenization, Lower casing and removing stop words, Lemmatization, Stemming
  • ngrams, Sentence tokenization, Part of speech tagging
  • Chunking, Named Entity Recognition
  • Introduction to WordNet, and word sense disambiguation

Projects

  • Project on Word Ladders Game
  • Project on Data Analysis and Prediction using the Loan Prediction Dataset



Probability

  • Introduction to Probability
  • Events, Sample space, Simple Probability, Join Probability
  • Mutually Exclusive events collectively exhaustive events marginal probability
  • Addition Rule
  • Conditional Probability
  • Multiplication Rule
  • Bayes theorem
  • Counting rules caution advanced stuff

Probability Distributions

  • What are probability distributions
  • Poisson Probability Distribution
  • Normal Probability Distribution
  • Binomial Probability Distribution

CLT and Confidence Intervals

  • Central Limit Theorem
  • CLT Example
  • CLT Using R-code
  • Confidence Intervals of Mean
  • Confidence Intervals of Mean Examples
  • Confidence interval of mean in details
  • Confidence interval for the mean with population deviation unknow
  • Confidence interval using Python
  • What do confidence intervals actually mean
  • Confidence intervals for pop mean with unknown pop std dev using Python

Hypothesis Testing

  • what is hypothesis testing? Null and alternative hypothesis
  • Hypothesis testing for pop mean type1 and type2 errors
  • 1-tailed hypothesis testing (known sigma)
  • 2-tailed hypothesis testing (known sigma)
  • Hypothesis testing (unknown sigma)
  • 2-sample tests
  • Independent 2-sample t-tests
  • Paired 2-sample t-tests
  • Chi-squared tests of independence

Measures of Central Tendency and Deviation

  • Descriptive Vs Inferential statistics
  • Central Tendency (mean, median, mode)
  • Measures of dispresion (Range, IQR, std dev, variance)
  • Five Number summary and skew
  • Graphic displays of basic statistical descriptions
  • Correlation Analysis



Machine Learning

  • Introduction to machine learning
  • Supervised, semisupervised, unsupervised machine learning
  • Types of data sets
  • Data() in R
  • Introduction to classification

Decision Trees

  • Introduction to Decision tree
  • Hunt's algorithm for learning a decision tree
  • Details of tree induction
  • GINI index computation
  • ID3, Entropy and information gain
  • ID3 Example
  • C4.5
  • Pruning
  • Metrics for performance Evaluation
  • Iris Decision Tree Example

K Nearest Neighbors (KNN)

  • Introduction to KNN algorithm
  • Decision boundary KNN Vs Decision tree
  • What is the best K
  • KNN Problems
  • Feature selection using KNNs
  • Wilson Editing
  • KNN Imputation
  • Speeding up KNN using KMeans
  • Coding up KNN from scratch in Python
  • KNN using sklearn
  • Digits classification using KNN in Python

Naïve Bayes

  • Examples of few text classification problems
  • Classification for text using bag of words
  • Naïve Bayes for text classification
  • Multinomial Naïve Bayes
  • Multinomial Naïve Bayes Example
  • Naïve Bayes for Hand-written digit recognition
  • Naïve Bayes for weather data
  • Numeric stability issue with Naïve bayes
  • Gaussian Naïve Bayes from scratch in Python
  • Naïve Bayes using sklearn
  • Multinomial Naïve Bayes

SVMs

  • Linear Classifiers
  • Margin of SVM's
  • SVM optimization
  • SVM for Data which is not linear separable
  • Learning non-linear patterns
  • Kernel Trick
  • SVM Parameter Tuning
  • Handling class imbalance in SVM's
  • SVM's pros and cons and summary
  • Linear SVM using Python
  • SVM with RBF kernel with Python
  • Learning SVM with noise data in Python

Ensemble Learning

  • Introduction to Ensemble learning
  • Why Ensemble learning
  • Independently constructed ensembles for classification: Majority voting
  • Independently constructed ensembles for classification: Bagging
  • Independently constructed ensembles for classification: Random forests
  • Independently constructed ensembles for classification: Error correcting output codes
  • Sequentially constructed ensembles for classification boosting
  • Sequentially constructed ensembles for classification boosting example
  • Sequentially constructed ensembles for classification stacking
  • Introduction to gradient boosted machines (GBM)
  • Relations between GBM gradient Descent
  • GBM regression with squared loss
  • Bagging in Python
  • Random forests in Python
  • Boosting in Python
  • Feature importance using ensemble classifiers
  • XGBoost in Python
  • Parameter tuning for GBM's
  • Voting classifier using skLearn

Artificial Neural Networks

  • Motivation for Artificial Neural Network
  • Mimicing a single neuron, integration function, Activation Function
  • Perceptron Algorithm
  • Perceptron Algorithm Example
  • Decision Boundary for a single Neuron
  • Learning Non-Linear Patterns
  • Introduction to Deep Learning
  • What can we achieve using a single hidden layers
  • MLPs with Sigmoid activation Function
  • Layers are transformation into a new space
  • Playing at the Tensorflow playground
  • Cost function, Loss function, Error Surface
  • How to learn Weights
  • Stochastic Gradient descent, Minibatch SGD, Momentum
  • Choosing a learning Rate
  • Updaters
  • Back Propagation
  • Softmax and Binary/Multi-class cross entropy loss

Linear Regression

Feature Selection

Sequence Labeling

Multi-task learning

Time Series Analysis

Architecting ML solutions

ML case studies

Frequent pattern mining and association rules

Data warehouse basic concepts

Clustering

Outlier Detection

Dimensionality Reduction using PCA and LDA



Data Mining

  • Frequent pattern mining and association rules
  • Data warehouse basic concepts
  • Outlier Detection


Text Processing

  • n-gram models
  • Named entity recognition
  • Natural Language Processing
  • Sentiment Analysis
  • Summarization
  • Topic Modeling
  • Word Representation learning
  • NLTK practical
  • Question Answering


Web Mining

  • Text indexing
  • Crawling
  • Relevance ranking
  • Pagerank
  • Recommendation Systems
  • Social Network Analysis
  • Social Influence Analysis
  • Event Detection from Twitter
  • Location Prediction in Twitter
  • Computational Advertizing
  • Crowdsourcing
  • Mining Structured Information from the Web
  • Entity Resolution in the Web of Data


Data Collection

  • Basics of Data Collection
  • Web Scraping
  • Twitter Scraping example
  • Graph data collection
  • Sensor Data collection
  • IoT


Deep Learning

  • TensorFlow
  • CNNs
  • RNNs
  • LSTMs
  • Auto-encoders


Visualizations

  • Complex Visualizations
  • Visualizing Large Data


Report Page