 Advanced Data Science and Artificial Intelligence Course

An Industry Ready Artificial Intelligence and Data Science course By DataJango Starts from 26th August Daily 9 to 10.30 am click here to register Register

### Data science and Machine Learning Syllabus

Introduction to Data Science/Data Analytics
• What is Data Science?
• Why Data Science?
• Applications of Data Science
• How much of statistics?
• How much of mathematics?
• How much demand in IT (across all) industry?

Descriptive Statistics
• Central Tendency (mean, median and mode)
• Interquartile Range
• Variance
• Standard Deviation
• Z-Score/T-Score
• Co-variance
• Correlation

Data Distributions
• Binomial Distribution
• Introduction to Probability
• Normal Distribution

Overview of Data Visualization
• Bar Chart
• Histogram
• Box whisker plot
• Dot-plot
• Line plot
• Scatter Plot

Introduction to Python
• How to install python (anaconda)
• How to work with Jupyter Notebook
• How to work with Spyder IDE
• Compound data types
o Strings, Lists, Tuples, Sets, Dictionaries
• Control Flows
• Keywords (continue, break, pass)
• Functions (Formal/Positional/Keyword arguments)
• Predefined functions (range, len, enumerate, zip)

Introduction to NumPy
• One-dimensional Array
• Two-dimensional Array
• Predefined functions (arrange, reshape, zeros, ones, empty, eye, linespace)
• Basic Matrix operations
o Slicing, indexing, Looping, Shape Manipulation, Stacking

Introduction to Pandas
• Series
• DataFrame
• df.GroupBy
• df.crosstab
• df.apply
• df.map
• df.mapapply

Inferential Statistics
• Central Limit Theorem
• Confidence Interval and z-distribution table
• Statistical Significance
• Hypothesis testing
• P-value
• One-tailed and Two-tailed Tests
• Chi-Square Goodness of Fit Test
• F- Statistic (ANOVA)
• Skewness, Kurtosis

Exploratory Data Analysis
• Train/Test split – Data snooping bias
• Statistical Data Analysis
• Fixing missing values
• Finding outliers
• Data quality check
• Feature transformation
• Data Visualization (Matplotlib, Seaboarn)
o Categorical to Categorical
o Categorical to Quantitative
o Quantitative to Quantitative
• Bi-Variate data analysis (Hypothesis Testing)
o Categorical and Quantitative (ANOVA)
o Categorical to Categorical (Chi-Square)
o Quantitative to Categorical (Chi-Square)
o Quantitative to Quantitative (Correlation)

Intro to Regression (Supervised Learning)
• What is regression?
• Simple linear regression
• Linear Regression – a statistics perspective (statsmodels – OLS)
• Evaluation metrics (R-Squre, Adj R-Squre, MSE, RMSE)

Regression Analysis (ML – statsmodels)
• Mean centralization and its use in multiple linear regression
• Multiple linear regression
• P – Value based feature selection methods (Backward, Forward and Mixed)
• Linear regression assumptions (linear relations – fitted vs residuals plot, homoscedasticity, normal distribution of error term, serial correlation, multicollinearity)
• Q-Q Plot, Shapiro Wilk test – different ways to check normality of data.
• Data transformation techniques.

Encoding & Code Modularization
• Label Encoding
• One-Hot (dummy variable) encoding
• Dummy variable trap
• Scikit-Learn → Custom Transformers
• Scikit-Learn → Pipeline

Multiple Linear regression (scikit-learn)
• Normal Equation (Linear Algebraic way of solving linear equation)
• Gradient Descent (Calculus way of solving linear equation)
• Multiple Linear Regression (SGD Regressor)
• Feature Scaling ( Min-Max vs Mean Normalization)
• Feature Transformation
• Polynomial Regression

• Major challenges in Data Science project (Data or Algorithm).
• Hold-out Data
• K-fold Cross-Validation
• Leave-one-Out
• Random Sub-sampling Cross-Validation
• Bootstrapping

Model Evaluation, Model Selection, Polynomial Regression, Regularization.
• Train/Validation/Test split
• K-Fold Cross Validation
• The Problem of Over-fitting (Bias-Variance tread-off)
• Learning Curve
• Regularization (Ridge, Lasso and Elastic-Net)
• Feature selection
• Hyper Parameter Tuning (GridSearchCV, RandomizedSearchCV)

Model Deployment
• Pickle (pkl file)
• Model load from pkl file and prediction

Classification (Supervised Learning)
• Logistic Regression Algorithm (SGD Classifier)
• Accuracy measurements – handling imbalanced dataset
o Accuracy score
o Confusion matrix
o Precision
o Recall
o Precision – Recall tread off curve
o ROC curve
o AUC score
• Multi-class Classification
o One-vs-One
o One-vs-All
o Softmax regression classifier
• Multi-label Classification
• Multi- output Classification

Support Vector Machine
• SVM Classifier (Soft/Hard – Margin)
• Linear SVM
• Non-Linear SVM
• Kernel Trick (mathematics behind kernel trick)
• Kernel SVM
• SVM Regression

Clustering (Unsupervised Learning)
• K-means
• Hierarchical
• How to use unsupervised outcome as support to solve supervised problem.

Dimensionality Reduction (Unsupervised)
• PCA
• Math behind PCA – Eigen vectors, eigen values, covariance matrix.
• Choosing Right Number of Dimensions or Principal Components
• Incremental PCA
• Kernel PCA

Tree Based Algorithms
• Regression Trees vs Classification Trees
• Entropy
• Gini Index
• Information Gain
• Tree pruning

Ensemble models
• Voting Classifiers (Heterogeneous Ensemble Models)
• Homogeneous Ensemble Models
o Random Forest
o Bagging
o Pasting

Naive Bayes
• Bayes Theorem
• Naive Bayes Algorithm
• Introduction to Text Analytics
• Tokenization
• Text Normalization, stemming, lemmatization
• Bag of words mode

Anomaly Detection
• Anomaly vs Classification
• Credit Card Fraud detection – Anomaly Detection Algorithm
• Assumptions of normality

• Overview of YARN architecture
• Map-Reduce example
• Overview of Spark Context (–master YARN)
• Resilient Distributed Datasets (RDDs)
• RDD Operations (Transformations, Actions)
• Spark DataFrames
• Spark ML model with Pipeline
• Classification model, MulticlassMetrics

Introduction to Neural Networks
• Perceptron, Sigmoid Neuron
• Neural Network model representation
• How it works
• Forward-Propagation
• Back-Propagation

### Deep Learning, Tensorflow, and DNN Course SyllabusSection – I (Deep Neural Networks)

Introduction to Neural Networks
Linear Regression Gradient Descent (Batch, Stochastic and Mini-Batch)
Logistic/Sigmoid neuron
Forward propagation
Back Propagation
Neural Network Architecture
Layers of a Deep Neural Network
Back Propagation
Activation Functions (Sigmoid, Tanh, ReLU, Leaky ReLU, Softmax)
Introduction to TensorFlow
Construction Phase
tf.Variable
tf.constant
tf.placeholder
Tensor reshape, slice, typecast
Variable collections – Global, Local, Trainable
Initializing Variables
Execution Phase
Linear Regression with TensorFlow
Use Case: Build a handwritten digit recognition model with TensorFlow

Regularizing Deep Neural Networks
l1, l2 regularization
Dropout regularization
Weight initializations (He/Xavier initialization)
Algorithm Optimizers
Momentum – Exponentially weighted moving average
Gradient Descent with RMSProp (Root Mean Squared Propagation)
Batch Normalization

### Convolutional Neural Networks – Computer VisionSection – II (Convolutional Neural Networks)

Introduction to CNN (Convolutional Neural Networks), Computer Vision
Convolution and Edge detection
Convolution Neural Network
Edge Detection
Stride
Pooling
ResNets (CNN build with Residual Block)
Inception Network (filter size, pooling, stride all combined layer)
Data Augmentation
Transfer Learning
Computer Vision
Object Location
Intersection over Union
Anchor Boxes
Normax Suppression
YOLO Algorithm
Object Detection
Face Verification

### Natural Language Processing, IR and Text AnalyticsSection – III (Natural Language Processing & Information Retrieval with Gensim, SpaCy, and NLTK)

Text Processing with Python (Regular Expressions)
Introduction to NLP – NLTK, Stanford Core NLP
Text Normalization
Tokenization
Case folding
Synonyms, Homonyms
Spelling mistakes
Stop words
Stemming
Lemmatization
What is Text Corpus?
Understanding different corpora in NLTK
Basic Sentiment Analysis Model (IMDB dataset)
Basic feature extraction using CountVectorizer
Build Sentiment Classifier
Information Retrieval (IR)
Term-Document incidence matrix
Inverted Index
Handling Phrase Queries (IR)
Biword index
Positional index
Spelling Correction
SoundX algorithm
Isolated words
Edit Distance
Weighted edit distance
N-Gram overlap (Jaccard coefficient)
Context-sensitive search
Document search and Rank Retrieval model
Term Frequency, Weighted Term Frequency, Inverse Document Frequency
TF-IDF Scoring
Euclidian distance
Cosine similarity
Sentiment Classification with TF-IDF (IMDB dataset)
Sentiment Classification with Hashing Vectorizer (IMDB dataset)
Word Embeddings

### Recurring Neural NetworksSection – IV (Recurrent Neural Networks for Text Analytics)

Recurrent Neural Networks
Bidirectional Recurrent Neural Networks
Gated Recurrent Units (GRU)
Long short-term memory (LSTM)
Autoencoders
Time series (Stock price prediction), Language Generation (Sequence to Sequence model)