Advanced Data Science and Ai Training in Hyderabad


Advanced Data Science and Artificial Intelligence Course


An Industry Ready Artificial Intelligence and Data Science course By DataJango Starts from 26th August Daily 9 to 10.30 am click here to register Register


Data science and Machine Learning Syllabus

Introduction to Data Science/Data Analytics
• What is Data Science?
• Why Data Science?
• Applications of Data Science
• How much of statistics?
• How much of mathematics?
• How much demand in IT (across all) industry?

Descriptive Statistics
• Central Tendency (mean, median and mode)
• Interquartile Range
• Variance
• Standard Deviation
• Z-Score/T-Score
• Co-variance
• Correlation

Data Distributions
• Binomial Distribution
• Introduction to Probability
• Normal Distribution

Overview of Data Visualization
• Bar Chart
• Histogram
• Box whisker plot
• Dot-plot
• Line plot
• Scatter Plot

Introduction to Python
• How to install python (anaconda)
• How to work with Jupyter Notebook
• How to work with Spyder IDE
• Compound data types
o Strings, Lists, Tuples, Sets, Dictionaries
• Control Flows
• Keywords (continue, break, pass)
• Functions (Formal/Positional/Keyword arguments)
• Predefined functions (range, len, enumerate, zip)

Introduction to NumPy
• One-dimensional Array
• Two-dimensional Array
• Predefined functions (arrange, reshape, zeros, ones, empty, eye, linespace)
• Basic Matrix operations
o Slicing, indexing, Looping, Shape Manipulation, Stacking
o Scalar addition, subtraction, multiplication, division, broadcasting
o Matrix addition, subtraction, multiplication, division and transpose, broadcasting

Introduction to Pandas
• Series
• DataFrame
• df.GroupBy
• df.crosstab
• df.apply
• df.mapapply

Inferential Statistics
• Central Limit Theorem
• Confidence Interval and z-distribution table
• Statistical Significance
• Hypothesis testing
• P-value
• One-tailed and Two-tailed Tests
• Chi-Square Goodness of Fit Test
• F- Statistic (ANOVA)
• Skewness, Kurtosis

Exploratory Data Analysis
• Train/Test split – Data snooping bias
• Statistical Data Analysis
• Fixing missing values
• Finding outliers
• Data quality check
• Feature transformation
• Data Visualization (Matplotlib, Seaboarn)
o Categorical to Categorical
o Categorical to Quantitative
o Quantitative to Quantitative
• Bi-Variate data analysis (Hypothesis Testing)
o Categorical and Quantitative (ANOVA)
o Categorical to Categorical (Chi-Square)
o Quantitative to Categorical (Chi-Square)
o Quantitative to Quantitative (Correlation)

Intro to Regression (Supervised Learning)
• What is regression?
• Simple linear regression
• Linear Regression – a statistics perspective (statsmodels – OLS)
• Evaluation metrics (R-Squre, Adj R-Squre, MSE, RMSE)

Regression Analysis (ML – statsmodels)
• Mean centralization and its use in multiple linear regression
• Multiple linear regression
• P – Value based feature selection methods (Backward, Forward and Mixed)
• Linear regression assumptions (linear relations – fitted vs residuals plot, homoscedasticity, normal distribution of error term, serial correlation, multicollinearity)
• Q-Q Plot, Shapiro Wilk test – different ways to check normality of data.
• Data transformation techniques.

Encoding & Code Modularization
• Label Encoding
• One-Hot (dummy variable) encoding
• Dummy variable trap
• Scikit-Learn → Custom Transformers
• Scikit-Learn → Pipeline

Multiple Linear regression (scikit-learn)
• Normal Equation (Linear Algebraic way of solving linear equation)
• Gradient Descent (Calculus way of solving linear equation)
• Multiple Linear Regression (SGD Regressor)
• Feature Scaling ( Min-Max vs Mean Normalization)
• Feature Transformation
• Polynomial Regression

Bias-Variance tread off, Re-sampling Techniques
• Bias-Variance tread off
• Major challenges in Data Science project (Data or Algorithm).
• Hold-out Data
• K-fold Cross-Validation
• Leave-one-Out
• Random Sub-sampling Cross-Validation
• Bootstrapping

Model Evaluation, Model Selection, Polynomial Regression, Regularization.
• Train/Validation/Test split
• K-Fold Cross Validation
• The Problem of Over-fitting (Bias-Variance tread-off)
• Learning Curve
• Regularization (Ridge, Lasso and Elastic-Net)
• Feature selection
• Hyper Parameter Tuning (GridSearchCV, RandomizedSearchCV)

Model Deployment
• Pickle (pkl file)
• Model load from pkl file and prediction

Classification (Supervised Learning)
• Logistic Regression Algorithm (SGD Classifier)
• Accuracy measurements – handling imbalanced dataset
o Accuracy score
o Confusion matrix
o Precision
o Recall
o Precision – Recall tread off curve
o ROC curve
o AUC score
• Multi-class Classification
o One-vs-One
o One-vs-All
o Softmax regression classifier
• Multi-label Classification
• Multi- output Classification

Support Vector Machine
• SVM Classifier (Soft/Hard – Margin)
• Linear SVM
• Non-Linear SVM
• Kernel Trick (mathematics behind kernel trick)
• Kernel SVM
• SVM Regression

Clustering (Unsupervised Learning)
• K-means
• Hierarchical
• How to use unsupervised outcome as support to solve supervised problem.

Dimensionality Reduction (Unsupervised)
• Math behind PCA – Eigen vectors, eigen values, covariance matrix.
• Choosing Right Number of Dimensions or Principal Components
• Incremental PCA
• Kernel PCA

Tree Based Algorithms
• Regression Trees vs Classification Trees
• Entropy
• Gini Index
• Information Gain
• Tree pruning

Ensemble models
• Voting Classifiers (Heterogeneous Ensemble Models)
• Homogeneous Ensemble Models
o Random Forest
o Bagging
o Pasting
o Introduction to Boosting (Ada, Gradient)

Naive Bayes
• Bayes Theorem
• Naive Bayes Algorithm
• Introduction to Text Analytics
• Tokenization
• Text Normalization, stemming, lemmatization
• Bag of words mode

Anomaly Detection
• Anomaly vs Classification
• Credit Card Fraud detection – Anomaly Detection Algorithm
• Assumptions of normality

Introduction to Hadoop & PySpark
• Overview of Hadoop architecture
• Overview of YARN architecture
• Map-Reduce example
• Overview of Spark Context (–master YARN)
• Resilient Distributed Datasets (RDDs)
• RDD Operations (Transformations, Actions)
• Spark DataFrames
• Spark ML model with Pipeline
• Classification model, MulticlassMetrics

Introduction to Neural Networks
• Perceptron, Sigmoid Neuron
• Neural Network model representation
• How it works
• Forward-Propagation
• Back-Propagation

Deep Learning, Tensorflow, and DNN Course Syllabus
Section – I (Deep Neural Networks)

Introduction to Neural Networks
Linear Regression Gradient Descent (Batch, Stochastic and Mini-Batch)
Logistic/Sigmoid neuron
Forward propagation
Back Propagation
Neural Network Architecture
Layers of a Deep Neural Network
Back Propagation
Activation Functions (Sigmoid, Tanh, ReLU, Leaky ReLU, Softmax)
Introduction to TensorFlow
Construction Phase
Tensor reshape, slice, typecast
Variable collections – Global, Local, Trainable
Initializing Variables
Execution Phase
Linear Regression with TensorFlow
Use Case: Build a handwritten digit recognition model with TensorFlow

Regularizing Deep Neural Networks
l1, l2 regularization
Dropout regularization
Vanishing & Exploding Gradients
Weight initializations (He/Xavier initialization)
Algorithm Optimizers
Momentum – Exponentially weighted moving average
Gradient Descent with Momentum
Gradient Descent with RMSProp (Root Mean Squared Propagation)
Gradient Descent with ADAM (Adaptive Momentum Estimation)
Batch Normalization

Convolutional Neural Networks – Computer Vision
Section – II (Convolutional Neural Networks)

Introduction to CNN (Convolutional Neural Networks), Computer Vision
Convolution and Edge detection
Padding, Striding Convolutions
Convolution Neural Network
Edge Detection
ResNets (CNN build with Residual Block)
Inception Network (filter size, pooling, stride all combined layer)
Data Augmentation
Transfer Learning
Computer Vision
Object Location
Intersection over Union
Anchor Boxes
Normax Suppression
YOLO Algorithm
Object Detection
Face Verification

Natural Language Processing, IR and Text Analytics
Section – III (Natural Language Processing & Information Retrieval with Gensim, SpaCy, and NLTK)

Text Processing with Python (Regular Expressions)
Introduction to NLP – NLTK, Stanford Core NLP
Text Normalization
Case folding
Synonyms, Homonyms
Spelling mistakes
Stop words
What is Text Corpus?
Understanding different corpora in NLTK
Basic Sentiment Analysis Model (IMDB dataset)
Basic feature extraction using CountVectorizer
Build Sentiment Classifier
Information Retrieval (IR)
Term-Document incidence matrix
Inverted Index
Handling Phrase Queries (IR)
Biword index
Positional index
Spelling Correction
SoundX algorithm
Isolated words
Edit Distance
Weighted edit distance
N-Gram overlap (Jaccard coefficient)
Context-sensitive search
Document search and Rank Retrieval model
Term Frequency, Weighted Term Frequency, Inverse Document Frequency
TF-IDF Scoring
Euclidian distance
Cosine similarity
Sentiment Classification with TF-IDF (IMDB dataset)
Sentiment Classification with Hashing Vectorizer (IMDB dataset)
Word Embeddings

Recurring Neural Networks
Section – IV (Recurrent Neural Networks for Text Analytics)

Recurrent Neural Networks
Bidirectional Recurrent Neural Networks
Gated Recurrent Units (GRU)
Long short-term memory (LSTM)
Time series (Stock price prediction), Language Generation (Sequence to Sequence model)