Syllabus for Data Science Course offered to 6th Semester Undergraduate students as an InterDepartmental Elective (IDE) by the Department of Computer Science and Engineering

This is the syllabus for Data Science course offered to 6th Semester Undergraduate students as an InterDepartmental Elective (IDE) by the Department of Computer Science and Engineering starting from the year 2021 at Dr.Ambedkar Institute of Technology (Dr.AIT), Bengaluru, Karnataka, India.

Course Title: Data Science
Course Code: 18CSE024
Exam Duration: 3 hours
No. of Credits: 3 = 3: 0: 0 (L: T: P)
No. of lecture hours/week: 3 Hours
Total No. of Contact Hours: 42 Hours

Course Objectives:

Determine the appropriate natural language processing, machine learning and deep learning models to solve the business-related challenges.
Indicate proficiency with statistical analysis of data to derive insight from results and interpret the data findings visually.
Demonstrate skills in data management by obtaining, cleaning and transforming the data.
Discuss how social networks appraise the ways in which the social clustering shape individuals and groups in contemporary society.

Unit No	Syllabus Content	No of Hours
1.	Visualizing Data, matplotlib, Bar Charts, Line Charts, Scatterplots, Linear Algebra, Vectors, Matrices, Statistics, Describing a Single Set of Data, Correlation, Simpson’s Paradox, Some Other Correlational Caveats, Correlation and Causation, Probability, Dependence and Independence, Conditional Probability, Bayes’s Theorem, Random Variables, Continuous Distributions, The Normal Distribution, The Central Limit Theorem.	08
2.	Hypothesis and Inference, Statistical Hypothesis Testing, Example: Flipping a Coin, p-Values, Confidence Intervals, p-Hacking, Example: Running an A/B Test, Bayesian Inference, Gradient Descent, The Idea Behind Gradient Descent Estimating the Gradient, Using the Gradient, Choosing the Right Step Size, Using Gradient Descent to Fit Models, Minibatch and Stochastic Gradient Descent, Getting Data, stdin and stdout, Reading Files, Scraping the Web, Using APIs, Example: Using the Twitter APIs, Working with Data, Exploring Your Data, Using NamedTuples, Dataclasses, Cleaning and Munging, Manipulating Data, Rescaling, An Aside: tqdm, Dimensionality Reduction.	08
3.	Machine Learning, Modeling, What Is Machine Learning?, Overfitting and Underfitting, Correctness, The Bias-Variance Tradeoff, Feature Extraction and Selection, k-Nearest Neighbors, The Model, Example: The Iris Dataset, The Curse of Dimensionality, Naive Bayes, A Really Dumb Spam Filter, A More Sophisticated Spam Filter, Implementation, Testing Our Model, Using Our Model, Simple Linear Regression, The Model, Using Gradient Descent, Maximum Likelihood Estimation, Multiple Regression, The Model, Further Assumptions of the Least Squares Model, Fitting the Model, Interpreting the Model, Goodness of Fit, Digression: The Bootstrap, Standard Errors of Regression Coefficients, Regularization, Logistic Regression, The Problem, The Logistic Function, Applying the Model, Goodness of Fit, Support Vector Machines.	09
4.	Decision Trees, What Is a Decision Tree?, Entropy, The Entropy of a Partition, Creating a Decision Tree, Putting It All Together, Random Forests, Neural Networks, Perceptrons, Feed-Forward Neural Networks, Backpropagation, Example: Fizz Buzz, Deep Learning, The Tensor, The Layer Abstraction, The Linear Layer, Neural Networks as a Sequence of Layers, Loss and Optimization, Example: XOR Revisited, Other Activation Functions, Example: FizzBuzz Revisited, Softmaxes and Cross-Entropy, Dropout, Example: MNIST, Saving and Loading Models, Clustering, The Idea, The Model, Example: Meetups, Choosing k, Example: Clustering Colors, Bottom-Up Hierarchical Clustering.	09
5.	SELF-STUDY Natural Language Processing, Word Clouds, n-Gram Language Models, Grammars, An Aside: Gibbs Sampling, Topic Modeling, Word Vectors, Recurrent Neural Networks, Example: Using a Character-Level RNN, Network Analysis, Betweenness Centrality, Eigenvector Centrality, Directed Graphs and PageRank, Recommender Systems, Manual Curation, Recommending What’s Popular, User-Based Collaborative Filtering, Item-Based Collaborative Filtering, Matrix Factorization.	08

Course Outcomes

Course Outcomes	Description	RBT Levels
CO1	Interpret the concepts and methods of mathematical disciplines relevant to data analytics and statistical modeling.	L3
CO2	Examine, visualize, curate, and prepare data and recognize how the quality of the data and the means of data collection may affect interpretation.	L3
CO3	Determine the machine learning, deep learning and natural language processing skills to design and implement efficient, data-driven solutions for real world problems.	L3
CO4	Illustrate how network analysis and recommender systems can contribute to increasing knowledge about diverse aspects of societal clustering.	L3

Course Articulation Matrix (CO-PO Mapping)

CO-PO Mapping	PO1	PO2	PO3	PO4	PO5	PO6	PO7	PO8	PO9	PO10	PO11	PO12
CO1	3	3	2	2	3	-	-	-	-	-	-	-
CO2	2	2	2	2	3	-	-	-	-	-	-	-
CO3	3	3	3	3	3	-	-	-	-	-	-	-
CO4	3	2	2	2	3	-	-	-	-	-	-	-
Strong -3 Medium -2 Weak -1

TEXT BOOKS:

Joel Grus, “Data Science from Scratch”, 2nd Edition, O’Reilly Publications/Shroff Publishers and Distributors Pvt. Ltd., 2019. ISBN-13: 978-9352138326.

REFERENCE BOOKS:

1. Emily Robinson and Jacqueline Nolis, “Build a Career in Data Science”, 1st Edition, Manning Publications, 2020. ISBN: 978-1617296246.
2. Aurélien Géron, “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems”, 2nd Edition, O’Reilly Publications/Shroff Publishers and Distributors Pvt. Ltd., 2019. ISBN-13: 978-1492032649.
3. François Chollet, “Deep Learning with Python”, 1st Edition, Manning Publications, 2017. ISBN-13: 978-1617294433.
4. Jeremy Howard and Sylvain Gugger, “Deep Learning for Coders with fastai and PyTorch”, 1st Edition, O’Reilly Publications/Shroff Publishers and Distributors Pvt. Ltd., 2020. ISBN-13: 978-1492045526.
5. Sebastian Raschka and Vahid Mirjalili, “Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2”, 3rd Edition, Packt Publishing Limited, 2019. ISBN-13: 978-1789955750.

SELF-STUDY REFERENCES/WEBLINKS:

1. Natural Language Processing
https://www.youtube.com/watch?v=xvqsFTUsOmc
2. Network Analysis
https://www.youtube.com/watch?v=K5xiFDClgjo
3. Recommender Systems
https://www.youtube.com/watch?v=39vJRxIPSxw

DOWNLOADS:

Click on the link below to view or download the syllabus for Data Science course.

Data Science Syllabus

Software Artist