Welcome to the Nanodegree program
02. Meet the Instructors00:00
03. Term 2 Projects00:00
03.2 Term 2 Projects00:00
03.3 Term 2 Projects00:00
03.4 Term 2 Projects00:00
04. Program Structure & Syllabus
05. Learning Plan – First Two Weeks
06. How to Succeed00:00
Words of Encouragement00:00
The Skills That Set You Apart
02. Interview: Robert Chang [AirBnB]00:00
03. Interview: Caroline [BMG]00:00
04. Interview: Dan [Coinbase]00:00
05. Interview: Richard [Starbucks]00:00
06. Outro00:00
The Data Science Process
02. Video: CRISP-DM00:00
03. Video: The Data Science Process – Business & Data00:00
04. Video: Business & Data Understanding – Example00:00
05. Screencast: Using Workspaces00:00
06. Quiz + Notebook: A Look at the Data
07. Screencast: A Look at the Data00:00
08. What Should You Check?
09. Video: Business & Data Understanding00:00
10. Video: Gathering & Wrangling00:00
11. Screencast: How To Break Into the Field?00:00
12. Notebook + Quiz: How To Break Into the Field
13. Screencast: How to Break Into the Field Solution00:00
14. Screencast: Bootcamps00:00
15.1 Quiz: Bootcamp Takeaways
15.1 Quiz: Bootcamp Takeaways
15.2 Quiz: Bootcamp Takeaways
15.2 Quiz: Bootcamp Takeaways
16. Notebook + Quiz: Job Satisfaction
17. Screencast: Job Satisfaction00:00
18. Video: It Is Not Always About ML00:00
19. Video: The Data Science Process – Modeling00:00
20. Video: Predicting Salary00:00
21. Screencast: Predicting Salary00:00
22. Notebook + Quiz: What Happened?
23. Screencast: What Happened Solution00:00
24. Video: Working With Missing Values00:00
25. Video: Removing Data – Why Not?00:00
26. Video: Removing Data – When Is It OK?00:00
27. Video: Removing Data – Other Considerations00:00
28. Quiz: Removing Data
29. Notebook + Quiz: Removing Values
30. ScreenCast: Removing Data Solution00:00
31. Notebook + Quiz: Removing Data Part II
32. Screencast: Removing Data Part II Solution00:00
33. Video: Imputing Missing Values00:00
34. Notebook + Quiz: Imputation Methods & Resources
35. Screencast: Imputation Methods & Resources Solution00:00
36. Notebook + Quiz: Imputing Values
37. Screencast: Imputing Values Solution00:00
38. Video: Working With Categorical Variables Refresher00:00
39. Notebook + Quiz: Categorical Variables
40. Screencast: Categorical Variables Solution00:00
41. Video: How to Fix This?00:00
42. Notebook + Quiz: Putting It All Together
43. Screencast + Notebook: Putting It All Together Solution00:00
44.1 Text + Quiz: Results
44.1 Text + Quiz: Results
44.2 Text + Quiz: Results
44.2 Text + Quiz: Results
44.3 Text + Quiz: Results
44.3 Text + Quiz: Results
45. Video: The Data Science Process – Evaluate & Deploy00:00
46. Text: Recap
Communicating to Stakeholders
02. Video: First Things First00:00
03. Text: README Showcase
04. Video: Posting to Github00:00
05. Quiz: Github Check
06. Video: Up And Running On Medium00:00
07. Text: Medium Getting Started Post and Links
08. Video: Know Your Audience00:00
09. Who Is The Audience?
10. Video: Three Steps to Captivate Your Audience00:00
11. Video: First Catch Their Eye00:00
12. Picture First, Title Second
13. Video: More Advice00:00
14. More Advice
15. Video: End With A Call To Action00:00
16. End With A Call To Action
17. Video: Other Important Information00:00
18. Text: Recap
19. Video: Conclusion00:00
Project Write A Data Science Blog Post
01. Project Overview00:00
02. Project Motivation and Details
Project Description – Write A Data Science Blog Post
Project Rubric – Write A Data Science Blog Post
Introduction to Software Engineering
02. Course Overview0:58
Software Engineering Practices Pt I
02. Clean and Modular Code4:19
Quiz
03. Refactoring Code2:01
04. Writing Clean Code5:11
05. Quiz: Clean Code
06. Writing Modular Code5:25
07. Refactoring – Wine Quality
08. Solution: Refactoring – Wine Quality
09. Efficient Code1:45
10. Optimizing – Common Books3:35
Documentation1:20
In-line Comments1:38
Docstrings1:14
Project Documentation
19. Documentation
Version Control in Data Science0:41
Scenario #12:39
Scenario #21:19
Scenario n. ° 31:18
Model Versioning
Conclusion0:36
Software Engineering Practices Pt II
02. Testing1:03
03. Testing and Data Science1:50
04. Unit Tests2:36
05. Unit Testing Tools1:18
07. Desarrollo basado en pruebas y ciencia de datos2:23
08. Logging0:50
09. Log Messages
Quiz
11. Code reviews0:47
12. Questions to Ask Yourself When Conducting a Code Review
13. Tips for Conducting a Code Review
14. Conclution00:27
OOP
01. Introduction1:21
02. Procedural vs. Object-Oriented Programmingedium1:55
02. Quiz
03. Class, Object, Method and Attribute2:37
03. Quiz
04. OOP Syntax5:32
05. Exercise: OOP Syntax Practice – Part 100:00
06. A Couple of Notes about OOP4:39
07. Exercise: OOP Syntax Practice – Part 200:00
08. Commenting Object-Oriented Code
09. Gaussian Class1:33
10. How the Gaussian Class Works3:36
11. Exercise: Code the Gaussian Class00:00
12. Magic Methods1:47
12.1 Magic Methods in Code00:00
13. Exercise: Code Magic Methods00:00
14. Inheritance00:00
14.1 Inheritance Example V100:00
15. Exercise: Inheritance with Clothing00:00
16. Inheritance: Probability Distribution00:00
17. Demo: Inheritance Probability Distributions00:00
18. Advanced OOP Topics
19. Organizing into Modules3:27
20. Demo: Modularized Code00:00
21. Making a Package5:38
22. Entornos virtuales2:24
24. Binomial Class00:00
24. Binomial Class 200:00
26. Scikit-learn Source Code00:00
27. Putting Code on PyPi00:00
29. Lesson Summary00:00
Portfolio Exercise: Upload a Package to PyPi
01. Introduction
02. Troubleshooting Possible Errors
03. Workspace00:00
Web Development
02. Lesson Overview1:02
03. Components of a Web App00:00
03. Quiz
04. The Front-End00:00
05. HTML00:00
05. Quiz
06. Exercise: HTML00:00
07. Div and Span00:00
08. IDs and Classes00:00
09. Exercise: HTML Div, Span, IDs, Classes00:00
10. CSS00:00
11. Exercise: CSS00:00
12. JavaScript00:00
13. Exercise: JavaScript00:00
14. Bootstrap Library00:00
15. Exercise: Bootstrap00:00
16. Plotly00:00
17. Exercise: Plotly00:00
18. The Backend00:00
19. The Web00:00
20. Flask4:59
21. Exercise: Flask00:00
22. Flask + Pandas00:00
23. Example: Flask + Pandas00:00
24. Flask+Plotly+Pandas Part 100:00
25. Flask+Plotly+Pandas Part 200:00
26. Flask+Plotly+Pandas Part 300:00
27. Flask+Plotly+Pandas Part 400:00
28. Example: Flask + Plotly + Pandas00:00
29. Exercise: Flask + Plotly + Pandas00:00
30. Deployment00:00
31. Exercise: Deployment00:00
32. Lesson Summary00:00
Portfolio Exercise: Deploy a Data Dashboard
02. Workspace Portfolio Exercise00:00
03. Troubleshooting Possible Errors
04. Congratulations3:32
05. APIs [advanced version]
06. World Bank API [advanced version]00:00
07. Python and APIs [advanced version]
08. World Bank Data Dashboard [advanced version]00:00
Introduction to Data Engineering
ETL Pipelines
02. Lesson Overview1:09
03. World Bank Datasets4:01
03. Quiz
05. Extract0:41
05. Overview of the Extract Part of the Lesson00:00
05. Quiz
06. Exercise: CSV
07. Exercise: JSON and XML
08. Exercise: SQL Databases
09. Extracting Text Data
10. Exercise: APIs
11. Transform3:11
11. Overview of the Transform Part of the Lesson00:00
12. Combining Data00:00
11. Quiz
13. Exercise: Combining Data
14. Cleaning Data1:31
15. Exercise: Cleaning Data
16. Exercise: Data Types
17. Exercise: Parsing Dates
18. Matching Encodings00:00
19. Exercise: Matching Encodings
20. Missing Data – Overview00:00
21. Missing Data – Delete00:00
22. Missing Data – Impute00:00
23. Exercise: Imputation
24. SQL, optimization, and ETL – Robert Chang Airbnb4:25
25. Duplicate Data00:00
26. Exercise: Duplicate Data
27. Dummy Variables00:00
28. Exercise: Dummy Variables
29. Outliers – How to Find Them00:00
30. Exercise: Outliers Part 1
31. Outliers – What to do00:00
32. Exercise: Outliers – Part 2
33. AI and Data Engineering – Robert Chang Airbnb2:09
34. Scaling Data00:00
35. Exercise: Scaling Data
36. Feature Engineering00:00
37. Exercise: Feature Engineering
38. Bloopers00:00
39. Load00:00
39. Overview of the Load Part of the Lesson00:00
40. Exercise: Load
41. Putting It All Together00:00
41. Overview of the Final Exercise00:00
42. Exercise: Putting It All Together
43. Lesson Summary00:00
Introduction to NLP
How NLP Pipelines Work00:00
Text Processing00:00
04. Cleaning00:00
05. Notebook: Cleaning
06. Normalization00:00
06. Quiz
07. Notebook: Normalization
08. Tokenization00:00
09. Notebook: Tokenization
10. Stop Word Removal00:00
11. Notebook: Stop Words
12. Part-of-Speech Tagging00:00
13. Named Entity Recognition00:00
14. Notebook: POS and NER
15. Stemming and Lemmatization00:00
16. Notebook: Stemming and Lemmatization
17. Text Processing Summary00:00
18. Feature Extraction00:00
19. Bag of Words00:00
20. TF-IDF00:00
21. Notebook: Bag of Words and TF-IDF
22. One-Hot Encoding00:00
23. Word Embeddings00:00
24. Modeling00:00
25. [OPTIONAL] Word2Vec00:00
26. [OPTIONAL] GloVe00:00
27. [OPTIONAL] Embeddings for Deep Learning00:00
28. [OPTIONAL] t-SNE00:00
Machine Learning Pipelines
02. Corporate Messaging Case Study4:11
03. Case Study Clean and Tokenize00:00
04. Solution Clean and Tokenize00:00
05. Machine Learning Workflow00:00
06. Case Study Machine Learning Workflow00:00
07. Solution Machine Learning Workflow00:00
08. Using Pipeline00:00
09. Advantages of Using Pipeline00:00
10. Case Study Build Pipeline
11. Solution Build Pipeline
12. Pipelines and Feature Unions00:00
13. Using Feature Union00:00
14. Case Study Add Feature Union
15. Solution Add Feature Union
16. Creating Custom Transformers00:00
17. Case Study Create Custom Transformer00:00
18. Solution Create Custom Transformer
19. Pipelines and Grid Search00:00
20. Using Grid Search with Pipelines2:14
21. Case Study Grid Search Pipeline00:00
22. Solution Grid Search Pipeline
23. Conclusion00:00
Disaster Response Pipeline
02. Building a Sentiment Analysis Model (XGBoost)4:17
03. Building a Sentiment Analysis Model (Linear Learner)00:00
04. Combining the Models00:00
05. Mini-Project: Updating a Sentiment Analysis Model00:00
06. Loading and Testing the New Data00:00
07. Exploring the New Data00:00
08. Building a New Model00:00
09. SageMaker Retrospective00:00
11. SageMaker Tips and Tricks00:00
Project1: Disaster Response Pipeline
01. Project Introduction00:00
02. Project Overview1:22
03. Project Details
04. Project Workspace – ETL
05. Project Workspace – ML Pipeline
06. Project Workspace IDE00:00
Project Description – Disaster Response Pipelines
Project Rubric – Disaster Response Pipelines
Concepts in Experiment Design
01. Deployment Project1:41
02. Setting up a Notebook Instance
03. SageMaker Instance Utilization Limits
Deploy a Sentiment Analysis Model
Project Rubric – Deploy a Sentiment Analysis Model
Statistical Considerations in Testing
Interview Segment00:00
02 What Applications Are Enabled By Amazon00:00
03 Why Should Students Gain Skills In Sagemaker And Cloud Services00:00
Course Outline, Case Studies
04. Unsupervised v Supervised Learning00:00
Model Design00:00
Population Segmentation00:00
K-means, Overview00:00
Creating a Notebook Instance00:00
09. Create a SageMaker Notebook Instance
10. Pre-Notebook: Population Segmentation
11. Exercise: Data Loading & Processing00:00
12. Solution: Data Pre-Processing00:00
13. Exercise: Normalization
14. Solution: Normalization00:00
15. PCA, Overview00:00
PCA Estimator & Training00:00
Exercise: PCA Model Attributes & Variance00:00
Solution: Variance00:00
Component Makeup00:00
20. Exercise: PCA Deployment & Data Transformation
21. Solution: Creating Transformed Data00:00
22. Exercise: K-means Estimator & Selecting K
23. Exercise: K-means Predictions (clusters)
24. Solution: K-means Predictor00:00
25. Exercise: Get the Model Attributes
26. Solution: Model Attributes00:00
27. Clean Up: All Resources
AWS Workflow & Summary00:00
Statistical Considerations in Testing
01. Lesson Introduction00:00
02. Practice: Statistical Significance
03. Statistical Significance – Solution
04. Practical Significance00:00
05. Experiment Size00:00
06. Experiment Size – Solution
07. Using Dummy Tests00:00
08. Non-Parametric Tests Part I
09. Non-Parametric Tests Part I – Solution
10. Non-Parametric Tests Part II
11. Non-Parametric Tests Part II – Solution
12. Analyzing Multiple Metrics00:00
12.2 Analyzing Multiple Metrics00:00
13. Early Stopping00:00
14. Early Stopping – Solution
15. Lesson Conclusion00:00
AB Testing Case Study
Pre-Notebook: Payment Fraud Detection
Exercise: Payment Transaction Data00:00
Solution: Data Distribution & Splitting00:00
LinearLearner & Class Imbalance00:00
Exercise: Define a LinearLearner
Solution: Default LinearLearner00:00
Exercise: Format Data & Train the LinearLearner
Solution: Training Job00:00
Precision & Recall, Overview
Exercise: Deploy Estimator
Solution: Deployment & Evaluation00:00
Model Improvements00:00
Improvement, Model Tuning00:00
Exercise: Improvement, Class Imbalance
Solution: Accounting for Class Imbalance00:00
Exercise: Define a Model w/ Specifications
One Solution: Tuned and Balanced LinearLearner
Summary and Improvements00:00
A/B Testing Case Study
01. Lesson Introduction00:00
02. Scenario Description
03. Building a Funnel
04. Building a Funnel – Discussion
05. Deciding on Metrics – Part I
06. Deciding on Metrics – Part II
07. Deciding on Metrics – Discussion
08. Experiment Sizing
09. Experiment Sizing – Discussion
10. Validity, Bias, and Ethics – Discussion
11. Analyze Data
12. Draw Conclusions
13. Draw Conclusions – Discussion
14. Lesson Conclusion00:00
Portfolio Exercise Starbucks
Can You Explain The Idea Behind The GitHub Respository00:00
Does Sagemaker Work With Certain Products Or Use Cases00:00
How Do You Label Data At Scale00:00
What_S Your Prediction Of What Sagemaker Will Prioritize In The Next 1-2 Years00:00
Do You Have Advice For Someone Who Wants To Learn More00:00
Introduction to Recommendation Engines
Pre-Notebook: Custom Models & Moon Data
02. Moon Data & Custom Models4:27
03. Upload Data to S300:00
Exercise: Custom PyTorch Classifier00:00
Solution: Simple Neural Network00:00
Exercise: Training Script00:00
Solution: Complete Training Script00:00
Custom SKLearn Model
PyTorch Estimator00:00
Exercise: Create a PyTorchModel & Endpoint
Solution: PyTorchModel & Evaluation00:00
Clean Up: All Resources
Summary of Skills
Matrix Factorization for Recommendations
Forecasting Energy Consumption00:00
03. Pre-Notebook: Time-Series Forecasting
Processing Energy Data00:00
Exercise Creating Time Series00:00
06. Solution: Split Data
Exercise Convert to JSON00:00
Solution Formatting JSON Lines _ DeepAR Estimator00:00
09. Exercise: DeepAR Estimator
Solution Complete Estimator _ Hyperparameters00:00
Making Predictions00:00
12. Exercise: Predicting the Future
Solution Predicting Future Data00:00
14. Clean Up: All Resources
Recommendation Engines
02. Containment00:00
04. Longest Common Subsequence00:00
05. Dynamic Programming00:00
01. Project Overview
06. Project Files _ Evaluation
07. Notebooks
Project Description – Plagiarism Detector
All Required Files and Tests
Upcoming Lesson
Time-Series Prediction00:00
Training _ Memory00:00
Hidden State Dimensions
Character-wise RNNs00:00
Sequence Batching00:00
Pre-Notebook: Character-Level RNN00:00
07. Notebook: Character-Level RNN
Implementing a Char-RNN
Batching Data, Solution00:00
Batching Data, Solution00:00
Defining the Model00:00
Char-RNN, Solution00:00
Making Predictions00:00
Sentiment Prediction RNN
Pre-Notebook: Sentiment RNN
03. Notebook: Sentiment RNN
04. Data Pre-Processing00:00
Encoding Words, Solution00:00
Getting Rid of Zero-Length00:00
Cleaning & Padding Data00:00
Padded Features, Solution00:00
TensorDataset & Batching Data00:00
Defining the Model00:00
Complete Sentiment RNN00:00
Training the Model00:00
Testing00:00
Inference, Solution
Convolutional Neural Networks
Applications of CNNs00:00
Lesson Outline00:00
MNIST Dataset00:00
How Computers Interpret Images00:00
MLP Structure & Class Scores00:00
Quiz
07. Do Your Research00:00
Loss & Optimization00:00
09. Defining a Network in PyTorch4:28
10. Training the Network00:00
11. Pre-Notebook: MLP Classification, Exercise
12. Notebook: MLP Classification, MNIST
One Solution00:00
14. Model Validation00:00
15. Validation Loss00:00
16. Image Classification Steps00:00
17. MLPs vs CNNs00:00
18. Local Connectivity00:00
19. Filters and the Convolutional Layer00:00
Filters & Edges00:00
21. Frequency in Images
22. High-pass Filters00:00
Quiz: Kernels
Notebook: Custom Filters
OpenCV & Creating Custom Filters
26. Convolutional Layer
27. Convolutional Layer00:00
28. Stride and Padding00:00
29. Pooling Layers
Notebook: Layer Visualization
Capsule Networks
Increasing Depth00:00
33. CNNs for Image Classification
Quiz 33
34. Convolutional Layers in PyTorch
Quiz 34
35. Feature Vector00:00
36. Pre-Notebook: CNN Classification
37. Notebook: CNNs for CIFAR Image Classification
38. CIFAR Classification Example00:00
39. CNNs in PyTorch00:00
40. Image Augmentation00:00
41. Augmentation Using Transformations00:00
42. Groundbreaking CNN Architectures00:00
43. Visualizing CNNs (Part 1)00:00
44. Visualizing CNNs (Part 2)
Summary of CNNs00:00
Transfer Learning
Useful Layers00:00
Fine-Tuning00:00
VGG Model & Classifier00:00
Pre-Notebook: Transfer Learning
06. Notebook: Transfer Learning, Flowers
07. Freezing Weights & Last Layer00:00
Training a Classifier00:00
Weight Initialization
Constant Weights00:00
Random Uniform00:00
General Rule00:00
Normal Distribution00:00
Pre-Notebook: Weight Initialization, Normal Distribution
07. Notebook: Normal & No Initialization
Solution and Default Initialization00:00
Additional Material
Autoencoders
Pre-Notebook: Linear Autoencoder00:00
A Linear Autoencoder00:00
Notebook: Linear Autoencoder
Defining & Training an Autoencoder00:00
A Simple Solution00:00
Learnable Upsampling00:00
Transpose Convolutions00:00
Convolutional Autoencoder00:00
Pre-Notebook: Convolutional Autoencoder
Notebook – Convolutional Autoencoder
Convolutional Solution00:00
Upsampling & Denoising00:00
De-noising00:00
Pre-Notebook: De-noising Autoencoder
Notebook: De-noising Autoencoder
Job Search
Intro00:00
Job Search Mindset00:00
Target Your Application to An Employer00:00
Open Yourself Up to Opportunity00:00
Refine Your Entry-Level Resume
Convey Your Skills Concisely00:00
Effective Resume Components00:00
Resume Structure00:00
Describe Your Work Experiences00:00
Resume Reflection00:00
Resume Review00:00
Craft Your Cover Letter
Get an Interview with a Cover Letter!00:00
Purpose of the Cover Letter00:00
Cover Letter Components00:00
Write the Introduction00:00
Write the Body00:00
Write the Conclusion00:00
Format00:00
Optimize Your GitHub Profile
Introduction00:00
GitHub profile important items00:00
Good GitHub repository00:00
Interview Part 100:00
Identify fixes for example “bad” profile00:00
Identify fixes for example “bad” profile 200:00
Quick Fixes #100:00
Quick Fixes #200:00
Writing READMEs00:00
Interview Part 200:00
Commit messages best practices
Reflect on your commit messages00:00
Participating in open source projects00:00
Interview Part 300:00
Participating in open source projects 2
Starring interesting repositories
Participating in open source projects 200:00
Starring interesting repositories00:00
Develop Your Personal Brand
Why Network?00:00
Why Use Elevator Pitches?00:00
Personal Branding
Elevator Pitch00:00
Pitching to a Recruiter00:00
Why Use Elevator Pitches?00:00
This course assumes you have experience manipulating data with the Pandas library, which is covered in the data analyst nanodegree. Some of these transformation exercises are challenging. The most challenging exercises are marked (challenging). If an exercise is marked as a challenge, it means you’ll get something out of solving it, but it’s not essential for understanding the lesson material or for getting through the final project at the end of this data engineering course.
Throughout the exercises, you might have to read the pandas documentation or search outside the classroom for how to do a certain processing technique. That is not just expected but also encouraged. As a data scientist professional, you will oftentimes have to research how to do something on your own much like what software engineers do. See this answer on Quora about how often do people use stackoverflow when working on data science projects?.
Use Google and other search engines when you’re not sure how to do something!
What You Will do in the Next Section
In the next section of the lesson, you’ll learn about the extract portion of an ETL pipeline. You’ll get practice with a series of exercises. These exercises are relatively brief and focus on extracting, or in other words, reading in data from different sources. The goal is to familiarize yourself with different types of files and see how the same data can be formatted in different ways.
For a review of pandas, click on the “Extracurricular” section of the classroom. Open the Prerequisite: Python for Data Analysis course, and go to Lesson 7: Pandas.