Practical Data Science on the AWS Cloud (PDS) Specialization

What you will learn

Prepare data, detect statistical data biases, and perform feature engineering at scale to train models
Automatically train, evaluate, and tune models with automated machine learning (AutoML)
Store and manage machine learning features using a feature store
Debug, profile, tune and evaluate models while tracking data lineage and model artifacts
Build, deploy, monitor, and operationalize end-to-end machine learning pipelines.
Build data labeling and human-in-the-loop pipelines to improve model performance with human intelligence.

Skills you will gain

Automated Machine Learning (AutoML)
Natural Language Processing with BERT
ML Pipelines and ML Operations (MLOps)
A/B Testing, Model Deployment, and Monitoring
Data Labeling at Scale
Data Ingestion
Exploratory Data Analysis
Statistical Data Bias Detection
Multi-class Classification with FastText and BlazingText
Feature Engineering and Feature Store
Model Training, Tuning, and Deployment with BERT
Model Debugging, Profiling, and Evaluation
ML Pipelines and MLOps
Artifact and Lineage Tracking
Distributed Model Training and Hyperparameter Tuning
Cost Savings and Performance Improvements
Human-in-the-Loop Pipelines

Development environments might not have the exact requirements as production environments. Moving data science and machine learning projects from idea to production requires state-of-the-art skills. You need to architect and implement your projects for scale and operational efficiency. Data science is an interdisciplinary field that combines domain knowledge with mathematics, statistics, data visualization, and programming skills.

The Practical Data Science on the AWS Cloud Specialization brings together these disciplines using purpose-built ML tools in the AWS cloud. It helps you develop the practical skills to effectively deploy your data science projects and overcome challenges at each step of the ML workflow using Amazon SageMaker.

This Specialization is designed for data-focused developers, scientists, and analysts familiar with the Python and SQL programming languages who want to learn how to build, train, and deploy scalable, end-to-end ML pipelines – both automated and human-in-the-loop – in the AWS cloud.

Each of the 10 weeks features a comprehensive lab developed specifically for this Specialization that provides hands-on experience with state-of-the-art algorithms for natural language processing (NLP) and natural language understanding (NLU), including BERT and FastText using Amazon SageMaker.

By the end of this program, you will be ready to:

Ingest, register, and explore datasets
Detect statistical bias in a dataset
Automatically train and select models with AutoML
Create machine learning features from raw data
Save and manage features in a feature store
Train and evaluate models using built-in algorithms and custom BERT models
Debug, profile, and compare models to improve performance
Build and run a complete ML pipeline end-to-end
Optimize model performance using hyperparameter tuning
Deploy and monitor models
Perform data labeling at scale
Build a human-in-the-loop pipeline to improve model performance
Reduce cost and improve performance of data products

Syllabus

In the first course, you will learn foundational concepts for exploratory data analysis (EDA), automated machine learning (AutoML), and text classification algorithms. With Amazon SageMaker Clarify and Amazon SageMaker Data Wrangler, you will analyze a dataset for statistical bias, transform the dataset into machine-readable features, and select the most important features to train a multi-class text classifier. You will then perform automated machine learning (AutoML) to automatically train, tune, and deploy the best text-classification algorithm for the given dataset using Amazon SageMaker Autopilot. Next, you will work with Amazon SageMaker BlazingText, a highly optimized and scalable implementation of the popular FastText algorithm, to train a text classifier with very little code.

Week 1: Explore the Use Case and Analyze the Dataset

Ingest, explore, and visualize a product review data set for multi-class text classification.

Week 2: Data Bias and Feature Importance

Determine the most important features in a data set and detect statistical biases.

Week 3: Automated Machine Learning

Inspect and compare models generated with automated machine learning (AutoML).

Week 4: Built-in Algorithms

Train a text classifier with BlazingText and deploy the classifier as a real-time inference endpoint to serve predictions.

Instructors

Antje Barth

Instructor

Principal Developer Advocate, Generative AI, Amazon Web Services (AWS)

Shelbee Eigenbrode

Instructor

Principal Solutions Architect, Generative AI, Amazon Web Services (AWS)

Sireesha Muppala

Instructor

Principal Solutions Architect, AI and Machine Learning, Amazon Web Services (AWS)

Chris Fregly

Instructor

Principal Solutions Architect, Generative AI, Amazon Web Services (AWS)

Practical Data Science on the AWS Cloud (PDS) Specialization

What you will learn

Skills you will gain

Syllabus

Week 1: Explore the Use Case and Analyze the Dataset

Week 2: Data Bias and Feature Importance

Week 3: Automated Machine Learning

Week 4: Built-in Algorithms

Week 1: Feature Engineering and Feature Store

Week 2: Train, Debug, and Profile a Machine Learning Model

Week 3: Deploy End-To-End Machine Learning Pipelines

Week 1: Advanced Model Training, Tuning and Evaluation

Week 2: Advanced Model Deployment and Monitoring

Week 3: Data Labeling and Human-in-the-Loop Pipelines

Course 1: Analyze Datasets and Train ML Models using AutoML

Course 2: Build, Train, and Deploy ML Pipelines using BERT

Course 3: Optimize ML Models and Deploy Human-in-the-Loop Pipelines

Instructors

Antje Barth

Shelbee Eigenbrode

Sireesha Muppala

Chris Fregly

Sign Up

Frequently Asked Questions