Posted On: Jan 28, 2022

Amazon SageMaker Autopilot automatically builds, trains, and tunes the best machine learning models based on your data, while allowing you to maintain full control and visibility. You can now use SageMaker Autopilot to build machine learning models for regression and classification problems on datasets larger than 10 GB, the previously supported limit. Starting today, SageMaker Autopilot supports datasets with sizes up to 100 GB by default in all AWS regions where SageMaker Autopilot is currently supported.. SageMaker Autopilot will subsample your dataset automatically, while accounting for class imbalance and preserving rare class labels. The default 100 GB service limit can be increased to support datasets larger than 100GB by filing a limit increase request in the AWS Support Center console.

You can get started building machine learning models automatically with datasets up to the 100 GB by simply pointing SageMaker Autopilot at the dataset in your S3 bucket as usual. Autopilot will automatically parse this data, detect class imbalance, and downsample observations, while keeping those from the minority class in binary classification problems. For more details, review Amazon SageMaker Autopilot quotas. For a deep dive, check out our blog post and sample notebook previewing this feature launch. To get started with SageMaker Autopilot, see the product page or access SageMaker Autopilot within SageMaker Studio.