Home › Forums › AWS › AWS Certified Machine Learning – Specialty › SM formats
-
Hi, another ambiguous question on data formats
A Machine Learning Specialist has been using Amazon EC2 for quite some time to train classification and regression models. The Specialist wants to simplify the training job by leveraging Amazon SageMaker’s built-in algorithms. However, he is unsure if SageMaker can support the format of his training data.
—
Classification and regression models dont work with JPEG files. At the same time, as of late 2019 ‘Amazon SageMaker Batch Transform now supports TFRecord format as a supported SplitType, enabling datasets to be split by TFRecord boundaries. This adds to the list of supported formats including RecordIO, CSV, and Text.’
So either question needs to be tuned or answers adjusted to remove ambiguity. Thanks
-
Hello Klimok,
Thanks for sharing your insights.
You can use a TFRecord data format to train models using custom algorithms in SageMaker. However, it is not included in the list of supported training data formats for built-in algorithms which can be found here.
https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html#cdf-common-content-types
Classification was meant in a general sense which could include image classification, binary classification, and so on.
Let me know if this helps.
Regards,
Carlo @ Tutorials Dojo
Log in to reply.