Home › Forums › AWS › AWS Certified Machine Learning Engineer Associate MLA-C01 › Practice exam question › Reply To: Practice exam question
-
Hi Toti,
Thank you for your question and for bringing this to our attention.
We understand your concern about filtering columns while the data is still in CSV format. Filtering can be done on CSV files because it involves selecting specific columns, such as reviews, ratings, and timestamps, based on the dataset’s structure. AWS Glue can read CSV files, identify their schema, and filter out unnecessary columns without requiring a columnar format. Doing this first reduces the data size, making the following steps more efficient.
The correct order is to filter columns first, then convert the CSV to a columnar format like Parquet, and finally compress the data. Filtering reduces the data processed during conversion to a format designed for faster queries. Compression then lowers storage needs and improves processing speed. We’ll update the portal to clarify this. If you have other queries, please don’t hesitate to reach out.
Best,
Irene @ Tutorials Dojo