Ends in
00
days
00
hrs
00
mins
00
secs
SHOP NOW

💸 Get $5 Credits on your next purchase for Every $30 Worth of Purchase

Find answers, ask questions, and connect with our
community around the world.

  • Irene-TutorialsDojo

    Administrator
    June 26, 2025 at 12:45 pm

    Hi Toti,

    Thank you for your question and for bringing this to our attention.

    We understand your concern about filtering columns while the data is still in CSV format. Filtering can be done on CSV files because it involves selecting specific columns, such as reviews, ratings, and timestamps, based on the dataset’s structure. AWS Glue can read CSV files, identify their schema, and filter out unnecessary columns without requiring a columnar format. Doing this first reduces the data size, making the following steps more efficient.

    The correct order is to filter columns first, then convert the CSV to a columnar format like Parquet, and finally compress the data. Filtering reduces the data processed during conversion to a format designed for faster queries. Compression then lowers storage needs and improves processing speed. We’ll update the portal to clarify this. If you have other queries, please don’t hesitate to reach out.

    Best,

    Irene @ Tutorials Dojo

Skip to content