Find answers, ask questions, and connect with our
community around the world.

Home Forums AWS AWS Certified Solutions Architect Professional scanned files / search functionality

  • scanned files / search functionality

  • kung

    Member
    May 28, 2020 at 9:53 pm

    Scenario about ’20 TB worth of scanned files’

    I’m assuming these scanned files are thus images? Any search solution would need to be preceded by an OCR (e.g. Rekognition) process, to convert the image to searchable text.

    But there’s nothing mentioned about this in the answers, and as far as I know CloudSearch doesn’t do OCR from image files.

    Maybe good to mention something about this additional step?

    Cheers,
    Robert

  • TutorialsDojo-Support

    Member
    May 31, 2020 at 2:05 pm

    Hi Robert,

    Thanks for sharing your feedback.

    The question does not necessarily want to search the text inside the scanned files. When files are uploaded to S3, we can assume that a certain type of metadata is also stored to accompany each photo, like date, location data, author, etc.

    This metadata is loaded to CloudSearch (could be YAML or JSON Format) and will be used to index the files.

    The searching feature is not necessarily used for searching the text inside the scanned files but rather the metadata for each file. This way users can search and pull up the correct file name based on the criteria they defined.

    https://docs.aws.amazon.com/cloudsearch/latest/developerguide/preparing-data.html

    Regards,

    Kenneth Samonte @ Tutorials Dojo

Viewing 1 - 2 of 2 replies

Log in to reply.

Original Post
0 of 0 posts June 2018
Now