MemberMay 28, 2020 at 9:53 pm
Scenario about ’20 TB worth of scanned files’
I’m assuming these scanned files are thus images? Any search solution would need to be preceded by an OCR (e.g. Rekognition) process, to convert the image to searchable text.
But there’s nothing mentioned about this in the answers, and as far as I know CloudSearch doesn’t do OCR from image files.
Maybe good to mention something about this additional step?
MemberMay 31, 2020 at 2:05 pm
Thanks for sharing your feedback.
The question does not necessarily want to search the text inside the scanned files. When files are uploaded to S3, we can assume that a certain type of metadata is also stored to accompany each photo, like date, location data, author, etc.
This metadata is loaded to CloudSearch (could be YAML or JSON Format) and will be used to index the files.
The searching feature is not necessarily used for searching the text inside the scanned files but rather the metadata for each file. This way users can search and pull up the correct file name based on the criteria they defined.
Kenneth Samonte @ Tutorials Dojo
Log in to reply.