automated data collection with AWS textract flow chart

Automated data collection: a machine learning use case for SMBs

Machine learning and AI projects are a reality. I’m not talking about long learning curve big tech. I mean, these are opportunities for small to midsize businesses (SMBs). Intelligent automated data collection rises to the top of the list thanks to cloud-based services like Amazon Textract.

Cloud services and intelligent document processing

Automated data collection uses machines to capture information from document images and convert it into transactional data. The core technology is OCR—optical character recognition—and it’s older than you think. There’s a great article on if you want to dig into the history. In a few words, OCR has evolved since the mid-20th century to detect various fonts, even handwriting, to alleviate manual entry. Companies have been leveraging it in Accounts Payable departments for decades.

The catch with OCR alone, or at least the limitation, is a matter of intelligence. While it can “read” images, OCR requires standardization and often process templates. That’s time and money when management wants you to focus on the process, efficiencies, and financial returns. Cloud services like Amazon Textract offer an intelligent solution. By combining OCR with machine learning (ML), this service can identify data anywhere in a document without customization or manual intervention. The underlying ML model has a high tolerance for reading a wide range of documents, even low-quality low-contrast scans.

Technical ease and compatibility

There can be a steep learning curve with new tech, but Amazon Textract requires no machine learning expertise. After some initial configuration, you can start loading images. As a full-time JD Edwards (JDE) business strategist and part-time number cruncher, I was able to throw up a proof of concept within weeks.

Invoice processing may be the most popular and tangible application of automated data collection, so I started there. Amazon Textract APIs made it relatively easy to capture data and interface with JDE EnterpriseOne. The service recognized tables, key data pairs, language and delivered the data records along with a confidence score. 

Given the flexibility and feasibility, my mind quickly went to Sales Orders, Freight Bills, and Advanced Shipping Notices. Also, text-heavy docs such as Safety Data Sheets (SDS) and Contract Abstracts. Collaborating with ERP Suites developers, we designed a user-friendly portal to make it even easier for customers to take full advantage of the technology.   



With cloud services, whether you put through 10 documents or 10,000, you’re only charged for what you use. You can collect documents across the organization and extract data that may be used in your ERP or elsewhere. For that matter, you can begin capturing new data whose value is yet to be understood. Compelling economics. Broad appeal. Ease of use. Automated data collection coupled with machine learning makes business sense. 

Need help getting started? ERP Suites provides comprehensive technology solutions along the digital journey. Learn more at