Training Material
This is a quick introduction and get started guide to Nanonets. Approx reading time should be 15-20 mins.
Last updated
This is a quick introduction and get started guide to Nanonets. Approx reading time should be 15-20 mins.
Last updated
Nanonets is a document data extraction platform. Customers typically use Nanonets in 2 ways, they either use our pre-trained models or they build their own custom OCR model to automate manual data entry. We extract data in a structured key-value pair format for you directly consume it.
Customers use Nanonets to automate manual data entry for any document type.
Input = Upload Image/PDF file via our UI or API
Output = Download structured response consumed either via our API or .csv/.xslx through our UI
Nanonets sample API response structure
Our team roots come from deep learning applied to computer vision. We don’t learn document templates like Abbyy, Kofax and Docparser, but actually learn the document. You can get started on Nanonets from day 1 without any template or rules setup. We are also developer-first with easy-to-integrate REST APIs and documentation for the same.
We support data extraction from some popular document types out-of-the-box - invoices, receipts, drivers’ license, passport to name a few. We have a set of specified fields that we extract from these document types. You can always add a new custom field that you may need to extract.
We support all major file and image formats like PDF, JPEG, PNG, TIFF, etc.
Nanonets allows you to train a model to extract specific labels from your document type without writing a single line of code.
Watch this 2 minutes video
We require a minimum of 50 images to train a custom model. We recommend starting with 50 and adding files depending on the accuracy you see. For a complicated document type you might need 1000 or more files.
Training usually takes between 20 mins - 2 hours depending on the number of files and queued models for training. In case you are facing a longer time you can choose to upgrade your model to a paid plan to be moved to the front of the queue and get more compute resources allocated.
You can use our “Verification” feature. Watch this 3 min video to understand more
Yes we support on-premises deployments. You can learn more about it here:
https://nanonets.com/help/security/do-you-have-an-on-premises-solution
Yes absolutely. We support table data extraction using Nanonets. You even specify the columns and rows of interest while training a model. For a more in depth explanation, watch this 4-minute video.
Watch this 4 min video:
You can add your own or new data to improve accuracy of a pre-trained model. This process involves uploading data, labelling the documents correctly and training the model.
What are the best practices to train high accuracy custom OCR models?
How to finetune the invoice/receipts model and add my own fields?
How do I run an OCR model on-premises via docker?