AI Quotation Parser
2024 · software
Built an internal quotation parser using the Gemini API to extract structured data from client quotation documents. Industrial quotations arrive in inconsistent formats: PDFs, scanned documents, spreadsheets with idiosyncratic layouts. Extracting line items, quantities, units, and pricing into a consistent structure was a manual task that took time and introduced transcription errors. The parser automates that extraction.
The implementation uses Gemini's document understanding capabilities to read quotation documents regardless of format and return structured output against a defined schema. The schema covers the fields that matter for downstream processing: vendor, line item description, quantity, unit, unit price, and total. Documents that can't be parsed cleanly are flagged for manual review rather than producing silently wrong output.
The business case was straightforward: quotation parsing is high-frequency, low-skill, error-prone work. Automating it with a language model that can handle format variation doesn't require fine-tuning or a complex pipeline. It requires a well-specified prompt and a schema. The parser handles the routine cases; the exception handling ensures the errors surface where they can be caught.
What this proved: The right AI application is one where the alternative is a human doing something tedious and error-prone. Parsing is exactly that kind of problem.