文章预览
项目简介 使用 Answer.AI 的 Byaldi 、OpenAI gpt-4o 和 Langchain 的结构化输出 从非结构化文档中提取结构化数据。 安装 pyenv virtualenv 3.10.6 docai pyenv activate docai poetry install 环境变量 确保您在环境变量中设置了 OPENAI_API_KEY 和 HF_TOKEN。 export OPENAI_API_KEY= export HF_TOKEN= 使用示例 从 pdfs/ 文件夹构建索引: python scripts/build_index.py --folder "pdfs/" --index_name "application" 样本输出 What losses have occurred in the past 5 years? LossHistory( losses=[ Loss(loss_date='2/20/21', loss_amount=7003.0, loss_description='Claimant was in his sleeper when his truck got hit by insured driver on the left', date_of_claim='4/19/21'), Loss(loss_date='2/4/21', loss_amount=92584.0, loss_description='The IV was attempting to merge on the highway when the IV lost control and struck ', date_of_claim=' 4 / 30 / 21 '), Loss(loss_date=' 9 / 14 / 21 ', loss_amount=5583.0, loss
………………………………