About Tutorial

Tutorial 1

Title: Fine-Tuning Large Language Models for Private Document Retrieval

Tutorial Time: Tuesday 11th June 14:00 (Suryan)
Frank Sommers
  • Docusure, Inc. USA
  • Email: frank.sommers@docusure.ai
Alisa Kongthon
  • King Mongkut’s University of Technology Thonburi, Thailand
  • alisa.kon@kmutt.ac.th
Sarawoot Kongyoung
  • National Electronics and Computer Technology Center, Thailand
Short Description:

Large Multi-Modal Language Models (LLMs) are trained on vast amounts of publicly available data. Yet, most of the world's data is not public: Health care records, financial documents, court records, or government civil registry records contain sensitive, personal data that cannot be shared publicly as training input for LLMs. This tutorial reviews the latest research and practical experiences in working with sensitive, private data in the context of LLM-assisted multimedia document retrieval. Key topics include fine-tuning LLMs with private data using Differential Privacy, and deploying privately fine-tuned LLMs in a secure manner for document classification, named entity-recognition, and question answering on multimedia document repositories containing sensitive, private data.

Figure 1: Collaborative labeling of documents containing sensitive, private data, using Docugym. The tool enforces organizational permissions for private data access. The labeled data is stored as private document metadata in the document repository, and is then used for differentially private fine-tuning of LLMs and multi-modal document models.