How to Build Secure Local AI for Insurance

Article by Oleksandr Korobov, CTO

At K2G, we develop machine learning and AI systems specifically designed for the insurance industry — a sector that operates under some of the strictest privacy, compliance, and data governance standards. Our work focuses on structured datasets like insurance policies and claims. These records often contain personally identifiable information, financial data, and business-sensitive patterns, which means uploading them to external services or cloud-hosted APIs is often not an option.

To address this challenge, we’ve embraced a solution that allows us to deliver powerful AI functionality while keeping everything — models, data, and processing — entirely local. That solution is the Lenovo ThinkStation PGX, powered by NVIDIA’s Grace-Blackwell GB10 Superchip.

Why We Chose the PGX

The PGX is a compact desktop device, but its architecture is closer to a small AI server. It features a 20-core Grace ARM CPU and a Blackwell GPU, connected to a shared 128 GB pool of unified memory. This unified architecture is what makes the difference: we’re able to run large-scale LLMs like GPT-OSS 120B entirely on-device without the need for external VRAM or sharded memory strategies.

With these capabilities, we’ve integrated LLMs into a wide range of insurance-specific workflows:

Code generation via local autonomous agents
Normalization of free-text vehicle descriptions into structured data (model, year, engine type, estimated value)
Risk scoring and pricing factor estimation for auto insurance products
Secure data enrichment without involving external services

And importantly, all of this runs within our infrastructure, with no data ever leaving the machine.

Extending the Setup for Larger Models

For even more advanced workloads, we can link two PGX units together via NVIDIA’s high-speed interconnect, effectively doubling memory and compute. This setup allows us to work with models up to 400 billion parameters — well beyond what’s possible on standard GPU workstations. For us, that means we can fine-tune a specialized insurance model and pair it with a large general-purpose LLM in the same pipeline, all while staying within a secure perimeter.

More Than Inference — A Complete Development Platform

While we use the PGX for production-grade inference and agent execution, it also serves as an excellent research and development workstation. Out of the box, it ships with a preconfigured Linux environment (DGX OS), complete with NVIDIA’s CUDA stack and AI tools. We run JupyterLab and Visual Studio Code directly on the device, which allows our team to develop, test, and deploy code in one place.

The PGX also supports NVIDIA’s new NIM API, which makes it easier to prototype LLM-based services and integrate them with other components. Combined with the DGX Dashboard for system monitoring and resource control, it forms a practical, user-friendly foundation for building and maintaining AI services in-house.

This environment enables us to work efficiently across the full lifecycle: from prompt design and data preprocessing, to fine-tuning, to real-time deployment. Having everything localized means shorter feedback loops and greater confidence in data handling.

Why Local Execution Is a Big Deal in Insurance

Running LLMs and data agents locally changes the game for insurers. Many companies in this space have firm policies that prohibit cloud-based data processing or restrict it to very specific platforms. With the PGX, we can meet those constraints head-on.

By keeping models and agents physically co-located with the data — whether that’s on a single machine or across a secure internal network — we can offer a robust AI solution that satisfies even the most stringent privacy policies. There’s no data transfer outside the organization, no exposure to third-party caching or telemetry, and no dependency on commercial LLM APIs.
This lets us build AI systems that are not only performant and flexible, but also trusted by legal and compliance teams — which is essential in our industry.

What This Enables

For K2G, this setup has opened up several new possibilities:

Building and testing LLM-based agents that operate directly on claims and contract records
Running private inference workflows for vehicle and pricing models
Developing and fine-tuning insurance-specific LLMs without relying on any external compute
Deploying solutions on-site, in customer infrastructure, with no reliance on public cloud

The PGX helps us bridge the gap between cutting-edge AI capability and real-world insurance constraints. It’s become a reliable part of both our internal R&D process and our production deployment strategy.

By admin-k2g

How to Build Secure Local AI for Insurance