| At K2G, we develop machine learning and AI systems specifically designed for the insurance industry — a sector that operates under some of the strictest privacy, compliance, and data governance standards. Our work focuses on structured datasets like insurance policies and claims. These records often contain personally identifiable information, financial data, and business-sensitive patterns, which means uploading them to external services or cloud-hosted APIs is often not an option. |
| To address this challenge, we’ve embraced a solution that allows us to deliver powerful AI functionality while keeping everything — models, data, and processing — entirely local. That solution is the Lenovo ThinkStation PGX, powered by NVIDIA’s Grace-Blackwell GB10 Superchip. |
| Why We Chose the PGX |
| The PGX is a compact desktop device, but its architecture is closer to a small AI server. It features a 20-core Grace ARM CPU and a Blackwell GPU, connected to a shared 128 GB pool of unified memory. This unified architecture is what makes the difference: we’re able to run large-scale LLMs like GPT-OSS 120B entirely on-device without the need for external VRAM or sharded memory strategies. |
With these capabilities, we’ve integrated LLMs into a wide range of insurance-specific workflows:
- Code generation via local autonomous agents
- Normalization of free-text vehicle descriptions into structured data (model, year, engine type, estimated value)
- Risk scoring and pricing factor estimation for auto insurance products
- Secure data enrichment without involving external services
And importantly, all of this runs within our infrastructure, with no data ever leaving the machine. |
Extending the Setup for Larger Models
For even more advanced workloads, we can link two PGX units together via NVIDIA’s high-speed interconnect, effectively doubling memory and compute. This setup allows us to work with models up to 400 billion parameters — well beyond what’s possible on standard GPU workstations. For us, that means we can fine-tune a specialized insurance model and pair it with a large general-purpose LLM in the same pipeline, all while staying within a secure perimeter. |
| More Than Inference — A Complete Development Platform |
| While we use the PGX for production-grade inference and agent execution, it also serves as an excellent research and development workstation. Out of the box, it ships with a preconfigured Linux environment (DGX OS), complete with NVIDIA’s CUDA stack and AI tools. We run JupyterLab and Visual Studio Code directly on the device, which allows our team to develop, test, and deploy code in one place. |
| The PGX also supports NVIDIA’s new NIM API, which makes it easier to prototype LLM-based services and integrate them with other components. Combined with the DGX Dashboard for system monitoring and resource control, it forms a practical, user-friendly foundation for building and maintaining AI services in-house. |
| This environment enables us to work efficiently across the full lifecycle: from prompt design and data preprocessing, to fine-tuning, to real-time deployment. Having everything localized means shorter feedback loops and greater confidence in data handling. |