Cleanlab Saves 90% on GPU Costs with Inferless Serverless Inference
Cleanlab helps enterprises clean data and labels by automatically detecting issues in a ML dataset. It is trusted by hundreds of top organizations like Amazon, Google, Databricks. They recently launched the Trustworthy Language Model (TLM), which adds a trustworthiness score to every LLM response. TLM is designed for high-quality outputs and enhanced reliability, crucial for enterprise applications where unchecked hallucinations are unacceptable.
TLM is a latency-sensitive application with unpredictable, spikey workloads. Cleanlab needed a deployment tool that could keep fixed costs low and scale efficiently during higher loads. That’s when they switched to Inferless from traditional GPU deployment methods.
By Switching to Inferless, Cleanlab was able to save almost 90% in their GPU cloud bills and was able to go live in a day’s time. You can also watch the detailed case study interview here:
Let’s deep dive.
The Challenge:
Before Inferless, Cleanlab faced several challenges with traditional cloud GPU providers:
1. High Latency During Peak Loads: CleanLab experienced high cold starts during peak loads and were stuck in queues, leading to poor user experience.
2. Inefficient Cost Management: Maintaining a large GPU cluster for peak loads was expensive, especially for a new product.
3. Complex Environment Management: Managing separate production, non-production, and development environments was financially and technically challenging.
The Solution:
Cleanlab adopted Inferless and saw immediate benefits:
1. Dynamic Scaling: Inferless allowed Cleanlab to scale models dynamically, reducing fixed costs significantly. They could scale down to almost zero when not in use, thanks to Inferless’s efficient cold boot times.
2. Reduced Cold start Times: Quick spin-up of replicas ensured Cleanlab could handle high loads without latency issues.
3. Quick Setup: Cleanlab integrated Inferless into their staging environment in about four hours and moved it to production after a day of testing.
Environment Separation: Inferless enabled Cleanlab to maintain separate environments for production, non-production, and development at no additional cost, simplifying operational management.
The Impact:
Switching to Inferless had a profound impact on Cleanlab’s operations:
1. Cost Savings: Cleanlab’s GPU spend decreased by approximately 90%, allowing them to allocate resources more effectively and invest in other development areas.
2. Reduced Engineering Effort: The ease of setup and the ability to manage multiple environments without additional costs reduced the engineering effort immensely.CleanLab integrated Inferless into their staging environment in about four hours and moved it to production after a day of testing.