What are the pricing options available?

Inferless offers flexible pricing plans tailored to your needs. Please visit our pricing page for detailed information.

Can I upgrade or downgrade my plan?

Yes, you can easily upgrade or downgrade your plan at any time. Contact our support team for assistance.

Is there a free trial available?

Yes, we offer a free trial period for you to experience the capabilities of Inferless. Sign up on our website to get started.

What payment methods do you accept?

We accept major credit cards including Visa, Mastercard, and American Express.

Serverless GPU Pricing - Pay per second, for exactly what you use

Get started with 10 hours of free credit, no credit card required.

Starter
$0.000555/sec

Cost:
Designed for small teams and independent developers looking to deploy their models in minutes without worrying about the cost.

Get in Touch

Enterprise
Discounted Price

Built for fast-growing startups and larger organizations looking to scale quickly at an affordable cost with desired latency results

Get in Touch

Pay for exactly what you use

Kickstart your compute journey with $30 free credit

GPU pricing

Nvidia T4

GPU RAM: 16GB

vCPUs: 3x

RAM: 20GB

$0.000185/sec

$0.66/hr

Nvidia A10

GPU RAM: 24GB

vCPUs: 7x

RAM: 30GB

$0.000341/sec

$1.22/hr

Nvidia A100

GPU RAM: 80GB

vCPUs: 20x

RAM: 200GB

$0.001491/sec

$5.36/hr

Volume pricing

Free 50 Gb every month and after that whatever you consume will be charged for storage in NFS volumes

Get free

50GB/month

Any extra storage costs

0.3$/GB/month

Hardware	Type	Pricing per second	Pricing per hour	GPUs	vCPUs	GPU RAM	RAM
Nvidia A100	Shared	$0.000745/sec	$2.68/hr	1X	10X	40GB	100GB
Nvidia A100	Dedicated	$0.001491/sec	$5.36/hr	1X	20X	80GB	200GB
Nvidia A10	Shared	$0.000170/sec	$0.61/hr	1X	3X	12GB	15GB
Nvidia A10	Dedicated	$0.000341/sec	$1.22/hr	1X	7X	24GB	30GB
Nvidia T4	Shared	$0.000092/sec	$0.33/hr	1X	1.5X	8GB	10GB
Nvidia T4	Dedicated	$0.000185/sec	$0.66/hr	1X	3X	16GB	20GB

Built for fast-growing startups

Min 10,000 Inference Requests per month

Unlimited deployed webhook endpoints

GPU concurrency of 5

15 day of log retention

Support via private Slack connect within 48 working hours

Include Credits : $30

Built for Enterprises

Min 100,000 Inference Requests per month

Unlimited deployed webhook endpoints

GPU concurrency of 50

365 day of log retention

Support via private Slack connect & support engineer

Include Credits : Custom

FAQs

How does your billing work?

With Inferless, you only pay for the compute resources used to run your models. Our pricing is based on two factors:
‍
• Duration - You are charged for the total number of seconds your models are running in a healthy state, rounded up to the nearest second. The duration is calculated from when your model starts loading until it finishes processing a request.
• Machine type - We offer different machine types like A100, A10 and T4 GPUs. The price per second varies based on the machine type you choose for your model. More powerful machines cost more per second.

For example, let’s say you have kept autoscaling on for 2 machines as maximum concurrency and used dedicated A100 80GB as machine type.

It runs on 1 machine for 14,400 sec (4 hrs) and 2 machines for 10,800 sec (3 hrs).
With A100 costing $0.0014/sec, total usage is 25,200 sec.
Your monthly bill is 25,200 * $0.0014 = $35.28.

What kind of applications can I deploy using Inferless?

You can deploy any machine learning model that runs on GPU workloads. For example, any compute-intensive deep learning workloads across various applications from computer vision, NLP, recommendations and scientific computing. Some of the most popular models that our users have deployed with us are Llama2 13Bn, Stable diffusion control-net,Vicuna 7B etc

What is the difference between Shared and Dedicated Instance?

The distinction between shared and dedicated instances revolves around resource allocation and performance. Shared instances allocate GPU resources among several users, offering variable performance at a cost-effective rate, making them suitable for smaller or infrequent tasks. In contrast, dedicated instances grant users exclusive access to an entire GPU, delivering consistent high performance but at a higher cost. This setup is optimal for large-scale tasks or when data isolation is a priority. Your choice should hinge on workload demands, desired performance, and budget.

Do you offer discounts for startups?

We provide a $30 free credit to help you kickstart. Since, we are currently in private beta, your use-case & stage needs to match with our criteria. You can read more about it here.

How secure is Inferless?

Customer data and privacy is our top priority. Inferless execution environments are completely isolated from each other using Docker containerization. This prevents any interaction between individual customer environments.
All log streams are separated securely using AWS CloudWatch Logs access controls. Logs are retained only for 30 days and then deleted as per our strict data retention policies.
The storage used for model hosting is encrypted using AES-256 encryption. Models and data are not shared across customers.

If I don't run any inference, will I still be charged?

With Inferless, you only pay for what you use. When minimum replicas is set to zero, no machines are spun up. So you are not charged when there are no inference requests.

What GPUs are available?

We run our workload on Nvidia on A100, A10, and T4 GPUs, so that you can have a blazing-fast inference.

How does your pricing work?

Pay for what you use: Per-second billing. No upfront costs. We typically predict upto 80% cost savings.

What is the latency?

For your first-time calls, you may have a cold start of 10-20s, but successive calls will depend only on inference time.

Does it support large custom models?

Yes we support for model size of upto 16GB, for larger size models feel free to speak with us and we will help you out

What GPUs are available?

We run our workload on Nvidia on A100, A10, and T4 GPUs, so that you can have a blazing-fast inference.

Is my data secure ?

Yes, Your models are deployed in a completely isolated environment, for storage the models are encrypted at rest.

Can I change/cancel my plan anytime?

Yes, You can set up an upper limit for each project/workspace. You can disable the model anytime to stop billing.

Serverless GPU Pricing - Pay per second, for exactly what you use

Starter $0.000555/sec

Enterprise Discounted Price

Pay for exactly what you use

Built for fast-growing startups

Built for Enterprises

FAQs

How does your pricing work?

What is the latency?

Does it support large custom models?

What GPUs are available?

Is my data secure ?

Can I change/cancel my plan anytime?

Cookies

Starter
$0.000555/sec

Enterprise
Discounted Price