TorchServe is a flexible and easy to use tool for serving PyTorch models.
Serving Quick Start – Basic server usage tutorial
Model Archive Quick Start – Tutorial that shows you how to package a model archive file.
Installation – Installation procedures
Serving Models – Explains how to use TorchServe
REST API – Specification on the API endpoint for TorchServe
gRPC API – TorchServe supports gRPC APIs for both inference and management calls
Packaging Model Archive – Explains how to package model archive file, use model-archiver.
Inference API – How to check for the health of a deployed model and get inferences
Management API – How to manage and scale models
Logging – How to configure logging
Metrics – How to configure metrics
Prometheus and Grafana metrics – How to configure metrics API with Prometheus formatted metrics in a Grafana dashboard
Captum Explanations – Built in support for Captum explanations for both text and images
Batch inference with TorchServe – How to create and serve a model with batch inference in TorchServe
Workflows – How to create workflows to compose Pytorch models and Python functions in sequential and parallel pipelines