Here at Quisitive, we are getting increasingly excited about the additional features coming to v2 of Azure Machine Learning (currently in public preview). One of the most game-changing features allows us to deploy ML models to infrastructure for real-time inference that is managed by Azure ML itself, without the need to maintain and manage an Azure Kubernetes Service (AKS) Cluster. Any workspace contributor can build these endpoints using v2 of the Azure ML Command Line Interface (CLI).
Architecture of an Endpoint
There are two key entities that users should be aware of within a real-time endpoint:
- The Endpoint – there is precisely one endpoint, which is the first entity to define. It consists of the endpoint URI and the expected swagger schema.
- The Deployments – there can be many deployments under a single endpoint, each corresponding to a different version of the model. Traffic can be varied between the deployments over time to accomplish blue-green or canary deployment strategies.
Important note: the virtual machine sizes are defined at the deployment level. This means that you need at least one virtual compute node per deployment within the endpoint.
Azure ML v2.0 is available only through Command Line Interface (CLI). Therefore, the following steps consist of bash commands to be executed in a Linux CLI. Configuring a new virtual environment beforehand is recommended. We recommend installing the latest version of the Azure ML SDK before setting up your first endpoints – you should look to have at least v1.37.0 installed:
pip install --upgrade azureml-sdk
Next, you need to install the v2 Azure ML CLI extension for the Azure CLI. This involves installing or updating the existing CLI, and then installing the new ML extension.
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash az extension add -n ml -y
Finally, log in to the CLI:
Creating the Endpoint
The endpoint creation process is configuration-driven. This means the first step is to create an endpoint.yaml configuration file to describe the endpoint. An example configuration file is shown below:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json name: cattestendpoint auth_mode: key description: 'Test endpoint for managed endpoints public preview' tags: purpose: publicpreviewtest creator: quisitive
Once you have created this file, you can deploy an endpoint into your workspace through the command line as shown below:
az ml online-endpoint create -f endpoint.yaml -w workspace_name
This will create the URI, which will be visible in the “Endpoints” tab of your Azure ML workspace. In this example, key-based authentication means that a permanent API key will be created, which must be passed in the header for authentication.
Creating a Deployment
Now that an endpoint has been created, you can deploy a model within it. You can deploy many versions of the same model behind an endpoint, each of which will be a separate deployment. You can then vary the traffic that passes to each deployment over time.
To create a deployment, you need to predefine the following:
- A registered model in Azure ML
- A registered environment in Azure ML, containing the required python packages
- A score.py file, in line with the usual Azure ML format for real-time endpoints
As with endpoints, deployments are configuration-driven, so you will need a deployment.yaml file for each deployment. For example:
name: catblue endpoint_name: cattestendpoint description: 'Blue deployment for online endpoint example' app_insights_enabled: true model: azureml:test_model:1 code_configuration: code: local_path: . scoring_script: score.py environment: azureml:test_environment:1 instance_type: Standard_F2s_v2 instance_count: 2 request_settings: request_timeout_ms: 3000 max_concurrent_requests_per_instance: 1 liveness_probe: period: 10 initial_delay: 10 timeout: 2 success_threshold: 1 failure_threshold: 30 readiness_probe: period: 10 initial_delay: 10 timeout: 2 success_threshold: 1 failure_threshold: 30 environment_variables: test_variable: 'test' tags: purpose: publicpreviewblue creator: quisitive
Note: when the deployment is initially created, it will deploy with a fixed number of nodes, with VMs defined in the instance_type. We will cover how to enable autoscaling in a future post.
You can then create the deployment through the CLI:
az ml online-deployment create -f deployment.yaml -w workspace_name
When this initial deployment is completed, by default no traffic is routed to the endpoint. At this point, you may want to test the endpoint and deployment by passing it an example .json file:
az ml online-endpoint invoke --name cattestendpoint --deployment blue --request-file test.json
The endpoint is not yet live, so finally set the endpoint live by making 100% of the traffic pass through the deployment:
az ml online-endpoint update --name cattestendpoint --traffic "blue=100"
This update command can be used to alter the traffic if there are multiple deployments.