Environment Management

Introduction

The purpose of the architecture is to achieve unlimited scalability for simulations.

The meta-model describes and governs the architecture through a backend application.

We can distinguish two types of components in the architecture:

Passive Components
Active Components

The passive components encapsulate data or are virtual entities that help the active components. For illustration, we can mention a couple: topic, modeled_building.

The active components are the ones that have behavior. That's why they need to be able to run somewhere.

We call computational assets any technological artifact that can run such components. As examples of computational assets, we can mention:

virtual machines
Pods within a Kubernetes cluster. Pods are the smallest execution unit in Kubernetes.
Serverless functions like an Azure Function

The concept of computational assets is generic, following the design principle of being loosely coupled with any specific cloud vendor or infrastructure. The meta-model can work with any type of computational asset, encapsulating the dependencies on any specific technology in a plugin that configuration can add.

Following that design principle, we minimize the attributes of the Computational Asset entity in the meta-model using a properties bag pattern. The properties bag pattern allows you to deal with properties not known at design time, storing the properties in a (<key>,<value>) dictionary.

Currently, we are working on Azure, and we are using mainly two different types of computational assets:

Azure Virtual Machines
Docker Containers deployed in Kubernetes (using the Azure Kubernetes Service) A container deployed in Kubernetes run on a pod.

During the development phase, using virtual machines gives more flexibility allowing the installation of development tools and other instrumentation on the machine.

During the production phase deploying the components in containers decreases the administrative burden and the hardware footprint.

The active components needed to simulate a digital city are:

IPC Server

It is the component that monitors all the other components and implements a command interface for the other components.
Brokers ( Produced Broker and Published Broker)

The brokers implement the communications between components. The communications are organized through topics (that are indeed passive components).
Proxy

It is the component that mediates between experiments and between experiments and clients (e.g., the Unity Application). It is the key to achieving horizontal scalability.
Experiment

The SUMO experiment that runs the simulation.
*** GAME or Unity Application***

It is the simulation's primary client. There can be multiple concurrent instances of the GAME using data from multiple experiments.

Managing computational assets involves two main areas:

Provisioning
Starting/allocating & Stopping/deallocating

Provisioning

The provision creates the computational asset, such as an Azure virtual machine or Kubernetes Pod.

In both cases, the departure point is an image that contains the binaries programs to execute.

In both cases, the image is one part of the recipe. Other parts are the configuration items needed to create the specific instance of that image. Configuration Items are elements that determine the behavior of a generic code, such as environment variables and firewall rules.

The base images are stored in specialized stores; container images are stored in a container registry, while virtual machine images are stored in compute galleries, but they are the same in essence.

In the meta-model, the information needed to create an asset from an image is stored in two entities:

Template
TemplateProperties

The TemplateProperties entity stores the information needed to create all the configuration items during the provisioning process.

Another entity related to the provisioning process is the landing zone.

The landing zone is the minimum environment to create and run the computational asset.

This model is simple and does not try to create a whole language for describing any infrastructure in an Infrastructure as code manner; trying to do so would introduce unnecessary complexities.

A landing zone for a virtual machine is composed by:

ResourceGroupName
DeploymentRegion
VirtualNetwork / SubNetwork

A landing zone for a container is composed of

ResourceGroupName
DeploymentRegion
VirtualNetwok / Subnetwork
AKS Cluster (Azure Kubernetes Service)
Node Pool

Let's drill down into the provisioning process. We use the case of creating a virtual machine that runs a sumo experiment.

Provisioning Process

Create a Virtual Machine Image
- Create a Virtual Machine with your standard procedure, like using the portal, and install all the required programs and tools.
- Create the Internal Firewall Rules.
- Turn on the AutoLogon in Windows ( Configure Windows to automate logon - Windows Server | Microsoft Learn )
- Set the startup script for autorun ( Run and RunOnce Registry Keys - Win32 apps | Microsoft Learn )
Remove machine-specific information

Remove machine-specific information by de-provisioning or generalizing a VM before creating an image ( Deprovision or generalize a VM before creating an image - Azure Virtual Machines | Microsoft Learn )

To do this on Windows, run the command
WindowsLinux
%WINDIR%\system32\sysprep\sysprep.exe /generalize /shutdown /oobe /mode:vm
# TBD
It is important to use de /mode:VM to accelerate the first boot, avoiding searching for drivers.

Generalizing the image avoids conflicts during provisioning.
Create an image

Create an image of your VM in the portal (https://learn.microsoft.com/en-us/azure/virtual-machines/capture-image-portal )

It is essential in the replication section of this procedure to set up what locations should have replicas of the image. You need the image replicated where you will provision a new virtual machine.
Register the VM Image

Register the VM Image in the model as a template.

By using the web interface, the user can create a template entry in the meta-model specifying:
- Name
- ImageID The ImageId is obtained in the portal and has the following format:
  
  /subscriptions/{subscriptionId}/resourceGroups/{rgName}/providers/Microsoft.Compute/galleries/{galleryName}/images/{ImageName}"
- Description
- Version
- ComputationalAssetType
- ImageName
Configuration items

The user can use the web interface to add the configuration items as template properties.

For example:
- PropertyName: "OpenPort", PropertyValue: "443"
- PropertyName: "OpenPort", PropertyValue: "1883"
- PropertyName: "EnvironmentVariable", PropertyValue {"variableName": "Experiment_id", "variableValue": "getValue(Experiment,Id)" }
This example shows that the value assigned to the environment variable is calculated during the provisioning process depending on other parameters of the provisioning operation; this is further explained in the provisioning operation step.
Create a LandingZone

Using the web interface, the user creates a LandingZone specifying:
- Name
- SubscriptionId
- ResourceGroupName
- VirtualNetworkName
- SubNetName
Invoke operation DeployComputationalAssetLandingZone

The user invokes the operation DeployComputationalAssetLandingZone using the web interface.

The argument is the LandingZone previously created in the meta-model.

The operation is implemented in a microservice decoupled from the API to avoid time-outs in the web interface.

The operation calls the Azure Management Rest API to create the Resource Group, the Virtual Network, and SubNetwork.

The Azure Management Rest API was selected because it is mature and stable.

Some programming frameworks are mounted on top of the API, but some were already deprecated while the management rest API continues evolving.

The operations of API specify the API version, which guarantees compatibility in the future.

The operations used are idempotent; in other words, if a method is invoked twice, there is no error, and the result is the same as invoked once.

The DeployComputationalAssetLandingZone operation is also idempotent.

Invoke operation DeployVMFromTemplate

Through the web interface, the user invokes the operation DeployVMFromTemplate

The user can deploy a component already existing in the meta-model on the landing zone using the previously created image.

The parameters are:

LandingZone
AssetName // The name of the virtual machine.
ComponentType: (IPCServer, ProducedBroker, PublishedBroker, Experiment)
ComponentId !c# // Through the web interface, the component is selected, but the operation receives the component type and the Id.
Template !c# // The previously created and registered in the meta-model

The following pseudocode describes the operation logic.

Pseudo code

using System;
Class EnvVar 
{
    string name { get; set;}
    string value { get; set;}
} 

DeployVMFromTemplate
{
List<string> openPorts = new List<string>;
List<EnvVar> environmentVariables = new List<EnvVar>;
foreach(property p in Template.Properties)
{
    If (p.Name==”OpenPort”) { 
            openPorts.Add(p.Value);
    }
    else if(p.Name==” EnvironmentVariable”)
    {
        string  _name = p.Value.variableName;
        string  _aux    = p.Value.variableValue;
        string  _val    = EvaluateParamaterValue(_aux,ComponentType,ComponentId);
        environmentVariables.Add(new EnvVar(_name,_val) )
    }
}

//Invoke DeployComputationalAssetLandingZone since is Idempotent
var DeployComputationalAssetLandingZone(LandingZone.Id);
// Get some atributes needed
var  location = LandingZone.location;
var subscriptionId= LandingZone.SubscriptionId;
var resourceGroup = LandingZone.ResourceGroup;
var virtualNetwork = Landing.VirtualNetworkName;
var subNet = LandingZone.SubNet;
//Get Image Id for Template
var imageId = Template.ImageId;
//Create Public IP
var publicIP = CreateNewPublicIP( subscriptionId, resourceGroup, location,AssetName);
//Create Network Security Group
var nsg = CreateNetwokSecurityGroup( subscriptionId, resourceGroup, location,AssetName);
//Create SecurityRule (We include all open Ports in one Rule)
var securityRule= CreateNetworkSecurityRule( subscriptionId,resourceGroup, location, AssetName, nsg, openPorts );
// Create Network Interface
var networkInterface =
    CreateNetworkInterface(subscriptionId,resourceGroup,location,Assetname,nsg);
// Create Virtual Machine
var virtualMachine=
CreateVirtualMachine(SubscriptionId,resourceGroup,location,AssetName,netwokInterface,environmetVariables);
// Create the entry of the Computational Asset in the meta-model
var ca= ComputationalAssets.Add(AzureVirtualMachine,virtualMachine); 
// Populates the ComputationalAssetProperties entity
// 
PopulateAssetProperties( ca, VirtualMachine);
// bind component with the newly deployed Computational asset
var  dc= DeployedComponents.Add(ComponentType,ComponentId,ca.Id);
}

The environment variables list is encoded and passed as custom data to the Azure method that creates the virtual machine.

Custom data is placed in %SYSTEMDRIVE%\AzureData\CustomData.bin as a binary file.

The startup script processes the binary file, creates the environment variables, and starts all the applications.

Info

For more information see: Custom data and cloud-init on Azure Virtual Machines

Starting/allocating & Stopping/deallocating

A digital city is an entity that aggregates all the components of a simulation.

The meta-model tracks the relationship between a digital city and a computational asset through the ComputationalAssetUsedByCity entity.

A computational asset may need another computational asset at run-time. That relationship is tracked by the ComputationalAssetDependency entity.

This dependency relationship constrains the order in which the computational assets should start. The following algorithm is used to start the assets in the right order.

The transitive closure of the ComputationalAssetDependency relationship is calculated. Transitive closure - Wikipedia

If a pair of computational assets a->b belongs to the relationship, asset a needs asset b to be running to start. We say that a is the DependantAsset and b is the RequiredAsset.

if a relation has the following pairs { a->b , b->c , b->d } the transitive closure is {a->b, b->c, b->d, a->c, a->d}

That is to add all the indirect relationships.

We can add the distance, considering that each computational asset is related to itself with a distance 0 and all the pairs of the original relationship with a distance of 1. If we extend the transitive closure in our example with the distance, we get TC= {a->a 0, b->b 0, c->c 0, a->b 1, b->c 1, b->d 1, a->c 2, a->d 2}

We filter the transitive closure with distance taking only de assets needed for the city c (simulation) or required by another asset that is used in the simulation.

TC’ = { (dependantAsset, requiredAsset, distance) } / dependantAsset belongs ComputationalAssetUsedByCity(c) or requiredAsset belongs ComputationalAssetUsedByCity(c)

We get an ordered list of assets

L = select requiredAsset, Priority= max(distance) from TC’
    group by requiredAsset 
    ordered by max(distance) descending

foreach(asset a in L)
{
    startAsset(a)
}

To stop/deallocate all the assets a city uses, the algorithm is the same, except in step ( c ), where we use the opposite order (ascending).

L = select requiredAsset, Priority= max(distance) from TC’
       group by requiredAsset
       ordered by max(distance) descending

Since a computational asset's start or stop operations take some time, the backend API queues them to be executed by a microservice. That strategy avoids time-outs in the client application.