4.9 KiB
CloudTAK DevOps
Operational Guide for a deployed CloudTAK Instance
Update: 2024-04-21 by @ingalls
This document is designed as a recipe book for common operational issues that an ops group may run into and how to triage and fix them.
Restarting CloudTAK
CloudTAK run on AWS ECS which provides compute, scaling, healthchecks, and runtime to the CloudTAK Server application.
Restarting CloudTAK can be done by following these steps:
- Signin to the AWS Console via the AWS SSO signin page provided by your AWS Administrator
- Select the AWS Partition (Commercial or GovCloud) and Account for the stack you wish to restart
Tip
COTAK & WFTAK production services are hosted in US-GovCloud, be sure to login to the GovCloud SSO App
- Ensure you are in the correct region by selecting the desired region in the upper right-hand corner next to your username
Tip
CloudTAK's default region is typically
us-east-1if in a Commercial account orus-gov-east-1if in GovCloud
-
In the Search Bar at the top, search
Elastic Container Serviceand click the result labelledRun and Manage Docker Containers -
The default view will be the
ECS Clusterslist. If you do not see any clustered listed - IE a message withNo clustersyou are likely in the wrong region, return to step 3. Clusters represent a group of server(s) that power one or more ECS services that have been deployed onto them. A DFPC TAK deployment will have 1 cluster per environment. IE: 1 for COTAK, 1 for WFTAK, etc. Click on the cluster relevant to the environment you with to restart.
Tip
CloudTAK Production for the COTAK Environment would have a cluster named
coe-ecs-prod
- Clicking on the Cluster will bring you to the Service Page. Services are "applications" that can contain one or more "Tasks" (A single emulated "server")
For CloudTAK, click on the service named:
tak-cloudtak-<environment>-Service
Tip
CloudTAK Production for the COTAK Environment would have a cluster named
tak-cloudtak-prod-Service
-
Clicking on the Cluster will bring you to the Service Health Page. This page will list stats about CPU & Memory usage as well as information about the last deployment to the service and the health of specific ports on the Server. CloudTAK as of this writing only uses a single Load Balancer Port on 443 for all API & UI operations.
-
Task can be updated one of two ways
- The recommended way to restart the server is to force a new deployment. This will tell ECS to attempt to bring a new CloudTAK instanec online BEFORE terminating the problematic task. This will not result in downtime.
- The second way is to terminate the CloudTAK Task.
8.1: Creating a new Deployment
Note
Creating a new deployment is the recommended way to force a CloudTAK Restart
- From the Health and Metrics tab, click the
Update Servicebutton in the upper right-hand corner. This will take you to theDeployment configurationpage. - Under deployment Configuration set the following and leave all other settings unchanged.
Force new deployment = True/CheckedDesired Tasks = 1Click theUpdate Servicebutton at the bottom of the page
- The button will take you to the
Deploymentstab of the ECS Service you updated. Continue to refresh this page while the service deploys. The deployment status can be found underDeployment Status. A deployment usually takes 5-10 minutes to roll over the task. Information about the task being terminated and about the new task being started can be seen at the bottom of the page underEvents - If the Service Deployment does not succeed (Multiple tasks being lauched but failing to stabilize under the
Eventsheading) the service is unlikely to stabilize on it's own and additional engineering work will need to be done to determine root cause. As the service is unlikely to stabilize you can temporarily shut the CloudTAK ECS Service down byfollowing the 8.1 steps again except settingDesired Tasks = 0. This will tell ECS that you dont want any running tasks.
Warning
Deployments should almost always succeed with the first task launch. If a deployment fails to launch a task successfully contact Application Specific Engineering immediately.
8.2: Terminate CloudTAK Task
Caution
Following #2 and terminating the CloudTAK task WILL RESULT IN DOWNTIME. The ECS Service will detect that the minimum task capacity of 1 task is not met and will create a new task but this process can lead to up to 5 minutes of downtime while the service launches the new instance
- From the Health and Metrics tab, click the
Taskstab. - Select the
Stopdropdown in the upper right-hand corner - Select
Stop All. - Tasks should shutdown and will be replaced by the number of tasks set in the ECS Service
Desired Taskssetting (Usually 1).