Hello, Terraform

At work, my team owns and maintains a large lab environment for the development and testing of Rubrik Build projects. It was built in a hurry, causing some of our original design principles to be compromised. My team and I have decided to use this no-travel period as an opportunity to redesign and redeploy our lab environment. I will review our design in a later post. 

One of our design goals is to leverage infrastructure as code principles (where possible). The team’s primary tool of choice for provisioning is Terraform

Terraform allows us to define what resources we need in a declarative manner, where we simply define the end state needed for our infrastructure. Here’s a few reasons why we like using Terraform:

  • Multi-platform, similar operations across a number of providers
  • Easy provisioning and deprovisioning of resources
  • Idempotent, saves current state as a file
  • Detects diffs from current state when applying changes

This post will dive into Terraform syntax, architecture, and operations.

Terraform Syntax

The low-level syntax of Terraform is defined in HashiCorp Configuration Language (HCL). The following example shows a generic configuration code block for Terraform:

command_type "provider_resource_label" "resource_label" {
  argument_name = "argument_value"
  argument_name = "argument_value"
}

Let’s dig into the syntax:

  • Command — the command type resource tells Terraform you want to create a resource, such as an S3 bucket or an EC2 instance.
  • Provider Resource Label — this is the type of resource you want to create. The resource name is specified by the provider. For example, you may use aws_instance to provision an EC2 instance using the AWS provider.
  • Resource Label — this what you want to colloquially label the resource within your Terraform configuration. This label should be unique within this configuration file as it is used later when referencing the resource.
  • Arguments — allows you to specify configuration details for the resource being provisioned. These are defined as an argument name and an argument value. As an example, when provisioning an EC2 instance, you may want to specify which AMI is used. 

Note that comments using # or // or even /* or */ are supported. 

To put these concepts together, an example configuration code block may resemble:

resource "aws_instance" "my-first-instance" {
  ami = "ami-008c6427c8facbe08"
  instance_type = "t2.micro"
  availability_zone = "us-west-2c"
  
  tags = {
    Name = "my-first-instance"
    Environment = "test"
  }
}

This example will provision a single EC2 instance in the US-West-2C availability zone, using the AMI specified, along with assigning the two tags. 

Most of your Terraform configuration is written in these code blocks. Once you master this, then you’ll be able to quickly write and provision more resources.

Terraform Architecture

A typical Terraform module may have the following structure:

project-terraform-files
│
└─── terraform-module-example01
│   │   main.tf
│   │   variables.tf
│   │   terraform.tfvars
│   │   outputs.tf
│   
└─── terraform-module-example02
│   │   provider.tf
│   │   data-sources.tf
│   │   main.tf
│   │   variables.tf
│   │   terraform.tfvars
│   │   outputs.tf

The names of the files are not important. Terraform will load all configuration files within the directory.

Providers

A provider is the core construct that allows Terraform to interact with the APIs across various platforms (PaaS, IaaS, SaaS). Think of this as the translator between the platform API and the HCL syntax. Before you can begin provisioning resources, you must first defined which platform by specifying the provider:

provider "aws" {
  region = "us-west-2"
}

Place the provider block in your main.tf file or create a separate provider.tf file.

Resources

I previously covered how to structure resource code blocks in the Terraform Syntax section. 

This example defines the creation of an instance based off the defined AMI, sized as t2.micro, and properly tagged:

resource "aws_instance" "my-first-instance" {
  ami = "ami-008c6427c8facbe08"
  instance_type = "t2.micro"
  availability_zone = "us-west-2c"
  
  tags = {
    Name = "my-first-instance"
    Environment = "test"
  }
}

Define the desired outcome for your resources in the main.tf file.

Data Sources

Data sources enable you to reference resources that already exist outside of Terraform or defined by a separate Terraform configuration. This allows you to extract information that can then be fed into a new resource. First, defined the data source and then reference this as an argument value:

data "aws_ami" "ubuntu" {
  most_recent = true
  owners = ["aws-marketplace"]

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-*"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

resource "aws_instance" "my-first-instance" {
  ami = "${data.aws_ami.ubuntu.id}"
  instance_type = "t2.micro"
  availability_zone = "us-west-2c"
  
  tags = {
    Name = "my-first-instance"
    Environment = "test"
  }
}

In this example, I am again creating a new EC2 instance. However, this time I am gathering AMI information using a data source to find and use the latest Ubuntu version instead of manually defining that AMI value. This allows my configuration to be more flexible because I no longer need to manually find and input the appropriate AMI value.

A data source is declared similarly to resources, except that the information provided is used by Terraform to discover existing resources rather than provision. Once defined, data sources can be referenced repeatedly to pass information to new resources. 

Place the data source blocks in your main.tf file or create a separate data-sources.tf file. 

Variables

To make your code more modular, you can choose to use variables instead of hard-coding values. Once defined, variables can be referenced:

provider "aws" {
  access_key = var.aws_access_key
  secret_key = var.aws_secret_key
  region = var.aws_region
}

I typically declare my essential variables in a separate variables.tf file. This may resemble:

variable "aws_access_key" {
  description = "AWS access key for authorization"
  type = "string"
}

variable "aws_secret_key" {
  description = "AWS secret key for authorization"
  type = "string"
}

variable "aws_region" {
  description = "AWS region in which resources will be provisioned"
  type = "string"
  default = "us-west-2"
}

In this example, I have declared a value for the AWS region to be reused when provisioning the infrastructure defined. The descriptions are optional, and for the developer’s benefit only, but I always recommend being kind to the next person using your code. The possible variable types are string (default type), list, and map. Variables can also be declared but left blank, setting their values through environment variables or a .tfvars file. 

Sometimes the variable definition may be specified as a default in the variables.tf file. Otherwise, this value should be defined by creating a file named terraform.tfvars, which allows variable values to persist across multiple executions. This is especially valuable for sensitive information such as secret keys. 

For example, the contents of the terraform.tfvars file may resemble the following variable definition:

aws_access_key = "ABC0101010101CBA"
aws_secret_key = "abc87654321zyxw"
aws_region = "us-west-2"

Terraform automatically loads all files in the current directory with the exact name of terraform.tfvars or any variation of *.auto.tfvars. If the file is named something else, you can use the -var-file flag to specify a file name.

However, keep in mind that these persistent variable definitions often contain sensitive information, such as passwords or API token, and should be treated with care. Consider adding this to your .gitignore file.

Outputs

Outputs can be used to display information needed or export information after Terraform completes a terraform apply command. An example output may resemble:

output "instance_id" {
  value = "${aws_instance.my-first-instance.id}"
  }

You can save the outputs files in a specific file called outputs.tf.

State

When you use Terraform to build resources, a state file gets created and contains configuration information for the resources provisioned. This is what allows Terraform to determine which parts of the configuration have changed, ultimately what provides idempotency because Terraform is able to determine the resource is present and does not create it again. 

After the terraform apply command is executed, the affiliated directory will contain two new files:

  • terraform.tfstate
  • terraform.tfstate.backup

Note: any manually changes made to Terraform provisioned infrastructure will be overwritten by terraform apply.

Modules

Terraform configuration files can be packaged as modules and used as building blocks to create new infrastructure resources without having to put forth much effort. Modules are available publicly in the Terraform registry, and can be directly added to configuration files for quickly provisioning resources.

If I were to use a pre-packaged module to provision an AWS S3 bucket, the code may resemble:

module "s3_bucket" { 
  source = "terraform-aws-modules/s3-bucket/aws" 
  bucket = "my-s3-bucket" 
  acl = "private" 
  versioning = { 
    enabled = true 
  } 
}

In this case, you are reusing the configurations specified by the module. All you need to input are the configuration values.

Terraform Operations

Terraform is managed through a simple CLI. Terraform is a single command-line application: terraform and you specify the action through a subcommand such as apply or plan

To view a list of the available commands at any time, just run terraform with no arguments.

In order to get started, you will need to run terraform init to initialize a number settings for Terraform that will create the required environment to proceed. It will also download the necessary plugins for the selected provider.

Before provisioning, you may want to generate an execution plan, or otherwise known as a dry-run of your changes. Generate by running terraform plan. Terraform outputs a delta, showing you which resources will be destroyed (marked with a -), which will be added (marked with a +), and which will be updated in-place (marked with a ~).

Once you have reviewed the execution plan and are ready to begin provisioning, run terraform apply to the changes to be executed. If at any point you need to remove the resources, simply use the command terraform destroy. If there are multiple resources in the module, you can specifically name which resource(s) to destroy. For example: 

terraform destroy - target=aws_instance.my-first-instance

In general, once you have defined the infrastructure in the .tf files, working with Terraform is pretty much just running terraform plan and terraform apply repeatedly (unless you use CI).

Summary

In this post, I described Terraform syntax, architecture, and common operations. Throughout the article I used the example of creating an AWS EC2 instance, however, these principles apply to all resources types across providers. I hope this helps you get started in your infrastructure as code journey. 

Happy Terraforming!

Introducing “Alta” – Rubrik CDM 4.0

Rubrik Cloud Data Management (CDM) 4.0 is Rubrik’s ninth and largest product release. The release, named Alta, expands the Rubrik platform to encompass all major hypervisors, adding Oracle support, furthering SQL support by introducing live mount functionality, and Cloud Instantiation. Additionally, Alta closes the gap with traditional backup architectures by introducing support for tape archival.

Screen Shot 2017-06-12 at 9.17.32 PM.png
Rubrik CDM 4.0 Features

A few release highlights…

Rubrik Atlas Infographic

  1. Manage and protect all major hypervisors. Support is added for Microsoft Hyper-V and Nutanix Acropolis hypervisor (AHV); this adds to the already supported VMware vSphere. Enterprise organizations are now able to orchestrate application data management and availability across multi-hypervisor and cloud infrastructures.
  2. Spin up applications in a public cloud using Cloud Instantiation. Any data that has been protected on-premises and sent to Amazon S3 can now be powered on as a fully functioning AMI. This functionality will be available for any VMware virtual machines that have been archived to Amazon S3. There is no requirement for a Rubrik Cloud Cluster to be running in the target Amazon region.
  3. Automate protection and recovery of Oracle databases. Database owners and administrators can leverage Rubrik’s high performing multi-stream backups to massively reduce any impact to production and existing workflows for database backups, replication, archival, and compliance.
  4. Live Mount capabilities for Microsoft SQL. Awesome innovation that delivers near-zero recovery times versus potentially hours or even days using other methods for restoring Microsoft SQL Server. With this feature, administrators can power on a SQL Server directly on Rubrik using any point in time. This delivers self-service access along with the powerful suite of APIs that can be used to automate workflows.
  5. Archive data to tape. This is the least sexy major feature but still an important one. Rubrik automates data archival to tape for enterprises who must meet governance specifications or other any type of compliance regulations.

What I’m excited about…

Nutanix AHV support – Nutanix Acropolis is a turn-key infrastructure platform, delivering enterprise-class storage compute, and native virtualization services capable of running nearly any application. In addition to supporting to supporting VMware vSphere and Microsoft Hyper-V, Acropolis includes its own built in hypervisor, AHV. With the 4.0 Alta release, Rubrik is excited to extend its capabilities to protect AHV virtual machines, making the company one of the first to do so. It’s been a fun journey the past few years watching Nutanix grow their platform and its ecosystem– AHV has matured a lot over the past few releases. I’m looking forward to seeing Rubrik used in large enterprises to protect multi-hypervisor and multi-cloud infrastructures. 

NTNX.png

Cloud Instantiation – There is a clear macro trend of IT workloads moving to the cloud. The use of cloud storage has been a part of the Rubrik story since the initial GA release in 2015. Customers can capture data sources on-premises and leverage cloud resources, such as Amazon S3, as a long-term archival target while still maintaining the ability to search, manage, and restore the data in any location. And now with this release, Rubrik extends the functionality of data in the cloud with cloud instantiation. There’s unlimited use cases for this type of functionality, especially as its feature set grows in future releases. This can assist with DR to cloud strategies or on-premises to cloud or even cross-cloud migrations. 

cloud.png

Mark your calendars — I’m co-presenting webinars on both topics this summer (AHV on 27 July and AWS Cloud Instantiation on 10 August). You can sign up here.