David is an Employee (his preferred title) at Transcend.io. For this post, he brought all the Terragrunt knowledge to the table, and I brought the extensive editorial input. Together, we authored this piece and were proud to see the creator of Terragrunt promote it:
Enhancing the Terraform Experience: Why we use Terragrunt. Great write up by the folks at @transcend_io.https://t.co/mxhC84WEdw
— Yevgeniy Brikman (@brikis98) May 10, 2020
At Transcend, we take our infrastructure and its security very seriously. Our product manages the personal data of other companies’ users; our infrastructure must be airtight. We automate as much as possible in order to de-risk our releases and to ensure our security standards are easily-controlled and auditable. To achieve these goals, we use Terragrunt in tandem with Terraform.
Terraform (by HashiCorp) enables people to use code to provision and manage any cloud, infrastructure, or service. Terragrunt (by Gruntwork, an official HashiCorp partner) wraps the Terraform binary to provide powerful enhancements, ranging from more first-class expressions to module dependencies and built-in caching.
Best practice in Terragrunt encourages module reusability and extensibility by default: it forces us to make the kinds of good technical decisions that uphold our security and development principles.
Terragrunt has significant benefits over Terraform
The initial purpose of Terragrunt was to fill in a few gaps in Terraform’s functionality, and it has continuously expanded with new features. Although Terraform has evolved to support more advanced feature sets, it still has room for improvement. Only Terragrunt brings the following rich feature set to the table:
- Explicit dependencies: Share your state easily
- Automatic Atlantis config generation: Eliminates toil
- Environment variable support: Discourages hard-coded values
generate
blocks: Remove repeated Terraform- Automatic resource tagging: Applies metadata universally
- Arbitrary command output from variables: Streamlines library usage
read_terragrunt_config
imports: Eliminate repeated Terragrunt code
Explicit dependencies: Share your state easily
In order to make Terraform modules reusable, they need to be configurable. Terragrunt’s explicit dependencies are hands down the most effective way to achieve this.
While the Terraform remote state data source makes it possible to pass output values between modules, using it can lead to complicated and verbose code. Further, it makes your modules less reusable in cases where you have different state configurations, and gives little direction to the readers of your code as to where values are coming from.
Terragrunt offers an alternative in explicit dependency blocks, which are much more powerful. Here’s an example of using dependencies with Terragrunt to configure a database on AWS inside of an existing virtual private cloud (VPC) with an existing security group (a virtual firewall):
# Define the Terraform source module to use: an RDS database module in this example
terraform {
source = "git::git@github.com:terraform-aws-modules/terraform-aws-rds?ref=v2.13.0"
}
# Define dependencies
dependency "vpc" {
config_path = "../path/to/vpc_module"
}
dependency "sg" {
config_path = "../path/to/security_group_module"
}
dependency "some_other_dep" {
config_path = "../other/variables/for/making/an/RDS_database"
}
# Terraform modules have ‘variables’ that need to be populated before that
# module can be run. In Terraform, these are provided by a static file.
# In Terragrunt, this `input` block can be used to dynamically supply those variables!
inputs = {
identifier = "backend"
db_subnet_group_name = dependency.vpc.outputs.database_subnet_group
vpc_security_group_ids = [dependency.sg.outputs.this_security_group_id]
other_necessary_variables = dependency.some_other_dep.outputs.any_output
}
There are lots of cool things to note here:
Terragrunt dependencies only define their state configuration once.
In Terraform, I have to define the dependencies and then, in every single module that requires them, define how to ingest them. These boilerplate definitions can add up.
Terragrunt knows to go and examine the config for those modules to find out how to access their state, which is extremely useful for using dependencies across multiple workspaces/AWS Accounts.
Terragrunt applies dependencies in their implied order.
In Terraform, because state is only available after a module has run, the order in which modules are run matters (and Terraform does not know that order). The operator needs to document the order to apply things.
Terragrunt creates a dependency tree, and runs all commands in the proper order such that all necessary dependencies are available at execution time.
Terragrunt encourages good programming practices.
Looking at the verified modules on the Terraform module registry, a pattern emerges: almost none of them use remote_state
data sources. In fact, the terraform-aws-modules Github organization, which hosts dozens of verified modules, only uses a single remote_state
dependency.
This is because using remote_state
makes modules less reusable. While variable inputs can come from anywhere, putting remote_state
blocks in module code restricts that module’s usage to only work when the remote_state already exists, and that module has access to the state.
Bonus: Terragrunt makes testing easy.
Further, Terragrunt dependencies have built in dependency injection which makes testing with a tool like terratest a breeze.
Automatic Atlantis config generation: Eliminates toil
In order to keep our infrastructure up-to-date, we use the popular CI/CD system Atlantis. The dependency hierarchy created by Terragrunt allows us to auto generate our Atlantis configuration, replacing the painful process of manually configuring it. At Transcend, we proudly made and maintain the open source tool terragrunt-atlantis-config: our Atlantis configuration generator.
We created this tool after struggling with the manual and error-prone process of updating our atlantis.yaml
file. These files can be tens of thousands of lines long, the dependency tree for each and every module must be defined in the configuration, and improperly-defined dependencies fail silently.
Save yourself some misery: generate your config.
Environment variable support: Discourages hard-coded values
Environment variables make it possible to, for example, programmatically populate AWS profile values. They can provide fallbacks and they discourage the hard coding of values. Life is better with environment variables.
# Set the `profile` variable to have a value of the `AWS_PROFILE` env var
# Default value is "dev-profile" if that env var is not set
profile = get_env("AWS_PROFILE", "dev-profile")
Terraform has no plans to support environment variables soon (as can be seen in apparentlymart
’s (a Terraform maintainer) response on this pull request), and for good reason! Doing so could cause very-hard-to-debug Terraform bugs when child modules depend on environment variables that were never explicitly set. The Terraform philosophy isn’t that environment variables are bad, but that they should be explicitly set and only available to top-level modules. Because Terragrunt is a wrapper that only deals with root modules, it can and does support environment variables.
generate
blocks: Remove repeated Terraform config
If you use AWS, how many times have you declared a variable for tags
? How about a variable for the AWS region to deploy in? How about the IAM Role to assume in the provider? How many places have you specified the same Terraform provider version constraints?
I thought so. Instead of porting that common code all over the place, with Terragrunt it’s possible to use generate
blocks to dynamically add Terraform code to modules before they are planned or applied.
generate "provider" {
path = "provider.tf"
if_exists = "overwrite"
contents = file("${get_parent_terragrunt_dir()}/vault/provider.block")
}
Writing a provider with all the variables it needs allows you to add that provider with only a short generate
block anywhere it is needed.
Automatic resource tagging: Applies metadata universally
Speaking of using tags
variables everywhere, how many places do you define which tags to pass to modules? They add valuable metadata that can be used to track everything from which resources are the most expensive to the location of a resource’s source code. If you wanted to add a new tag to every single resource in your AWS account with the name of the Terraform file that created it, how much work would that be?
With Terragrunt, it’s simple. Use a generate
block as described above to add a tags
variable everywhere with the other common vars you want, then in your parent Terragrunt file (we have one per environment), add the input:
tags = {
Terraform = true
env = "some_env"
TerraformPath = path_relative_to_include()
}
And just like that, every single resource tells us which module manages it. If I’m browsing the AWS console and see some ECS service is failing because it’s missing an environment variable, I can just check the TerraformPath
tag on that service to see exactly which file in our codebase I need to edit.
Arbitrary command output from variables: Streamlines library usage
Terragrunt’s run_cmd
function executes arbitrary commands in the local shell. This is useful in certain exceptional cases where there’s no Terraform provider (library) for a specific task.
As an example, we use AWS IAM Authentication to connect to our Hashicorp Vault clusters. Inside the Vault provider, we can authenticate using IAM instance metadata… but we need AWS IAM request headers to complete the authentication, and getting AWS IAM request headers is a complex task. The path of least resistance is using external libraries like aws4
, a popular NodeJS library.
Fortunately, because I can set variables to arbitrary command output, in order to supply this complex var to Terraform, I just need to write a script that generates the headers, then to point the variable to that output, as I do with iam_request_headers
in sample Terraform code below:
auth_login {
path = "auth/aws/login/"
parameters = {
role = var.vault_role
iam_http_request_method = "POST"
iam_request_url = base64encode("https://sts.amazonaws.com/")
iam_request_body = base64encode("Action=GetCallerIdentity&Version=2011-06-15")
iam_request_headers = var.request_headers
}
}
Then in your Terragrunt inputs, just add:
request_headers = run_cmd("--terragrunt-quiet", "node", "path/to/script.js")
Presto! Terragrunt unlocks the utility of infinite other scripts!
read_terragrunt_config
imports: Eliminate repeated Terragrunt code:
Terragrunt offers a brand new function read_terragrunt_config
that allows the importing of Terragrunt code from external Terragrunt files.
At Transcend, we have a few ECS container definitions that share a common subset of environment variables. In a common file, we create a map of these shared env vars and then, in our container modules, we just read in that common config and merge its results with our inputs.
While collecting our environment variables has some complexity (it has quite a few dependencies and function calls), read_terragrunt_config
allows us to isolate that complexity in one location, and to leverage its results everywhere else.
Terragrunt continues to evolve and improve
Like with any piece of software, over time we noticed some missing features and have had to work around some rough edges. Let’s take a look at some of the historical issues we ran into and how we handled them:
- Managing latency.
- Making variable file usage consistent
- Reducing the verbosity of output.
Managing latency
As we showed earlier, Terragrunt dependency configuration blocks are incredibly powerful and useful. However, because modules frequently have dependencies on dozens of other modules, each of which can have their own dependencies, the dependency tree can balloon quickly. Once we reached around 300 top-level modules, some with 50+ dependencies whose Terraform states Terragrunt needed to look up, it could take over ten minutes to run terragrunt plan
commands.
To deal with this lag, we submitted a pull request (PR) to the Terragrunt repository that updated the library to look up all state dependencies concurrently by using goroutines. We proposed the PR one Friday at 10pm, it was merged in under an hour, and by the next morning, it was already released on Homebrew.
With these changes in place, our slowest module now takes just two minutes to plan the first time, and under a minute to plan
once the Terragrunt cache is established.
Making variable file usage consistent
In both Terraform and Terragrunt, it is possible to make changes with or without generating a plan file first. Either:
terraform plan -out planfile.out # Terraform variables supplied here
terraform apply planfile.out # Terraform will throw an error if you supply errors here
Or simply,
terraform apply # Without a planfile, supply Terraform variables here
Because Terragrunt automates so much, it becomes import to make sure application configuration protects against running into Terraform’s quirks: otherwise, it’s easy to inadvertently pass variables to an apply
with a planfile and everything will explode. (Realistically, it’ll only error. It just feels like an explosion.)
The Terragrunt best practice of always running terragrunt plan
before any terraform apply
would help with this issue, but at Transcend we chose to take our practices one step further to eliminate the issue altogether.
Instead of using Terraform variable files, we load all our secret values directly from Hashicorp Vault clusters using the Hashicorp Vault Provider or via direct queries to our Vault cluster from our application code. This way, we never manually share secret files (secrets in git aren’t secure), and we can have extremely high confidence in the security of our secret storage.
Because Terraform and Terragrunt store Terraform state files as plaintext with all secrets visible, an ongoing topic of over six years of discussion, by moving as many of our secrets into Vault as possible we ensure they are never present in our state.
Not only does this make our infrastructure code lean (which made moving our deployments onto a CI system with Atlantis a breeze), our state files are now much more secure in the case of a leak: a huge security win!
Reducing the verbosity of output
Terragrunt is a rather verbose tool, so much so that its most upvoted issue of all time is a request for clean logging. We can relate.
Some of our modules produce thousands of logs to stderr
with each plan
. These logs can be useful when errors come up, but tend to be excessive when plans succeed.
To deal with this, we wrote a small Terragrunt wrapper of our own that, on successful runs, only displays the Terraform plan output. When issues come up with our Terraform code, our wrapper displays the entire set of Terragrunt logs so we can easily debug what went wrong.
Feel free to use and fork our solution in that gist!
With major upside and minimal downside, Terragrunt is still a winner
Every tool in a system’s stack adds complexity, increasing both the cognitive load of working in the system and the barrier to entry for new developers. In general, we try to avoid complexity for complexity’s sake, and to strive to build a lean codebase wherever possible.
At Transcend, we’ve found that Terragrunt not only shortens our codebase by a few thousand lines, it also greatly simplifies our infrastructure code. We no longer need to repeat our provider, state, or dependency configurations, so the logic of our code is now focused exclusively on what it should be focusing on: the terraform resources our developers want.
In the rare instances when Terragrunt fell short of our expectations, the maintainers have been extremely open to feedback, feature requests, and pull requests, and have quickly addressed our pain points.
With all the wonderful features Terragrunt brings to Terraform, we are able to have confidence that each of our infrastructure deploys will produce secure, auditable systems that offer strong protections for our customers’ sensitive data.