Introduction

Kubernetes is becoming more and more popular these recent years. Many new projects choose it - by default - as a platform. Its flexibility and advantages are numerous to the point that many of the legacy applications are getting refactored, containerized, and migrated to Kubernetes.

In real life, it is common to have services you develop and maintain, and third-party services and tools, deployed altogether in the same cluster (or clusters).

Although not all services will be exposed via an endpoint (eg. batch jobs), many of them will have an endpoint and can be exposed at least to a set of trusted clients, and generally very publicly. In both cases, communication goes through networks we do not own.

From a security standpoint, a lot can be done to secure these communications, but a ‘must-have’ is using TLS certificates to secure the communication.

In this article, we will explore how to secure Kubernetes services using Let’s Encrypt certificates, and how to automatically generate these certificates and automatically renew them when they expire.

Preparing the Kubernetes cluster

We will be using AWS as a Cloud Provider, and EKS to create our cluster. The DNS will be managed by Route 53.

We will be using mainly Terraform to provision our Infrastructure.

Let’s start by creating a Kubernetes cluster that we will be using during our exploration, and let’s start with the basic building blocks.

In the following setup phase, we will prepare an EKS cluster and install ExternalDNS controller and NGINX Ingress Controller. If you want to skip this part, feel free to jump directly to Configure and Install cert-manager

Installing controllers

At this stage, we start with a simple EKS cluster as defined here.

However, some functionalities we will be needing are not there yet. An Ingress Controller is not yet installed. DNS records are not handled by any component.

We will install NGINX Ingress Controller, and ExternalDNS to automatically update our Route 53 zone from ingress resources.

These controllers’ Helm Charts will be installed using Terraform Helm provider.

We won’t go deep into the details of these controllers as they are not our main topic.

For our Kubernetes workloads, we will make use of IRSA (IAM Roles for Service Account) through iam-assumable-role-with-oidc Terraform module. This will allow us to have dedicated IAM roles for our Kubernetes service accounts, with the required permissions only.

ExternalDNS

To manage our AWS Route 53 zone we will install and configure external-dns. We will define a Terraform variable with our DNS zone:

public_dns_zone       = "dev.cloudiaries.com"

For ExternalDNS specific variables:

variable "external-dns" {
  default = {
    chart_version   = "6.14.1"
    namespace       = "system"
    service_account = "external-dns"
  }
}

ExternalDNS IAM

Let’s get the ID of our DNS zone:

data "aws_route53_zone" "selected" {
  name         = var.public_dns_zone
  private_zone = false
}

And create an IAM Role for ExternalDNS deployment:

module "iam_assumable_role_for_external_dns" {
  source                        = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version                       = "5.14.3"
  create_role                   = true
  number_of_role_policy_arns    = 1
  role_name                     = "external-dns-role-${var.eks.cluster_name}"
  provider_url                  = replace(var.eks.cluster_oidc_issuer_url, "https://", "")
  role_policy_arns              = [aws_iam_policy.external_dns.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:${var.external-dns.namespace}:${var.external-dns.service_account}"]
}

# ExternalDNS policy
data "aws_iam_policy_document" "external_dns" {
  statement {
    actions   = ["sts:AssumeRole"]
    resources = ["*"]
  }

  statement {
    actions = [
      "route53:ChangeResourceRecordSets"
    ]
    resources = [
      "arn:aws:route53:::hostedzone/${data.aws_route53_zone.selected.zone_id}",
    ]
  }

  statement {
    actions = [
      "route53:ListHostedZones",
      "route53:ListResourceRecordSets"
    ]
    resources = ["*"]
  }
}

resource "aws_iam_policy" "external_dns" {
  name   = "external-dns-policy-${var.eks.cluster_name}"
  policy = data.aws_iam_policy_document.external_dns.json
}

We will use the following basic configuration for ExternalDNS, to watch ingress resources for DNS records management.

We’ll also create CRDs, and force the deployment to be scheduled on the system node group. The following configuration will be part of our external-dns-values.yaml

tolerations:
- key: "workload_type"
  operator: "Equal"
  value: "system"
  effect: "NoSchedule"

nodeSelector:
  workload_type: system

sources:
  - ingress

crd:
  create: true

logFormat: json
policy: sync

And finally, the chart installation:

resource "helm_release" "external_dns" {
  name                  = "external-dns"
  repository            = "https://charts.bitnami.com/bitnami"
  chart                 = "external-dns"
  version               = var.external-dns.chart_version
  values                = [file("${path.module}/external-dns-values.yml")]
  render_subchart_notes = false
  namespace             = var.external-dns.namespace
  create_namespace      = true
  set {
    name  = "serviceAccount.name"
    value = var.external-dns.service_account
  }
  set {
    name  = "provider"
    value = "aws"
  }
  set {
    name  = "aws.region"
    value = var.region
  }
  set {
    name  = "aws.assumeRoleArn"
    value = module.iam_assumable_role_for_external_dns.iam_role_arn
  }
  set {
    name  = "aws.zoneType"
    value = "public"
  }
  set {
    name  = "domainFilters[0]"
    value = data.aws_route53_zone.selected.name
  }
  set {
    name  = "zoneIdFilters[0]"
    value = data.aws_route53_zone.selected.zone_id
  }
  set {
    name  = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = module.iam_assumable_role_for_external_dns.iam_role_arn
  }
}

Ingress Controller

We will use the ingress-nginx as our Ingress Controller.

The Ingress Controller will be exposing our services through a Load Balancer.

For ingress-specific variables

variable "ingress" {
  default = {
    namespace     = "system"
    chart_version = "4.5.2"
    timeout       = "600"
  }
}

A lot can be said about NGINX ingress configuration. This is not our main topic. Let’s install the Ingress Controller this way:

resource "helm_release" "ingress_controller" {
  name                  = "nginx-ingress-controller"
  repository            = "https://kubernetes.github.io/ingress-nginx"
  chart                 = "ingress-nginx"
  version               = var.ingress.chart_version
  render_subchart_notes = false
  namespace             = var.ingress.namespace
  create_namespace      = true
  values                = [file("${path.module}/ingress-values.yml")]
  timeout               = var.ingress.timeout

  set {
    name  = "controller.service.annotations.external-dns\\.alpha\\.kubernetes\\.io/hostname"
    value = "*.${var.public_dns_zone}"
  }
  set {
    name  = "controller.ingressClass"
    value = "nginx"
  }
  set {
    name  = "controller.service.annotations.service\\.beta\\.kubernetes\\.io/aws-load-balancer-proxy-protocol"
    value = "*"
  }
  set {
    name  = "controller.config.use-forwarded-headers"
    value = true
  }
  set {
    name  = "controller.config.use-proxy-protocol"
    value = true
  }
  set {
    name  = "controller.config.compute-full-forwarded-for"
    value = true
  }
}

Configure and Install cert-manager

At this stage, we have an EKS cluster with an Ingress Controller and ExternalDNS to manage records in our AWS Route 53 zone.

Let’s now tackle our main topic: generating certificates using cert-manager.

cert-manager is a Kubernetes controller that is responsible for issuing certificates from different Issuers. It supports many public issuers as well as private issuers.

It will also make sure certificates are up to date, and renew the expiring certificates before expiry.

Install cert-manager

To install cert-manager we will be using - as with previous components - a Helm Chart.

We will be using the Let’s Encrypt Staging server to generate certificates for this demo. In order to issue a certificate from an Issuer, we need to solve a challenge to prove we own the domain name we are generating the certificate for. Cert-manager offers two challenge validation methods: HTTP01 and DNS01.

For HTTP01 challenge, the client is asked to present a token in an HTTP URL that is publicly routable and accessible. Once Let’s Encrypt gets the URL successfully with the expected token, the certificate will be issued. The URL has the format: http://<YOUR_DOMAIN>/.well-known/acme-challenge/<TOKEN>

For DNS01 challenge, we will be asked to present a token in a TXT DNS record. Once Let’s Encrypt gets the DNS record successfully with the expected key the certificate will be issued. The token is expected to be in a TXT record named _acme-challenge.<YOUR_DOMAIN> under your domain name.

Let’s start by setting our IAM Role for the cert-manager Service Account, which will be needed for DNS01 challenges. We will need to create an IAM role with the required permissions as in the official documentation.

We can now create the IAM policy:

data "aws_iam_policy_document" "cert_manager" {
  statement {
    actions   = ["sts:AssumeRole"]
    resources = ["*"]
  }

  statement {
    actions   = ["route53:GetChange"]
    resources = ["arn:aws:route53:::change/*"]
  }

  statement {
    actions = [
      "route53:ChangeResourceRecordSets",
      "route53:ListResourceRecordSets"
    ]
    resources = [
      "arn:aws:route53:::hostedzone/${data.aws_route53_zone.selected.zone_id}"
    ]
  }

  statement {
    actions = [
      "route53:ListHostedZonesByName"
    ]
    resources = ["*"]
  }
}
resource "aws_iam_policy" "cert_manager" {
  name   = "cert-manager-policy-${var.eks.cluster_name}"
  policy = data.aws_iam_policy_document.cert_manager.json
}

And the IAM Role for the Kubernetes Service Account:

module "iam_assumable_role_for_cert_manager" {
  source                        = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version                       = "5.14.3"
  create_role                   = true
  number_of_role_policy_arns    = 1
  role_name                     = "cert-manager-role-${var.eks.cluster_name}"
  provider_url                  = replace(var.eks.cluster_oidc_issuer_url, "https://", "")
  role_policy_arns              = [aws_iam_policy.cert_manager.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:${var.cert-manager.namespace}:${var.cert-manager.service_account}"]
}

We can now install the cert-manager Helm Chart, enabling CRDs and setting the ARN of the IAM Role on the Service Account annotation:

resource "helm_release" "cert_manager" {
  name                  = "cert-manager"
  repository            = "https://charts.jetstack.io"
  chart                 = "cert-manager"
  version               = var.cert-manager.chart_version
  render_subchart_notes = false
  namespace             = var.cert-manager.namespace
  create_namespace      = true
  values                = [file("${path.module}/cert-manager-values.yml")]
  set {
    name  = "installCRDs"
    value = true
  }
  set {
    name  = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = module.iam_assumable_role_for_cert_manager.iam_role_arn
  }
}

Configure a Cluster Issuer

First of all, we will need to configure an Issuer or ClusterIssuer. The first is a namespaced resource, while the second is a cluster-wide resource.

Both of these resources will define the CA that can sign Certificates in response to Certificate Signing Requests.

You can have multiple Issuers/CluserIssuers on your cluster. When you request a Certificate, you will specify the Issuer you want to use.

The HTTP01 challenges will use the NGINX ingress controller to expose a public endpoint. For that, we need to specify the ingress Class used for our Ingress Controller, which is ’nginx’ in our example.

DNS01 Challenges need to access the Route 53 zone to add the TXT record. In the DNS01 solver configuration, we will provide the IAM role we created for cert-manager in the previous step, as well as the hosted zone ID.

The following Cluster Issuer definition contains three main sections to take note of:

  • The ACME server to use (Let’s Encrypt - Staging) and yout email. This is used for certificate expiry notifications.
  • The HTTP01 solver with the ingress class to use, and a selector on the DNS zone dev.cloudiaries.com. This means all certificates under this domain will use this HTTP01 solver.
  • The DNS01 solver with the Route 53 config and a selector for a DNS name. This means all certificates for these names will use the DNS01 solver.
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    email: youraccount@example.com
    preferredChain: ""
    privateKeySecretRef:
      name: issuer-account-key
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    solvers:
      - http01:
          ingress:
            class: nginx
        selector:
          dnsZones:
          - dev.cloudiaries.com
      - dns01:
          route53:
            hostedZoneID: <HOSTED_ZONE_ID>
            region: <AWS_REGION>
            role: <CERT_MANAGER_IAM_ROLE_ARN>
            secretAccessKeySecretRef:
              name: ""
        selector:
          dnsNames:
          - "*.dev.cloudiaries.com"

Let’s now install it:

kubectl apply -f staging-cluster-issuer.yaml

and check that the cluster issuer have been created successfully:

kubectl get clusterissuer
NAME                  READY   AGE
letsencrypt-staging   True    5m

Services configuration

We have now:

  • An EKS cluster with
    • An Ingress Controller
    • An ExternalDNS controller for managing our Route 53 zone
    • Cert-manager for creating/renewing our certificates, with a Cluster Issuer definition

In the next steps, we will use the podinfo microservice to deploy it in our cluster. We will then secure the service ingress using Let’s Encrypt certificates, with two different methods.

Again, we will use a Helm Chart here.

Service with wildcard certificate

For this first method, we will create a podinfo service that will use a wildcard certificate.

Let’s start by generating the dev wildcard certificate *.dev.cloudiaries.com.

To do so we will need to create a Certificate object for cert-manager controller.

The following is a definition of our Certificate object:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: dev.cloudiaries.com
  namespace: dev
spec:
  # Secret name that will hold the issued certificate
  secretName: dev.cloudiaries.com

  duration: 2160h # 90d: The duration of the certificate validity
  renewBefore: 360h # 15d: Renew the certificate 15 days before expiry
  subject:
    organizations:
      - cloudiaries
  isCA: false
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  usages:
    - server auth
    - client auth
  # At least one of a DNS Name, URI, or IP address is required.
  dnsNames:
    - "*.dev.cloudiaries.com"
  # Issuer references are always required.
  issuerRef:
    name: letsencrypt-staging
    # We can reference ClusterIssuers by changing the kind here.
    # The default value is Issuer (i.e. a locally namespaced Issuer)
    kind: ClusterIssuer

When we create this object, the DNS Name will match the DNS01 solver, and it will be used for the challenge. cert-manager will create a CSR (Certificate Signing Request) and submit it to the ACME server defined in the ClusterIssuer.

The privateKey in the Certificate object is used to generate a private key. This generated key is then used to sign the CSR. This key will become later the private key of the issued certificate.

Once cert-manager requests a certificate from Let’s Encrypt, it will be asked to solve a challenge by adding a TXT DNS record under the specified domain.

We can inspect the main resources’ status on the cluster. The challenge is pending, waiting for TXT records to be set as requested (and propagated):

kubectl -n dev get challenges
NAME                                             STATE     DOMAIN                AGE
dev.cloudiaries.com-c8t66-847405157-2663135921   pending   dev.cloudiaries.com   35s

The certificate request is approved by the cert-manager but not yet ready:

kubectl -n dev get certificaterequest
NAME                        APPROVED   DENIED   READY   ISSUER                REQUESTOR                                   AGE
dev.cloudiaries.com-c8t66   True                False   letsencrypt-staging   system:serviceaccount:system:cert-manager   5s

Similarly, the certificate is not yet ready:

kubectl -n dev get chertificate
NAME                  READY   SECRET                AGE
dev.cloudiaries.com   False   dev.cloudiaries.com   6s

When Let’s Encrypt succeeds to fetch the TXT record, a certificate is issued and given back to cert-manager.

kubectl -n dev get chertificate
NAME                  READY   SECRET                AGE
dev.cloudiaries.com   True    dev.cloudiaries.com   2m38s

Finally, the certificate is stored in the secret with the specified secret name in the Certificate definition.

kubectl -n dev get secrets dev.cloudiaries.com
NAME                  TYPE                DATA   AGE
dev.cloudiaries.com   kubernetes.io/tls   2      2m7s

The secret as you can see in the DATA field, contains 2 keys. These are:

  • tls.crt: The issued certificate
  • tls.key: The certificate private key (generated for the CSR)

We can check the certificate in the secret to make sure it contains the information we defined in the Certificate object:

kubectl -n dev get secrets dev.cloudiaries.com -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -subject
subject= /CN=*.dev.cloudiaries.com

And it is generated from the Let’s Encrypt staging server:

kubectl -n dev get secrets dev.cloudiaries.com -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -issuer
issuer= /C=US/O=(STAGING) Let's Encrypt/CN=(STAGING) Ersatz Edamame E1

And the public key:

kubectl -n dev get secrets dev.cloudiaries.com -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -pubkey
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE75HnxlU0dVC8mliZFEJEWvAQ1LX0
oy/gJnVLFjDrSMObURIpSm9g48RVjzuRRprcmI6TZb7kqY52Oi/2BMhG4w==
-----END PUBLIC KEY-----

Notice that, as secrets are base64 encoded, we need to decode the certificate before passing it to the openssl command.

Let’s now deploy our service and use the wildcard certificate. Let’s start by setting our service ingress config in a podinfo-dev-values.yaml file:

ingress:
  enabled: true
  className: "nginx"

  hosts:
    - host: podinfo-dev.dev.cloudiaries.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: dev.cloudiaries.com
      hosts:
        - podinfo-dev.dev.cloudiaries.com

Make sure you have added the podinfo Helm Chart repo:

  helm repo add podinfo https://stefanprodan.github.io/podinfo

And let’s install the chart:

  helm -n dev install podinfo-dev podinfo/podinfo -f podinfo-dev-values.yaml

You will notice that a new record podinfo-dev.dev.cloudiaries.com have been added to the Route 53 zone, thanks to ExternalDNS. Remember that ExternalDNS was configured to watch ingress resources.

Give some time for DNS to propagate.

A few moments later, we can check that our service is reachable:

curl --insecure https://podinfo-dev.dev.cloudiaries.com/version
{
  "commit": "67e2c98a60dc92283531412a9e604dd4bae005a9",
  "version": "6.3.5"
}

Notice the curl --insecure flag. This is required as Staging Let’s Encrypt certificates are not trusted.

We can use openssl to inspect the certificate, to make sure we have used the generated certificate:

openssl s_client -connect podinfo-dev.dev.cloudiaries.com:443 -showcerts </dev/null | openssl x509 -noout -pubkey
# Removed output
-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE75HnxlU0dVC8mliZFEJEWvAQ1LX0
oy/gJnVLFjDrSMObURIpSm9g48RVjzuRRprcmI6TZb7kqY52Oi/2BMhG4w==
-----END PUBLIC KEY-----

We can see that the public key of our service is the same as the key from our certificate generated previously.

We can also inspect the certificate using a browser: Wildcard Certificate

We’re now done with this example. We can uninstall the Chart:

helm -n dev uninstall podinfo-dev

Service using dedicated certificate

In the previous example we generated a wildcard certificate *.dev.cloudiaries.com that can be used for any service under the subdomain dev.cloudiaries.com.

In the following example, we will create a dedicated certificate for one service, under dev.cloudiaries.com subdomain. Let’s call it simply podinfo.dev.cloudiaries.com.

These types of certificates will be generated using an HTTP01 challenge. We won’t need to create a Certificate object this time. A simple method is to add annotations on our ingress resource, and cert-manager will take care of creating the certificate. Let’s create our ingress configuration:

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-staging

  hosts:
    - host: podinfo.dev.cloudiaries.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: podinfo.dev.cloudiaries.com
      hosts:
        - podinfo.dev.cloudiaries.com

We can see that the certificate request is approved and not yet valid:

kubectl -n dev get certificaterequest
NAME                                APPROVED   DENIED   READY   ISSUER                REQUESTOR                                   AGE
podinfo.dev.cloudiaries.com-xxxpg   True                False   letsencrypt-staging   system:serviceaccount:system:cert-manager   65s

Waiting for the challenge to complete:

kubectl -n dev get challenges
NAME                                                      STATE     DOMAIN                        AGE
podinfo.dev.cloudiaries.com-xxxpg-3709303441-4086217025   pending   podinfo.dev.cloudiaries.com   63s

And given this certificate request will use the HTTP01 solver, a new ingress resource is created to complete the challenge:

kubectl -n dev get ingress
NAME                        CLASS    HOSTS                         ADDRESS                                                                   PORTS     AGE
cm-acme-http-solver-7nqgc   <none>   podinfo.dev.cloudiaries.com   a1a0e3219534c4c77b5e2fc5ef859be7-2059265022.eu-west-1.elb.amazonaws.com   80        65s
podinfo                     nginx    podinfo.dev.cloudiaries.com   a1a0e3219534c4c77b5e2fc5ef859be7-2059265022.eu-west-1.elb.amazonaws.com   80, 443   67s

Although the acme ingress class is set to <none>, it is nginx:

kubectl -n dev get ing cm-acme-http-solver-7nqgc -o jsonpath='{.metadata.annotations.kubernetes\.io/ingress\.class}'
nginx

This type of challenges (HTTP01) is completed by exposing a specific URL. We can see that on the Ingress specification:

spec:
  rules:
  - host: podinfo.dev.cloudiaries.com
    http:
      paths:
      - backend:
          service:
            name: cm-acme-http-solver-hx5nw
            port:
              number: 8089
        path: /.well-known/acme-challenge/IzWGIem3ciEqz_drxbnRUd79nBxchuGCRD67WF0q6f4
        pathType: ImplementationSpecific

Once the challenge is completed and the certificate generated, the Ingress resource used for validation will be deleted.

We can see that this time we get a certificate for our service and not a wildcard certificate: TLS Certificate

Conclusion

Throughout this article, we’ve explored the installation and configuration of cert-manager. We’ve used it with Let’s Encrypt to secure our services endpoints’. We’ve gone through the different methods for solving challenges (HTTP01 and DNS01).

We’ve been using the Let’s Encrypt Staging server to generate certificates. Switching to the Prod server can be done by simply replacing the server on the ClusterIssuer by: https://acme-v02.api.letsencrypt.org/directory