How to Automate Trimming your GCP Artifact Registry Docker Images

Alright alright alright, so there is the gcr-cleaner if you want a shortcut. I like writing code so here's an alternative version that you can mould into a cloud function or a cli. I can't help it. I enjoy writing code, so here it is:

Anforderungen

If you don't know what pip is, you are in the wrong place and need to leave immediately. Goodbye.

pip install google-cloud-artifact-registry

Umwelt

Export some environment variables like this:

export PROJECT_ID=your-project-id
export LOCATION=your-region # e.g europe-west4
export REPOSITORY=your-repository-name # not the full path

Konstanten

Let's create a new project including a constants.py file which contains the below:

import datetime
import os

from proto.datetime_helpers import DatetimeWithNanoseconds

PROJECT_ID = os.getenv(key="PROJECT_ID")
LOCATION = os.getenv(key="LOCATION")
REPOSITORY = os.getenv(key="REPOSITORY")
TODAY = datetime.datetime.today()
CUTOFF_DATETIME = DatetimeWithNanoseconds(
    TODAY.year,
    TODAY.month,
    TODAY.day,
    TODAY.hour,
    TODAY.minute,
    0,
    0,
    datetime.timezone.utc,
)
constants.py

Aufbewahrungsort

Ok, I'm definitely overcooking this but it's a learning exercise and an option that may or may not be to your liking. If I were in a room full of engineers, I'd be a different beast but this is where I can be myself and I can let off some steam.

Choo Choo (by mixkit.co)
0:00
/0:15
from google.api_core import operation
from google.cloud import artifactregistry_v1
from google.cloud.artifactregistry_v1.services.artifact_registry import pagers
from google.cloud.artifactregistry_v1.types import artifact


class DockerImageRepository:
    def __init__(
        self,
        client: artifactregistry_v1.ArtifactRegistryClient,
        location: str,
        project_id: str,
        repository: str,
    ):
        self.client = client
        self.location = location
        self.project_id = project_id
        self.repository = repository

    def delete_image_version(
        self, version_path: str, force: bool = True
    ) -> operation.Operation:
        request = artifactregistry_v1.DeleteVersionRequest(
            name=version_path, force=force
        )  # noqa
        print(msg=f"Deleting {version_path=}")
        response = self.client.delete_version(request=request)
        print(msg=f"{response=}")
        return response

    def get_image_versions(self) -> pagers.ListDockerImagesPager:
        request = artifactregistry_v1.ListDockerImagesRequest(
            parent=self.repository_path
        )
        docker_images = self.client.list_docker_images(request=request)
        return docker_images

    @staticmethod
    def package(image_version: artifact.DockerImage) -> str:
        return image_version.uri.split("/")[3].partition("@")[0]

    @property
    def repository_path(self) -> str:
        return (
            f"projects/{self.project_id}/"
            f"locations/{self.location}/"
            f"repositories/{self.repository}"
        )

    @staticmethod
    def version(image_version: artifact.DockerImage) -> str:
        return image_version.name.partition("@")[-1]

    def version_path(self, image_version: artifact.DockerImage) -> str:
        return (
            f"projects/{self.repository_path}/"
            f"packages/{self.package}/"
            f"versions/{self.version(image_version=image_version)}"
        )

Ok, that's a handful. Should we unpack it a little? Nope. Have a read through and the next bit will hopefully help you making sense out of it.

Löschung

Create entrypoint.py and write the below. PS it helps if you write it rather than copy and paste it because it builds muscle memory. Some peeps ask my why I don't provide github repositories for these examples. Well, muscle memory and rolling with the punches is what makes you a better engineer and not been fed every piece of information. It also gives you some freedom to express yourself. Coding can be a very creative and rewarding process in that way.

import functions_framework
from cloudevents.http import event
from google.cloud import artifactregistry_v1

# isort: off
from art import (
    CUTOFF_DATETIME,
    LOCATION,
    PROJECT_ID,
    REPOSITORY,
    DockerImageRepository,
    logger,
)

# isort: on


def delete_old_images():

    client = artifactregistry_v1.ArtifactRegistryClient()

    docker_image_repository = DockerImageRepository(
        client=client,
        location=LOCATION,
        project_id=PROJECT_ID,
        repository=REPOSITORY,
    )

    for image_version in docker_image_repository.get_image_versions():
        print(msg=f"Checking {image_version.name}")
        version_path = docker_image_repository.version_path(image_version=image_version)
        if (
            image_version.upload_time >= CUTOFF_DATETIME
            or "latest" in image_version.tags
        ):
            print(msg=f"{version_path} active")
            continue
        docker_image_repository.delete_image_version(version_path=version_path)
        print(msg=f"{version_path} deleted")
entrypoint.py

Unpacking time yeah?

  1. Declare the artifact registry client
client = artifactregistry_v1.ArtifactRegistryClient()

2. Initialise the DockerImageRepository with the client and the constants

    docker_image_repository = DockerImageRepository(
        client=client,
        location=LOCATION,
        project_id=PROJECT_ID,
        repository=REPOSITORY,
    )

3. Iterate over the images in your repository:

for image_version in docker_image_repository.get_image_versions():

4. Set the path of the version:

version_path = docker_image_repository.version_path(image_version=image_version)

This took a while to understand as the docs are not forthcoming with the info or my reading skills were not good enough at the time of writing this. The path is composed of the below:

  • project: your project name
  • package: your repository name
  • version: your docker image name
  • tag: your docker  image tag

5. Ignore images that are tagged as latest or have been created later than your constant CUTOFF_DATETIME

if (
        image_version.upload_time >= CUTOFF_DATETIME
        or "latest" in image_version.tags
):
    print(msg=f"{version_path} active")
    continue

6. Delete the old images:

docker_image_repository.delete_image_version(version_path=version_path)
        print(msg=f"{version_path} deleted")

Happy days. It's important to remember that cloud engineering has a cost associated with it so the more effective we manage our resources the least amount of $$$ trouble we'll get into.

Genießen Sie Ihren Tag!