Data anonymization : Bricoprivé case study

Case study

5 min

read

The challenge

BricoPrivé is an online home improvement retail business whose presence spans across multiple countries. As any online shop, they have to store names, email addresses, delivery addresses, contact information and other personal information to be able to ship their goods to their customers. The European law is rightfully strict when it comes to the protection of user information, but it requires organizations to tread carefully around this data, especially if it can be exposed to different teams.

Technofy is helping BricoPrivé's team on multiple aspects related to AWS and DevOps, but this use case focuses on the implementation of a data anonymization mechanism to ensure GDPR compliance when employees need to work with customer data.


Business pain & challenges

Tech Stack

AWS

The solution

Overview

Figure 1: High-level overview of the data anonymization process

BricoPrivé already had an anonymization script that scrambles the data in-place. The main goal of our operation was to completely automate the anonymization process, and provide the developers with a fresh scrambled copy of the production database every week.

The production and development environments are separated in two different AWS accounts in the same organization. This practice has numerous advantages in terms of security, isolation of resources and cost allocation. This also means that the production and development databases are two completely separate clusters.

The solution consists of two parts:

The workflow of each side of the process is described in the sections below.

Anonymizing the data

The script BricoPrivé has developed scrambles personal information "in-place" this means that the original data gets replaced by the scrambled one. In our case, thanks to the flexibility of the AWS ecosystem, we can spin up a database from the latest production snapshot, run the script, and create an anonymized snapshot in a few minutes.

Our approach does exactly that, and the whole thing is orchestrated thanks to AWS Step Functions.

Figure 2: Steps of the anonymization process on the production account

The workflow includes a few safety checks to make sure that the actions are happening correctly, and in the right order. Once the snapshot has been created and it is considered available, it is then shared to the development account.

All the steps are using a Lambda Function, except the anonymization step, which uses a container running on ECS, because the script can exceed the Lambda execution time limit.

This whole process happens in the production account to ensure that no personal information ever leaves the more restricted environment. As development environments are generally more lax in terms of access, it was decided that it should only receive anonymized data.

Receiving anonymized data

The other part of the stack resides in the development account. It was decided to use a CloudWatch Event rule with a schedule to trigger the other Step Function rather than an event-driven approach because we could potentially want to restore the development database multiple times during the week to the latest snapshot available.

Figure 3: Steps of the receiver process on the development account

The first step of this workflow is to copy the snapshot received on the current account to no longer depend on the anonymizer. Afterwards, we need to check if an anonymized database is already up, to know if we will need to delete it before deploying the new one with the last snapshot received during the day.

All the steps are using Lambda functions.


Results & Highlights

This solution an approach let us lot of advantages:

Thank you for reading this article. We hope you enjoyed it!

Contact us for more information about our accompaniment and expertise !