HAS QCon London, Rachel Greaves, CEO of Castle Systems, presented both the obligations and benefits of data minimization as a mechanism to reduce the impact of data breaches. AI self-classification and automatic decision-making tools help manage the ever-increasing volumes of data, provided that ethical principles are taken into account, allowing decisions to be questioned.
Greaves began his presentation by emphasizing that cybersecurity primarily focuses on reducing the likelihood of a breach through training, firewalls and encryption. But risk is a combination of probability and impact, i.e. “There could be a low probability of penetration but a critical impact”.
Data minimization is a mechanism to reduce the impact of data breaches. She said that creating an impenetrable system is impossible: “There will always be day zerothe trusted inner man or the misconfiguration.” It is a security and privacy principle that requires organizations to limit the amount of information they hold, knowing that it could be breached or the data could be released into the public domain at any time .
Aside from legal obligations, Greaves highlighted the benefits of implementing data minimization:
- Deterrence: Data minimization reduces the potential damage that can be inflicted in the event of a data leak, but also discourages further attempts to break into the system (minimizing the amount of data that bad actors can monetize will discourage further attempts ).
- Response and recovery: If you fully understand your data, this is a secondary benefit of data minimization. Knowing this before an incident will allow you to know “who is in the spill” (which customer was affected). In the event of a breach, you can quickly alert affected parties, minimizing the impact.
- Insurability and risk transfer: Even if the assessment process is opaque, the assessment of cyber insurers includes large parts related to sensitive data. It was also found that organizations with large amounts of sensitive information tended to have higher insurance costs.
- Organizational effectiveness: you need to understand all your information, i.e. what presents a risk and what has value, and above all what rules apply to this information (retention rules, confidentiality rules, obligations regulatory).
Although many actions related to data minimization can be enforced through governance, Greaves sees it as an effort for the entire organization, one in which developers play a key role, particularly in terms of data inventory (identifying sensitive or high-value data). ). She emphasized that if done correctly:
Data minimization enables organizational maximization
Data minimization is not an additional phase of your project, but it is an ongoing effort throughout the data lifecycle, from creation or capture to eventual disposal. Three key elements of data minimization stand out:
- Minimize collection: do not collect unnecessary personal data, do not collect the same data twice across two different services, do not keep duplicates, excessive backups or offline copies, and collect only what’s needed.
- Minimize access: minimize the number of people with access, their privileges and the duration of their access (“Seeing who is doing what with data and being alerted of actions on sensitive and high-value data helps identify privilege creep.”).
- Data end-of-life management: Disposable data is more than just a hard drive demagnetization. Much can be accomplished through policies and governance around records management and retention policies.
Greaves re-emphasized that while many outcomes come from processes and governance, it is of utmost importance that developers adhere to and support the “data minimization philosophy”, while data privacy and governance data moves to the left.
Given the complexity and large amounts of data, technology can shed light on “what is valuable and what is sensitive” quickly and accurately (Minimize risks and maximize results). Artificial intelligence is very suitable for such tasks, including self-classification of data and Automated decision making (ADM). Systems must be able to collect and report sensitive data across multiple systems without affecting the source systems, but they must still keep humans as the ultimate decision-makers:
In an area as risky as data governance, it is important not to completely exclude people from the process.
To avoid AI bias, hallucinations or the risk of malicious use of AI systems, certain software-assisted obligations (AI-enabled or not) must be explainable and transparent. In this way, they can be challenged to prevent harm from being inflicted on the most vulnerable communities.
Greaves concluded his example-rich presentation (covering the OPM Data Breach, Data breach at Australian university, Windrush scandal, etc.) with checklists integrating the good practices presented beforehand. According to her, privacy laws are geared toward destroying data, records laws are geared toward preservation, and national security laws tip the balance toward eliminating sensitive data. So, as difficult as it may seem, systems must balance the tension between data risk and value through system design.