Malware is a threat against both organization and individuals in enterprise environments. These days, a large majority of network-traffic is encrypted on the wire, which protects the confidentiality and integrity of the data – but also creates issues for the monitoring of traffic for potential malicious activity on your network.
There are vendors, however, that provide network-products that will perform termination of SSL/TLS-traffic, inspect it and potentially produce logs for security analysis purposes.
But then again, the General Data Protection Regulation (GDPR) guarantees data subjects (e.g. your employees or co-workers) certain rights pertaining to privacy and data protection – and by extension makes certain demands of data processors when it comes to what data they can collect about whom and under what conditions, how they can process it, how long they can keep it, what third parties they can share it with under what stipulations, as well as how the data must be protected.
So what does this mean for your SSL/TLS-Decryption?
Well, one vendor of firewalls makes the claim that
[The GDPR] states specifically that you are allowed to implement measures in order to secure the processing of personal data. Because of this, it’s not correct to say, “I cannot do SSL decryption because of GDPR.” In fact, it’s more accurate to say, “The GDPR requires me to do it.”
This claim is obviously (on its face) so categorical that it's obviously wrong.
The argument put forth claims (at least in my reading) that you can process some personal data (the data in the encrypted traffic that you wish to decrypt) because you wish to secure the processing of some other personal data (whatever that may be). This is a pretty charitable reading of the argument, because it certainly wouldn't make sense to argue that you need to decrypt (i.e. process; make non-confidential and inspect) a piece of data in order to secure that same piece of data – this would be self-contradictory.
There's also the fact that even "just" metadata – e.g. URLs – could in and of itself constitute special category data (colloquially known as sensitive personal data), which triggers even stricter requirements in the GDPR. This is because there is a non-zero possibility of a URL "revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, [...] trade union membership [or] genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation".
A slight digression
URLs revealing personal data in your logs would, of course, be an issue even without SSL/TLS-decryption – meaning there's potential privacy impact and a compliance issue in any firewall or web server logging initiative.
IPs themselves are already categorized by the EU as personal data (this is something you should have some awareness of too), though in the case of webserver logs you'd probably be able able to use "legitimate interest" as basis for processing, make sure they're not available for anyone that don't need access, and rotate your logs every so often.
Still, this is no news – the Internet Engineering Task Force's (IETF) wrote a draft update to "RFC 6302 Logging Recommendations for Internet-Facing Servers" back in 2018, which suggests:
- Full IP addresses should only be stored for as long as needed to provide a service;
- Logs should otherwise only include the first two octets of IPv4 addresses, or first three octets of IPv6 addresses;
- Inbound IP address logs shouldn't last longer than three days;
- Unnecessary identifiers should not be logged – these include source port number, timestamps, transport protocol numbers, and destination port numbers; and
- Logs should be protected against unauthorised access.
More info in this article in The Register.
Notice that we haven't even touched on the payload – the data that is communicated between two parties (e.g. a browser and a web server) – which could, naturally, reveal even more detailed information.
An attempted analysis
In short, the identified issue is: even URLs can potentially reveal a lot about a person – and must thus be considered (potentially) sensitive personal data. Payloads are, of course, an even bigger issue. What solution can we come up with to work around this?
One approach might be to not retain any logs of payloads and/or URLs from your SSL/TLS-decryption at all, except for concrete suspicions (i.e. perform allow-list-based logging).
Another approach could be to exclude as much traffic as you can (e.g. categories such as "health", etc. – if your solution supports this), knowing that this will create blind-spots in the excluded traffic.
Either way, you should also restrict access to any resulting logs, analysis tools and related itself as much as possible. You should also retain data (i.e. logs) for as short as practically possible (probably somewhere around 30-60 days or so).
This means that you will have to do trade-offs, and weigh the interests of privacy and security against each other – and realize that you probably won't have perfect forensic capabilities.
No matter what, you should still be aware that you can't ever really guarantee that you won't have personal (and maybe even sensitive) data as a consequence of analyzing your network traffic.
Hence: you need to perform a Data Protection Impact Assessment (DPIA).
Contrary to what some believe – when dealing with personal data, you can't just decide that one one consideration is more important than another. If you don't base your processing on consent, but in stead opt for the oft-abused "legitimate interest" as your lawful basis for processing, you still need to formally weigh the interests, and document this in a Legitimate Interests Assessment (LIA), to decide whether it's reasonable to do so. And you're obligeated to inform the data subjects. You must also attempt to reduce the privacy impact as much as possible.
It's almost as if you can't just blindly trust vendors (no way...) and actually have to make detailed assessments for yourself, just like the GDPR requires.