Explosive revelations of massive surveillance programs conducted by government agencies by the former contractor Edward Snowden triggered new debate about the security and privacy of each individual who is connected somehow to the Internet and after the Snowden’s disclosures they think that by adopting encrypted communications, i.e. SSL enabled websites, over the Internet, they’ll be secure.
People do care of their privacy and many have already changed some of their online habits, like by using HTTPS instead of HTTP while they are surfing the Internet. However, HTTPS may be secured to run an online store or the eCommerce Web site, but it fails as a privacy tool.
The US researchers have found a traffic analysis of ten widely used HTTPS-secured Web sites “exposing personal details, including medical conditions, financial and legal affairs and sexual orientation.”
The UC Berkeley researchers Brad Miller, A. D. Joseph and J. D. Tygar and Intel Labs' researchers, Ling Huang, together in ‘I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis’ (PDF), showed that HTTPS, which is a protocol to transfer encrypted data over the Web, may also be vulnerable to traffic analysis.
Due to similarities with the Bag-of-Words approach to document classification, the researchers refer their analysis as Bag-of-Gaussians (BoG).
“Our attack applies clustering techniques to identify patterns in traffic. We then use a Gaussian distribution to determine similarity to each cluster and map traffic samples into a fixed width representation compatible with a wide range of machine learning techniques,” say the researchers.
They also mentioned that, "all capable adversaries must have at least two abilities." i.e. The attacker must be able to visit the same web pages as the victim, allowing the attacker to identify patterns in encrypted traffic indicative of different web pages and "The adversary must also be able to observe victim traffic, allowing the adversary to match observed traffic with previously learned patterns" they said.
The Test analysis carried out in the study includes health care services, legal services, banking and finance, Netflix and YouTube as well. The traffic analysis attack covered 6,000 individual pages on the ten Web sites and identified individual pages in the same websites with 89% accuracy in associating users with the pages they viewed.
Snowden mentioned previously, "Encryption works. Properly implemented strong crypto systems are one of the few things that you can rely on. Unfortunately, endpoint security is so terrifically weak that NSA can frequently find ways around it" So, the technique allows Government agencies to target HTTPS traffic to mine metadata from ISP Snooping, Employee Monitoring, and which they could use for Surveillance and Censorship purpose.