Google have got into a bit of hot water when it emerged that while their cars drove around taking pictures for their Street View service, they collected and stored people’s private WiFi traffic. People have understandably got angry with Google for doing this, but I think some demystification is in order.
Did they collect my private data?
If your WiFi access point uses WPA with a good pass-key, don’t worry. Your network traffic is encrypted and is just noise without that pass-key.
If your WiFi is “open”, then anyone within range can collect and look at your network traffic. I would be more worried about that creepy guy in the van parked around the corner, maliciously snooping on you, spamming and browsing dodgy websites. Worrying about Google would be way down my list. Take this opportunity to switch on WPA on your access point. This article will still be here afterwards.
How come their camera cars collect WiFi data at all?
It can be used to supplement or replace GPS. Google are in the mapping and navigation business, and knowing where you are is essential to helping you get where you are going.
If you go out to some random spot in a built-up area and switch on your laptop’s WiFi gizmo, you’ll find several access points, both public and private, all with a variety of weird names. Make a list of those access points and their signal strength, compare it against a list of known access-points and their previously monitored location, do a few calculations and you’ll have your location.
No need for GPS electronics, just use the same WiFi electronics your laptop will have anyway.
That’s very nice, but even my WiFi address is private. They shouldn’t have collected even that.
Is it? By necessity, your WiFi access point has to broadcast it’s identity to the public in the clear. Your neighbours might be using WiFi too, possibly within range of your own laptop. When it hears something broadcast, it loads the packet and looks to see if its from an access point it knows about. It’ll be receiving lots of noise from your neighbours and silently throwing away anything it’s not interested in.
Now apply the principle that no-one else should even look at a packet’s identity. You’ll have no way of knowing which packets are yours and which are someone else’s unless you do look at the access points identity. It’s part of the protocol.
But they collected private traffic as well as just the access point’s identity. How could that be accidental?
Even when you are only interested in the identity of an access point, you need to collect a whole packet before it’s useful. The trouble is that radio on it’s own is subject to noise and interference. To fix this, the clever people that designed the WiFi protocols added a noise check. Before a packet is broadcast, some simple calculations are done on the content of the packet and the result is added on the end. The recipient takes the packet and performs the same calculation on the content. If the result the recipient ends up with is the same as the number on the end of the packet received, it can be reasonably sure the packet arrived without errors.
For this to work, the recipient needs the whole packet. If they only listened to first bit where the sender’s identity is stored, there is a risk of noise creeping in, masquerading as correct data.
But why did they store the whole packet after the error check has passed?
Very little of a software developer’s work is making things from scratch. Instead, we reuse and build upon work done in the past. We make reusable components that can be reused for different things.
I can only speculate here, but I imagine that when Google put this project together, they would have taken a generic WiFi receiver component which has been well tested and trusted rather than build an entirely new one. The packet is the natural unit of a WiFi receiver, so it would be expected that generic components designed to deal with WiFi traffic would store whole packets as a matter of routine.
Wouldn’t they have noticed a huge data file if they were only planning to store a fraction of what they did collect?
They would have been taking pictures and collecting many image files at the same time, so the space taken up by captured WiFi traffic would be a small proportion. Even if they were only collecting WiFi locations, the amount of storage that would be required in the field isn’t quite so predictable. Databases aren’t simple files where one item is stored one after the other, but are complex structures with indexing and redundant copies.
I imagine that if I were an engineer at Google and I wanted an estimate of how many hard disks to buy, I would send the car out on a short test journey and see how big the database is when it came back. Multiply that figure by however far the car will be going and that’s how much storage I’ll need. Hard disks are not that expensive these days, so spending engineer’s time working on reducing the amount of storage needed might not be a good economy.
Even so, collecting private network traffic is illegal. If I were caught eavesdropping, I probably wouldn’t get away with it.
(I’m not a lawyer, and this is not legal advice. If you take legal advice from a software engineer, you’re insane.)
If Google were taken to a criminal court over this, they could show that there was no intention to eavesdrop as I’ve outlined. If they take steps to securely destroy the additional collected data, no-one has been harmed here. Prosecuting this “crime” would be a petty reaction to a simple oversight.
But I don’t trust Google to not look at and abuse the collected private data.
If you’re not using WPA, your private data has been broadcast to all and sundry in range since you started using it, and you’re only worried now?
Picture credit: 'Shot of Daventry area while cycling' by... me!