(Updated)
Thank you, Google for resisting the DOJ’s effort to obtain user search data. You put up a good fight to protect our privacy, and you won. Too bad it was all in vain.
AOL, in blatant violation of its users privacy just released the log of 3 month’s worth of searches by 650,000 users. Not to the DOJ, but for open download by anyone. The claim:
“This collection is distributed for non-commercial research use only. Any application of this collection for commercial purposes is STRICTLY PROHIBITED”
Prohibited. Yeah, right. As if they could control it. The data is supposedly “anonymized”, which in AOL-speak means the screen-name is replaced by a unique user number. Anyone a little bit familiar with data mining knows what this means, and obviously some commenters on the AOL blog have already put two and two together, “outing” certain users whose identity was easy to find based on the search patterns. I don’t even want to think what data mining pro’s will do with this.
AOL, you betrayed your users. If they are any smart, they will boycott your services.
Update #1 (8/6): I’m going out on a limb here with this prediction: as they realize the magnitude of what they did (or if they don’t, due to the PR nightmare) AOL will apologize, the fingerpointing starts and heads will roll. They will remove the download link. Not before anyone who wanted the data will have obtained it though.
Update #2 (8/6): TechCrunch further elaborates on the “utter stupidity” of this move by AOL:”
“The data includes personal names, addresses, social security numbers and everything else someone might type into a search box. The most serious problem is the fact that many people often search on their own name, or those of their friends and family, to see what information is available about them on the net. Combine these ego searches with porn queries and you have a serious embarrassment. Combine them with “buy ecstasy” and you have evidence of a crime. Combine it with an address, social security number, etc., and you have an identity theft waiting to happen. The possibilities are endless. “
Update #3 (8/6): The download link leads to a blank page. Perhaps AOL Exec’s are waking up… I wish all my predictions (see the first update above) would materialize this fast. I wonder if there will be a black market for the “limited edition” downloaded dataset… eBay, anyone?
Update #4 (8/6): Dennis pondering about possible ramifications, partly based on our Skype IM:
- Zoli estimates maybe 1,500-2,000 downloads by the time AOL woke up to what theyโd done. Whatโs the real number?
- How long was the file in the wild?
- Could illicit copies end up on eBay?
- Could market data derived from the file end up on eBay or as part
of a market intelligence offering? Almost certainly the second if not
the first.
- What will be the impact on AOLs stock price?
- Might shorters speculate on the impact?
- What about a class action lawsuit? For once I think there are
decent grounds for one of the ambulance chasers to send out its hit
squad – they may even get what they need from the file
- Will AOL be able to track who got the file?
- What is the potential for wholesale identity theft among those 650,000 AOL users?
Update #5 (8/6): The last thing I expected was to find myself deleting comments; but this situation forced me to. A commenter provided a link to his site where he put up the file for anyone to download. I know the cat is out of the bag, and there will be several other sites, but at least I don’t want to actively promote making a bad situation even worse. Since I can’t edit comments, my only choice was to delete it.
Update #6 (8/7): ZDNet agrees: “People will be boycotting the company because of their blatent disregard for the privacy of users.”
The news is out on Infoworld, was well as mainstream news media all the way to Korea.
Update #7 (8/7): AOL responded by email to John Battelle, also quoted at SiliconBeat. “The summary: Man, did we screw up.”
Related posts:
- AOL Releases Search Logs from 500,000 Users
- AOL Research exposes data; we’ve got a little sick feeling
- AOL releases search data on 500k usersโฆ and then tries to take it back
- More AOHell
- Forget The Government, AOL Exposes Search Queries To Everyone
- AOL Gate: Search Query Data Scandal
- You never had privacy anyway
- What do people search AOL for? Now we know
- AOL Shared Private Search Queries
- AOL Search Data Launches World’s Biggest Experiment On Privacy Invasion
- No, no, no, you weren’t supposed to tear down THAT wall
“Blatant violence”? Did you mean “Blatant violation”? ๐
Oh, thanks for catching it, corrected ๐
Reality check:
1. anonymized search logs are perfectly legal to distribute, and in compliance with all rules
2. anonymized Query logs are widely used in research, only that people have to specifically license it from google / yahoo etc, by working for them / with them.
3, Query logs are an invaluable commodity for the research community. Without them all web research is like trying to design something without having any requirements to cater to (tailoring clothes without measurements!).
4. Big players like Google and Yahoo DO NOT share this with the research community, using it internally to maintain their monopoly.
Regarding comparisons with Google:
1. technically, the “Google Trends” service is also an exposed view of their search logs
2. Google Research also released internal data on the same day, for the same reason as AOL Research. Maybe you should produce blog post worshipping Google about this.
3. Google not only studies what you search for, it also studies every bit of your email at Gmail, your calendar, and your IM conversations. Anyone who knows how to de-anonymize these search logs can also use adsense to deanonymize your email (using impression statistics).
I’m myself am totally against AOL and it’s spammy nature of shoving their CDs around to everyone. However, I’m also against ignorant Google-fanatism, hence this comment. Imagine the advances in search technology that this data will bring. Why should a few kilobytes of data hurt?
Yes, I think about 5,000 AOL heads wiull roll over this.
5000 + 1 ?
AOL releases private search data
AOL just released information about 20 million web queries from 650,000 users. They just changed usernames into random strings, but they kept user-data association. Techcrunch makes privacy implications very clear.
Blogs are buzzing, AOL users are gett…
Aol Releases Googles most prized Keyword List… Google is gonna get mega spammed.
I’m shocked that AOL released this data,
AOL Releases User Data
TechCrunch is reporting that AOL has made available the search histories of 650,000 of their users. The user account name is replaced with an ID number, but as Michael Arrington correctly points out, there is often enough information in search
AOL screws the pooch – or at least about 650,000 of their own users
Stunning Privacy Breach by AOL
While most reports have commented on personally identifiable information in the queries, there’s a greater risk of identification due to ability to link “questionable” queries to requests to government web sites.
…
The real threat to privacy isn’t as much the personal information as the presence of timestamps. That allows potentially any query, and thus user, to be tracked back to IP address. Especially if government owned sites are involved.
See details at Stunning Privacy Breach by AOL.
Boycott AOL
Zoli Erdos: “AOL, in blatant violation of its users privacy just released the log of 3 month’s worth of searches by 650,000 users. Not to the DOJ, but for open download by anyone.”
Luckily, I’ve never used AOL for search. Almost…
AOL discloses 650,000 AOL users’ search data
Well this isn’t going to help AOL’s image. Over the weekend, AOL researchers posted a 400MB+ tarball of the raw search query data of some 650K AOL users over the period from March 1, 2006 to May 30, 2006. While…
Apart from any ethical issues, AOL has breached its contract with its users. This disclosure contradicts AOL’s own privacy policy, which names search data as being part of a user’s network information, says that a user’s network information will only be disclosed as described in the privacy policy, and makes no mention of just publishing the data for public research. (There is a mention of using the data for researching use of the AOL network, but that’s not the same as letting the whole world do that).
See:
http://about.aol.com/aolnetwork/aol_pp
Sean (http://www.prompt-communications.com)
1) There will not be a black market for this data on ebay. Predictably, it is already mirrored and torrented.
2) searching for drugs or words with drug connotations is not a crime. I doubt there is even probable cause for any police force to get a warrant. At this point, I could type “buy ecstasy” and this page might be the top hit.
3) Google has released 6 DVDs worth of 5-gram search terms. They will not give a hoot about this “large” dataset
4) AOL still has users?
Ver
AOL: ืื ืืืฉืจื ืืืฉืคืืื ืืืืจืืงืื, ืื ืืื ืืฉืืจ
ืืฆืขื ืืกืจ ืชืงืืื, ืืืจืช AOL ืืืืืื ืืฉืืจืจ ืืช ืืื ืืคืขืืืืช ืฉื ืื ืืืื ืืงืื ืืจืื. ืืฉืืืืฉ ืืืืืข,
time to short AOL stock!
What Google did was release information informing everyone of how often everything is searched for – not who searched for it. There is absolutely no user specific information available from Google’s (soon to be) published data and it should come to no surprise if “porn” tops the list of words searched for. On the other hand, maybe some names are searched for often and it shouldn’t be too much of a surprise to see some in the list, but that doesn’t mean that Steve Jobs or Bill Gates are typing their own names into Google.
Hey, even with the AOL data there’s still an amount of deniability, but it’s appalling that anyone should be put in the position of having to deny anything. And the idea of a “unique” user ID means that at some point, somewhere AOL probably have a file that says how that corresponds to the user that did the search. If that file ever gets out, then all anonimity has been completely lost.
It might well be that searching for drugs on AOL is not a crime, but after reading this: http://www.twopercentco.com/rants/archives/2006/08/drop_the_sudafe.html , I wouldn’t be so sure.
Don’t Blame Just AOL — The Bloggers are at Fault Too!
AOL released a large database of searches that includes 20 million web queries from 650,000 AOL users. Even though they changed the AOL username to a random ID number, they did not filter the results in any other manner. Unfortunately, people’…
AOL’s Appallingly Bone-Headed Move
Something happened over the weekend that I’m at a complete loss to explain. AOL released a list of over 20 million searches by 500,000 users. The online giant apparently did this for “research” purposes, although a key battle was won…
AOL Releases Searchs From 500,000 Users
Remember the big hubbub of the Government trying to get search data from Google and Microsoft last year? Well, apparently no one at AOL does, they just released search data from 500,000 users, they removed the AOL username, but just changed it to a ran…
It was BAIT, you morons…
AOL will instead scapegoat the people who mirrored the data. They will use their PR team, dupe a government agency into denouncing the “dangerous” linkers, file a lawsuit, and drive the media to villainizing anyone who dared mention *their* bungle as hackers.
It’s been done before: http://corphq.livejournal.com/60599.html
Kill the messenger, and all you get is quiet.
As long as the public falls for this sort of distraction tactic, they will deserve the world of corporate secrecy and cover ups they get.
The information in the database is _not_ anonymized. There are unique ID’s associated with each query, making it very possible to relate the identity of any given person in this database to their search query. The table that relates the anonymous ID’s to the users isn’t out in public – Yet. But I guarantee that information exists somewhere. You better believe it can be subpoenaed if a law enforcement agency that gets their hands on this database decides to go on a fishing expedition. I hope no innocent people on this list were doing research on questionable subjects or else they can say hello to “probable cause”.
Zoli – yawn. I knew it was coming all along because I am a Precog – -)
read more at
http://dealarchitect.typepad.com/deal_architect/2006/08/the_intention_e.html
Reporting about it is good. Distributing it to the public is bad–as bad as what AOL did. In fact, it is what AOL did. Yes, the data’s already out there, but that doesn’t mean the users no longer have rights. Personally, I say boycott everyone who intentionally distributes this data. (e.g., Slashdot)
Agree, in fact that’s why for the very first time I had to delete two comments – they were pointing to mirror sites.
hell, AOL won’t be able to stop it. by now, everyone on the internet has seen it, pretty much.
http://www.aolsearchdatabase.com if you havent
AOL apologizes for privacy leak
America Online posted a file containing three months of anonymized
search queries of 658,000 users…
I wonder if you can now search aol search and find the search logs online? How ironic would that be? The search engine actually showing you where to find somthing that they dont want you to find…
These bastards should be shot for this, no-one in their position should make a mistake like this in their position
counter-reality check:
1. not in my country they aren’t (the Netherlands), unless you told the user you would *and* had business needs in keeping the data in the first place
2. the govt. asked for them and AOL complied. all products of American govt. research are in the public domain, last time i heard. is this AOL’s not-so-subtle way of making sure the govt. complies with that?
3. this is not a few kilobytes but a basic violation of trust. AOL deserves to die over this.
AOL users are too dumb to boycott
Can’t call AOL ?????? always push 2 or 4 call this number
1-866-859-0176 its for aol collection department always
there no number to push.
wow. That is too bad AOL would do that.
SCAM!
Your post about using the 1-866-859-0176 number to call AOL is a SCAM.
Go this this web page:
http://800notes.com/Area-Code.aspx/1-866
and find the number in the list, then check the postings by others who almost were scammed.
If it’s not a scam then, Anonymous poster, please reveal your identity and your location if you dare.
I suppose he won’t … but I still don’t understand. What’s the purpose of the scam?
Thatโs absolutely disgusting move from AOL. I would never believed that they would do such a thing. I guess they want to have the government by their side as they are trying to โtake overโ and own the internet.
[…] visit. If someone gets enough information about you, they can potentially identify you โ the AOL search fiasco is a great example of […]
I like this theme you are using… what is it?
GenkiTheme, available here:
http://ericulous.com/2007/05/21/wp-theme-genkitheme/
The public should kick AOL the curb with the rest of the trash. No need to put up with AOL’s extreme “Riech Wing” policies. Do they really think they are above the law?
Boycotting Time Warner might be a good plan as well.
There are amzing Architecture Resumes in sampleresumes.in. This resumes are very effective in the corporate world.
Wow, this is very nice of you to take your time and do. Yours is the only list I’ve been using, and I report like crazy (don’t know if you can see who reports what, but trust me, I report!), and hopefully it will just continue to grow and get bigger. I’ve found about 10 good blogs that I’m a part of now thanks to your list. Thanks!