« Autohornblowing | Main | One year anniversary stats »

December 18, 2005

The Probabilistic Age

325pxnormal_distribution_pdf_3 Q: Why are people so uncomfortable with Wikipedia? And Google? And, well, that whole blog thing?

A: Because these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at the macroscale.

Q: Huh?

A: Exactly. Our brains aren't wired to think in terms of statistics and probability. We want to know whether an encyclopedia entry is right or wrong. We want to know that there's a wise hand (ideally human) guiding Google's results. We want to trust what we read.

    When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out for such things as accuracy. But now we're depending more and more on systems where nobody's in charge; the intelligence is simply emergent. These probabilistic systems aren't perfect, but they are statistically optimized to excel over time and large numbers. They're designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale.

    But how can that be right when it feels so wrong?

    There's the rub. This tradeoff is just hard for people to wrap their heads around. There's a reason why we're still debating Darwin. And why Jim Suroweicki's book on Adam Smith's invisible hand is still surprising (and still needed to be written) more than 200 years after the great Scotsman's death. Both market economics and evolution are probabilistic systems, which are simply counterintuitive to our mammalian brains. The fact that a few smart humans figured this out and used that insight to build the foundations of our modern economy, from the stock market to Google, is just evidence that our mental software has evolved faster than our hardware.

    Probability-based systems are, to use Kevin Kelly's term, "out of control". His seminal book by that name looks at example after example, from democracy to bird-flocking, where order arises from what appears to be chaos, seemingly reversing entropy's arrow. The book is more than a dozen years old and decades from now we'll still find the insight surprising. But it's right.

    Is Wikipedia "authoritative"? Well, no. But what really is? Britannica is reviewed by a smaller group of reviewers with higher academic degrees on average. There are, to be sure, fewer (if any) total clunkers or fabrications than in Wikipedia. But it's not infallible either; indeed, it's a lot more flawed that we usually give it credit for.

    Britannica's biggest errors are of omission, not commission. It's shallow in some categories and out of date in many others. And then there are the millions of entries that it simply doesn't--and can't, given its editorial process--have. But Wikipedia can scale to include those and many more. Today Wikipedia offers 860,000 articles in English - compared with Britannica's 80,000 and Encarta's 4,500. Tomorrow the gap will be far larger.

    The good thing about probabilistic systems is that they benefit from the wisdom of the crowd and as a result can scale nicely both in breadth and depth. But because they do this by sacrificing absolute certainty on the microscale, you need to take any single result with a grain of salt. As Zephoria puts it in this smart post, Wikipedia "should be the first source of information, not the last. It should be a site for information exploration, not the definitive source of facts."

    The same is true for blogs, no single one of which is authoritative. As I put it in this post, "blogs are a Long Tail, and it is always a mistake to generalize about the quality or nature of content in the Long Tail--it is, by definition, variable and diverse." But collectively they are proving more than an equal to mainstream media. You just need to read more than one of them before making up your own mind.

    Likewise for Google, which seems both omniscient and inscrutable. It makes connections that you or I might not, because they emerge naturally from math on a scale we can't comprehend. Google is arguably the first company to be born with the alien intelligence of the Web's large-N statistics hard-wired into its DNA. That's why it's so successful, and so seemingly unstoppable.

    Paul Graham puts it beautifully:

"The Web naturally has a certain grain, and Google is aligned with it.  That's why their success seems so effortless.  They're sailing with the wind, instead of sitting becalmed praying for a business model, like the print media, or trying to tack upwind by suing their customers, like Microsoft and the record labels. Google doesn't try to force things to happen their way.  They try to figure out what's going to happen, and arrange to be standing there when it does."

The Web is the ultimate marketplace of ideas, governed by the laws of big numbers. That grain Graham sees is the weave of statistical mechanics, the only logic that such really large systems understand. Perhaps someday we will, too.

[Update: Nicholas Carr, who seems to have inherited the Clifford Stoll chair of reliable techno-skepticism, has a clever and well-written response here.]

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341bfb6353ef00d8345b783c69e2

Listed below are links to weblogs that reference The Probabilistic Age:

» Why tagging and Wikipedia work from Ton's Interdependent Thoughts
Chris Andersen writes a piece that I recommend you to goread in full, to prevent me from quoting it here in full.With clients and others I often have a hard time explaining my information strategy when it comes to blogreading... [Read More]

» Excellent post on the Long Tail from On IT and beyond
Chris Anderson has another great piece on the Long Tail. Generally, I have nothing to add in this context. Interestingly enough, there is enough software that is expected no behave in a non-Gaussian way - that is: they have to work perfectly with no fl... [Read More]

» Have faith from Rough Type: Nicholas Carr's Blog
Wired editor Chris Anderson offers a spirited defense of internet "systems" like Wikipedia, Google, and the blogosphere. Criticism of these systems, he argues, stems largely from our incapacity to comprehend their "alien logic." Built on the mathematic... [Read More]

» Have faith from Rough Type: Nicholas Carr's Blog
Wired editor Chris Anderson offers a spirited defense of internet "systems" like Wikipedia, Google, and the blogosphere. Criticism of these systems, he argues, stems largely from our incapacity to comprehend their "alien logic." Built on the mathematic... [Read More]

» Chris Anderson on Probabilistic Thinking from The Stalwart
Forgive the spate of link entries, The Stalwart is on partial vacation this week in Austin, TX. We like to rib the whole long-tail crowd for letting one idea so dominate their worldview, that almost everything can be seen through [Read More]

» Probability, Superstition and Ideology from alex wright
Nick Carr makes the humanist case against Chris Anderson's defense of probabilistic systems like Google and Wikipedia, taking issue with Anderson's argument that qualitative criticisms of these systems fail to recognize the virtues of sacrificing "perf... [Read More]

» Probability, Superstition and Ideology from alex wright
Nick Carr makes the humanist case against Chris Anderson's defense of probabilistic systems like Google and Wikipedia, taking issue with Anderson's argument that qualitative criticisms of these systems fail to recognize the virtues of sacrificing "perf... [Read More]

» Emergent Properties of the Long Tail from Emergent Chaos
Chris Anderson warms the cockles of our heart as he discusses the psychological acceptability of "The Probabilistic Age:" When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out fo... [Read More]

» Lots of links from JD on [TBD]
Lots of links: I've kept a lot of tabbed windows open in my browser the past week, but didn't have sufficient original commentary to justify sending each to the aggregator... here's a bunch of recent postings, some of which you may find of interest too... [Read More]

» The Politics of Statistics from Ryan Shaw
Chris Anderson has posted an absurd piece called The Probabilistic Age in which he suggests that the reason people arent comfortable with Wikipedia and Google is that they are systems that operate according to the laws of probabilistic statisti... [Read More]

» Why tagging and Wikipedia work from Ton's Interdependent Thoughts
Chris Andersen writes a piece that I recommend you to goread in full, to prevent me from quoting it here in full.With clients and others I often have a hard time explaining my information strategy when it comes to blogreading... [Read More]

» "The Probabilistic Age" - Why Wikipedia Works from Influence
A topic of continuing interest here is why and how wikipedia works (which we think it does), and so this commentary by Chris Anderson, writer for Wired, is insightful: The Probabilistic Age: "Q: Why are people so uncomfortable with Wikipedia? And [Read More]

» Probability the Mammalian Brain from exoskeleton
Check out this essay on the Long Tail blog (which I think is written by one of the Wired editors) which answers the question: Why are people so uncomfortable with Wikipedia? And Google? And, well, that whole blog thing? Our brains aren&... [Read More]

» Probablistic systems from Johnnie Moore's Weblog
There's a thought provoking post by Chris Anderson on probablistic systems - and some good debate in the comments and trackbacks. One of those led me to this post by Wiggy:This is a battle. Wikipedia is under attack by those... [Read More]

» Probablistic systems from Johnnie Moore's Weblog
There's a thought provoking post by Chris Anderson on probablistic systems - and some good debate in the comments and trackbacks. One of those led me to this post by Wiggy:This is a battle. Wikipedia is under attack by those... [Read More]

» Google e Wikipedia: por que o desconforto? from De Gustibus Non Est Disputandum
O link veio do Marginal Revolution (link fixo aí ao lado), mas a entrevista está aqui e eu a recomendo. Claudio... [Read More]

» Micro vs. Macro in a Duel to the Death from Snarkmarket
Get ready: I am about to compare Wikipedia to Wal-Mart. Chris Anderson says the magic of Wikipedia (and other internet systems, e.g. Google) is that they work on hugely macro "probabilistic" scales. Think of it like this: To put it... [Read More]

» 蓋然的(確率的)時代 from The Croton
Unofficial Japanese translation of "The Probabilistic Age" by Chris Anderson. [Read More]

» 蓋然的(確率的)時代 from The Croton
Unofficial Japanese translation of "The Probabilistic Age" by Chris Anderson. [Read More]

» Probabalistic Information Flow from Toomre Capital Markets LLC
At TCM, we spend a lot of time talking about the convergence of asset markets, the liability markets and the liquidity markets. The liability markets and to a lesser extent, the liquidity markets are focused significantly on probabilistic statistics. [Read More]

» Länkar från 2005-12-22 from k-mrkt
Jeffrey Zeldman: Style vs Design Zeldman om webbutveckling: "Design är kommunikation", "De flesta webbsidor ska användas", "Därför måste webbsidor... [Read More]

» Probabalistic Information Flow from Toomre Capital Markets LLC
At TCM, we spend a lot of time talking about the convergence of asset markets, the liability markets and the liquidity markets. The liability markets and to a lesser extent, the liquidity markets are focused significantly on probabilistic statistics. [Read More]

» Probability, Superstition and Ideology revisited from alex wright
Gartner's Nick Gall sent along a few thoughts on my earlier post Probability, Superstition and Ideology (itself a commentary on earlier posts by Nick Carr and Chris Anderson). With Nick's permission, I've excerpted his comments here: "The image of a... [Read More]

» Amazon's Recommendations are Probabilistic from Kaedrin Weblog
Amazon.com is a fascinating website. It's one of the first eCommerce websites, but it started with a somewhat unique strategy.... [Read More]

» Amazon's Recommendations are Probabilistic from Kaedrin Weblog
Amazon.com is a fascinating website. It's one of the first eCommerce websites, but it started with a somewhat unique strategy.... [Read More]

» The Probabilistic Age from Musings From Alfheim
But now were depending more and more on systems where nobodys in charge; the intelligence is simply emergent. Chris Anderson Chris is Patient Zero of the Long Tail meme. I finally got around to giving an in-depth read of his lat... [Read More]

» Challenges for Blog Analysts from Netcoms
"Blogs are a long tail" Chris Anderson recently observed, in a post otherwise dedicated to explaining... [Read More]

» Cheating Probabilistic Systems from Kaedrin Weblog
Further discussion of probabilistic systems like Amazon.com recommendations, Google, and Wikipedia, including specific references to "cheating" in those systems. Also noted is how these new systems are not meant to replace the old, but in the words of ... [Read More]

» Cheating Probabilistic Systems from Kaedrin Weblog
Further discussion of probabilistic systems like Amazon.com recommendations, Google, and Wikipedia, including specific references to cheating in those systems. Also noted is how these new systems are not meant to replace the old, but in the words of Ne... [Read More]

» How Many Worms In A Can? from theQview
The problem with James Surowiecki's bookThe Wisdom of Crowds is not in its logic but in its application. Instead of understanding and questioning the limits of group wisdom, it is currently vogue to simply cite the book, drink the Kool-Aid [Read More]

» Internet Search Engine from Web Search Engines
Blog search engines help you find blogs on the Web on whatever topic you'd like to ... Profile of AltaVista, One of the Oldest Search Engines on the Web... [Read More]

» The Anti-Authoritarian Age from Mike Linksvayer
In a compelling post Chris Anderson claims that people are unconfortable with distributed systems [b]ecause these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at t... [Read More]

» Yahoo improves My Web 2.0 from myBlog
Yahoo! My Web 2.0 is an extension of the My Web personal search service. My Web 2.0 lets you save your bookmarks and share your bookmarks with family, friends and colleagues. You can also discover new things by browsing what’s popular or interesting to... [Read More]

» MSN Search's WebLog from MSN tests new blog, search features
MSN tests new blog, search features | The service will let users find blogs and syndicate content using the RSS format, as well as search blogs for specific ... [Read More]

» Medical License Search from Pharmacy Web Site Directory
Search engine for Florida Health related licenses. ... Skip left hand navigation and go to main body of page. Welcome to the Health and Human Services ... [Read More]

» MP3 Downloads, Find your favorite mp3 from Mp3 Search
Enter Artist or Song or Album name to search:. Download MP3 Music for $0.10 per song. MP3 Archive:. # - A - B - C - D - E - F - G - H - I - J - K - L - M ... [Read More]

» BEST OF MP3 MUSIC, SOUNDTRACKS, COLLECTIONS AND FULL ALBUMS from MP3 Directory
BEST OF MP3 MUSIC, SOUNDTRACKS, COLLECTIONS AND FULL ALBUMS. ... 17.04.2006 Silent Voices Silent Voices: Silent Voices - Full Album: Building Up The Apathy ... [Read More]

» Probabilistic accuracy v definitive authority from aTypical Joe: A gay New Yorker living in the rural south.
Chris Anderson has a wonderful post on why people are uncomfortable with Wikipedia, Google and blogs. It's because these systems "sacrifice perfection at the microscale for optimization at the macroscale." He says we're living in a probabilistic age: T... [Read More]

» Pilot gets 15 months in cable car deaths (AP) from heavy equipment
whose helicopter dropped heavy equipment onto a ski lift in Austria last year, killing nine Germans, was convicted [Read More]

Comments

Wikipedia is not a probabilistic system.

I do not really "understand" Google because the math is beyond me, but I trust it. I understand Wikipedia just fine, which is why I don't trust it.

Information systems are only useful to the user at the point in time at which the system is accessed. At the time of a Google search you are presented with a mathmatically determined 'average' value; the sum wisdom of the internet's hyperlinks. It is an average value, and even if 30% of the links on the web are "wrong" you still get the right answer.

Wikipedia does not work like that. When you access Wikipedia you do not get the average value of an article; you get the last author's value only. Instead of getting a probabilistic average you instead are getting a single data-point.

Google is "wrong" only when the entire web is wrong. This happens on occasion, such as when an urban legend becomes more popular than the truth (when it's done purposefully it's called a Google Bomb). Wikipedia is wrong when a single person is wrong. It is also incredibly easier to "bomb" Wikipedia. Anyone with a login can do it with 1 minute's work. With 860,000 articles an error in an obscure article can remain undetected for some time.

(I found an article where someone had inserted "Jake is the best!" or something like that in the middle of a sentence. As an experiment I left it there to see how long it took for someone to find it. It's still there 4 months later, and that's with an obvious error. An error in the data that only an authoritative source would know was wrong is likely to last even longer.)

To use an analogy most survivors of the Dot.Bomb would understand, a Google search is like predicting stock performance by taking the average stock price of every Wall St. analyst (occasionally wrong and sometimes very wrong, but usually close); while a Wikipedia search is like doing the same by trolling chat rooms for tips.

Brock,

In the popular entries with many eyes watching, Wikipedia becomes closer to the statistical average of the views of the participants, weighted by such factors the authority of each as defined by the others (frequent contributors to any entry tend to win any vote-offs). Studies have shown that for such entries, the mean time to repair vandalism of the sort you describe is measured in minutes. As Wikipeida grows that rapid self-repairing property will spread to more entries.

But the main point I was making about Wikipedia was not that any single entry is probabilistic, but that the *entire encylopedia* is probabilistic. Your odds of getting a substantive, up-to-date and accurate entry for any given subject are excellent on Wikipedia, even if every individual entry isn't excellent.

To put it another way, the quality range in Britannica goes from, say, 5 to 9, with an average of 7. Wikipedia goes from 0 to 10, with an average of, say, 5. But given that Wikipedia has ten times as many entries as Britannica, your chances of finding a reasonable entry on the topic you're looking for are actually higher on Wikipedia.

That doesn't mean that any given entry will be better, only that the overall value of Wikipedia is higher than Britannica when you consider it from this statistical perspective.

Either way it takes the academics in ivory towers out of the equation, which is both a very good and a very bad thing.

Chris,

I agree that Wikipedia as a whole has more total value than Britannica as a whole. It probably does produce more social utility than Britannica, just as the Web + Google produces more utility than a good library + a card catalog.

But no one needs the whole of Wikipedia. They need the article they need, and they need it to be (mostly) right.

My point was that individual Google searches are probabilisitic, but that individual Wikipedia articles (the ones in the Long Tail at any rate) are not. Since individual searches and articles are what matter to individual people, I think that's the more important thing to focus on.

I think Wikipedia would be more probabilistic to the user if disputed issues, history of changes, and "voting" was displayed in the actual article without having to comb through the changes. Put the statistics of opinion right out in front where the intelligent reader can judge them for himself.

I just want to make clear that I think Wikipedia is great in a lot of ways, but it is engineered poorly. Wikipedia is a lot like Communism - a nice idea, but inappropriate for humans. Too many of us has motivations far from the pursuit of objective truth. It would be far better if each author could write his own, complete version (perhaps borrowing sections using a Creative Commons license). If you don't like it, write your own, but don't mess with his. Then all readers have to do is find both articles, read them, and judge for himself.

Of course Step 1, "finding", brings us back to Google ... :-)

Brock;
You don't care about the entire Google database either, just one or two entries. PageRank isn't an average either, it's basically whoever gets has the most links today (with weighting).

I think you make an erroneous argument, that the latest Wikipedia article is the result of only the last person's edit. This would be true if every edit involved a complete rewrite of the article. This is astronomically rare. Almost all changes are incremental, and as a matter of practical interest they're often reviewed by the most recent contributors. As such, the wiki article you view is more of an average, or better an aggregation, of all previous edits. The most recent edit might be less trusted than the previous ten, but it usually represents a small portion of article.

Add to that, if you have even the slightest doubt about something, you can persue the article history to find when such a crazy thing was added.

Then there's the human habit of yielding to people who seem to know what they're talking about. This means that uninformed people tend to avoid putting in the work to contest something they don't understand, and informed and motivated people tend to do most of the work. Wikipedia's NPOV policy, maintained by crowd without the natural stimuluses towards mob mentality, means that demagogues naturally lose. This is rather unlike the practice of mid-sized groups that produce traditional encyclopedias.

And finally, I have to say that anyone who regards any *single* source as authorative gets what they deserve. Wikipedia is my first stop, and it's sometimes my last stop (for revisions) when I find out most authorative sources say something a little different.

Just try to write a report on something like witchcraft based on the Encyclopedia Britanica I grew up with. It won't even get you started. Wikipedia will though, because contributors try to be comprehensive to all input, not authorative about what something should be. That's precisely Wikipedia's strength: It's not meant to be authorative, but it will take authorative input (even when two authoraties viciously disagree). It doesn't take academic authorities out of the equation--they're reduced from all powerful to merit-weighted influence.

Brock makes some good points about Wikipedia. Surowiecki explains in WoC that a good "aggregation function" is critical to extracting the wisdom from the crowd, such as a voting mechanism or calculating the average. Wikipedia doesn't really have one. Chris suggests that "frequent contributors" win vote-offs, but that is rare, and it puts the quality issue back in the hands of a few. (Google's aggregation function is the math that Brock and I don't understand, and is their core asset).

There is another concept relevant to the WoC that Surowiecki does not spend much time on called the Condorcet Jury Theorem, which says that if the members of the crowd each individually have a less than 50% chance of getting the answer right, then the chance the crowd will get it right is almost certainly 0%. (See http://www.lessig.org/blog/archives/003027.shtml). That is a real likelihood in Wikipedia, especially if the "frequent contributors" are few and in the < 50% category.

Chris has faith that as "wikipedia grows" it will become better. I fear that the growth necessary is similar to that of a Ponzi scheme: every human being on Earth will need to be doing nothing but editing the wiki entries they have knowledge on all the time for it to be reliable.

jelons17 -

You say I fear that the growth necessary is similar to that of a Ponzi scheme: every human being on Earth will need to be doing nothing but editing the wiki entries they have knowledge on all the time for it to be reliable.

Not so, I think - there are technical fixes around that problem. An expert on, say, the First World War only really needs a stored RSS search that informs them if the pages on that subject change, or even better one that informs them if the pages on that subject change in particular ways. Any given person might need to keep a feed of the page about them (if there is one); their company (if they have one); and whatever other tiny number of things they happen to be sufficiently expert in that they would be expected to constantly edit those pages on Wikipedia in your Ponzi model.

Now, admittedly, Wikipedia doesn't have RSS searches that tell you when pages have changed. Yet. But lots of newspapers - the Baltimore Sun-Times is, I think, the longest-running example and NYT the most recent - have saved RSS search facilities, it's not especially hard to do.

I really wish I'd read this post before I wrote my post concerning what I think is going on:

http://www.well.com/~wiggy/2005/12/battle-in-new-war.html

The interesting thing is that people don't see this for what it is: an outright philosophical war. Some people think a small group who are qualified in some capacity can produce 'better' information than a much larger group that on average is lesser qualified, but INCLUDES the small 'highly qualified' group anyway.

It's OK to think about these things in terms of probabilistic systems, but it's much simpler than that: you either believe in democracy and freedom of speech or you don't. Twenty years from now, information will be a more valued resource than oil. In some industries, it already is. We have a choice: do we want to put the systems in place now to make sure we all own it, or do we actively fight against a system that seems counter-intuitive, thereby putting the ball back into the court of a very small group of people.

The Internet needs Wikipedia and sites like it. It needs information to be free and editable by anybody. To fail to work out the very small glitches and protect assets from the attacks predicted by game theory would be to plan to lose to the Murdochs, the Turners, the Rumsfelds of this World. Simple as that.

re.: "the *entire encylopedia* is probabilistic."

Doesn't this ignore the way users access the content on Wikipedia? Sure, some scholars may browse subject areas and therefore the greater content is probabilistic - but most folk engage in hit and run activity. Quick in, quick out. This is the age of attention deficit - we want single entries now not subject areas or entire encycolpedias. Wikipedia has broken our trust in the single entries of content (and not doing much to rebuild it, to be honest) - and this could be Wikipedia's downfall.

"Twenty years from now, information will be a more valued resource than oil"

This ignores supply and demand. In twenty years we'll be saturated in information and thirsting for oil

Actually, I couldn't disagree more with your observation. All biological systems are emergent, including ourselves. If you notice the folks having the problem with these types of systems, they are scientists, engineers or business folk. These people have been trained since their formative years to think in a quite unnatural way when dealing with the world. The world does not follow a simple set of linear equations that can be pulled from you typical college textbook, it is emergent. Yes, they may be using a mathematical technique to exploit emergent properties in an information space, but that doesn't make it any less emergent. For most of us, it actually feels right already. It's most of you who it feels wrong for, with "you" being the scientists, academics, professionals, etc.

I always think the best example of "Wisdom of Crowds" is the "Ask the audience" part of Who Wants to be a Millionaire. The crowd is almost never wrong. The people who don't know the answer make a random guess, but all the random guesses cancel each other out and you're left with the people who really DO know the answer.

Umm, all you seem to be saying is that these system are built to be mostly right, most of the time, and we strange weird primitives don't "GET IT" when we are bothered that they're notably wrong many times.

That's a comprehensible view - but not necessarily an easily defensible view!

Piers: I can't speak for anyone, but I find that due to the interlinked structure of Wikipedia, it's rare that I view only a single entry. Typically, I surf broadly related entries for a half-hour or more, absorbing information on a variety of topics.

Obviously, this is pure, not applied, research. But is Wikipedia really the tool for applied research anyways?

The "Rumsfelds of the world"? Is he a media mogul now, too?

Re: everyone editing wikipedia all the time. It should not be necessary to edit a topic more than *once* or to monitor it constantly. The "technical fixes" should take care of all that. Once information is entered it should be preserved, not hidden away under "changes" where a casual reader may not see it. If each entry could be made probalistic as well as the whole site it would increase the value of each entry. The value of the entry is what most visitors will be interested in, especially in out consumer based society.

For those of you who don't trust Wikipedia, I pose the question, "Do you trust Britannica?" If so, check out this link . Seems there are errors either way. The search for truth is an endless quest.

Wikipedia is probabilistically successful even if you hit and run. The question is "Given a query, what is the chance that you will get an answer, and that it will be correct?" With Britannica, the latter half of that question is a bit higher, but the first half is much lower. Getting no answer at all is a failure, too. Overall, your odds of getting useful information are higher on wikipedia.

I do think it could do with better aggregation. There are plenty of experiments out there, wikipedia's just one of them...we'll get there.

I trust that someone is accountable for mistakes in Britanica. That may be misguided too. But it's also why we have defamation law. Wikipedia is kind of defamiation proof. Sure, if someone complains, the offending publication will be removed. But in at least some cases, the damage is already done, and the distributed nature of the WP makes it difficult or impossible to hold anyone accountable (particularly given the ease with which people can post anonymously). Conversely, if Britanica did the same thing, they'd face a defamiation suit. This, I would imagine, if a pretty profound incentive to err on the side of not publishing untruths or information damaging to someone's reputation.

I like WP quite a bit myself. It's a very nice way to interface with info to the extent the info is accurate. I love being able to read one article and then drill down on a term by clicking on a link. That's a really great way to explore.

But I think the WP should do a better job making clear to the users the inherent limitations of the WP at the micro level (i.e., there's a pretty good chance that any given article could be wrong in a pretty major way).

I'm one of those over educated academic/professional people someone was complaining about above. But from time to time, I teach college students. The limitations of the WP are not at all obvious to them. They just want the easiest path to getting an answer (or at least the feeling of getting the answer), regardless of whether the answer is accurate. Clearly, it's the job of teachers to help educate students about the limitations of things like the WP, but it sure would help if the WP folks were a bit more forthright with the user about the WP limitations.

WP does have a disclaimer. But you must click an 8 point type link below the fold at the bottom of the page to get to it. How many people ever click on links like that? Not many.

Instead, I think each article should begin with some language like this followed by a link to the longer disclaimer:

"WIKIPEDIA IS A PLACE TO START RESEARCH, NOT A PLACE TO FINISH IT. THE WIKIPEDIA COMMUNITY DOES ITS BEST TO POLICE THE ACCURACY OF THE INFORMATION HERE. BUT BECAUSE WIKIPEDIA ALLOWS ANONYMOUS CONTRIBUTORS, NO INDIVIDUAL OR INSTITUTION IS LEGALLY ACCOUNTABLE FOR THE ACCURACY OF THIS INFORMATION. THEREFORE, THIS INFORMATION IS PRESENTED "AS IS," WITH NO WARRANTY TO ITS ACCURACY, AND THE BEST PRACTICE IS TO CHECK WIKIPEDIA ENTRIES AGAINST OTHER MORE EASILY VERIFIABLE SOURCES."

With respect, I don't buy the idea that the human mind can't handle the notion of a micro/macro tradeoff. In point of fact, the human brain is BUILT to discard information at the microscale and produce decent average results at the macroscale.

A simple case in point is the concept of temperature. There's no such thing at the microscale, in this case meaning atomic scale. Temperature is an aggregate property of the average motion of huge numbers of atoms, not the instantaneous, or even long-term-average, motion of a single atom.

Even at the macroscale, human perception of temperature involves more loss of low-level precision. Most people can't tell you how the temperature at their elbows compares to the temperature at their knees, let alone how much signal they're getting from a single, specific nerve. Nor can most people give you a precise statement of the absolute temperature around them at the moment.

It's hard to find a part of the human information-processing system that doesn't characterize information and throw away the detail before passing the message up to the next level of processing, in fact.

IMO, the real trouble is that people want to believe that every problem has a simple, easily-stated, one-size-fits-all solution that will always provide good answers. No such solution exists, or ever has, but in time, people get used to the inaccuracies of whatever system is in use at the time, and learn to ignore them.

We discount the fact that many specific news stories about, say, atrocities in the Superdome following Katrina, were completely inaccurate, because we believe that on average the mechanism of news production gives reasonably good results.

People have trouble with Google and such because they haven't had time to develop a blind spot that lets them ignore the erronous results, and go back to their comfortable assumption that the system is Platonically perfect.

If Wikipedia is a "place to start", that "shouldn't be cited" and beneficial for the ability to "surf a bunch of interrelated topics through links to get a quick overview" built by anonymous contributors who can't be check for authority, how is it any different from the Web with Google?

Also, does the success/quality of Wikipedia require that there only be one Wikipedia? If there are more than one Wikipedia, doesn't that make it harder for each individual article to have the many eyes necessary to improve quality? If so, who gets to decide which Wikipedia is the one?

"Given a query, what is the chance that you will get an answer, and that it will be correct?"

Wiki does much better at this than one would initially assume because queries aren't randomly distributed through wiki-space -- people share common interests. Queries cluster. The more likely it is that you are interested in a particular topic, the more likely it is that other people were interested too. Interested enough to create, modify, and watchlist that topic.

Thus, Wikipedia could easily be >99% accurate (measured as percentage of accurate answers returned) even if half the articles in the database were complete nonsense, so long as the /right/ articles are in the accurate half. The important question is whether the articles being given the most attention are the ones people care most about the answer to. Which is where the probability comes in.

Ok, two last points.

What I've been trying to say is that WP does not provide enough of a filter. Information gets in too easily. "Correct" information almost always has a higher signal strength than incorrect information, so raising the bar should not damage WP.

Soundbite: With each additional user WP gets less reliable and Google gets more so. (And the evidence of WP's co-founder Wales editing his own bio should make my point quite clearly)

And on Daniel's point, he's right. Correct information should have have to be contantly guarded. Vigilance is a high-cost activity and I have better things do with my time than constantly watch out for people editing the article about me, or mentiong me in other articles.

And as a last, third point, Paul Robinson (above) is full of crap. This is not a philosophical war. This is a straight-up social engineering question of how information is processed within a society, how information is filtered, and how decisions are made. Some systems are better than others at different kinds of tasks. The only "War" is the war to improve Wikipedia.

Re: . It's most of you who it feels wrong for, with "you" being the scientists, academics, professionals, etc.

Posted by: Tony Mendoza | December 19, 2005 at 07:15 AM
-----
There is a reason scientists would have a concern about this. There are "laws" of nature that are immutable as far as we are concerned. We can analyze a process, and if done the same way every time we will get the same results. This ONLY applies to the real, observable science fields (mathematics, physics, etc) - not to fields like archaeology, anthropology, etc where people see what they want to see. There is truth, and there are embellishments of it, retractions from it, etc. It only takes a person with an "agenda" to put their slant on the information to make it "tainted." Same applies for standard encyclopedias.

what if filters worked together on an aggregation platform focused on the specific content the filter's cared about and completely eliminated the clutter found around it. what if filter's could start aggregating the specific content they like like content from other sites. what if there was a web search engine that did not contain web pages but instead only contained the specific stuff a user wanted from any given page. what if the scalability of filters was infinite and their actions over time created a social engine of purely filtered content. we are attempting all of this and more at clipmarks.com. click my name to see what i am filtering...

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Tidbits

Search this site

The Long Tail by Chris Anderson

Notes and sources for the book

FREE will be available in all digital forms--ebook, web book, and audiobook--for free when the hardcover is published on July 9th. The ebook and web book will be free for a limited time, the unabridged audiobook will be available free forever.

Preorder the hardcover now!