As I look more at the "Long Tail of Time" (see my first post on that here) in preparation for this Long Now talk, I'm finding that one of the biggest forces driving demand into the archives is Google. We're used to the newspaper model of content: new is what matters and yesterday's news is fish-wrap. But Google and the other search engines are time-agnostic [UPDATE--see below]. And the result of that is a dramatic shift in demand towards older material.
What matters to modern search engines is relevance, measure mostly by the number of other sites that link to a page. A little-noticed implication of this is that older content tends to score higher because it's had longer to accumulate incoming links. In other words, search inverts the usual priority of content: older is often better.
We don't think of Google as a time machine, but that's actually what it is. By subsuming time under more important criteria such as "authority", it frees us from the tyranny of the new. Quality lasts and freshness is just one factor in many that determine value.
I looked at my own server logs here at the thelongtail.com to quantify this. Search (mostly Google, but a bit of Yahoo and MSN) now accounts for 37% of my traffic, and most of that is to older posts rather than new posts or just the blog's home page. The picture looks like this:
In other words, without search, only 12% of my traffic would be to my older posts. With search, it's nearly 40%.
I'd wager there's not a newspaper in the world that shows that sort of archive-heavy distribution. Yet that turns out to be the natural shape of demand in an organic search-blog ecosystem. Which is to say it's increasingly the shape of the web itself. Archives Rule!
UPDATE: in the comments, Jim points out that time is indeed an important factor in Google's results. A very readable summary from its original search patent is here. But it's clearly not the only criteria and since the spider only gets to the average site once a week or so, the really new stuff (ie, the last day or two) doesn't even show up in regular search. (blog search and news search use different methodologies and do get newer content). The net effect of this--really new stuff absent, older stuff accumulating more links, and Google decaying relevance over time by some unknown amount--still amounts to a strong advantage for the archives. Exactly how strong we can't say, but based on the effect on incoming traffic alone (see above) it's significant.




My articles get 80% of their lifetime readership after they have passed into the archives.
In my case, a smaller proportion of this traffic comes from search, and more comes from links from other sites or people who go directly to my site and look for a specific old article.
On average, each of my articles get the following traffic over its lifetime:
Regarding your archives, Chris, you haven't had your site long enough yet to truly experience timed-long tail traffic :-) Let's talk in ten years, and your archival numbers will surely be much bigger that what you say in your post.
Posted by: Jakob Nielsen | May 01, 2006 at 03:14 AM
How does the 37% that comes from search compare to the rest of your traffic in terms of time spent on the site?
My small archive gets a lot of traffic from search, but most people coming from a search engine leave very quickly.
Posted by: Rick Burnes | May 01, 2006 at 05:06 AM
I feel that Google's not as timeless as you suggest, Chris. I'm sure that age of post is a variable in their algorithm. Note how search results sometimes give dates of the content - so, I'd guess - they include this in their math.
Posted by: Piers Fawkes | May 01, 2006 at 07:53 AM
No offense, but isn't this to be expected?
From an interview I did over on New World Notes (long middle paragraph is mine):
“The future looks like a one to one between stuff and sales," I suggest. "Desires instantly served by product, and vice versa.”
“Well, the future could be that everything gets equal exposure (more or less) because everyone is empowered to advertise. And when manufacturing is obsolete, and everything is ‘printed’, distribution is as simple as printing the object on your desktop. So does it look like the top grey? Or like the top blue? I'm beginning to think both are possible. As people develop systems for finding the things they really want, then ‘Obscurity’ really is relegated to things people really just don't want... I'm thinking that Finding the items easily will happen after we're in a position to deliver them. So we'll start off with so much junk we can't find anything. Then we'll figure out ways to really wade through it."
“A Google for desires, basically.”
http://secondlife.blogs.com/nwn/2005/08/the_long_tail_o.html
Non-tangible media are already approaching this point and consequently search is increasingly important.
Posted by: csven | May 01, 2006 at 07:54 AM
Search is "Time-agnostic" -- NO WAY!
History is a vital part of page ranking. Google uses a set of history data in determining page ranking. Read their patent or try this brief description:
http://www.101-seo-resources.com/google-patent-application.htm
Older content doesn't accumulate more links just because it's older. It does that because it's more relevant. People often find these “older” pages by search, so it’s also a positive feedback loop. Further, more links to a page doesn’t mean higher page ranking. Ten “quality” links to a page is worth more than 100 bad ones.
I’m not seeing the Long Tail theory working in well enough in this case. Maybe it’s there, but you should have mentioned the existing “long tail” of newspapers stored as microfiche.
Posted by: Jim | May 01, 2006 at 12:05 PM
Jim,
Good point. I've updated the post accordingly.
Chris
Posted by: Chris Anderson | May 01, 2006 at 12:24 PM
Yeah, this is interesting but, no offence Chris, hardly new. In fact, when we designed (my former employment) www.onlineopinion.com.au, we wanted to take advantage of exactly this effect. We recognised that not enough attention was paid by mainstream news media to the history of (recent) thought - especially when it comes to op-eds and other opinion. So we built a resource that was "parasitic" of such thoughts but also archival.
What we hadn't expected, but found, was the way "seeds" can be planted in articles and the incredible popularity they can achieve when they go from being a "sleeper" to being a "current topic". In a classic example, we had an article on an experiment in improving parenting (I think it was this one: http://www.onlineopinion.com.au/view.asp?article=1484) that ran its usual peak/decay course then, when the authors presented at an international conference, it went straight back to the top of the list for quite a while on the back of international cross-media coverage.
In another example, when the Anglican Archbishop of Brisbane, Peter Hollingworth, was appointed Governor General of Australia, the article he had written for us *three years earlier* went straight to the top of searches for his name ... of which there were plenty.
This can be a powerful device for getting a "leg up the tail" and a strong argument for preserving publishing archives ... Rupert!
Posted by: Hugh Brown | May 01, 2006 at 05:38 PM
You might want to take search behavior like mine into account when you analyze traffic to your site. I sometimes just search on the title of a website I want to visit rather than type the url or use a link.
For instance today I decided that I wanted to see if you had anything new and so I typed long tail into google. The remembered title is my bookmark to many things.
Posted by: Greg Banville | May 01, 2006 at 10:52 PM
Having not reviewed the content of Google's original patent this may be off base. For me the age of the content isn't so much of a problem as the age of the links. For example an old post that has lots of links from a long time ago (Internet Time) is, to me, less relevant that an old post with fewer but newer links.
What needs to depreciate is the link age. Which Google may already do.
Posted by: Simon | May 02, 2006 at 05:27 AM
Is this Long Tail theory really working?
Posted by: Grace Smith | May 02, 2006 at 05:42 AM
I've noticed this as well. I've been running my blog for a little less than two months now, but one thing I've noticed in looking at the traffic is that search engines like Google and Yahoo account for an ever-growing percentage of traffic, despite me doing nothing different. I realized that it's because 1) I'm building up an ever larger number of posts, which increase the likelyhood that any given post will rank for any given keyword search and 2) the pagerank for those older posts increases with time.
By contrast, Technorati traffic trends towards the "fresh" links, as they sort by date by default (I'd imagine the same is true for Google News, as it sorts by date rather than relevancy).
The other driver of traffic towards the archived web that I've noticed is community driven sites like Digg, Reddit, and Del.icio.us - where links are determined by what the community finds interesting rather than what's new. I've noticed that at any given time, the front page of Digg or Reddit might include a joke or "cool thing" which is dated to over a year ago. So while it's not new, it's new to enough people to still rank.
Posted by: Eric | May 02, 2006 at 09:12 AM
Very good analysis. You are not alone in noticing this effect. Our site doesn't get a lot of hits, but we do have some golden oldies that wax and wane in popularity. This effect destroys the media's attention model, but it increases overall information value.
A minor point on arithmetic. If you knock out the 27% older articles found by search engine, you would have closer to 16.4% of all hits being to older articles. That's 12 / (12 + 61).
Posted by: kaleberg | May 08, 2006 at 09:01 PM
Hi, interesting post you got there.
If you have any feedback how we could change the Google Maps API to meet your needs, let me know.
Posted by: fm transmitter | November 15, 2009 at 11:36 PM