A former senior Microsoft manager (he didn't want his name used) emailed me with an interesting perspective on the Long Tail of search:
As you know, search engine query logs have a Zipfian distribution (Rank * Frequency = Constant). When I presented this concept to senior execs at Microsoft, including BillG, they never quite got it. I would draw the graph just like the logo on your site, but it just wouldn't sink in. I realized that the size of the X axis was the problem. People see a graph, but they don't comprehend the scale of the X axis.
Looking at your logo, for example, it looks as if the (yellow) long tail is what, 4x, 5x the size of the red portion? And it is, if you are comparing the integral of the yellow portion (i.e. query volume) compared to the red. But the X axis, on a linear scale, extends almost infinitely to the right, and no visual can communicate that. In essence, in order to represent the long tail in graphical/visual form, you implicitly have to represent the X axis logarithmically, as you know, and people (generally) don't comprehend logarithmic / exponential scale.
He also found a handy rule of thumb to estimate the consequences of this distribution:
Loosely speaking, if you divide the number of queries by 4, you'll get the frequency of the most popular query, and if you divide by two you'll get the number of queries that occurred only once over whatever time period you are measuring.
A second example refers to what the external world knows as Windows Crash Analysis but what inside Microsoft is known as Watson or Dr. Watson (of "come here, Watson, I need you" fame) because that's what it was first called. Anyway this is the dialog that appears when a Windows application crashes on Windows XP -- an alert appears and offers to send the information to Microsoft. Back at Microsoft, they compile that information in a SQL Server database, indexed by application name, module name, module version, and the internal address where the crash occurred. Back in 2001 or 2002, they started showing off a graph of all the application crashes, with Rank on the X axis and Frequency on the Y axis. Internally they called this the "Watson Curve," even in front of Gates.
When I saw the curve, I smiled, because it looked familiar. I asked one of the guys on that team to do me a favor and plot the results log-log. He got back to me a few days later and said "wow, it's a straight line!" I wasn't surprised. I don't think they call it the Watson curve anymore because it's just yet another example of a Zipfian distribution at work.