“Is this the death knell for Technorati, et. al? “ – asks Charlene about Google’s entry to the Blog Search space.
I suspect I know the long-term answer for that, but for now let’s look at what Technorati’s own Niall Kennedy thinks:
“Google is specifically restricting its search to feeds, and not using the HTML
of the blog. Why? Googlebot is designed to swallow a page whole and not
break the page up into individual entries or items. Feeds come
prepackaged as individual items or entries allowing for easy digestion
by parsers and indexers. Google would need to overhaul its indexer or
design a new and separate indexer specific to blog posts if it would
like to include more post content than it is currently pulling down
from a page’s link alternate declared feed (this is based on a
conversation I had with Google engineers in February about the indexer,
I won’t blog the details, and things may have changed). Technorati
indexes a blog’s HTML assisted by the declared RSS and Atom feed, so I am admittedly a bit biased.”
I’m not sure I’d consider Google’s using feeds a disadvantage/weakness:
the fact is, reading the entire HTML may very well be the cause of some
i.e. their parser getting “lost”, not finding post boundaries,
associating posts with the titles and tags of the neighboring posts..
etc. (the previous link provides more details as well as a
collection of other blogger’s experience with Technorati). If
it’s so difficult to index the entire blog right, we might actually be
better off with a feed-based search.
Which takes us to Jeff Clavier’s conclusion: “… bloggers publishing only a partial feed will be partially indexed (Aha, would that be the reason for full feeds to become the standard ?) “
could not agree more. Unless your blog is all about ad-revenue
generation, in which case you need to attract readers to your
site, there is no reason to not serve up the entire post
in your feed. It’s really simple: in this world of
infoglut either you make reading your blog convenient, or expect
to lose subscribers who are fed up with clicking and waiting.
Submitting a ‘bait’ in your feed defeats the purpose of RSS Readers.
brings me to a problem I find with my blog platform: there is not
enough control over the smart use of excerpts. My preference
- Full post in the RSS feed
- Auto-created excerpt (say, first 100 words) on the Blog Main Page, with manual override option
- Hand-edited 2–3 line summary that other blogs can use in the trackback detail.
platform (Blogware via Blogharbor) does not support such selective use
of excerpts, and I am not aware that others do it
(?). Oh, well, there is always a next
Google does not spider the full blog – only what’s in the site’s RSS
feed. This presents a problem since many bloggers only publish a
summary feed. As a result, the Google Blog Search engine may be missing
a ton of important content.” True…
but again, why look at the symptom, not the root cause: providing full
content in the feed takes care of the “problem” and keeps readers
Update #3 (9/15) Planet OZH shares my views: Five Reasons Why Partial Content Feeds Suck. (he’s got cute baby pics, too)
Update #4 (9/18): Business Blog Consulting agrees.