<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.ninebyblue.com/~d/styles/itemcontent.css"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">

<channel>
	<title>Nine By Blue</title>
	
	<link>http://www.ninebyblue.com</link>
	<description>by Vanessa Fox</description>
	<lastBuildDate>Mon, 30 Aug 2010 01:23:33 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.ninebyblue.com/ninebyblue" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="ninebyblue" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><feedburner:emailServiceId xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">ninebyblue</feedburner:emailServiceId><feedburner:feedburnerHostname xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://feedburner.google.com</feedburner:feedburnerHostname><item>
		<title>Bad SEO Advice</title>
		<link>http://www.ninebyblue.com/blog/bad-seo-advice/</link>
		<comments>http://www.ninebyblue.com/blog/bad-seo-advice/#comments</comments>
		<pubDate>Mon, 30 Aug 2010 01:22:03 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1364</guid>
		<description><![CDATA[I come across bad SEO advice all the time. Much of it may seem obvious to those of us who have been involved in search for any length of time, but for people who haven&#8217;t, it can be difficult to know what&#8217;s concrete advice, what&#8217;s speculation, and what&#8217;s just plain terrible. For that matter, it [...]]]></description>
			<content:encoded><![CDATA[<p>I come across bad SEO advice all the time. Much of it may seem obvious to those of us who have been involved in search for any length of time, but for people who haven&#8217;t, it can be difficult to know what&#8217;s concrete advice, what&#8217;s speculation, and what&#8217;s just plain terrible. For that matter, it can be difficult for those outside of SEO to know what&#8217;s smart and what&#8217;s considered search engine manipulation.</p>
<p>I was in a meeting a few days ago and someone asked if it was true that for SEO purposes, a page should have as few outbound links as possible. I said outbound links were fine, great even! And then talked a bit about how it&#8217;s a bad idea to build pages for nuances in the search engine algorithms anyway, as hundreds of signals exist and they&#8217;re changing all the time. Oh, he said. We&#8217;ve been talking about implementing the <a href="http://searchengineland.com/canonical-tag-16537" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/canonical-tag-16537?referer=');">canonical tag</a>. We probably shouldn&#8217;t do that then. And I realized, how would a developer know that the canonical tag is awesome and the meta keywords tag isn&#8217;t? That you shouldn&#8217;t worry about keyword density but you should put important keywords in your title tag?</p>
<p>Recently, someone sent me an &#8220;SEO optimization report&#8221; for their site that came from automated software that guaranteed top ten rankings in 90 days. Some of the advice was good (use unique title tags), some was harmless (improve your Flesch readability ease score), and some was just crazy talk. Below is a bit of the crazy.</p>
<p><strong>&#8220;You should increase your keyword density. You can do this by removing some text.&#8221;</strong></p>
<p>This whole notion of keyword density has been around forever, but here&#8217;s what it really boils down to. How is your potential audience looking for this content? Put those words in your title tag, H1, and somewhere on the page. And use those words as anchor text in internal links to that page. If other sites link to the page using that anchor text, even better! It&#8217;s bad enough when people try to get the &#8220;right&#8221; keyword density by nonsensically repeating the same words over and over on a page, but removing other text? That&#8217;s just sad.</p>
<p><strong>&#8220;Keywords in the HTML comment tags help a good ranking in Google.&#8221;</strong></p>
<p>Um. Not really.</p>
<p><strong>&#8220;Some search engines penalize sites if the terms from the meta keywords tag don&#8217;t appear in the body of the page.&#8221;</strong></p>
<p>Well, first, search engines (in particular, Google) ignore the meta keywords tag. And also, this statement isn&#8217;t true.</p>
<p><strong>&#8220;Your page includes the meta Google-Site-Verification tag twice. Search engines could regard it as a spamming  attempt and might decide not to index your web site.&#8221;</strong></p>
<p>Wow. I assume this is simply a case of automation going awry and whoever wrote this software doesn&#8217;t actually think that having two verified Google Webmaster Tools accounts will cause Google to remove the site from the index. But even so, having duplicate meta tags of any kind doesn&#8217;t cause Google or Bing to flag the site for spam. I mentioned this was all about the crazy, right?</p>
<p><strong>&#8220;Some search engines don&#8217;t accept submissions with capitalized letters in titles or meta tags.&#8221;</strong></p>
<p>Maybe someone more familiar with old school directories can weigh in on where this comes from. But recommending that your title tags not contain capital letters? This may be automated software, but someone manually wrote that message.</p>
<p><strong>&#8220;Some search engines rank sites lower that are hosted at free hosting providers.&#8221;</strong></p>
<p><a href="http://www.mattcutts.com/blog/myth-busting-virtual-hosts-vs-dedicated-ip-addresses/" onclick="pageTracker._trackPageview('/outgoing/www.mattcutts.com/blog/myth-busting-virtual-hosts-vs-dedicated-ip-addresses/?referer=');">No</a>.</p>
<p>PS &#8211; Creative use of bold won&#8217;t actually help. And question marks in URLs are just fine.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/bad-seo-advice/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>August 26, 2010 Office Hours</title>
		<link>http://www.ninebyblue.com/office-hours/august-26-2010-office-hours/</link>
		<comments>http://www.ninebyblue.com/office-hours/august-26-2010-office-hours/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 20:35:13 +0000</pubDate>
		<dc:creator>Heather</dc:creator>
				<category><![CDATA[Office Hours]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1360</guid>
		<description><![CDATA[Vanessa talks about best practices for pagination URL structure and duplicate content issues with printer versions of said paginated articles.
She also dives in to backlinks and Pagerank before talking about Glee.
Links mentioned:
preventing duplicate content issues with print versions of content
Somebody added 120k backlinks to my site&#8230;and my PR went from 6 to 1 that month
6 [...]]]></description>
			<content:encoded><![CDATA[<p>Vanessa talks about best practices for pagination URL structure and duplicate content issues with printer versions of said paginated articles.<br />
She also dives in to backlinks and Pagerank before talking about Glee.</p>
<p>Links mentioned:<br />
<a href="http://www.google.com/support/forum/p/Webmasters/thread?tid=382b94ee7159f659&#038;hl=en" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/forum/p/Webmasters/thread?tid=382b94ee7159f659_038_hl=en&amp;referer=');">preventing duplicate content issues with print versions of content</a><br />
<a href="http://www.google.com/support/forum/p/Webmasters/thread?tid=5b36e19ba75ca3af&#038;hl=en" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/forum/p/Webmasters/thread?tid=5b36e19ba75ca3af_038_hl=en&amp;referer=');">Somebody added 120k backlinks to my site&#8230;and my PR went from 6 to 1 that month</a><br />
<a href="http://www.seomoz.org/blog/6-ways-to-replace-yahoos-link-linkdomain-search-commands" onclick="pageTracker._trackPageview('/outgoing/www.seomoz.org/blog/6-ways-to-replace-yahoos-link-linkdomain-search-commands?referer=');">6 ways to replace Yahoo&#8217;s Link Linkdomain Search</a><br />
<a href="http://searchengineland.com/all-new-microsoft-bing-webmaster-tools-46827" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/all-new-microsoft-bing-webmaster-tools-46827?referer=');">All New Bing Webmaster Tools</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/office-hours/august-26-2010-office-hours/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Effectively Using Images</title>
		<link>http://www.ninebyblue.com/blog/effectively-using-images-2/</link>
		<comments>http://www.ninebyblue.com/blog/effectively-using-images-2/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 03:22:34 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1357</guid>
		<description><![CDATA[Note: This post was originally posted on Jane and Robot in May 2008 and is being temporarily stored here.

A picture is worth a thousand words. Unfortunately, when it comes to major search engines (which are still primarily text-based), a picture is worth a lot of blank space. Does this mean you shouldn&#8217;t use images on your [...]]]></description>
			<content:encoded><![CDATA[<p><strong><em>Note: </em></strong><em>This post was originally posted on Jane and Robot in May 2008 and is being temporarily stored here.</em></p>
<div>
<p>A picture is worth a thousand words. Unfortunately, when it comes to major search engines (which are still primarily text-based), a picture is worth a lot of blank space. Does this mean you shouldn&#8217;t use images on your site if you want to rank in search? Not at all. Just keep some simple things in mind when adding those images to your pages. As a bonus, these tips help not only with search engine robots, but with Jane as well! You want your site to be accessible in screen readers, to those who have images turned off in their browsers, and to those who have slow connections or are on mobile browsers and may have trouble loading images.</p>
<p>By providing search engine robots with textual information about the images on your site, your site can benefit not only from better placement in web search results, but in image search results also. Image Seach can provide substantial search traffic, so don&#8217;t overlook this as an acquisition channel.</p>
<p>Below are recommendations for using images effectively for both Jane and search engine robots.</p>
<h2>Don&#8217;t put text in images</h2>
<p>Put text in straight HTML whenever possible. Sometimes web designers like to put text in images because they can use a wider variety of fonts and can manipulate the design more freely. Much of this styling can be done with CSS and in cases where it can&#8217;t, the extra design a graphical version of the text provides may not really add visitor value. In fact, it may detract from usability because it may be difficult to read. It also may hurt viral efforts since it can&#8217;t be copied and pasted. If I want to send an email to all of my friends suggesting we all go to a hot new restaurant, I may want to copy and paste a few menu items from the restaurant&#8217;s web site to send to them. If the menu is in an image, I can&#8217;t do that.</p>
<h2>Use the ALT attribute</h2>
<p>The most well-known method for making images accessible is effective use the ALT attribute in the IMG element. And yet it&#8217;s very common to find empty ALT tags all over the web.</p>
<pre>&lt;img src="/images/lavender-plant.jpg" alt="Picture of a lavender plant"&gt;</pre>
<ul>
<li>Make the text in the ALT tag descriptive. It should describe the image concisely. Think of someone browsing your site with a screen reader. How will they want the image presented?</li>
<li>Don&#8217;t stuff the ALT tag with keywords. A long ALT attribute, full of keywords your want to rank for looks spammy to both your visitors and to search engines and may make both devalue your content. How can you tell if your ALT text is spammy or simply descriptive? It&#8217;s a judgment call, but if you can&#8217;t tell, get some objective opinions. ALT=&#8221;buy cheap viagra now cheap viagra online get viagra here&#8221; is probably going to be pretty obviously spammy to anyone you ask.</li>
<li>Make the ALT text relevant to the image. Use the ALT text to describe the image, not as a place to add descriptive text about the page that isn&#8217;t directly relevant to the image. For instance, if the image is of a car, your ALT text should be something like &#8220;blue mini cooper&#8221; not &#8220;cheapcars.com has cheap cars available in every make and model including mini coopers, volkswagen, and Ferrari&#8217;s like Magnum PI used to drive&#8221;.</li>
</ul>
<p>What about the TITLE attribute? It likely <a href="http://googlewebmastercentral.blogspot.com/2007/12/using-alt-attributes-smartly.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2007/12/using-alt-attributes-smartly.html?referer=');">doesn&#8217;t provide direct search engine value</a>, although it may be useful for your visitors.</p>
<h2>Make image filenames descriptive</h2>
<p>If possible, describe the image in name of the image file. For instance, lavender-plant.jpg is better than image123.jpg. If you are importing a lot of images, for instance, for a product database, it may be problematic to manually name each file. In this case, find programmatic ways to rename the images using text from how the images are tagged or categorized. If your filename includes multiple words, use hyphens to separate them (search engines tend to see a hyphen as a separator and an underscore as a joiner (so lavender_plant would be seen as one word and lavender-plant would be seen as two).</p>
<h2>Use image captions</h2>
<p>Provide a caption below or above the image that describes what it&#8217;s about and gives context for how it relates to the rest of the page.</p>
<h2>Provide textual clues around the image</h2>
<p>Try to include text around the image that relates to what the image is about. Text on the page helps search engines know what the page itself is about, which helps the page rank for relevant queries, but text near images can help those images rank in image search results as well.</p>
<h2>Be cautious about using images for navigational links</h2>
<p>If you use images in menus and other navigation, make sure that you use ALT text that replicates how the image represents that menu option. But also test the implementation by turning off images in your browser and making sure the links still work. Some implementations incorrectly require images to be enabled, causing search engine robots to be unable to follow those links.</p>
<p>Another potential usability issue with images and navigation is that if you use a textual link combined with a background image, the text may disappear if the image doesn&#8217;t load. (This issue can happen with this type of design in places other than menus, but that scenario is where it can be commonly seen.)</p>
<table border="0">
<tbody>
<tr>
<td>
<div>Navigational Link With Images Enabled</div>
</td>
<td>Navigational Link with Images Disabled</td>
</tr>
<tr>
<td><img title="image-example-with-background1" src="http://janeandrobot.com/wp-content/uploads/2008/05/image-example-with-background1.png" alt="image-example-with-background1" /></td>
<td><img title="image-example-without-background" src="http://janeandrobot.com/wp-content/uploads/2008/05/image-example-without-background.png" alt="image-example-without-background" /></td>
</tr>
</tbody>
</table>
<h2>Be cautious about using images for headings and logos</h2>
<p>Many web sites use an image for the header of the page or for the company logo. This implementation works well, but be sure that you replicate the company name, heading text, or other words from that image in the ALT text.</p>
<ul>
<li>If you have an image as the page&#8217;s H1 tag, keep in mind that the H1 is one of the most important clues for a search engine to determine what the page is about, so consider using text instead of an image or at least using descriptive ALT text. In the example below, the code is using CSS to display an image of the company logo as the H1 tag. A better implementation would be to display the image in the header of the page, and use the H1 tag to provide visitors and search engines description information about the page.</li>
<div>
<pre>&lt;h1&gt;Company Name&lt;/h1&gt;</pre>
</div>
<p>The CSS for this implementation positions the text at -999em. This is not recommended both because it means that when a visitor loads the page with images turned off, the text can&#8217;t be seen (and so the heading space is simply blank) and because search engines may find the practice deceptive (the text is hidden).</p>
<div>
<pre>.home-logo {</pre>
<pre>     background:transparent url(/images/logo1.gif)</pre>
<pre>     no-repeat scroll center top;</pre>
<pre>     height:63px;</pre>
<pre>     margin-top:35px;</pre>
<pre>     text-indent:-99999em;</pre>
<pre>     }</pre>
</div>
<li>If your header includes an image of your company logo, avoid commonly used ALT text such as &#8220;home&#8221; or &#8220;logo&#8221;. Instead, succinctly describe your company or home page (using either the company name or a brief description of the site). (Also, avoid naming your company logo something like logo.jpg.)</li>
<li>If your site includes a header that consists entirely of a large image, test the layout of the page with images turned off. In some cases, the result can be a large area of white space that pushes all content below the fold. In the example below, all company information and details about the site are lost without the header image.</li>
</ul>
<p><img title="header-images-off" src="http://janeandrobot.com/wp-content/uploads/2008/05/header-images-off.gif" alt="header-images-off" /></p>
<h2><a name="noncontent"></a>Block Non-Content Images</h2>
<p>If you use a lot of non-content images (for instance, arrows, bullets, and boxes), you likely don&#8217;t want those indexed. Since search engine robots spend limited time crawling each site, it may make sense to block them from crawling these types of images so they can spend all the available resources on the pages and images you do want indexed. As a bonus, if you want to provide an image search on your site (for instance, using the <a href="http://nathanbuggia.com/post/Custom-Site-Search-Engine-Using-the-Live-Search-API.aspx" onclick="pageTracker._trackPageview('/outgoing/nathanbuggia.com/post/Custom-Site-Search-Engine-Using-the-Live-Search-API.aspx?referer=');">Live Search API</a>), if only content images are indexed, then the image results will be more useful for your visitors.</p>
<p>A good way to block non-content images is to place them in a separate folder from your content images and then block that folder using robots.txt. For instance, if you place these images in a folder calledno_index_images, your robots.txt file would contain:</p>
<div>
<pre>User-agent: *</pre>
<pre>Disallow: /no_index_images/</pre>
</div>
<h2>Images can be search engine and user friendly</h2>
<p>With a little planning and good structure, you can effectively use images on your site in ways that benefit both Jane and robots. And by optimizing images in the ways described in this article, you may also be able to tap into an additional acquisition channel &#8211; image search.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/effectively-using-images-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>URL Referrer Tracking</title>
		<link>http://www.ninebyblue.com/blog/url-referrer-tracking/</link>
		<comments>http://www.ninebyblue.com/blog/url-referrer-tracking/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 03:18:28 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1355</guid>
		<description><![CDATA[Note: This post was originally posted on Jane and Robot in November 2008 (by Nathan Buggia) and is being temporarily stored here.

There may be instance when you want to track the source of a request, and a common way of doing so is by using tracking parameters in URLs. Unfortunately, implementing referrer tracking in this [...]]]></description>
			<content:encoded><![CDATA[<p><strong><em>Note: </em></strong><em>This post was originally posted on Jane and Robot in November 2008 (by <a href="http://nathanbuggia.com" onclick="pageTracker._trackPageview('/outgoing/nathanbuggia.com?referer=');">Nathan Buggia</a>) and is being temporarily stored here.</em></p>
<div>
<p>There may be instance when you want to track the source of a request, and a common way of doing so is by using tracking parameters in URLs. Unfortunately, implementing referrer tracking in this way can result in significant issues with search engines. In particular, it can cause duplicate content issues (since the search engine bot finds multiple valid URLs that point to the same page) and ranking issues (since all the links to the page aren&#8217;t to the same URL).</p>
<p>Let&#8217;s say that Jane and Robot uploaded two different online training seminars to YouTube as part of a viral marketing effort to drive more traffic to our site. To gauge our return on investment from each of these seminars, we&#8217;ve added a tracking parameter to the link within each YouTube description that a customer can click on to learn more, here are the two URLS: http://janeandrobot.com/?from=promo-seminar-1 and http://janeandrobot.com/?from=promo-seminar-2. Each would bring the customer to our home page (the same page served by http://janeandrobot.com) and we would track the conversions based on the from parameter in the URL.</p>
<p>While this solution may seem to work well initially, it can result in low quality tracking data and impact our search acquisition. Here&#8217;s a summary of the major problems:</p>
<ol>
<li>Duplicate content - search engines sometimes have difficulty determining if two URLs contain the exact same page (see <a href="/post/canonical-url-canonicalization-domain.aspx">canonicalization</a> for more information). In this case, we&#8217;re creating this problem because we&#8217;ve created multiple URLs for the same page. Search engines are likely to find all three URLs for the home page and store/ rank them as separate content within their index. This could cause the search engine robots to crawl the page three times instead of just once (which may not be a big deal if we are only tracking two promotions, but could become a big problem if we used similar tracking parameters for many other campaigns and URLs). Not only are the robots using more bandwidth than is necessary, but since they don&#8217;t crawl a site infinitely, they could spend all the allotted time crawling duplicate pages and never get to some of the good unique pages on the site.</li>
<li>Ranking - search engines use the number of quality links pointing to a URL as a major signal in determining the authority and usefulness of that content. Because we now have three different URLs pointing to the same page, people have three choices when linking to it. The result is a lower rank for all of the variations of the URL. Search engines generally filter out duplicates, so for instance, if the original (canonical) home page has 100 incoming links and each URL with a tracking parameter has 25 links, then search engines might filter out the two URLs with fewer links and show only the canonical URL, ranking it at position eight for a particular query based on those 100 incoming links. If all incoming links were to the same URL, then search engines would count 150 links to the home page and might rank it at position three for that same query. Another danger is that if one of the YouTube promo videos becomes exceptionally popular, its promo URL might gain more links than the original home page URL. Using this same example, if one of the promo URLs gained 200 links, search engines might choose to display it in the search results over the original home page. This could cause a confusing experience for potential customers who are looking for your home page (http://janeandrobot.com/?from=promo-seminar-1 doesn&#8217;t look like a home page and searchers might be less likely to click on it, thinking it&#8217;s not the page they&#8217;re looking for). It&#8217;s also not ideal from a branding perspective.</li>
<li>Reporting quality - as social networking sites become more popular, we become more of a sharing culture online. Many people use bookmarks, and online bookmarking sites such as Delicious, email, and other sharing sites such as Facebook, Twitter, and FriendFeed to save and share URLs. They&#8217;ll click on on a URL, and if they like it, copy and paste it from the browser&#8217;s address bar. If the link they&#8217;re saving/sharing happens to be one of our promotional links, then they have preserved this link for all time, and everyone who clicks through the link will look identical to someone coming through the promo. This skews the reporting numbers of who went to the site after viewing the video &#8212; which was why we set up the tracking parameters in the first place!</li>
</ol>
<h2>Implementation Options</h2>
<p>Unfortunately there is no perfect solution for this scenario, and what works best for you depends on your infrastructure and situation. Here we&#8217;ve listed several common solutions that you can choose from to improve your own implementation. We generally recommend the first solution (Redirects), but there are pros and cons to each option that you should review carefully before making your decision.</p>
<h3>Redirects (and Cookies)</h3>
<p>The first option strives to solve the problem by trapping all of the promotional requests, recording the tracking information, then removing the tracking parameter from the URL. This can be time consuming to implement, but it is the best all-round scenario to address the three major issues listed above.</p>
<p>If you wanted to get fancy, and track a user&#8217;s entire session based on your referral parameter, then you can use this method as well and simply set a cookie on the client machine at the same time you trap the request. This is recommended to understand the value of traffic from different sources. In either case, here are the steps you&#8217;ll need to undertake:</p>
<p>1. Trap the incoming request - find where you web site application&#8217;s logic processes the HTTP request for your page. Trap each request at that point and check if it has a tracking parameter. If it does, record this in your internal referral tracking system. You can record this either in your server logs, or in a custom referral tracking database you maintain on your own.</p>
<ul>
<li>If you also would like to track the entire user&#8217;s session, then you should also use this opportunity to set a cookie on the client.</li>
</ul>
<p>2. Implement the redirect - next step is to implement a 301 redirect from the current URL to the same page without the tracking parameter (or the canonical URL). Don&#8217;t for get to use the cache-control attribute in the HTTP header to ensure that all the requests come to your server and don&#8217;t get handled automatically in some network-based cache. Here&#8217;s what a sample redirect header might look like:</p>
<div>
<pre>301 Moved Permanently</pre>
<pre>Cache-Control: max-age=0</pre>
</div>
<p>Note that ASP.Net and IIS both use 302 redirects by default, so you many need to manually create the 301 response code.</p>
<p>The way this works is that when a search engine encounters a promotional URL (http://janeandrobot.com/?from=promo-seminar-1) it issues an HTTP GET request to the URL. The HTTP response tells the search engine that this page has been permanently moved (301 Redirect) and provides the new address (the same as the old address but without the tracking parameter). The search engine then discards the first URL (with the tracking code) and only stores the second URL (without the tracking code). And everything is right in the world.</p>
<p>This implementation is one of the best options, but it does have some limitations:</p>
<ul>
<li>One downside of this method is that it requires you to manage your own referral tracking system. Because it traps the referral parameters and removes them from the URL before the page actually loads, 3rd party referral tracking applications like Google Analytics, Omniture, WebTrends or Microsoft adCenter Analytics will not be able to track these referrals.</li>
</ul>
<h3>Canonical URL &lt;Link /&gt; Tag</h3>
<p>Possibly the simplest option to solve this issue is to take advantage of a new standard recently adopted by Google, Yahoo and Microsoft Live Search. Their solution to this problem is to use a new attribute of the &lt;link /&gt; tag to explicitly tell them what the canonical URL for the page is. Assuming the &lt;link /&gt; tag has been created correctly, the search engines will treat this like the a 301 redirect to the canonical URL.</p>
<p>Here&#8217;s an example of using this tag:</p>
<div>
<pre>&lt;html&gt;</pre>
<pre>   &lt;head&gt;</pre>
<pre>      &lt;link rel="canonical" href="http://janeandrobot.com" /&gt;</pre>
<pre>   &lt;/head&gt;</pre>
<pre>&lt;/html&gt;</pre>
</div>
<p>Here&#8217;s a few notes about implementing this tag:</p>
<ul>
<li>Search engines view this as a hint, not a command. Implementing this tag isn&#8217;t a guarantee, although Google said they will try their best to make it work. The reason they can&#8217;t give any guarantees is because they may detect that you are implementing it incorrectly, or it is being used for some type of spammy scenario.</li>
<li>Relative or absolute URL are supported within the href attribute. However, I recommend that you use absolute URLs whenever possible. This helps the search engines further normalize the URLs because they see what protocol (http or https) you use, and whether or not you are prefixing your domain with &#8220;www.&#8221;.</li>
<li>Sub-domains are supported, separate domains are not. With this tag you can specify a separate a different sub-domain, for example within this URL (http://janeandrobot.com?from=promo-seminar-2) you could specify this canonical URL (http://videos.janeandrobot.com). However, the &lt;link /&gt; tag would not be valid if you specific a completely different domain like this http://janeandrobot-videos.com.</li>
<li>Common Pitfalls&#8230; You&#8217;ll want to ensure that you don&#8217;t do anything silly like (i) create an infinite loop with two canonical tags pointing to each other (ii) have the canonical tag point to a page that returns a 404 status code. You should also make sure that your canonical URL is generally a short and simple URL.</li>
</ul>
<p>While this implementation seems a little too good to be true, there are a few potential downsides. The first is that if you implement it incorrectly, the search engines will simply ignore it, and that could be complicated to debug. The other issue is that it fixes issues #1 (duplicate content) and #2 (ranking) but does nothing to fix the 3rd issue of reporting. Still, given all of that I would likely implement this option first and do the others when I had some spare dev cycles.</p>
<h3>URL Fragment</h3>
<p>A simple and elegant option is to simply place the tracking parameter behind a hash mark in the URL, creating a URL fragment. Traditionally, these are used to denote links within a page, and are ignored completely by search engines. In fact, they simply truncate the URL fragment from the URL.</p>
<p>Old URL</p>
<ul>
<li>http://janeandrobot.com/?from=promo-seminar-1</li>
<li>http://janeandrobot.com/?from=promo-seminar-2</li>
</ul>
<p>New URL with URL Fragment</p>
<ul>
<li>http://janeandrobot.com/#from=promo-seminar-1</li>
<li>http://janeandrobot.com/#from=promo-seminar-1</li>
</ul>
<p>By default Google Analytics will ignore the fragment as well, however there is a simple work around that was provided to us by <a href="http://www.kaushik.net/avinash/" onclick="pageTracker._trackPageview('/outgoing/www.kaushik.net/avinash/?referer=');">Avinash Kaushik</a>, Google&#8217;s web metrics evangelist. Using the following JavaScript:</p>
<div>
<pre>var pageTracker = _gat._getTracker("UA-12345-1");</pre>
<pre>// Solution for domain level only</pre>
<pre>pageTracker._trackPageview(document.location.pathname + "/" + document.location.hash);</pre>
<pre>// If you have a path included in the URL as well</pre>
<pre>pageTracker._trackPageview(document.location.pathname + document.location.search +</pre>
<pre>                           "/" + document.location.hash);</pre>
</div>
<p>You can create a few additional variations of this if you also have additional queries in the URL you would like to track. Check with your web analytics provider to find out if you need to customize your implementation to account for using URL fragments for tracking.</p>
<p>Does this sound too simple and easy to be true? There are a couple downsides to this approach:</p>
<ul>
<li>This option fixes issues 1 (duplicate content) &amp; 2 (ranking) listed above, but it will not address the 3rd issue of reporting. You could still encounter some reporting issues using this method if people are bookmarking or emailing around the URL.</li>
<li>Typically you&#8217;ll have to write some custom code to parse the URL fragment. Since it&#8217;s a non-standard implementation, standard methods may not support this.</li>
</ul>
<h3>Robots Exclusion Protocol</h3>
<p>Another relatively simple solution is to use robots.txt to ensure that search engines are not indexing URLs that contain tracking parameters. This method enables you to ensure that the original (canonical) version of the URL is always the one indexed and avoids the duplicate content issues involving indexing and bandwidth.</p>
<p>Assuming that all of our tracking parameters will follow a similar pattern to this:</p>
<p>http://janeandrobot.com/?from=&lt;PromoID&gt;</p>
<p>we can easily create a pattern that will match for this. Below is a robots.txt file that implements the pattern:</p>
<div>
<pre># Sample Robots.txt file, single query parameter</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /?from=</pre>
</div>
<p>The first line means that this rule should apply to all search engines (or robots crawling your site), and the second line tells them that they can&#8217;t index any URLs that start with &#8216;janeandrobot.com/?from=&#8217; and some type of promotional code of any length. See complete information on using the <a href="/post/Managing-Robots-Access-To-Your-Website.aspx">Robots Exclusion Protocol</a>. Use this pattern if you will have multiple query parameters:</p>
<div>
<pre># Sample Robots.txt file, multiple query parameters</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /*from=</pre>
</div>
<p>Once you&#8217;ve implemented the pattern appropriate for your site, you can easily check to see if it is working correctly by using the <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35237" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35237&amp;referer=');">Google Webmaster Tools robots.txt analysis tool</a>. It enables you to test specific URLs against a test robots.txt file. Note that although this tool tests GoogleBot specifically, all the major search engines <a href="http://searchengineland.com/yahoo-google-microsoft-clarify-robotstxt-support-14125.php" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/yahoo-google-microsoft-clarify-robotstxt-support-14125.php?referer=');">support the same pattern matching rules</a>. In <a href="https://www.google.com/webmasters/tools" onclick="pageTracker._trackPageview('/outgoing/www.google.com/webmasters/tools?referer=');">Google Webmaster Tools</a>:</p>
<ol>
<li>Add the site, then click Tools &gt; Analyze robots.txt. (Unlike most features in Google Webmaster Tools, you don&#8217;t need to verify ownership of the site to use the robots.txt analysis tool). The tool displays the current robots.txt file.</li>
<li>Modify this file with the Disallow line for the tracking parameter. (If the site doesn&#8217;t yet have a robots.txt file, you&#8217;ll need to copy in both the User-agent and Disallow lines.)</li>
<li>In the Test URLs box, add a couple of the URLs you want to block. Also add a few URLs you do want indexed (such as the original version of the URL that you&#8217;re adding tracking parameters to).</li>
<li>Click Check. The tool displays how Googlebot would interpret the robots.txt file and if each URL you are testing would be blocked or allowed.</li>
</ol>
<p>At this point you may be thinking, wow, I can do all this and not have to write any new code? Unfortunately, there are even more downsides to this approach than the others:</p>
<ul>
<li>This option will fix issue 1 (duplicate content), but not issues 2 (ranking) and 3 (reporting). This can be a good interim solution while you&#8217;re implementing the more complete redirects solution, but it often isn&#8217;t useful enough on its own.</li>
<li>Likely this will take a little bit of extra testing to ensure you get the patterns correct in your robots.txt file and don&#8217;t inadvertently block content you want indexed.</li>
</ul>
<h3>Yahoo Site Explorer</h3>
<p>Yahoo provides an online tool designed to solve this scenario. However, the solution only helps with Yahoo search traffic. To use the Yahoo fix, simply go to <a href="http://siteexplorer.search.yahoo.com" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com?referer=');">http://siteexplorer.search.yahoo.com</a> and create an account for your web site in the Yahoo Site Explorer tool. Once you&#8217;ve verified ownership of your web site, you can use their <a href="http://www.ysearchblog.com/archives/000479.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000479.html?referer=');">Dynamic URL Rewriting</a> tool to indicate which parameters in your URLs Yahoo should ignore.</p>
<p><a href="http://siteexplorer.search.yahoo.com" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com?referer=');"><img title="yahoo-url-rewriting-tool" src="http://janeandrobot.com/wp-content/uploads/2008/11/yahoo2.png" alt="Yahoo URL Rewriting Tool" /></a></p>
<p>Simply specify the name of the parameter you use for referral tracking (in our example it is &#8216;from&#8217;), and set the action &#8216;Remove from URLs&#8217;. Yahoo will then remove that parameter from all of your URLs while processing them and give you a handy little report about how many URLs where impacted.</p>
<p>Again, this is another solution that seems too easy to be true, but again, there are some significant limitations with this approach:</p>
<ul>
<li>At the end of the day this is still a Yahoo-only solution. With approximately 20% market share, it is likely this will not meet all of your needs. However, if you do get some percentage of your traffic from Yahoo, there is no harm in doing this in the short term while you implement another method in the longer term.</li>
<li>The other problem with this solution is that it doesn&#8217;t solve issue #3 (reporting), so you are still susceptible to reporting errors due to folks bookmarking and emailing your URLs with tracking codes.</li>
</ul>
<h2>Common Pitfalls</h2>
<h3>Cloaking &amp; Conditional Redirects</h3>
<p>Some web sites and SEO consultants attempt to solve this by a technique called cloaking or conditional redirects. Essentially what these methods do is check if the HTTP GET request is coming from a search engine and then show them something different than normal users see. This something different could be a simple 301 redirect back to the page without the tracking parameter similar to our first solution above. The difference is that our solution implemented this redirect for all requesters, and cloaking/ conditional redirects implement it only for search engines.</p>
<p>The big problem with this implementation method is that cloaking and conditional redirects are explicitly prohibited in the webmaster guidelines for <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=66355" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=66355&amp;referer=');">Google</a>, <a href="http://help.yahoo.com/l/us/yahoo/search/basics/basics-18.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/basics/basics-18.html?referer=');">Yahoo</a> and <a href="http://help.live.com/help.aspx?mkt=en-us&amp;project=wl_webmasters" onclick="pageTracker._trackPageview('/outgoing/help.live.com/help.aspx?mkt=en-us_amp_project=wl_webmasters&amp;referer=');">Live Search</a>.  If you use this method, you risk your pages being penalized or banned by the search engines. The primary reason they prohibit this behaviors is because they want to know exactly what content they are presenting searchers using their service. When a web site shows something different to a search engine robot than to a general user, a search engine can never be sure what the user will see when they go to the web site. So, even if you&#8217;re thinking of implementing cloaking for what seems to be a <a href="http://www.ninebyblue.com/blog/whats-really-black-hat-anyway/">valid, and not deceptive, reason</a>, it&#8217;s still a technique search engines strongly discourage.</p>
<p>This leads to the second major problem with this implementation method &#8211; it adds significant complication and can be difficult to monitor whether or not it&#8217;s working &#8211; e.g. you have to test it pretending to be each of the 3 search engines robots. When things go wrong, it is likely that you&#8217;re not going to see it right away, and by the time you do, your search engine traffic may already be impacted. Check out this example when Nike ran into an<a href="http://www.vabeachkevin.com/nikecom-pay-attention-googlebot-cloaking-broken/" onclick="pageTracker._trackPageview('/outgoing/www.vabeachkevin.com/nikecom-pay-attention-googlebot-cloaking-broken/?referer=');">issue with cloaking</a>.</p>
<h3>Crazy Tracking Codes</h3>
<p>Many studies on the web that show <a href="http://www.marketingsherpa.com/article.php?ident=30181" onclick="pageTracker._trackPageview('/outgoing/www.marketingsherpa.com/article.php?ident=30181&amp;referer=');">customers prefer short, understandable URLs</a> over long complicated ones, and are more likely to click on them in the search results. In addition, users prefer descriptive keywords in URLs. Therefore, it might be worth your time to spend a few extra minutes thinking about the tracking codes you use to see if you can make them friendlier.</p>
<p>Good examples</p>
<ul>
<li>?from=promo</li>
<li>?from=developer-video</li>
<li>?partner=a768sdf129</li>
</ul>
<p>Bad examples</p>
<ul>
<li>?i=A768SDF129,re23ADFA,style-23423,date-2008-02-01&amp;page=2</li>
<li>?IAmSpyingOnYou=a768sdf129&amp;YouAreASucker=re23adfd</li>
</ul>
<h2>Testing Your Implementation</h2>
<p>So you&#8217;ve implemented your new favorite method, it compiles on your dev box, and now it&#8217;s time to roll it into production, right? Maybe not! The initial goal of referrer URL-based tracking was to understand where your traffic was coming from so you can use that information to optimize your business. To ensure the data your collecting is actually useful, we highly recommend that you do some testing to ensure that all the common scenarios are working the way you expect, and you know where the holes are in your measurement capabilities. As with all metrics on the web, there will be holes in your data so you need to know what they are and account for them.</p>
<p>The first step in testing the implementation is to try it with a test parameter, walking the full scenario through start to finish.</p>
<ol>
<li>Create several phoney promotional links that reflect the actual types of links you expect. This could be on your home page, product pages or with many additional query parameters that you might encounter.</li>
<li>Place these fake promotional links in a location that won&#8217;t confuse your customers but are likely to get indexed by search engines. Using a social networking site or a blog might serve this well.</li>
<li>Click through those links as a customer and verify that you get to the correct page with a good user experience. Be sure to take these into account as well:
<ul>
<li>Redirects operating properly (if you&#8217;re using them) - use the <a href="https://addons.mozilla.org/en-US/firefox/addon/3829" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/3829?referer=');">Live HTTP Headers</a> tool in FireFox to ensure the application is providing the correct headers (301 redirect and caching).</li>
<li>Major browsers all work- if you&#8217;re using cookies, you should test all the major browsers to ensure that they support cookies and that your scenario works the way you might expect. Don&#8217;t forget to try common mobile browsers if your customers access your site this way.</li>
</ul>
</li>
<li>Check out the search engine experience to ensure that you&#8217;re not running into the duplicate content or ranking issues.
<ul>
<li>Major Engines submit URL - if you place the test URLs in the right social network or place on your blog, they should get indexed within a week or so. If they don&#8217;t you can also try the &#8220;submit a URL&#8221; from <a href="http://www.google.com/addurl/" onclick="pageTracker._trackPageview('/outgoing/www.google.com/addurl/?referer=');">Google</a>,<a href="http://siteexplorer.search.yahoo.com/submit" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com/submit?referer=');">Yahoo</a> and <a href="http://search.msn.com.sg/docs/submit.aspx" onclick="pageTracker._trackPageview('/outgoing/search.msn.com.sg/docs/submit.aspx?referer=');">Microsoft</a>, though they are not guaranteed to work. Essentially you want to make sure the search engines have had the opportunity to see these URLs.</li>
<li>Use &#8217;site:&#8217; command to ensure tracking URLs are not indexed - here&#8217;s an example query in <a href="/admin/Pages/site:janeandrobot.com%20inurl:from">Google</a>, <a href="http://siteexplorer.search.yahoo.com/search?p=http%3A%2F%2Fjaneandrobot.com&amp;fr=sfp" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com/search?p=http_3A_2F_2Fjaneandrobot.com_amp_fr=sfp&amp;referer=');">Yahoo</a>, and <a href="http://search.live.com/results.aspx?q=site%3Ajaneandrobot.com&amp;first=1&amp;FORM=PERE" onclick="pageTracker._trackPageview('/outgoing/search.live.com/results.aspx?q=site_3Ajaneandrobot.com_amp_first=1_amp_FORM=PERE&amp;referer=');">Microsoft</a> showing that our Jane and Robot example promotional URLs are not indexed.</li>
</ul>
</li>
<li>Take a look at your metrics and ensure the numbers you&#8217;re recording correlate to the testing you are doing. Some additional things to consider:
<ul>
<li><span style="text-decoration: underline;">Internal referrals </span>- you might also want to add some logic to your application to filter out (or exclude) all referrals from the development team and your own employees. This is often done by checking requests against a list of known employee or company IP addresses and scrubbing those from your tracking data.</li>
<li><span style="text-decoration: underline;">Caching Issues </span>- you might also want to try out several scenarios with multiple subsequent requests. You&#8217;ll want to ensure that every request is going to your server and not getting cached somewhere along the way.</li>
</ul>
</li>
</ol>
<h2>Related Resources</h2>
<ul>
<li>Related Internet Standards:
<ul>
<li><a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9" onclick="pageTracker._trackPageview('/outgoing/www.w3.org/Protocols/rfc2616/rfc2616-sec14.html_sec14.9?referer=');">W3C Standard For Cache-Control Header</a></li>
<li><a href="http://www.apps.ietf.org/rfc/rfc2396.html" onclick="pageTracker._trackPageview('/outgoing/www.apps.ietf.org/rfc/rfc2396.html?referer=');">URI Specification</a> (how URLs work)</li>
</ul>
</li>
<li>Tools Used in Article:
<ul>
<li><a href="http://google.com/webmasters/tools" onclick="pageTracker._trackPageview('/outgoing/google.com/webmasters/tools?referer=');">Google Webmaster Tools</a> (Robots.txt Tester)</li>
<li><a href="http://siteexplorer.search.yahoo.com/" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com/?referer=');">Yahoo Site Explorer</a> (Dynamic URL Rewriting)</li>
<li><a href="https://addons.mozilla.org/en-US/firefox/addon/3829" onclick="pageTracker._trackPageview('/outgoing/addons.mozilla.org/en-US/firefox/addon/3829?referer=');">Live HTTP Headers</a> (View HTTP Headers)</li>
<li>Suggest URL Tool - <a href="http://www.google.com/addurl/" onclick="pageTracker._trackPageview('/outgoing/www.google.com/addurl/?referer=');">Google</a>, <a href="http://siteexplorer.search.yahoo.com/submit" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com/submit?referer=');">Yahoo</a>, <a href="http://search.msn.com.sg/docs/submit.aspx" onclick="pageTracker._trackPageview('/outgoing/search.msn.com.sg/docs/submit.aspx?referer=');">Microsoft</a></li>
</ul>
</li>
<li>Canonical Link Tag Standard
<ul>
<li><a href="http://searchengineland.com/canonical-tag-16537" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/canonical-tag-16537?referer=');">Search Engine Land Article</a> (Best Practices)</li>
<li><a href="http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html?referer=');">Google Announcement Blog Post</a></li>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=139394" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=139394&amp;referer=');">Google Help Documentation</a></li>
<li><a href="http://ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/" onclick="pageTracker._trackPageview('/outgoing/ysearchblog.com/2009/02/12/fighting-duplication-adding-more-arrows-to-your-quiver/?referer=');">Yahoo Announcement Blog Post</a></li>
<li><a href="http://blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2009/02/12/partnering-to-help-solve-duplicate-content-issues.aspx?referer=');">Live Search Announcement Blog Post</a></li>
</ul>
</li>
<li>Related articles
<ul>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=66359" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=66359&amp;referer=');">Duplicate Content &#8211; Google Technical Support</a></li>
<li><a href="http://blogs.omniture.com/2008/10/01/campaign-tracking-inside-omniture-sitecatalyst/" onclick="pageTracker._trackPageview('/outgoing/blogs.omniture.com/2008/10/01/campaign-tracking-inside-omniture-sitecatalyst/?referer=');">URL Tracking in Omniture&#8217;s SiteCatalyst</a></li>
<li><a href="http://www.google.com/support/googleanalytics/bin/answer.py?answer=55515" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/googleanalytics/bin/answer.py?answer=55515&amp;referer=');">Goal Tracking in Google Analytics</a></li>
</ul>
</li>
<li><a href="http://www.kaushik.net/avinash" onclick="pageTracker._trackPageview('/outgoing/www.kaushik.net/avinash?referer=');">Occam’s Razor by Avinash Kaushik</a></li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/url-referrer-tracking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Managing Robot’s Access To Your Website</title>
		<link>http://www.ninebyblue.com/blog/managing-robots-access-to-your-website-2/</link>
		<comments>http://www.ninebyblue.com/blog/managing-robots-access-to-your-website-2/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 03:06:10 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[SEO]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1347</guid>
		<description><![CDATA[Note: This post was originally posted on Jane and Robot in June 2008 and is being temporarily stored here.

Controlling what content is blocked from being found in search engines is crucial for many websites. Fortunately, the major search engines and other well-behaved robots observe the Robots Exclusion Protocol (REP), which has evolved organically since the early [...]]]></description>
			<content:encoded><![CDATA[<p><strong><em>Note: </em></strong><em>This post was originally posted on Jane and Robot in June 2008 and is being temporarily stored here.</em></p>
<div>
<p>Controlling what content is blocked from being found in search engines is crucial for many websites. Fortunately, the major search engines and other well-behaved robots observe the <a href="http://www.robotstxt.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.robotstxt.org/?referer=');">Robots Exclusion Protocol</a> (REP), which has evolved organically since the early 1990&#8217;s to provide a set of controls over what parts of a web site search engines robots can crawl and index.</p>
<p>Article Sections:</p>
<ul>
<li><a href="#Capabilities_of_the_REP">Capabilities of REP</a></li>
<li><a href="#Deciding_what_should_be_Public_vs._Private">Deciding What Should be Public vs. Private</a></li>
<li><a href="#Implementing_the_REP">Implementing the REP</a>
<ul>
<li><a href="#Site_Level_Implementation_(Robots.txt)">Site Level</a></li>
<li><a href="#Page_Level_Implementation_(META_Tags)">Page Level (Meta Tags)</a></li>
<li><a href="#HTTP_Header_Implementation_(X-ROBOTS-Tag)">Page Level (HTTP Header)</a></li>
<li><a href="#Content_Level_Implementation">Content Level</a></li>
</ul>
</li>
<li><a href="#Common_implementation_mistakes">Common Mistakes</a></li>
<li><a href="#Testing_your_implementation_">Testing Your Implementation</a></li>
<li><a href="#removal">Removing Content From Search Engine Indices</a></li>
<li><a href="#Additional_Resources:_">Additional Resources</a></li>
</ul>
<h2><a name="Capabilities_of_the_REP"></a>Capabilities of the REP</h2>
<p>The Robots Exclusion Protocol provides controls that can be applied at the site level (robots.txt), at the page level (META tag, or X-Robots-Tag), or at the HTML element level to control both the crawl of your site and the way it&#8217;s listed in the search engine results pages (SERPs). Below is a table listing the common scenarios, directives, and which search engines support them.</p>
<table border="1" cellspacing="0" cellpadding="2">
<tbody>
<tr>
<td valign="top">Use Case</td>
<td valign="top">Robots.txt</td>
<td valign="top">META/ X-Robots-Tag</td>
<td valign="top">Other</td>
<td valign="top">Supported By</td>
</tr>
<tr>
<td valign="top">Allow access to your content</td>
<td valign="top">Allow</td>
<td valign="top">FOLLOW<br />
INDEX</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40364&amp;referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Disallow access to your content</td>
<td valign="top">Disallow</td>
<td valign="top">NOINDEX<br />
NOFOLLOW</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=35303" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=35303&amp;referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Disallow access to index images on the page</td>
<td valign="top"></td>
<td valign="top">NOIMAGEINDEX</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=79892" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=79892&amp;referer=');">Google</a></td>
</tr>
<tr>
<td valign="top">Disallow the display of a cached version of your content in the SERP</td>
<td valign="top"></td>
<td valign="top">NOARCHIVE</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35306=" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35306=&amp;referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/deletion/basics-10.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/deletion/basics-10.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Disallow the creation of a description for this content in the SERP</td>
<td valign="top"></td>
<td valign="top">NOSNIPPET</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35304" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35304&amp;referer=');">Google</a><br />
<a href="http://www.ysearchblog.com/archives/000587.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000587.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Disallow the translation of your content into other languages</td>
<td valign="top"></td>
<td valign="top">NOTRANSLATE</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/help/faq_translation.html#donttrans" onclick="pageTracker._trackPageview('/outgoing/www.google.com/help/faq_translation.html_donttrans?referer=');">Google</a></td>
</tr>
<tr>
<td valign="top">Do not follow or give weight to links within this content</td>
<td valign="top"></td>
<td valign="top">NOFOLLOW</td>
<td valign="top">a href attribute:<br />
rel=NOFOLLOW</td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=96569" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=96569&amp;referer=');">Google</a><br />
<a href="http://www.ysearchblog.com/archives/000069.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000069.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/livesearch/archive/2005/01/18/nofollow_tags.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2005/01/18/nofollow_tags.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Do not use the <a href="http://www.dmoz.org/" target="_blank" onclick="pageTracker._trackPageview('/outgoing/www.dmoz.org/?referer=');">Open Directory Project</a> (ODP) to create descriptions for your content in the SERP</td>
<td valign="top"></td>
<td valign="top">NOODP</td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35264" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35264&amp;referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/indexing/indexing-11.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/indexing/indexing-11.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Do not use the Yahoo Directory to create descriptions for your content in the SERP</td>
<td valign="top"></td>
<td valign="top">NOYDIR</td>
<td valign="top"></td>
<td valign="top"><a href="http://blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/06/03/robots-exclusion-protocol-joining-together-to-provide-better-documentation.aspx?referer=');">Yahoo</a></td>
</tr>
<tr>
<td valign="top">Do not index this specific element within an HTML page</td>
<td valign="top"></td>
<td valign="top"></td>
<td valign="top">class=robots-nocontent</td>
<td valign="top"><a href="http://www.ysearchblog.com/archives/000444.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000444.html?referer=');">Yahoo</a></td>
</tr>
<tr>
<td valign="top">Stop indexing this content after a specific date</td>
<td valign="top"></td>
<td valign="top">UNAVAILABLE_AFTER</td>
<td valign="top"></td>
<td valign="top"><a href="http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html" onclick="pageTracker._trackPageview('/outgoing/googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html?referer=');">Google</a></td>
</tr>
<tr>
<td valign="top">Disallow the creation of enhanced captions</td>
<td valign="top"></td>
<td valign="top">NOPREVIEW</td>
<td valign="top"></td>
<td valign="top"><a href="http://bing.com/community" onclick="pageTracker._trackPageview('/outgoing/bing.com/community?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Specify a sitemap file or a sitemap index file</td>
<td valign="top">Sitemap</td>
<td valign="top"></td>
<td valign="top"></td>
<td valign="top"><a href="http://www.google.com/support/webmasters/bin/answer.py?hl=en&amp;answer=64748" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?hl=en_amp_answer=64748&amp;referer=');">Google</a><br />
<a href="http://www.ysearchblog.com/archives/000437.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000437.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/livesearch/archive/2007/04/11/discovering-sitemaps.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2007/04/11/discovering-sitemaps.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Specify how frequently a crawler may access your website</td>
<td valign="top">Crawl-Delay</td>
<td valign="top"></td>
<td valign="top"><a href="http://google.com/webmaster" onclick="pageTracker._trackPageview('/outgoing/google.com/webmaster?referer=');">Google WMT</a></td>
<td valign="top"><a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-03.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-03.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/webmaster/archive/2008/04/18/ramping-up-msnbot.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/webmaster/archive/2008/04/18/ramping-up-msnbot.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Authenticate the identity of the crawler</td>
<td valign="top"></td>
<td valign="top"></td>
<td valign="top">Reverse DNS Lookup</td>
<td valign="top"><a href="http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html?referer=');">Google</a><br />
<a href="http://www.ysearchblog.com/archives/000460.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000460.html?referer=');">Yahoo</a><br />
<a href="http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx?referer=');">Microsoft</a></td>
</tr>
<tr>
<td valign="top">Request removal of your content from the engine&#8217;s index</td>
<td valign="top"></td>
<td valign="top"></td>
<td valign="top"><a href="http://google.com/webmaster" onclick="pageTracker._trackPageview('/outgoing/google.com/webmaster?referer=');">Google WMT</a><br />
<a href="http://siteexplorer.search.yahoo.com" onclick="pageTracker._trackPageview('/outgoing/siteexplorer.search.yahoo.com?referer=');">Yahoo SE</a><br />
<a href="http://webmaster.live.com/" onclick="pageTracker._trackPageview('/outgoing/webmaster.live.com/?referer=');">Microsoft WMT</a></td>
<td valign="top"><a href="http://googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html?referer=');">Google</a><br />
<a href="http://help.yahoo.com/l/us/yahoo/search/siteexplorer/delete/" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/siteexplorer/delete/?referer=');">Yahoo</a><br />
Microsoft</td>
</tr>
</tbody>
</table>
<h2><a name="Deciding_what_should_be_Public_vs._Private"></a>Deciding What Should be Public vs. Private</h2>
<p>One of the first steps in managing the robots is knowing what type of content should be public vs. private. Start with the assumption that by default, everything is public, then explicitly identify the items that are private.</p>
<p>If you want search engines to access all the content on your site, you don&#8217;t need a robots.txt file at all. When a search engine tries to access the robots.txt file on your site and the server can&#8217;t return one (ideally by returning a 404 HTTP status code), the search engine treats this the same as a robots.txt file that allows access to everything.</p>
<p>Every website and every business has a different set of needs, so there&#8217;s no blanket rule for what to make private, but some common elements may apply.</p>
<ul>
<li><strong>Private data -</strong> You may have content on your site that you don&#8217;t want to be searchable in search engines. For instance, you may have private user information (such as addresses) that you don&#8217;t want surfaced. For this type of content, you may want to use a more secure approach that keeps all visitors from the pages (such as password protection). However, some types of content are fine for visitor access, but not search engine access. For instance, you may run a discussion forum that is open for public viewing, but you may not want individual posts to appear in search results for forum member names.</li>
<li><a name="noncontent"></a><strong>Non-content content </strong>- Some content, like <a href="/post/Effectively-Using-Images.aspx#noncontent">images used for navigation</a>, provides little value to searchers. It&#8217;s not harmful to include these items in search engine indices, but since search engines allocate limited bandwidth to crawl each site and limited space to store content from each site, it may make sense to block these items to help direct the bots to the content on your site that you do want indexed.</li>
<li><strong>Printer-friendly pages -</strong> if you have specific pages (URLs) that are formatted for printing you may want to block them out to avoid duplicate content issues. The drawback to allowing the printer-friendly page to be indexed is that it could potentially be listed in the search results instead of the default version of the page, which wouldn&#8217;t provide an ideal user experience for a visitor coming to the site through search.</li>
<li><strong>Affiliate links and advertising -</strong> If you include advertising on your site, you can keep search engine robots from following the links by redirecting them to a blocked page, then on to the destination page. (There are other methods for implementing advertising-based links as well.)</li>
<li><strong>Landing pages -</strong> Your site may include multiple variations of entry pages used for advertising purposes. For instance, you may run AdWords campaigns that link to a particular version of a page based on the ad, or you may print different URLs for different print ad campaigns (either for tracking purposes or to provide a custom experience related to the ad). Since these pages are meant to be an extension of the ad, and are generally near duplicates of the default version of the page, you may want to block these landing pages from being indexed.</li>
<li><strong>Experimental pages -</strong> As you try new ideas on your site (for instance, using A/B testing), you likely want to block all but the original page from being indexed during the experiment.</li>
</ul>
<h2><a name="Implementing_the_REP"></a>Implementing the REP</h2>
<p>REP is flexible and can be implemented a number of ways. This flexibility lets you easily specify some policies for your entire site (or subdomain) and then enhance them more granularly at the page or link level as needed.</p>
<h3><a name="Site_Level_Implementation_(Robots.txt)"></a>Site Level Implementation (Robots.txt)</h3>
<p>Site wide directives are stored in a robots.txt file, which must be located in the root directory of each domain or sub-domain (e.g. <a href="/robots.txt">http://janeandrobot.com/robots.txt</a>.) Note that robots.txt files only apply to the hostname where they are placed, and do not apply to subdomains. So a robots.txt file located on <a href="http://microsoft.com/robots.txt" onclick="pageTracker._trackPageview('/outgoing/microsoft.com/robots.txt?referer=');">http://microsoft.com/robots.txt</a> will not apply to the MSDN subdomain <a href="http://msdn.microsoft.com/" onclick="pageTracker._trackPageview('/outgoing/msdn.microsoft.com/?referer=');">http://msdn.microsoft.com</a>. However, the robots.txt file does apply to all subfolders and pages within the specified hostname.</p>
<p>A robots.txt file is a UTF-8 encoded file that contains entries that consist of a user-agent line (that tells the search engine robot if the entry is directed at it) and one or more directives that specify content that the search engine robot is blocked from crawling or indexing. A simple robots.txt file is shown below.</p>
<div>
<pre>User-agent: *</pre>
<pre>Disallow: /private</pre>
</div>
<p><code>user-agent:</code> &#8211; Specifies which robots the entry applies to.</p>
<ul>
<li>Set this to <code>*</code> to specify that this entry applies to all search engine robots.</li>
<li>Set this to a specific robot name to provide instructions for just that robot. You can find a complete list of robot names at <a href="http://www.robotstxt.org" onclick="pageTracker._trackPageview('/outgoing/www.robotstxt.org?referer=');">robotstxt.org</a>.</li>
<li>If you direct an entry at a particular robot, then it obeys that entry instead of any entries defined for <code>user-agent: * </code>(rather than in addition to those entries).</li>
</ul>
<p>The major search engines have multiple robots that crawl the web for different types of content (such as images or mobile). They generally begin all robots with the same name so that if you block the major robot, all robots for that search engine are blocked as well. However, if you want to block only the more specific robot, you can block it directly and still allow web crawl access.</p>
<ul>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40364&amp;referer=');">Google</a> &#8211; The primary search engine robot is Googlebot.</li>
<li><a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html?referer=');">Yahoo!</a> &#8211; The primary search engine robot is Slurp.</li>
<li><a href="http://www.bing.com/community/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx" onclick="pageTracker._trackPageview('/outgoing/www.bing.com/community/blogs/webmaster/archive/2010/06/28/bing-crawler-bingbot-on-the-horizon.aspx?referer=');">Bing</a> &#8211; The primary search engine robots is Bingbot. (The previous name for this bot was MSNbot, and Microsoft Bing continues to obey directives aimed at that bot as well.)</li>
</ul>
<p><code>Disallow: </code>- Specifies what content is blocked</p>
<ul>
<li>Must begin with a slash (<code>/</code>).</li>
<li>Blocks access to any URLs that begin with the characters after the <code>/</code>. For instance, <code>Disallow: /images</code> blocks access to <code>/images/</code>, <code>/images/image1.jpg</code>, and <code>/images10</code>.</li>
</ul>
<p>You can specify other rules for search engine robots in addition to the standard instructions that block access to content as noted in <a href="#other">other robot instructions</a>.</p>
<p>Some things to note about robots.txt implementation:</p>
<ul>
<li>The major search engines support pattern matching using the asterisk character (*) for wildcard match and the dollar sign ($) for end of sequence matching as described below in <a href="#patterns">using pattern matching</a>.</li>
<li>The robots.txt file is case sensitive, so <code>Disallow: /images </code>would block <code>http://www.example.com/images</code> but not <code>http://www.example.com/Images</code>.</li>
<li>If conflicts exist in the file, the robot obeys the longest (and therefore generally more specific) line.</li>
</ul>
<h4>Basic Samples</h4>
<p>Block all robots - Useful when your site is in pre-launch development and isn&#8217;t ready for search traffic.</p>
<div>
<pre># This keeps out all well-behaved robots.</pre>
<pre># Disallow: * is not valid.</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /</pre>
</div>
<p>Keep out all bots by default - Blocks all pages except those specified. Not recommended as is difficult to maintain and diagnose.</p>
<div>
<pre># Stay out unless otherwise stated</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /</pre>
<pre>Allow: /Public/</pre>
<pre>Allow: /articles/</pre>
<pre>Allow: /images/</pre>
</div>
<p>Block specific content - The most common usage of robots.txt.</p>
<div>
<pre># Block access to the images folder</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /images/</pre>
</div>
<p><a name="allow"></a>Allow specific content - Block a folder, but allow access to selected pages in that folder.</p>
<div>
<pre># Block everything in the images folder</pre>
<pre># Except allow images/image1.jpg</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /images/</pre>
<pre>Allow: /images/image1.jpg</pre>
</div>
<p><a href="/admin/Pages/patterns"></a>Allow specific robot - Block a class of robots (for instance, Googlebot), but allow a specific bot in that class (for instance, Googlebot-Mobile).</p>
<div>
<pre># Block Googlebot access</pre>
<pre># Allow Googlebot-Mobile access</pre>
<pre>User-agent: Googlebot</pre>
<pre>Disallow: /</pre>
<pre>User-agent: Googlebot-Mobile</pre>
<pre>Allow: /</pre>
</div>
<h4>Pattern Matching Examples</h4>
<p>The major engines support two types of pattern matching.</p>
<ul>
<li>* matches any sequence of characters</li>
<li>$ matches the end of  URL.</li>
</ul>
<p>Block access to URLs that contain a set of characters - Use the asterisk (*) to specify a wildcard.</p>
<div>
<pre># Block access to all URLs that include an ampersand</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /*&amp;</pre>
</div>
<p>This directive would block search engines from crawling <code>http://www.example.com/page1.asp?id=5&amp;sessionid=xyz</code>.</p>
<p>Block access to URLs that end with a set of characters - Use the dollar sign ($) to specify end of line.</p>
<div>
<div>
<pre># Block access to all URLs that end in .cgi</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /*.cgi$</pre>
</div>
<p>This directive would block search engines from crawling <code>http://www.example.com/script1.cgi</code> but not from crawling <code>http://www.example.com/script1.cgi?value=1</code>.</p>
<p>Selectively allow access to a URL that matches a blocked pattern - Use the <code>Allow</code> directive in conjunction with pattern matching for more complex implementations.</p>
<div>
<pre># Block access to URLs that contain ?</pre>
<pre># Allow access to URLs that end in ?</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /*?</pre>
<pre>Allow: /*?$</pre>
</div>
<p>That directive blocks all URLs that contain <code>?</code> except those that end in <code>?</code>. In this example, the default version of the page will be indexable:</p>
<ul>
<li><code>http://www.example.com/productlisting.aspx?</code></li>
</ul>
<p>Variations of the page will be blocked:</p>
<ul>
<li><code>http://www.example.com/productlisting.aspx?nav=price</code></li>
<li><code>http://www.example.com/productlisting.aspx?sort=alpha</code></li>
</ul>
<h4><a name="other"></a>Other robot instructions</h4>
</div>
<p>Specify a Sitemap or Sitemap index file - If you&#8217;d like to provide search engines with a comprehensive list of your best URLs, you can provide one or more <a href="http://sitemaps.org" target="_blank" onclick="pageTracker._trackPageview('/outgoing/sitemaps.org?referer=');">Sitemap</a> autodiscovery directives. Note, user-agent does not apply to this directive so you cannot use this to specify a Sitemap to some but not all search engines.</p>
<div>
<pre># Please take my sitemap and index everything!</pre>
<pre>Sitemap: http://janeandrobot.com/sitemap.axd</pre>
</div>
<p>Reduce the crawling load - This only works with Microsoft and Yahoo. For Google you&#8217;ll need to specify a slower crawling speed through their <a href="http://google.com/webmaster" target="_blank" onclick="pageTracker._trackPageview('/outgoing/google.com/webmaster?referer=');">Webmaster Tools</a>. Be careful when implementing this because if you slow down the crawl too much, robots won&#8217;t be able to get to all of your site and you may lose pages from the index.</p>
<div>
<pre># Bingbot, please wait 5 seconds in between visits</pre>
<pre>User-agent: bingbot</pre>
<pre>Crawl-delay: 5</pre>
<pre># Yahoo's Slurp, please wait 12 seconds in between visits</pre>
<pre>User-agent: slurp</pre>
<pre>Crawl-delay: 12</pre>
</div>
<h3><a name="Page_Level_Implementation_(META_Tags)"></a>Page Level Implementation (META Tags)</h3>
<p>The REP page-level directives allow you to refine the site wide policies on a page-by-page basis</p>
<p>Placing a meta tag on the page - Place the meta tag in the head tag. Each directive should be comma delimited inside the tag. E.g. &lt;meta name=&#8221;ROBOTS&#8221; content=&#8221;Directive1, Directive 2&gt;.</p>
<div>
<pre>&lt;html&gt;</pre>
<pre>&lt;head&gt;</pre>
<pre>&lt;title&gt;Your title here&lt;/title&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOINDEX"&gt;</pre>
<pre>&lt;/head&gt;</pre>
<pre>&lt;body&gt;Your page here&lt;/body&gt;</pre>
<pre>&lt;/html&gt;</pre>
</div>
<p>Targeting a specific search engine - Within the meta tag you can specify which search engine you would like to target, or you can target them all.</p>
<div>
<pre>&lt;!-- Applies to All Robots --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOINDEX"&gt;</pre>
<pre>&lt;!-- ONLY GoogleBot --&gt;</pre>
<pre>&lt;meta name="Googlebot" content="NOINDEX"&gt;</pre>
<pre>&lt;!-- ONLY Slurp (Yahoo) --&gt;</pre>
<pre>&lt;meta name="Slurp" content="NOINDEX"&gt;</pre>
<pre>&lt;!-- ONLY BingBot (Microsoft) --&gt;</pre>
<pre>&lt;meta name="BingBot" content="NOINDEX"&gt;</pre>
</div>
<p>Control how your listings - there are a set of options you can use to determine how your site will show up on the SERP. You can exert some control over how the description is created, and remove the &#8220;Cached page&#8221; link.</p>
<p><img title="example-serp" src="http://janeandrobot.com/wp-content/uploads/2008/06/example-serp.gif" alt="example-serp" /></p>
<div>
<pre>&lt;!-- Do not show a description for this page --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOSNIPPET"&gt;</pre>
<pre>&lt;!-- Do not use http://dmoz.org to create a description --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOODP"&gt;</pre>
<pre>&lt;!-- Do not present a cached version of the document in a search result --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOARCHIVE"&gt;</pre>
</div>
<p>Using other directives - Other meta robots directives are shown below.</p>
<div>
<pre>&lt;!-- Do not trust links on this page, could be user generated content (UCG) --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOFOLLOW"&gt;</pre>
<pre>&lt;!-- Do not index this page --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOINDEX"&gt;</pre>
<pre>&lt;!-- Do not index any images on this page (will still index the if they are linked</pre>
<pre>     elsewhere) Better to use Robots.txt if you really want them safe.</pre>
<pre>     This is a Google Only tag. --&gt;</pre>
<pre>&lt;meta name="GOOGLEBOT" content="NOIMAGEINDEX"&gt;</pre>
<pre>&lt;!-- Do not translate this page into other languages--&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="NOTRANSLATE"&gt;</pre>
<pre>&lt;!-- NOT RECOMMENDED, there really isn't much point in using these --&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="FOLLOW"&gt;</pre>
<pre>&lt;meta name="ROBOTS" content="UNAVAILABLE_AFTER"&gt;</pre>
</div>
<h3><a name="HTTP_Header_Implementation_(X-ROBOTS-Tag)"></a>HTTP Header Implementation (X-ROBOTS-Tag)</h3>
<p>Allows developers to specify page-level REP directives for non text/html content types like PDF, DOC, PPT, or dynamically generated images.</p>
<p>Using the X-Robots-Tag - to use the X-Robots-Tag, simply add it to your header as shown below. To specify multiple directives you can either comma delimit them, or add them as separate header items.</p>
<div>
<pre>HTTP/1.x 200 OK</pre>
<pre>Cache-Control: private</pre>
<pre>Content-Length: 2199552</pre>
<pre>Content-Type: application/octet-stream</pre>
<pre>Server: Microsoft-IIS/7.0</pre>
<pre>content-disposition: inline; filename=01 - The truth about SEO.ppt</pre>
<pre>X-Robots-Tag: noindex, nosnippet</pre>
<pre>X-Powered-By: ASP.NET</pre>
<pre>Date: Sun, 01 Jun 2008 19:25:47 GMT</pre>
</div>
<p>The X-Robots-Tag directive supports most of the same directives as the meta tag. The only limitation with this method over the meta tag implementation is that there is no way to target a specific robot &#8211; though that probably isn&#8217;t a big deal for most use cases.</p>
<ul>
<li>X-Robots-Tag: noindex</li>
<li>X-Robots-Tag: nosnippet</li>
<li>X-Robots-Tag: notranslate</li>
<li>X-Robots-Tag: noarchive</li>
<li>X-Robots-Tag: unavailable_after: 7 Jul 2007 16:30:00 GMT</li>
</ul>
<h3><a name="Content_Level_Implementation"></a>Content Level Implementation</h3>
<p>You can further refine your site level and page level directives within several content tags.</p>
<p>Each anchor tag (link) can be modified to tell search engines that you do not trust where this URL is pointing to. This is typically used for links within user generated content (UCG) like wikis, blog comments, reviews and other community sites.</p>
<div>
<pre>&lt;a href="#" rel="NOFOLLOW"&gt;My Hyperlink&lt;/a&gt;</pre>
</div>
<p>Also, in Yahoo Search you can specify which &lt;div&gt; elements on a page you would not like indexed using the <code>class=robots-nocontent</code> attribute. However, we don&#8217;t highly recommend using this tag because it is not supported in any other engine, making it not super-useful.</p>
<div>
<pre>&lt;div&gt;</pre>
<pre>No content for you! (or at least Yahoo!)</pre>
<pre>&lt;/div&gt;</pre>
</div>
<h2><a name="Common_implementation_mistakes"></a>Common Mistakes</h2>
<p>While implementing the REP is generally straight-forward, there are a few common mistakes.</p>
<ul>
<li>GoogleBot follows the most specific directive, ignoring all others. In the robots.txt file, if you specify a section for all user-agents (<code>user-agent: *</code>) and also declare a section for Googlebot (<code>user-agent: Googlebot</code>), Google will disregard all sections in the robots.txt file except the Googlebot section. This could potentially leave you exposing much more content to Google that you might have thought.</li>
</ul>
<div>
<pre># This keeps out all well-behaved robots</pre>
<pre>User-agent: *</pre>
<pre>Disallow: /</pre>
<pre># This looks like it is giving Google access to only this directory, but since it is a</pre>
<pre># GoogleBot specific section, Google will disregard the previous section</pre>
<pre># and access the whole site.</pre>
<pre>User-agent: Googlebot</pre>
<pre>Allow: /Content_For_Google/</pre>
</div>
<ul>
<li><strong>NOFOLLOW will most likely not prevent indexing -</strong> if you use <code>NOFOLLOW</code> at either the page or the link level, it is still possible for the links from the page to be indexed because the search engine may have found a reference to them from another source. Another note, using <code>rel="NOFOLLOW"</code> within your anchor text is still perceived as a recommendation by the search engines, not a command.To ensure that content is not indexed, either use the <code>Disallow</code> directive at the site level, or use <code>NOINDEX</code> at the page level.</li>
<li><strong>Directives that are not recommended -</strong> the directives in the REP are all about exceptions, by default the robots assume they can crawl your whole site. Therefore, you do not need to explicitly use the <code>FOLLOW</code> and<code>INDEX</code> directives as they will not be taken into account by the search engines. It sounds silly but I&#8217;ve seen a few sites that have implemented these on every page and every link.Another directive that is not recommended is the <code>NOCACHE</code> directive. This was created by Microsoft, and is synonymous with <code>NOARCHIVE</code>. While they will most likely always continue to support the directive, it is better to use <code>NOARCHIVE</code> so it will work on all the search engines.</li>
<li><strong>Be cognizant of case -</strong> when referencing files and URLs in the robots.txt file, use a defensive approace to URL case, as the major engines do not handle it the same way. (e.g. /Files does not always equal /files).</li>
</ul>
<h2><a name="Testing_your_implementation_"></a>Testing Your Implementation</h2>
<p>As you&#8217;re implementing your REP design, you should test it both before you deploy it and after. The easiest way to test this is to use the robots validator in Google&#8217;s Webmaster Tools. This tool is a good sanity check to ensure you&#8217;re not blocking URLs you want indexed, however advanced developers (or paranoid ones with critical business requirements) will want to definitively know what the robots are doing, not simply rely on what the robots say they are doing. These folks will want to look at tools as well look at their server logs to see what&#8217;s being crawled definitively.</p>
<p>In addition to using validation tools, reporting tools from the search engines on what they couldn&#8217;t acces, and looking at logs data to see what the search engine robots are crawling, you should check the search engine results to see if any pages you are intending to block are being indexed. If they are, use the methods described in this section to ensure you are blocking them correctly and <a href="#removal">use the search engine tools to request that the pages be removed</a>.</p>
<p><a name="partial"></a><strong>When Blocked Content Appears to be Indexed - </strong>If search engines are blocked from crawling pages, they may still index the URL if the robot finds a link to that URL on a page that isn&#8217;t blocked. The listing may display the URL only, such as shown below.</p>
<p><img title="urlonly" src="http://janeandrobot.com/wp-content/uploads/2008/06/urlonly.gif" alt="urlonly" /></p>
<p>Or, it may include a title and in some instances, a description. This makes it appear as though the search engine robot is disregarding the directive that blocks access to the page, but the search engine is in fact obeying the directive not to crawl the page and is using anchor text from the link to that page and descriptive details from either the page that contains the link or a source such as the <a href="http://www.dmoz.org" onclick="pageTracker._trackPageview('/outgoing/www.dmoz.org?referer=');">Open Directory Project</a>.</p>
<p>For more details, see:</p>
<ul>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=35667" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=35667&amp;referer=');">Google: partially indexed page</a></li>
<li><a href="http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-01.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-01.html?referer=');">Yahoo!: thin documents</a></li>
</ul>
<h3><a name="The_Easy_Way_"></a>The Easy Way</h3>
<p><strong>Search Engine Tools For Validation -</strong> Both Google and Microsoft provide some tools as part of their Webmaster Centers to help you verify if you&#8217;ve configured your REP the way you expect. Let&#8217;s start with Google&#8217;s tools:</p>
<p>The first thing you should check are the list of URLs that Google has seen from your website and not indexed due to the REP. Note you can also download the list and filter, sort, and have-your-way-with-it in Excel.</p>
<p><img title="webmaster-robotstxt-blocked1" src="http://janeandrobot.com/wp-content/uploads/2008/06/webmaster-robotstxt-blocked1.gif" alt="webmaster-robotstxt-blocked1" /></p>
<p>The next step is to use their interactive robots.txt tool to analyze your rules and test specific URLs for blockage. When you pull up the tool they already should have it pre-populated with the robots.txt file they have on file from the last time they crawled. You can input a list of URLs you&#8217;d like to check below, select the user-agent you&#8217;d like to check against and the tool will tell you if they are blocked or not. You can also use the tool to test changes to your robots.txt file to see how Google would interpret things.</p>
<p><img title="google-analyze-robotstxt" src="http://janeandrobot.com/wp-content/uploads/2008/06/google-analyze-robotstxt.jpg" alt="google-analyze-robotstxt" /></p>
<p>Microsoft has list of URLs blocked by robots.txt that Bingbot has tried to crawl as well.</p>
<h3><a name="The_Hard_Way_(More_Accurate)"></a>The Hard Way</h3>
<p><strong>More Accurate Views of Robot Access Through Your Logs -</strong> If you have a specific business need to ensure that the robots are following your rules, (or you&#8217;re just paranoid) then you should not simply rely on the tools they provide to test compliance. You&#8217;re going to need to go straight to the horse&#8217;s mouth and analyze your web server logs to see exactly what they are doing. There is no one easy tool for doing this, you&#8217;ll likely have to use an existing tool like one of these (<a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07" onclick="pageTracker._trackPageview('/outgoing/www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&amp;referer=');">Microsoft HTTP Log Parser</a>) or write your own. It isn&#8217;t difficult, it will simply take some time to implement. A useful reference for this is a list of all the robot <a href="http://www.robotstxt.org/db.html" onclick="pageTracker._trackPageview('/outgoing/www.robotstxt.org/db.html?referer=');">user agents</a>, and more complete list of bots from <a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40364&amp;referer=');">Google</a>, and <a href="http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx?referer=');">Microsoft</a>.</p>
<p><a name="verify"></a><strong>Verifying Robot Identity -</strong> Another thing you&#8217;ll likely want to consider in this endeavor is to validate that the robots are who they actually say they are. Google, Yahoo and Microsoft all support <a href="http://en.wikipedia.org/wiki/Reverse_DNS_lookup" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Reverse_DNS_lookup?referer=');">Reverse DNS authentication</a>of their robots. The process is pretty simple and described here by <a href="http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html?referer=');">Google</a>, <a href="http://www.ysearchblog.com/archives/000460.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000460.html?referer=');">Yahoo </a>and <a href="http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx?referer=');">Microsoft</a>, essentially you simply find out what range their robot&#8217;s DNS is hosted in, and use that in your tool. This way, if the address changes (which it will), you don&#8217;t need to update your code.</p>
<p>Should you find any issues, where one of the robots are not minding the REP, or are misbehaving in some other way, you can always communicate directly with each engine through one of their forums:</p>
<ul>
<li><a href="http://groups.google.com/group/Google_Webmaster_Help-Indexing/topics" onclick="pageTracker._trackPageview('/outgoing/groups.google.com/group/Google_Webmaster_Help-Indexing/topics?referer=');">Google Crawling, Indexing and Ranking Forum</a></li>
<li><a href="http://help.yahoo.com/l/us/yahoo/search/search_support.html" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/search_support.html?referer=');">Yahoo Crawler Feedback Form</a></li>
<li><a href="http://forums.microsoft.com/webmaster/ShowForum.aspx?ForumID=1984&amp;SiteID=79" onclick="pageTracker._trackPageview('/outgoing/forums.microsoft.com/webmaster/ShowForum.aspx?ForumID=1984_amp_SiteID=79&amp;referer=');">Microsoft Crawler Error and Feedback Forum</a></li>
</ul>
<h2><a name="removal"></a>Removing Content From Search Engine Indices</h2>
<p>If you find that you haven&#8217;t implemented the techniques described here correctly and private content from your site is indexed, each of the major search engines has methods available for requesting that it be removed. For more information, see:</p>
<ul>
<li><a href="http://googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2007/04/requesting-removal-of-content-from-our.html?referer=');">Google: Requesting removal of content from our index</a></li>
<li><a href="http://help.yahoo.com/l/us/yahoo/search/siteexplorer/delete/" onclick="pageTracker._trackPageview('/outgoing/help.yahoo.com/l/us/yahoo/search/siteexplorer/delete/?referer=');">Yahoo!: Deleting URLs</a></li>
<li><a href="https://support.live.com/eform.aspx?productKey=wlsearch&amp;page=wlsupport_home_options_form_byemail&amp;ct=eformts" onclick="pageTracker._trackPageview('/outgoing/support.live.com/eform.aspx?productKey=wlsearch_amp_page=wlsupport_home_options_form_byemail_amp_ct=eformts&amp;referer=');">Live Search: Requesting content removal</a></li>
</ul>
<h2><a name="Additional_Resources:_"></a>Additional Resources:</h2>
<ul>
<li>Google
<ul>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40362" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40362&amp;referer=');">How to create a robots.txt file</a></li>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40364" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40364&amp;referer=');">Descriptions of each user-agent that Google uses</a></li>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40367" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40367&amp;referer=');">How to use pattern matching</a></li>
<li><a href="http://www.google.com/support/webmasters/bin/answer.py?answer=40368" onclick="pageTracker._trackPageview('/outgoing/www.google.com/support/webmasters/bin/answer.py?answer=40368&amp;referer=');">How often we recrawl your robots.txt file</a></li>
<li><a href="http://googlewebmastercentral.blogspot.com/2006/08/all-about-googlebot.html" onclick="pageTracker._trackPageview('/outgoing/googlewebmastercentral.blogspot.com/2006/08/all-about-googlebot.html?referer=');">All about Googlebot</a></li>
</ul>
</li>
<li>Yahoo!
<ul>
<li><a href="http://www.ysearchblog.com/archives/000372.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000372.html?referer=');">Wild card support</a></li>
<li><a href="http://www.ysearchblog.com/archives/000508.html" onclick="pageTracker._trackPageview('/outgoing/www.ysearchblog.com/archives/000508.html?referer=');">X-Robots tag directive support</a></li>
</ul>
</li>
<li>Microsoft Bing
<ul>
<li><a href="http://blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx" onclick="pageTracker._trackPageview('/outgoing/blogs.msdn.com/livesearch/archive/2006/11/29/search-robots-in-disguise.aspx?referer=');">Search robots in disguise</a></li>
</ul>
</li>
<li>Other resources
<ul>
<li><a href="http://searchengineland.com/070305-204850.php" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/070305-204850.php?referer=');">Search Engine Land: Meta Robots Tag 101</a></li>
<li><a href="http://searchengineland.com/080603-121100.php" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/080603-121100.php?referer=');">Search Engine Land: Yahoo!, Microsoft, Google Clarify Robots.txt Support</a></li>
<li><a href="http://searchengineland.com/070417-213813.php" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/070417-213813.php?referer=');">Search Engine Land: URL Removal Options</a></li>
<li><a href="http://www.robotstxt.org/" onclick="pageTracker._trackPageview('/outgoing/www.robotstxt.org/?referer=');">robotstxt.org</a></li>
<li><a href="http://en.wikipedia.org/wiki/Robots.txt" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Robots.txt?referer=');">Wikipedia: Robots Exclusion Standard</a></li>
</ul>
</li>
</ul>
<h3>Revision History</h3>
<ul>
<li>02/12/2009 &#8211; Google, Yahoo and Microsoft make a joint announcement of the rel=&#8217;Canonical&#8217; tag to make it easier for publishers to specify the canonical URLs.</li>
<li>06/04/2009 &#8211; Added NOPREVIEW tag announced this week by Microsoft. Used to disable the &#8216;hover preview&#8217; feature on their SERP.</li>
<li>08/19/10 &#8211; Changed Live Search to Bing through and MSNbot to Bingbot. Also removed references to Bing Webmaster Center robots.txt validator, <a href="http://searchengineland.com/all-new-microsoft-bing-webmaster-tools-46827" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/all-new-microsoft-bing-webmaster-tools-46827?referer=');">as it no longer exists</a>.</li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/managing-robots-access-to-your-website-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>August 19th 2010 Office Hours</title>
		<link>http://www.ninebyblue.com/office-hours/august-19th-2010-office-hours/</link>
		<comments>http://www.ninebyblue.com/office-hours/august-19th-2010-office-hours/#comments</comments>
		<pubDate>Thu, 19 Aug 2010 20:38:28 +0000</pubDate>
		<dc:creator>Heather</dc:creator>
				<category><![CDATA[Office Hours]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1345</guid>
		<description><![CDATA[Vanessa talks about the Google algo change and how it affects brand names in search.
Then, Vanessa takes lots of questions from the chat room on TLDs, blocking bad links and if &#8220;&#62;&#62;&#62;&#62;&#8221; in your title tags hurts your rankings.
Articles Mentioned:
Google Treating Brand Names in Search Terms
Google Search Results Dominated by One Domain
Could this change to [...]]]></description>
			<content:encoded><![CDATA[<p>Vanessa talks about the Google algo change and how it affects brand names in search.</p>
<p>Then, Vanessa takes lots of questions from the chat room on TLDs, blocking bad links and if &#8220;&gt;&gt;&gt;&gt;&#8221; in your title tags hurts your rankings.</p>
<p>Articles Mentioned:<br />
<a href="http://www.malcolmcoles.co.uk/blog/google-treating-brand-names-in-search-terms-as-site-searches/" onclick="pageTracker._trackPageview('/outgoing/www.malcolmcoles.co.uk/blog/google-treating-brand-names-in-search-terms-as-site-searches/?referer=');">Google Treating Brand Names in Search Terms</a><br />
<a href="http://searchengineland.com/google-search-results-dominated-by-one-domain-49025" onclick="pageTracker._trackPageview('/outgoing/searchengineland.com/google-search-results-dominated-by-one-domain-49025?referer=');">Google Search Results Dominated by One Domain</a><br />
<a href="http://www.coastdigital.co.uk/blog/2010/08/19/google-algorithm/" onclick="pageTracker._trackPageview('/outgoing/www.coastdigital.co.uk/blog/2010/08/19/google-algorithm/?referer=');">Could this change to the Google algorithm give big brands a leg-up, or does Google have another agenda?</a></p>
<p>Listen to the entire episode on Webmaster Radio.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/office-hours/august-19th-2010-office-hours/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Jane and Robot Is Down For Maintenance</title>
		<link>http://www.ninebyblue.com/blog/jane-and-robot-is-down-for-maintenance/</link>
		<comments>http://www.ninebyblue.com/blog/jane-and-robot-is-down-for-maintenance/#comments</comments>
		<pubDate>Thu, 19 Aug 2010 17:13:56 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1342</guid>
		<description><![CDATA[Some awesome and exciting things are happening around here, and as part of that, janeandrobot.com has  been down for maintenance. I thought that would be quick, but it&#8217;s looking like it might be another month. (A whole month!) I&#8217;ve had a lot of people asking how to access the articles since it&#8217;s been down, so [...]]]></description>
			<content:encoded><![CDATA[<p>Some awesome and exciting things are happening around here, and as part of that, janeandrobot.com has  been down for maintenance. I thought that would be quick, but it&#8217;s looking like it might be another month. (A whole month!) I&#8217;ve had a lot of people asking how to access the articles since it&#8217;s been down, so I&#8217;m going to post them all here, and put in some temporary redirects from the old URLs to these ones. That last part might be a challenge, as I&#8217;m going to dive into if you can even implement redirects when using a WordPress plugin that has the site down for maintenance, particularly on a Windows-based GoDaddy server. Stay tuned!</p>
<p>(Will be an interesting study in indexing in search too!)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/jane-and-robot-is-down-for-maintenance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Knowledge is a mashup</title>
		<link>http://www.ninebyblue.com/collected-writings/guest-blogging/knowledge-is-a-mashup/</link>
		<comments>http://www.ninebyblue.com/collected-writings/guest-blogging/knowledge-is-a-mashup/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 23:19:59 +0000</pubDate>
		<dc:creator>Heather</dc:creator>
				<category><![CDATA[Guest Blogging]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1338</guid>
		<description><![CDATA[hese days, we hear a lot about open data, open government, and Gov 2.0. President Obama&#8217;s  Open Government directive has given us access to huge data sets through avenues such as data.gov. But we have a lot more assets as a country than just digital 0s and 1s in CSV files. We also have [...]]]></description>
			<content:encoded><![CDATA[<p>hese days, we hear a lot about open data, open government, and <a href="http://radar.oreilly.com/gov2/" onclick="pageTracker._trackPageview('/outgoing/radar.oreilly.com/gov2/?referer=');">Gov 2.0</a>. President Obama&#8217;s <a href="http://www.whitehouse.gov/the_press_office/TransparencyandOpenGovernment/" onclick="pageTracker._trackPageview('/outgoing/www.whitehouse.gov/the_press_office/TransparencyandOpenGovernment/?referer=');"> Open Government</a> directive has given us access to huge data sets through avenues such as <a href="http://www.data.gov/" onclick="pageTracker._trackPageview('/outgoing/www.data.gov/?referer=');">data.gov</a>. But we have a lot more assets as a country than just digital 0s and 1s in CSV files. We also have artifacts and science and history and experts. Can open government apply to those assets as well?</p>
<p><a href="http://radar.oreilly.com/2010/08/the-smithsonian-commons-projec.html" onclick="pageTracker._trackPageview('/outgoing/radar.oreilly.com/2010/08/the-smithsonian-commons-projec.html?referer=');">Read more at Oreilly</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/collected-writings/guest-blogging/knowledge-is-a-mashup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>August 5, 2010 Office Hours</title>
		<link>http://www.ninebyblue.com/office-hours/august-5-2010-office-hours/</link>
		<comments>http://www.ninebyblue.com/office-hours/august-5-2010-office-hours/#comments</comments>
		<pubDate>Thu, 05 Aug 2010 20:33:54 +0000</pubDate>
		<dc:creator>Heather</dc:creator>
				<category><![CDATA[Office Hours]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1335</guid>
		<description><![CDATA[Vanessa is super stoked because she got a tweet from the bridge here in Seattle. This launches her into a lecture about how important social media is and how quickly it can raise awareness. In this case, the government owns the twitter account, and so there is no monetary value in tweeting, but instead they [...]]]></description>
			<content:encoded><![CDATA[<p>Vanessa is super stoked because she got a tweet from the bridge here in Seattle. This launches her into a lecture about how important social media is and how quickly it can raise awareness. In this case, the government owns the twitter account, and so there is no monetary value in tweeting, but instead they are taking advantage of technology to better serve the communities.  </p>
<p>Vanessa takes questions about language links, TLD&#8217;s and geographic restrictions, crawl errors, and touches again on the rollout of Bing results in Yahoo searches.</p>
<p><a href="http://www2.webmasterradio.fm/office-hours/2010/bing-yahoo-organic-and-paid-search-transition/" onclick="pageTracker._trackPageview('/outgoing/www2.webmasterradio.fm/office-hours/2010/bing-yahoo-organic-and-paid-search-transition/?referer=');">Listen to the entire episode on WebmasterRadio.fm</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/office-hours/august-5-2010-office-hours/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Do We Have a Cognitive Surplus?</title>
		<link>http://www.ninebyblue.com/blog/social-media/do-we-have-a-cognitive-surplus/</link>
		<comments>http://www.ninebyblue.com/blog/social-media/do-we-have-a-cognitive-surplus/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 04:34:42 +0000</pubDate>
		<dc:creator>Vanessa</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Social Media]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.ninebyblue.com/?p=1309</guid>
		<description><![CDATA[I&#8217;ve been thinking a lot lately about how to manage my time and all the information that comes at me every day. I know a lot of you do too. Many of us run our own companies, are working on cool projects that absorb all of our attention, and are constantly trying to find balance.
In [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking a lot lately about how to manage my time and <a href="http://www.ninebyblue.com/blog/managing-information-overload/">all the information that comes at me every day</a>. I know a lot of you do too. Many of us run our own companies, are working on cool projects that absorb all of our attention, and are constantly trying to find balance.</p>
<p>In that light, then, the premise of Clay Shirky&#8217;s new book <a href="http://www.amazon.com/gp/product/1594202532?ie=UTF8&amp;tag=nibybl-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=1594202532" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/1594202532?ie=UTF8_amp_tag=nibybl-20_amp_linkCode=as2_amp_camp=1789_amp_creative=9325_amp_creativeASIN=1594202532&amp;referer=');">Cognitive Surplus: Creativity and Generosity in a Connected Age</a><img style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=nibybl-20&amp;l=as2&amp;o=1&amp;a=1594202532" border="0" alt="" width="1" height="1" /> seems a bit out of left field. The idea is that we have so much free time we just don&#8217;t know what to do with ourselves, so in leiu of any better ideas, we watch a lot of TV. And if watched even slightly less TV, we&#8217;d have time to do things that actually mattered. Like edit Wikipedia. Or create lolcats. Or at least, that&#8217;s the premise on the face of it, which for me made the book difficult to read. Because I don&#8217;t watch a lot of TV. Nor does anyone I know. And anyway, what&#8217;s the  difference  between relaxing and recharging by watching a bit of TV vs. reading a book? Or enjoying the sunset. Or taking a nap.</p>
<p>More on all of that in a bit, but first, here are some thoughts I did get from the book that weren&#8217;t necessarily related to the implied premise,  but that I found way more interesting.</p>
<p><strong>The rise of &#8220;citizen journalism&#8221;</strong></p>
<p>Shirky points to many examples where the ability of regular citizens to become reporters of the world around them has led to amazing things. And it&#8217;s true. <a href="http://mashable.com/2009/06/14/new-media-iran/" onclick="pageTracker._trackPageview('/outgoing/mashable.com/2009/06/14/new-media-iran/?referer=');">Iranians can tweet about the elections</a> to let the world know what&#8217;s happening there. The <a href="http://sudanvotemonitor.com/" onclick="pageTracker._trackPageview('/outgoing/sudanvotemonitor.com/?referer=');">Sudanese can text incident information</a> to help organizations map out needs. These uses of technology are awesome, but I don&#8217;t know that they&#8217;re the result of a cognitive surplus. They didn&#8217;t come about because the Iranians and the Sudanese were watching too much television and found new uses of their time by way of technology. They came about because people had a new mechanism to capture and broadcast what was happening in their lives. Anne Frank didn&#8217;t have Twitter, so she used pen and paper.</p>
<p>The surplus here isn&#8217;t the time we spend watching TV. It&#8217;s increased access to technology. Shirky notes that &#8220;the chance that anyone with a camera will come across an event of global significance is simply the number of witnesses of the event times the percentage of them that have cameras.&#8221;</p>
<p><strong>So much content: what to consume?</strong></p>
<p>This idea of citizen journalism isn&#8217;t universally embraced. I was at an event a few weeks ago and listened in on a conversation about how blog content isn&#8217;t vetted and can&#8217;t really be relied upon in the same way that traditional journalism can. Shirky does address this, quoting what the novelist Harvey Swados said in 1951 of the advent of paperbacks:</p>
<blockquote><p>&#8220;Whether this revolution in the reading habits of the American public means that we are being inundated by a flood of trash which will debase farther the popular taste, or that we shall now have available cheap editions of an ever-increasing list of classics, is a question of basic importance to our social and cultural development.&#8221;</p></blockquote>
<p>Shirky notes we didn&#8217;t have to choose. We could have both. As it stands today with what&#8217;s available to us on the internet, be it vetted material from professionals, or ad-hoc creations from amateurs. In either case (and it&#8217;s really more of abroad spectrum than either/or), the same as with books or TV or any other type of information, it&#8217;s up to us to be careful consumers. Clay Johnson says we need to <a href="http://infovegan.com/2010/06/30/dealing-with-information-overload" onclick="pageTracker._trackPageview('/outgoing/infovegan.com/2010/06/30/dealing-with-information-overload?referer=');">consciously consume</a>. He asserts that our <a href="http://infovegan.com/2010/07/20/selectivity-vs-critical-thinking" onclick="pageTracker._trackPageview('/outgoing/infovegan.com/2010/07/20/selectivity-vs-critical-thinking?referer=');">abundance isn&#8217;t with time, but with information</a>. I know that&#8217;s certainly my situation. Time is the most precious possession I have, and I never seem to have enough of it. But information? I&#8217;ve got that in spades. It threatens to bury me alive.</p>
<p>In <a href="http://www.amazon.com/gp/product/030759243X?ie=UTF8&amp;tag=nibybl-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=030759243X" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/030759243X?ie=UTF8_amp_tag=nibybl-20_amp_linkCode=as2_amp_camp=1789_amp_creative=9325_amp_creativeASIN=030759243X&amp;referer=');">Although Of Course You End Up Becoming Yourself: A Road Trip with David Foster Wallace</a><img style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=nibybl-20&amp;l=as2&amp;o=1&amp;a=030759243X" border="0" alt="" width="1" height="1" />, David Lipsky recounts David Foster Wallace describing this back in 1996, even before we had Twitter and YouTube competing for our attention:</p>
<blockquote><p>&#8220;I received five hundred thousand discrete bits of information today, of which maybe twenty-five are important. And how am I going to sort those you, you know? &#8230;I think a lot of people feel &#8212; not overhwelmed by the amount of stuff they have to do. But overwhelmed by the number of choices they have, and by the number of discrete, different things that come at them&#8230; the number of small insistent tugs on them, from a number of different systems and directions.&#8221;</p></blockquote>
<p>As we are provided with more ways to create, we have more to sort through to consume.</p>
<p><strong>Fail a lot in order to succeed</strong></p>
<p>I first started thinking about the idea of valuing failure when reading the <a href="http://www.amazon.com/gp/product/044669889X?ie=UTF8&amp;tag=nibybl-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=044669889X" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/044669889X?ie=UTF8_amp_tag=nibybl-20_amp_linkCode=as2_amp_camp=1789_amp_creative=9325_amp_creativeASIN=044669889X&amp;referer=');">The Geography of Bliss: One Grump&#8217;s Search for the Happiest Places in the World</a><img style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=nibybl-20&amp;l=as2&amp;o=1&amp;a=044669889X" border="0" alt="" width="1" height="1" />. In it, the author Eric Weiner describes how in Iceland, practically everyone is a painter or a poet at least in part because the Iclandic culture doesn&#8217;t have the same view of success and failure as the American one does. You don&#8217;t have to be a good painter to be a painter. Just paint! When you aren&#8217;t constrained by success metrics, you feel freer to try more things. Weiner writes &#8220;If you are free to fail, you are free to try.&#8221;</p>
<p>Shirky is advocating this idea as well. The act of creation is what&#8217;s important, even if it&#8217;s bad <em>Charmed</em> fan fiction. And while I certainly think anyone who wants to write poetry should go for it, I also find the notion of failing a lot in order to succeed to be interesting. We tend to fear failure.  Shirky describes how failure helps us succeed using a book metaphor: &#8220;If there was an easy formula for writing something that would become prized for decades or centuries, we wouldn&#8217;t need experimentation, but there isn&#8217;t, so we do.&#8221;</p>
<p><strong>User-generated content: are we giving something up for free or getting something for free?</strong></p>
<p>Shirky writes about services like YouTube and Flickr, &#8220;it can seem unfair for amateurs to be contributing their work for free to people who are making money from aggregating and sharing that work.&#8221; He notes Nicholas Carr&#8217;s use of the term &#8220;<a href="http://www.roughtype.com/archives/2006/12/sharecropping_t.php" onclick="pageTracker._trackPageview('/outgoing/www.roughtype.com/archives/2006/12/sharecropping_t.php?referer=');">digital sharecropping</a>&#8221; to describe how content creators are being potentially ripped off. But are they?  Shirky concludes that (amateur) content creators don&#8217;t mind because they are creating for love and not for money.</p>
<p>I dunno. I think that at least in some cases content creators don&#8217;t mind because they don&#8217;t look at it as &#8220;digital sharecropping&#8221; &#8212; giving away their labor to others who profit. They look at it as a fair exchange of services. The content creators get a place to host their work, the tools to share it with others, and wide visibility &#8212; for free! This is something was difficult, if not impossible, before the web, and something that we tended to pay fairly hefty prices for in the early days of the web. And this (mostly free) opportunity is what makes much of what Shirky celebrates in his book possible.</p>
<p><strong>Why we share</strong></p>
<p>Shirky references a 2006 NYU paper called &#8220;<a href="http://www.google.com/search?q=Commons-Based+Peer+Production+and+Virtue" onclick="pageTracker._trackPageview('/outgoing/www.google.com/search?q=Commons-Based+Peer+Production+and+Virtue&amp;referer=');">Commons-Based Peer Production and Virtue</a>&#8221; that describes what motivates us to voluntarily contribute to groups. In addition to personal motivations such as autonomy and competence, the paper describes social motivations around connectedness and sharing/generosity.<a href="http://developer.yahoo.com/ypatterns/social/people/reputation/" onclick="pageTracker._trackPageview('/outgoing/developer.yahoo.com/ypatterns/social/people/reputation/?referer=');"> Yahoo&#8217;s recently released reputation model</a> addresses the personal motivations, but not the social ones. And the social ones can certainly be motivating. Shirky calls this, in part, &#8220;go[ing]  public to find people who think like you.&#8221; He says to ask of users:</p>
<ul>
<li>Are their desires for autonomy or competence being rewarded?</li>
<li>Are their desires to feel connected or generous being rewarded?</li>
</ul>
<p>He asks these questions to answer the question of why people would share, create, and build  communities, but I think they are also create questions to ask when building a new community and attempting to encourage user participation.</p>
<p><strong>We don&#8217;t want things for the sake of those things; we want what those things provide</strong></p>
<p>I think this is an important idea for anyone making any content available, building any product, appealing to any audience. Shirky brings this up to explain why older people would adopt email. It&#8217;s not that they wanted to try out the latest technology. They wanted what all of us want: to communicate with others. He writes &#8220;no one wants e-mail for itself, any more than anyone wants electricity for itself; rather, we want the things that electricity enables.&#8221;</p>
<p>But this notion goes well beyond his point. No one cares about your features or that you&#8217;ve worked really hard on your product or about all the data you&#8217;ve just made available as an XML file. They care about solving their problems, doing things that make them happy, making their lives better. Focus on how you can help your audience do those things and you&#8217;ve got their attention. (I <a href="http://www.chadblenkin.ca/the-influencer-project-recap" onclick="pageTracker._trackPageview('/outgoing/www.chadblenkin.ca/the-influencer-project-recap?referer=');">talked about this during my 60 seconds</a> as part of the <a href="http://influencerproject.com/" onclick="pageTracker._trackPageview('/outgoing/influencerproject.com/?referer=');">Influencer Project</a>.)</p>
<p><strong>The value of combinability</strong></p>
<p>Shirky writes &#8220;if you have a stick, and someone gives you another one, you have two sticks. If you have a piece of knowledge &#8212; that rubbing two sticks together in a certain way can make fire &#8212; you can do something of value you couldn&#8217;t do before.&#8221; And here too is another new surplus the culture of the web gives us. By sharing knowledge, tools, failures, successes, ideas, we can better combine them for sums much greater than the parts. He notes that the  community size has to be big enough, sharing has to be easy, there should be a common format or way of understanding the information, and then, there&#8217;s the last component, the one that technology can&#8217;t solve &#8212; people. Can we work well together? Do we understand each other, trust each other, want others to make what we do better?</p>
<p><strong>Build rules as you need them</strong></p>
<p>Don&#8217;t spend time creating a solution to a problem until you have a problem. I think this holds true of online communities, ways of iterating online products, and even building startups. When I started my company a couple of years ago, I didn&#8217;t set up any processes at all. I&#8217;m building them out now as I find I need them, based on experience of what&#8217;s been working and not. If I had set everything up in advance, I&#8217;d still be spending just as much time now adjusting it.</p>
<p><strong>What about TV? </strong></p>
<p>I think that if Shirky had relied less on the idea of using TV time for more productive things, the book would have been stronger. I clearly found much of what he wrote about interesting, but I got distracted every time he&#8217;d bring the point back to how dang much we watch television.</p>
<p>Shirky and I really aren&#8217;t so far apart on how we think about human behavior. He writes that &#8220;human motivations change little over the years, but the opportunity can change a little or a lot, depending on the social environment.&#8221; But then we diverge: &#8220;the raw material of this change is the free time available to us.&#8221; In truth, the stats point at <a href="http://www.hollywoodreporter.com/hr/content_display/television/news/e3i37b1b301206de33f413193a1d8abbb41" onclick="pageTracker._trackPageview('/outgoing/www.hollywoodreporter.com/hr/content_display/television/news/e3i37b1b301206de33f413193a1d8abbb41?referer=');">televison viewing at an all time high</a> over the same period that Shirky notes the explosion of creation and sharing online. We aren&#8217;t watching less TV in order to upload cute videos of our cat to YouTube. We&#8217;re doing both.</p>
<p><strong>Do we really watch that much TV a day?</strong></p>
<p>This was the first point that distracted me. I started wondering what those stats really mean. Most people I know who do watch TV tend to do it while they are getting ready for work in the morning, and eating breakfast, and writing their college essays. How much of that time is really spent solely in front of the TV? Because you can&#8217;t really make a lolcat in leiu of watching TV while you&#8217;re ironing your clothes.</p>
<p>David Foster Wallace talked about our excessive TV watching way back in 1990 in his essay &#8220;<a href="http://www.goodreads.com/book/show/2577573._E_Unibus_Pluram" onclick="pageTracker._trackPageview('/outgoing/www.goodreads.com/book/show/2577573._E_Unibus_Pluram?referer=');">E Unibus Pluram: Television and US Fiction</a>&#8220;. In that essay, he describes a 1985 book called <a href="http://www.amazon.com/gp/product/0393311589?ie=UTF8&amp;tag=nibybl-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0393311589" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0393311589?ie=UTF8_amp_tag=nibybl-20_amp_linkCode=as2_amp_camp=1789_amp_creative=9325_amp_creativeASIN=0393311589&amp;referer=');">Life After Television: The Coming Transformation of Media and American Life</a><img style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=nibybl-20&amp;l=as2&amp;o=1&amp;a=0393311589" border="0" alt="" width="1" height="1" />. This book paints a picture of a future world where TVs will not just feed what the broadcaster wants passively, but will be an &#8220;interactive net&#8221; of everyone&#8217;s TVs and we&#8217;ll go from &#8220;passive dependence&#8221; to everyone being &#8220;their own harried guy with earphones and clipboard&#8221;. The author, George Gilder writes, &#8220;we will, in short, be able to engineer our own dreams.&#8221;</p>
<p>The book&#8217;s portrait of how we would do that are different than what&#8217;s come to be, but the general idea isn&#8217;t so far off.</p>
<p><strong>Is community engagement and creation really better than and a reasonable alternative to TV?</strong></p>
<p>Shirky asserts that creation &#8212; any creation &#8212; is better than mere consumption. But is that true? Is creating a lolcat and sharing it really better than relaxing to an episode of <em>30 Rock</em>? And what about the percentage of those hours we spend watching the news (or possibly <em>The Daily Show</em>) to learn about the world? I know that in my case, I watch TV when my brain is unable to do anything else. I&#8217;ve been working for 16 hours, I can&#8217;t even process words in books very well, and I need to distract my brain so that I can get some sleep. In those instances, I find TV useful in ways that editing Wikipedia couldn&#8217;t be.</p>
<p>Shirky notes, &#8220;the stupidest possible creative act is still a creative act.&#8221; Implying that a creative act always trumps acts of other kinds, I suppose. Explaining why it&#8217;s better to play World of Warcraft (acknowledging that some may think of this as &#8220;grown men and women sitting in their basements pretending to be elves&#8221;) than watch TV, he says &#8220;at least they&#8217;re doing something&#8230; however pathetic it is to sit in your basement pretending to be an elf, I can tell you from personal experience: it&#8217;s worse to sit in your basement trying to decide whether Ginger or Mary Ann is cuter.&#8221;</p>
<p>Maybe for Shirky it is. Not that I&#8217;m a TV apologist, but one could say the same of reading: it&#8217;s a solitary activity (generally more so than TV), you aren&#8217;t creating anything or doing anything as you read. Or as you sit on a bench and watch the water. As I wrote at the beginning, it&#8217;s the insistence in the book to always bring everything back to the time we waste on TV that I find fault with. I&#8217;m not at all saying that creating and sharing and being social are bad things.</p>
<p>And certainly too much TV is probably not great. Going back to Wallace again, who rather famously had a love/hate relationship with TV, likened television to candy.</p>
<blockquote><p>&#8220;What if you ate it all the time? Real pleasurable, but it dudn&#8217;t have any calories in it. There&#8217;s something really vital about food that candy&#8217;s missing&#8230; There&#8217;s nothing sinister, the thing that&#8217;s sinister about it is the pleasure that it gives you to make up for what it&#8217;s missing is a kind of&#8230; addictive, self-consuming pleasure.&#8221;</p></blockquote>
<p>And at least in part, he agreed with what Shirky would later focus on in this book, as well perhaps agree with me:</p>
<blockquote><p>&#8220;It gives you a certain kind of pleasure that I would argue is fairly passive. There&#8217;s not a whole lot of thought involved, the thought is often fantasy like, &#8216;I am this guy, I&#8217;m having this adventure.&#8221; And it&#8217;s a way to take a vacation from myself for a while. And that&#8217;s fine &#8212; I think sort of the same way candy is fine.&#8221;</p></blockquote>
<p>And perhaps Wallace also would agree with Clay Johnson&#8217;s assertion that our problems with information overload are around what and how we choose to consume. Wallace noted that his book <a href="http://www.amazon.com/gp/product/0316066524?ie=UTF8&amp;tag=nibybl-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0316066524" onclick="pageTracker._trackPageview('/outgoing/www.amazon.com/gp/product/0316066524?ie=UTF8_amp_tag=nibybl-20_amp_linkCode=as2_amp_camp=1789_amp_creative=9325_amp_creativeASIN=0316066524&amp;referer=');">Infinite Jest</a><img style="border: none !important; margin: 0px !important;" src="http://www.assoc-amazon.com/e/ir?t=nibybl-20&amp;l=as2&amp;o=1&amp;a=0316066524" border="0" alt="" width="1" height="1" /> wasn&#8217;t an indictment of entertainment, but was about our relationship to it.</p>
<blockquote><p>&#8220;Why am I getting 75 percent of my calories from candy? I mean that&#8217;s something that a little tiny child would do, and that would be all right. But we&#8217;re postpubescent, right? Somewhere along the line, we&#8217;re supposed to have grown up.&#8221;</p></blockquote>
<p>Shirky also maintains that we are shifting from strictly consumption around TV to &#8220;opportunities to comment on the material, share it with friends&#8230; and discuss it with other viewers&#8221;. I&#8217;d argue that we&#8217;ve always done that, we simply didn&#8217;t do it so publicly and we did it with our friends and coworkers rather than strangers around the world. Sure, it&#8217;s easier to share fanfiction now than it was on the 70s when we had to mimeograph &#8217;zines and send them through the mail, but is Shirky really saying fanfiction is how we should spend our supposed &#8220;cognitive surplus&#8221;? (Particularly since writing fanfiction about TV shows (and commenting on them, labeling them, and so forth), at least, has a prerequisite of watching the shows in question on TV.)</p>
<p>Those who want to create and share and be communal are and always have been. Those who want to watch TV will. And many of us will do both.</p>
<p>Early in the book, Shirky writes, &#8220;this book is about the novel resource that has appeared as the world&#8217;s cumulative free time is addressed in aggregate.&#8221; But once you forget about the free time and TV aspects of the book and focus on the rest, it seems that what&#8217;s he&#8217;s really saying is that our human tendencies to create and share that we&#8217;ve always felt regardless of the free time we have available can now be done globally and at scale, and there&#8217;s real value to be harnessed from that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ninebyblue.com/blog/social-media/do-we-have-a-cognitive-surplus/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
	</channel>
</rss>
