Seo

Seo - All The Information You Need On Seo

Search Engine Spiders Lost Without Guidance - Post This Sign!


Seo

The robots.txt file is an exclusion standard required by allweb crawlers/robots to tell them what files and directoriesthat you want them to stay OUT of on your site. Not allcrawlers/bots follow the exclusion standard and will continuecrawling your site anyway. I like to call them "Bad Bots" ortrespassers. We block them by IP exclusion which is anotherstory entirely.

This is a very simple overview of robots.txt basics forwebmasters. For a complete and thorough lesson, visithttp://www.robotstxt.org/

To see the proper format for a somewhat standard robots.txtfile look directly below. That file should be at the root ofthe domain because that is where the crawlers expect it to be,not in some secondary directory.

Below is the proper format for a robots.txt file ----->

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /group/

User-agent: msnbot
Crawl-delay: 10

User-agent: Teoma
Crawl-delay: 10

User-agent: Slurp
Crawl-delay: 10

User-agent: aipbot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: psbot
Disallow: /

--------> End of robots.txt file

This tiny text file is saved as a plain text document andALWAYS with the name "robots.txt" in the root of your domain.

A quick review of the listed information from the robots.txtfile above follows. The "User Agent: MSNbot" is from MSN,Slurp is from Yahoo and Teoma is from AskJeeves. The otherslisted are "Bad" bots that crawl very fast and to nobody'sbenefit but their own, so we ask them to stay out entirely.The * asterisk is a wild card that means "All"crawlers/spiders/bots should stay out of that group of filesor directories listed.

The bots given the instruction "Disallow: /" means they shouldstay out entirely and those with "Crawl-delay: 10" are thosethat crawled our site too quickly and caused it to bog downand overuse the server resources. Google crawls more slowlythan the others and doesn't require that instruction, so isnot specifically listed in the above robots.txt file.Crawl-delay instruction is only needed on very large siteswith hundreds or thousands of pages. The wildcard asterisk *applies to all crawlers, bots and spiders, includingGooglebot.

Those we provided that "Crawl-delay: 10" instruction to wererequesting as many as 7 pages every second and so we askedthem to slow down. The number you see is seconds and you canchange it to suit your server capacity, based on theircrawling rate. Ten seconds between page requests is far moreleisurely and stops them from asking for more pages than yourserver can dish up.

(You can discover how fast robots and spiders are crawling bylooking at your raw server logs - which show pages requestedby precise times to within a hundredth of a second - availablefrom your web host or ask your web or IT person. Your serverlogs can be found in the root directory if you have serveraccess, you can usually download compressed server log filesby calendar day right off your server. You'll need a utilitythat can expand compressed files to open and read those plaintext raw server log files.)

To see the contents of any robots.txt file just typerobots.txt after any domain name. If they have that file up,you will see it displayed as a text file in your web browser.Click on the link below to see that file for Amazon.com

http://www.Amazon.com/robots.txt

You can see the contents of any website robots.txt file thatway.

The robots.txt shown above is what we currently use atPublish101 Web Content Distributor, just launched in May of2005. We did an extensive case study and published a series ofarticles on crawler behavior and indexing delays known as theGoogle Sandbox. That Google Sandbox Case Study is highlyinstructive on many levels for webmasters everywhere about theimportance of this often ignored little text file.

One thing we didn't expect to glean from the research involvedin indexing delays (known as the Google Sandbox) was theimportance of robots.txt files to quick and efficient crawlingby the spiders from the major search engines and the number ofheavy crawls from bots that will do no earthly good to thesite owner, yet crawl most sites extensively and heavily,straining servers to the breaking point with requests forpages coming as fast as 7 pages per second.

We discovered in our launch of the new site that Google andYahoo will crawl the site whether or not you use a robots.txtfile, but MSN seems to REQUIRE it before they will begincrawling at all. All of the search engine robots seem torequest the file on a regular basis to verify that it hasn'tchanged.

Then when you DO change it, they will stop crawling for briefperiods and repeatedly ask for that robots.txt file duringthat time without crawling any additional pages. (Perhaps theyhad a list of pages to visit that included the directory orfiles you have instructed them to stay out of and must nowadjust their crawling schedule to eliminate those files fromtheir list.)

Most webmasters instruct the bots to stay out of "image"directories and the "cgi-bin" directory as well as anydirectories containing private or proprietary files intendedonly for users of an intranet or password protected sectionsof your site. Clearly, you should direct the bots to stay outof any private areas that you don't want indexed by the searchengines.

The importance of robots.txt is rarely discussed by averagewebmasters and I've even had some of my client business'webmasters ask me what it is and how to implement it when Itell them how important it is to both site security andefficient crawling by the search engines. This should bestandard knowledge by webmasters at substantial companies, butthis illustrates how little attention is paid to use ofrobots.txt.

The search engine spiders really do want your guidance andthis tiny text file is the best way to provide crawlers andbots a clear signpost to warn off trespassers and protectprivate property - and to warmly welcome invited guests, suchas the big three search engines while asking them nicely tostay out of private areas.

Copyright © August 17, 2005 by Mike Banks Valentine

Google Sandbox Case Study http://publish101.com/Sandbox2Mike Banks Valentine operates http://Publish101.comFree Web Content Distribution for Article Marketers andProvides content aggregation, press release optimizationand custom web content for Search Engine Positioninghttp://www.seoptimism.com/SEO_Contact.htm







Pools Cosmetic Surgery San Diego   |   Car Insurance Rates   |   Dental Insurance   |   Health Insurance   |   Home Owner Insurance   |   Life Insurance Quote



| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 |











Seo Expert Guide - Free Site Promotion (pr) (part 6/10)
In parts 1 - 5 you learnt how to develop your proposition, identify your key words and optimize your site and pages. You were also introduced to our mythical Doug (who sells antique doors, door handles, knockers, door bells or pulls and fitting services) in Windsor in the UK.Now we turn our attention to site promotion, starting with free techniques. In marketing parlance, we are talking about PR here. My key message is that free publicity opportunities abound on the web, but that people tend to invest their time in the wrong areas!If people spent even half the time they generally spend on reciprocal link requests and invested it evenly across all the other tips and tricks I am going to tell you abou...(related: Seo)


Search Engines The Masters Of The Internet Universe ? Part 2
"This is a continuation of the pervious part, split into multiple parts for keeping the article size manageable."When we thought we are just boxed in by the same set of search engine perspectives, there it comes, "help" flapping its wide wings in the form of meta-search engines. These search engines gets results from multiple search engines and giv...(related: Seo)


Google Sitemaps Just Got Better
Having a Google Sitemap just got better! Not only does the search engine company check your sitemap, but now they give you feedback! The new feature was quickly noticed by many users of the service recently as a way for Google to alert the webmaster about possible problem pages that they have had trouble indexing.I had the wonderful experience of getting to kno...(related: Seo)


Without Conversion Rates You Don’t Know If You’re Mickey Mouse Or Mickey Mantle
I couldn’t agree more with the headline of this article and it’s one I’m afraid I can’t take credit for. I found this line in Paco Underhill’s book, Why We Buy – The Science Of Shopping, and found myself comparing many of the things he has measured in the retail world to the tests I’ve done with online, visitor-based activity. The conversion rate on a website is easy to...(related: Seo)


Search Engines 101 - Search Engines Explained
What Are Search Engines?A search engine is a database system designed to index and categorize internet addresses, otherwise known as URLs (for example, http://www.submittoday.com).There are four basic types of search engines:Automatic: These search engines are based on information that is collected, sorted and analyzed by software programs, commonly referred to as "robots", "spiders", or "crawlers". These spiders crawl through web pages collecting information which is then analyzed and categorized into an "index". When you conduct a search using one of these search engines, ...(related: Seo)


Seo #1: Choosing The Keywords To Optimize For
This is the first lesson out of 6 that teaches you the most important elements of search engine optimization. You should read one of these 6 courses EVERY day so that you can have enough time to "digest" all the information and put them to the test. Now today is DAY #1 and I will start with the first course on the list:Choosing "THE" keywords to optimize for1) The first step in search engine optimization is choosing the right keywords that you want to rank well for on the search engines. This is very easy to do nonetheless many webmasters FAIL HERE! Ok for this course I will be using a website about "jokes" as an example. First you need to use the very powerful keyword tool that Overture offers, you can ...(related: Seo)




Google




How To Use The Google Patent To Get More Traffic
According to the recent release of the Google Patent Application, many of the things you're doing to get better page rank and increase your position in natural se...(related: Seo)

10 Ways To Tackle Keyword Research And Selection
You need to be extremely careful with keyword research so that you don't miss excellent opportunities or aim so broadly that you target phrases that will never rank well. Here are 10 strategies to guide you along ...(related: Seo)

The Purpose Of Pagerank?
I am sure the first thought provoking question that popped into your head would be: what exactly is PageRank? Well PageRank can be summed up as how vital Google considers a particular webpage. Pagerank is a value from 0-10, zero being the least significant and ten of which few obtain being the most note-worthy. PageRank is often abbreviated as PR and Google determines this PR bye evaluating how many websites link to yours, even though many of these links are missed.Who says Google gets to play the role of big brother? Google may be the leader that people praise and follow, yet why do webmasters allow themselves to be caught up in this publicity stunt?...(related: Seo)

site-map - Copyright © 2006 | You can send your articles and get links. Contact Webmaster | All Rights Reserved.

free webpage

counters
| Seo