Les nouveautés et Tutoriels de Votre Codeur | SEO | Création de site web | Création de logiciel

seo how to create robots.txt file for website 2013

Seo Master present to you:

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.

Robots.txt file tells search engines which directories to crawl and which not to. You can use it to block crawlers from looking at your image directory if you don't want your images showing up on google search. Be careful not to use this to try and block people from directories you want to keep secret. Anyone can view you robots.txt file. Make sure you password protect directories that need to be secured.


The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://mydomain.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.


Creating the robots.txt file

Robots.txt should be put in the top-level directory of your web server.

Take the following robots.txt file for example:


1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
With the above declared, all robots (indicated by "*") are instructed to not index any of your pages (indicated by "/"). Most likely not what you want, but you get the idea.

2) Lets get a little more discriminatory now. While every webmaster loves Google, you may not want Google's Image bot crawling your site's images and making them searchable online, if just to save bandwidth. The below declaration will do the trick:
User-agent: Googlebot-Image
Disallow: /

3) The following disallows all search engines and robots from crawling select directories and pages:
User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.htm

4) You can conditionally target multiple robots in "robots.txt." Take a look at the below:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /cgi-bin/
Disallow: /privatedir/
This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.

5) There is a way to use Disallow: to essentially turn it into "Allow all", and that is by not entering a value after the semicolon(:):
User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow:
Here I'm saying all crawlers should be prohibited from crawling our site, except for Alexa, which is allowed.

6) Finally, some crawlers now support an additional field called "Allow:", most notably, Google. As its name implies, "Allow:" lets you explicitly dictate what files/folders can be crawled. However, this field is currently not part of the "robots.txt" protocol, so my recommendation is to use it only if absolutely needed, as it might confuse some less intelligent crawlers.

Per Google's FAQs for webmasters, the below is the preferred way to disallow all crawlers from your site EXCEPT Google:
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
2013, By: Seo Master

seo Robots.txt 2013

Seo Master present to you:

Meta Robots Tags, About Robots.txt and Search Indexing Robots
EntryMeaning
User-agent: *
Disallow:
Because nothing is disallowed, everything is allowed for every robot.
User-agent: mybot
Disallow: /
mybot robot may not index anything, because the root path (/) is disallowed.
User-agent: *
Allow: /
For all user agents, allow.
User-agent: BadBotAllow: /About/robot-policy.htmlDisallow: /
The BadBot robot can see the robot policy document, but nothing else.All other user-agents are by default allowed to see everything.This only protects a site if "BadBot" follows the directives in robots.txt
User-agent: *Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private
In this example, all robots can visit the whole site, with the exception of the two directories mentioned and any path that starts with private at the host root directory, including items in privatedir/mystuff and the file privateer.html
User-agent: BadBot
Disallow: /

User-agent: *
Disallow: /*/private/*
The blank line indicates a new "record" - a new user agent command. All other robots can see everything except any subdirectory named "private" (using the wildcard character)
User-agent: WeirdBotDisallow: /links/listing.htmlDisallow: /tmp/
Disallow: /private/

User-agent: *
Allow: /
Disallow: /temp*
Alllow: *temperature*

Disallow: /private/
This keeps the WeirdBot from visiting the listing page in the links directory, the tmp directory and the private directory.
Allother robots can see everything except the temp directories or files,but should crawl files and directories named "temperature", and shouldnot crawl private directories. Note that the robots will use thelongest matching string, so temps and temporary will match the Disallow, while temperatures will match the Allow.
Bad Examples - Common Wrong Entries
use one of the robots.txt checkers to see if your file is malformed
User-agent: googlebot
Disallow /
NO! This entry is missing the colon after the disallow.
User-agent: sidewiner
Disallow: /tmp/
NO! Robots will ignore misspelled User Agent names (it should be "sidewinder"). Check your server logs for User Agent name and the listings of User Agent names.
User-agent: MSNbot
Disallow: /PRIVATE
WARNING! Many robots and webservers are case-sensitive. So this path will not match any root-level folders named private or Private.
User-agent: *
Disallow: /tmp/
User-agent: Weirdbot
Disallow: /links/listing.html
Disallow: /tmp/
Robots generally read from top to bottom and stop when they reach something that applies to them. So Weirdbot would probably stop at the first record, *.
Ifthere's a specific User Agent, robots don't check the * (all useragents) block, so any general directives should be repeated in thespecial blocks.
2013, By: Seo Master
Powered by Blogger.