Création des Logiciels de gestion d'Entreprise, Création et référencement des sites web, Réseaux et Maintenance, Conception
Création des Logiciels de gestion d'Entreprise, Création et référencement des sites web, Réseaux et Maintenance, Conception
Search engine | Submission URL | Help page |
---|---|---|
http://www.google.com/webmasters/tools/ping?sitemap= | How do I resubmit my Sitemap once it has changed? | |
Yahoo! | Does Yahoo! support Sitemaps? | |
Ask.com | http://submissions.ask.com/ping?sitemap= | Q: Does Ask.com support sitemaps? |
Live Search | http://webmaster.live.com/ping.aspx?siteMap= | Webmaster Tools (beta) |
Yandex | — | Sitemaps files |
Task | Entry |
---|---|
Indexer: ignore content; Robot: follow links | <META name="ROBOTS" content="NOINDEX"> |
Indexer: include content; Robot: do not follow links | <META name="ROBOTS" content="NOFOLLOW, INDEX "> |
Indexer: ignore content; Robot: do not follow links | <META name="ROBOTS" content="NOINDEX,NOFOLLOW"> |
Indexer: include content; Robot: follow links | <META name="ROBOTS" content="INDEX,FOLLOW"> |
Search results pages should not show "cache" link | <META name="ROBOTS" content="NOARCHIVE"> |
Search results pages should not display the Open Directory Project (ODP) title and description for the page. | <META name="ROBOTS" content="NOODP"> Danny Sullivan provides good examples of how outdated descriptions and even titles show up when the ODP content is used for search results. |
Search results pages should not display the Yahoo Directory title and description for the page | <META name="ROBOTS" content="NOYDIR"> (Yahoo Slurp robot only) |
Search results pages should not display any description or text context for this page. Title only, I guess. | <M |
Entry | Meaning |
---|---|
User-agent: * Disallow: | Because nothing is disallowed, everything is allowed for every robot. |
User-agent: mybot Disallow: / | mybot robot may not index anything, because the root path (/) is disallowed. |
User-agent: * Allow: / | For all user agents, allow. |
User-agent: BadBotAllow: /About/robot-policy.htmlDisallow: / | The BadBot robot can see the robot policy document, but nothing else.All other user-agents are by default allowed to see everything.This only protects a site if "BadBot" follows the directives in robots.txt |
User-agent: *Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /private | In this example, all robots can visit the whole site, with the exception of the two directories mentioned and any path that starts with private at the host root directory, including items in privatedir/mystuff and the file privateer.html |
User-agent: BadBot Disallow: / User-agent: * Disallow: /*/private/* | The blank line indicates a new "record" - a new user agent command. All other robots can see everything except any subdirectory named "private" (using the wildcard character) |
User-agent: WeirdBotDisallow: /links/listing.htmlDisallow: /tmp/ Disallow: /private/ User-agent: * Allow: / Disallow: /temp* Alllow: *temperature* Disallow: /private/ | This keeps the WeirdBot from visiting the listing page in the links directory, the tmp directory and the private directory. Allother robots can see everything except the temp directories or files,but should crawl files and directories named "temperature", and shouldnot crawl private directories. Note that the robots will use thelongest matching string, so temps and temporary will match the Disallow, while temperatures will match the Allow. |
Bad Examples - Common Wrong Entries | |
use one of the robots.txt checkers to see if your file is malformed | |
User-agent: googlebot Disallow / | NO! This entry is missing the colon after the disallow. |
User-agent: sidewiner Disallow: /tmp/ | NO! Robots will ignore misspelled User Agent names (it should be "sidewinder"). Check your server logs for User Agent name and the listings of User Agent names. |
User-agent: MSNbot Disallow: /PRIVATE | WARNING! Many robots and webservers are case-sensitive. So this path will not match any root-level folders named private or Private. |
User-agent: * Disallow: /tmp/ User-agent: Weirdbot Disallow: /links/listing.html Disallow: /tmp/ | Robots generally read from top to bottom and stop when they reach something that applies to them. So Weirdbot would probably stop at the first record, *. Ifthere's a specific User Agent, robots don't check the * (all useragents) block, so any general directives should be repeated in thespecial blocks. |