Les nouveautés et Tutoriels de Votre Codeur | SEO | Création de site web | Création de logiciel

salam every one, this is a topic from google web master centrale blog: Vanessa's been posting a lot lately, and I'm starting to feel left out. So here my tidbit of wisdom for you: I've noticed a couple of webmasters confused by "blocked by robots.txt" errors, and I wanted to share the steps I take when debugging robots.txt problems:

A handy checklist for debugging a blocked URL

Let's assume you are looking at crawl errors for your website and notice a URL restricted by robots.txt that you weren't intending to block:
http://www.example.com/amanda.html URL restricted by robots.txt Sep 3, 2006

Check the robots.txt analysis tool
The first thing you should do is go to the robots.txt analysis tool for that site. Make sure you are looking at the correct site for that URL, paying attention that you are looking at the right protocol and subdomain. (Subdomains and protocols may have their own robots.txt file, so https://www.example.com/robots.txt may be different from http://example.com/robots.txt and may be different from http://amanda.example.com/robots.txt.) Paste the blocked URL into the "Test URLs against this robots.txt file" box. If the tool reports that it is blocked, you've found your problem. If the tool reports that it's allowed, we need to investigate further.

At the top of the robots.txt analysis tool, take a look at the HTTP status code. If we are reporting anything other than a 200 (Success) or a 404 (Not found) then we may not be able to reach your robots.txt file, which stops our crawling process. (Note that you can see the last time we downloaded your robots.txt file at the top of this tool. If you make changes to your file, check this date and time to see if your changes were made after our last download.)

Check for changes in your robots.txt file
If these look fine, you may want to check and see if your robots.txt file has changed since the error occurred by checking the date to see when your robots.txt file was last modified. If it was modified after the date given for the error in the crawl errors, it might be that someone has changed the file so that the new version no longer blocks this URL.

Check for redirects of the URL
If you can be certain that this URL isn't blocked, check to see if the URL redirects to another page. When Googlebot fetches a URL, it checks the robots.txt file to make sure it is allowed to access the URL. If the robots.txt file allows access to the URL, but the URL returns a redirect, Googlebot checks the robots.txt file again to see if the destination URL is accessible. If at any point Googlebot is redirected to a blocked URL, it reports that it could not get the content of the original URL because it was blocked by robots.txt.

Sometimes this behavior is easy to spot because a particular URL always redirects to another one. But sometimes this can be tricky to figure out. For instance:
  • Your site may not have a robots.txt file at all (and therefore, allows access to all pages), but a URL on the site may redirect to a different site, which does have a robots.txt file. In this case, you may see URLs blocked by robots.txt for your site (even though you don't have a robots.txt file).
  • Your site may prompt for registration after a certain number of page views. You may have the registration page blocked by a robots.txt file. In this case, the URL itself may not redirect, but if Googlebot triggers the registration prompt when accessing the URL, it will be redirected to the blocked registration page, and the original URL will be listed in the crawl errors page as blocked by robots.txt.

Ask for help
Finally, if you still can't pinpoint the problem, you might want to post on our forum for help. Be sure to include the URL that is blocked in your message. Sometimes its easier for other people to notice oversights you may have missed.

Good luck debugging! And by the way -- unrelated to robots.txt -- make sure that you don't have "noindex" meta tags at the top of your web pages; those also result in Google not showing a web site in our index.this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com
salam every one, this is a topic from google web master centrale blog: Webmaster level: All

Webmaster Tools now has a new download option for exporting your data directly to a Google Spreadsheet. The download option is available for most of our data heavy features, such as Crawl errors, Search queries, and Links to your site. If you enjoy digging into the data from Webmaster Tools but don’t want to use Python scripts or the API, we’ve added new functionality just for you. Now when you click a download button from a Webmaster Tools feature like Search queries, you'll be presented with the "Select Download Format" option where you can choose to download the data as "CSV" or "Google Docs."


Choosing "CSV" initiates a download of the data in CSV format which has long been available in Webmaster Tools and can be imported into other spreadsheet tools like Excel. If you select the new “Google Docs” option then your data will be saved into a Google Spreadsheet and the newly created spreadsheet will be opened in a new browser tab.

We hope the ability to easily download your data to a Google Spreadsheet helps you to get crunching on your site's Webmaster Tools data even faster than you could before. Using only a web browser you can instantly dive right into slicing and dicing your data to create customized charts for detecting significant changes and tracking longer term trends impacting your site. If you've got questions or feedback please share it in the Webmaster Help Forum.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com
salam every one, this is a topic from google web master centrale blog: Webmaster level: all

Protecting users’ privacy is a priority for us and it’s helped drive recent changes. Helping users save time is also very important; it’s explicitly mentioned as a part of our philosophy. Today, we’re happy to announce that Google Web Search will soon be using a new proposal to reduce latency when a user of Google’s SSL-search clicks on a search result with a modern browser such as Chrome.

Starting in April, for browsers with the appropriate support, we will be using the "referrer" meta tag to automatically simplify the referring URL that is sent by the browser when visiting a page linked from an organic search result. This results in a faster time to result and more streamlined experience for the user.

What does this mean for sites that receive clicks from Google search results? You may start to see "origin" referrers—Google’s homepages (see the meta referrer specification for further detail)—as a source of organic SSL search traffic. This change will only affect the subset of SSL search referrers which already didn’t include the query terms. Non-HTTPS referrals will continue to behave as they do today. Again, the primary motivation for this change is to remove an unneeded redirect so that signed-in users reach their destination faster.

Website analytics programs can detect these organic search requests by detecting bare Google host names using SSL (like "https://www.google.co.uk/"). Webmasters will continue see the same data in Webmasters Tools—just as before, you’ll receive an aggregated list of the top search queries that drove traffic to their site.

We will continue to look into further improvements to how search query data is surfaced through Webmaster Tools. If you have questions, feedback or suggestions, please let us know through the Webmaster Tools Help Forum.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com
Powered by Blogger.