Les nouveautés et Tutoriels de Votre Codeur | SEO | Création de site web | Création de logiciel

salam every one, this is a topic from google web master centrale blog:

Web publishers often ask us how they can maximize their visibility on the web. Much of this has to do with search engine optimization -- making sure a publisher's content shows up on all the search engines.

However, there are some cases in which publishers need to communicate more information to search engines -- like the fact that they don't want certain content to appear in search results. And for that they use something called the Robots Exclusion Protocol (REP), which lets publishers control how search engines access their site: whether it's controlling the visibility of their content across their site (via robots.txt) or down to a much more granular level for individual pages (via META tags).

Since it was introduced in the early '90s, REP has become the de facto standard by which web publishers specify which parts of their site they want public and which parts they want to keep private. Today, millions of publishers use REP as an easy and efficient way to communicate with search engines. Its strength lies in its flexibility to evolve in parallel with the web, its universal implementation across major search engines and all major robots, and in the way it works for any publisher, no matter how large or small.

While REP is observed by virtually all search engines, we've never come together to detail how we each interpret different tags. Over the last couple of years, we have worked with Microsoft and Yahoo! to bring forward standards such as Sitemaps and offer additional tools for webmasters. Since the original announcement, we have, and will continue to, deliver further improvements based on what we are hearing from the community.

Today, in that same spirit of making the lives of webmasters simpler, we're releasing detailed documentation about how we implement REP. This will provide a common implementation for webmasters and make it easier for any publisher to know how their REP directives will be handled by three major search providers -- making REP more intuitive and friendly to even more publishers on the web.

So, without further ado...

Common REP Directives
The following list are all the major REP features currently implemented by Google, Microsoft, and Yahoo!. With each feature, you'll see what it does and how you should communicate it.

Each of these directives can be specified to be applicable for all crawlers or for specific crawlers by targeting them to specific user-agents, which is how any crawler identifies itself. Apart from the identification by user-agent, each of our crawlers also supports Reverse DNS based authentication to allow you to verify the identity of the crawler.

1. Robots.txt Directives
DIRECTIVE IMPACT USE CASES
Disallow Tells a crawler not to index your site -- your site's robots.txt file still needs to be crawled to find this directive, however disallowed pages will not be crawled 'No Crawl' page from a site. This directive in the default syntax prevents specific path(s) of a site from being crawled.
Allow Tells a crawler the specific pages on your site you want indexed so you can use this in combination with Disallow This is useful in particular in conjunction with Disallow clauses, where a large section of a site is disallowed except for a small section within it
$ Wildcard Support Tells a crawler to match everything from the end of a URL -- large number of directories without specifying specific pages 'No Crawl' files with specific patterns, for example, files with certain filetypes that always have a certain extension, say pdf
* Wildcard Support Tells a crawler to match a sequence of characters 'No Crawl' URLs with certain patterns, for example, disallow URLs with session ids or other extraneous parameters
Sitemaps Location Tells a crawler where it can find your Sitemaps Point to other locations where feeds exist to help crawlers find URLs on a site

2. HTML META Directives
DIRECTIVE IMPACT USE CASES
NOINDEX META Tag Tells a crawler not to index a given page Don't index the page. This allows pages that are crawled to be kept out of the index.
NOFOLLOW META Tag Tells a crawler not to follow a link to other content on a given page Prevent publicly writeable areas to be abused by spammers looking for link credit. By using NOFOLLOW you let the robot know that you are discounting all outgoing links from this page.
NOSNIPPET META Tag Tells a crawler not to display snippets in the search results for a given page Present no snippet for the page on Search Results
NOARCHIVE META Tag Tells a search engine not to show a "cached" link for a given page Do not make available to users a copy of the page from the Search Engine cache
NOODP META Tag Tells a crawler not to use a title and snippet from the Open Directory Project for a given page Do not use the ODP (Open Directory Project) title and snippet for this page


These directives are applicable for all forms of content. They can be placed in either the HTML of a page or in the HTTP header for non-HTML content, e.g., PDF, video, etc. using an X-Robots-Tag. You can read more about it here:X-Robots-Tag Post or in our series of posts about using robots and Meta Tags.

Other REP Directives
The directives listed above are used by Microsoft, Google and Yahoo!, but may not be implemented by all other search engines. In addition, the following directives are supported by Google, but are not supported by all three as are those above:

UNAVAILABLE_AFTER Meta Tag - Tells a crawler when a page should "expire", i.e., after which date it should not show up in search results.

NOIMAGEINDEX Meta Tag - Tells a crawler not to index images for a given page in search results.

NOTRANSLATE Meta Tag - Tells a crawler not to translate the content on a page into different languages for search results.


Going forward, we plan to continue to work together to ensure that as new uses of REP arise, we're able to make it as easy as possible for webmasters to use them. So stay tuned for more!

Learn more
You can find out more about robots.txt in our documentation and at Google's Webmaster help center, which contains lots of helpful information, including:We've also done several posts in our webmaster blog about robots.txt that you may find useful, such as:There is also a useful list of the bots used by the major search engines.

To see what our colleagues have to say, you can also check out the blog posts published by Yahoo! and Microsoft.this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com
salam every one, this is a topic from google web master centrale blog: Webmaster Level: All

Last year, as part of Google’s initiative to make the web faster, we introduced Page Speed, a tool that gives developers suggestions to speed up web pages. It’s usually pretty straightforward for developers and webmasters to implement these suggestions by updating their web server configuration, HTML, JavaScript, CSS and images. But we thought we could make it even easier -- ideally these optimizations should happen with minimal developer and webmaster effort.

So today, we’re introducing a module for the Apache HTTP Server called mod_pagespeed to perform many speed optimizations automatically. We’re starting with more than 15 on-the-fly optimizations that address various aspects of web performance, including optimizing caching, minimizing client-server round trips and minimizing payload size. We’ve seen mod_pagespeed reduce page load times by up to 50% (an average across a rough sample of sites we tried) -- in other words, essentially speeding up websites by about 2x, and sometimes even faster.

Comparison of the AdSense blog site with and without mod_pagespeed


Here are a few simple optimizations that are a pain to do manually, but that mod_pagespeed excels at:
  • Making changes to the pages built by the Content Management Systems (CMS) with no need to make changes to the CMS itself,
  • Recompressing an image when its HTML context changes to serve only the bytes required (typically tedious to optimize manually), and
  • Extending the cache lifetime of the logo and images of your website to a year, while still allowing you to update these at any time.
We’re working with Go Daddy to get mod_pagespeed running for many of its 8.5 million customers. Warren Adelman, President and COO of Go Daddy, says:
"Go Daddy is continually looking for ways to provide our customers the best user experience possible. That's the reason we partnered with Google on the 'Make the Web Faster' initiative. Go Daddy engineers are seeing a dramatic decrease in load times of customers' websites using mod_pagespeed and other technologies provided. We hope to provide the technology to our customers soon - not only for their benefit, but for their website visitors as well.”
We’re also working with Cotendo to integrate the core engine of mod_pagespeed as part of their Content Delivery Network (CDN) service.

mod_pagespeed integrates as a module for the Apache HTTP Server, and we’ve released it as open-source for Apache for many Linux distributions. Download mod_pagespeed for your platform and let us know what you think on the project’s mailing list. We hope to work with the hosting, developer and webmaster community to improve mod_pagespeed and make the web faster.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com
salam every one, this is a topic from google web master centrale blog: Webmaster Level: All

The Fetch as Googlebot feature in Webmaster Tools now provides a way to submit new and updated URLs to Google for indexing. After you fetch a URL as Googlebot, if the fetch is successful, you’ll now see the option to submit that URL to our index. When you submit a URL in this way Googlebot will crawl the URL, usually within a day. We’ll then consider it for inclusion in our index. Note that we don’t guarantee that every URL submitted in this way will be indexed; we’ll still use our regular processes—the same ones we use on URLs discovered in any other way—to evaluate whether a URL belongs in our index.

This new functionality may help you in several situations: if you’ve just launched a new site, or added some key new pages, you can ask Googlebot to find and crawl them immediately rather than waiting for us to discover them naturally. You can also submit URLs that are already indexed in order to refresh them, say if you’ve updated some key content for the event you’re hosting this weekend and want to make sure we see it in time. It could also help if you’ve accidentally published information that you didn’t mean to, and want to update our cached version after you’ve removed the information from your site.

How to submit a URL
First, use Diagnostics > Fetch As Googlebot to fetch the URL you want to submit to Google. If the URL is successfully fetched you’ll see a new “Submit to index” link appear next to the fetched URL.
Once you click “Submit to index” you’ll see a dialog box that allows you to choose whether you want to submit only the one URL, or that URL and all its linked pages.
When submitting individual URLs, we have a maximum limit of 50 submissions per week; when submitting URLs with all linked pages, the limit is 10 submissions per month. You can see how many submissions you have left on the Fetch as Googlebot page. Any URL submitted should point to content that would be suitable for Google Web Search, so if you're trying to submit images or videos you should use Sitemaps instead.

Submit URLs to Google without verifying
In conjunction with this update to Fetch as Googlebot, we've also updated the public "Add your URL to Google" form. It's now the Crawl URL form. It has the same quota limits for submitting pages to the index as the Fetch as Googlebot feature but doesn't require verifying ownership of the site in question, so you can submit any URLs that you want crawled and indexed.

Note that Googlebot is already pretty good about finding and crawling new content in a timely fashion, so don’t feel obligated to use this tool for every change or update on your site. But if you’ve got a URL whose crawling or indexing you want to speed up, consider submitting it using the Crawl URL form or the updated Fetch as Googlebot feature in Webmaster Tools. Feel free to comment here or visit our Webmaster Help Forum if you have more detailed questions.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com
Powered by Blogger.