Les nouveautés et Tutoriels de Votre Codeur | SEO | Création de site web | Création de logiciel

from web contents: More webmaster questions - Answered! 2013

salam every one, this is a topic from google web master centrale blog:
When it comes to answering your webmaster related questions, we just can't get enough. I wanted to follow-up and answer some additional questions that webmasters asked in our latest installment of Popular Picks. In case you missed it, you can find our answers to image search ranking, sitelinks, reconsideration requests, redirects, and our communication with webmasters in this blog post.



Check out these resources for additional details on questions answered in the video:
Video Transcript:

Hi everyone, I'm Reid from the Search Quality team. Today I'd like to answer some of the unanswered questions from our latest round of popular picks.

Searchmaster had a question about duplicate content. Understandably, this is a popular concern from webmasters. You should check out the Google Webmaster Central Blogwhere my colleague Susan Moskwa recently posted "Demystifying the 'duplicate content penalty," which answers many questions and concerns about duplicate content.

Jay is the Boss wanted to know if e-commerce websites suffer if they have two or more different themes. For example, you could have a site that sells auto parts, but also sightseeing guides. In general, I'd encourage webmasters to create a website that they feel is relevant for users. If it makes sense to sell auto parts and sightseeing guides, then go for it. Those are the sites that perform well, because users want to visit those sites, and they'll link to them as well.

emma2 wanted to know if Google will follow links on a page using the "noindex" attribute in the "robots" meta tag. To answer this question, Googlebot will follow links on a page which uses the meta "noindex" tag, but that page will not appear in our search results. As a reminder, if you would like to prevent Googlebot from crawling any links on a page, use the "nofollow" attribute in the "robots" meta tag.

Aaron Pratt wanted to know about some ways a webmaster can rank well for local searches. A quick recommendation is to add your business to the Local Business Center. There, you can add contact information as well as store operating hours and coupons as well. Another example, or a tip, is to take advantage and purchase a country-specific top-level domain, or use the geotargeting feature in Webmaster Tools.

jdeb901 said it would be helpful if we could let webmasters know if we are having problems with Webmaster Tools. This is an excellent point, and we're always thinking about better ways to communicate with webmasters. If you're having problems with Webmaster Tools, chances are someone else is as well, and they've posted to the Google Webmaster Help Group about this. In the past, if we've experienced problems with Webmaster Tools, we've also created a "sticky" post to let users know that we know about these issues with Webmaster Tools, and we're working to find a solution.

Well, that about wraps it up with our Popular Picks. Thanks again for all of your questions, and I look forward to seeing you around the group.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Dynamic URLs vs. static URLs 2013

salam every one, this is a topic from google web master centrale blog:
Chatting with webmasters often reveals widespread beliefs that might have been accurate in the past, but are not necessarily up-to-date any more. This was the case when we recently talked to a couple of friends about the structure of a URL. One friend was concerned about using dynamic URLs, since (as she told us) "search engines can't cope with these." Another friend thought that dynamic URLs weren't a problem at all for search engines and that these issues were a thing of the past. One even admitted that he never understood the fuss about dynamic URLs in comparison to static URLs. For us, that was the moment we decided to read up on the topic of dynamic and static URLs. First, let's clarify what we're talking about:

What is a static URL?
A static URL is one that does not change, so it typically does not contain any url parameters. It can look like this: http://www.example.com/archive/january.htm. You can search for static URLs on Google by typing filetype:htm in the search field. Updating these kinds of pages can be time consuming, especially if the amount of information grows quickly, since every single page has to be hard-coded. This is why webmasters who deal with large, frequently updated sites like online shops, forum communities, blogs or content management systems may use dynamic URLs.

What is a dynamic URL?
If the content of a site is stored in a database and pulled for display on pages on demand, dynamic URLs maybe used. In that case the site serves basically as a template for the content. Usually, a dynamic URL would look something like this: http://code.google.com/p/google-checkout-php-sample-code/issues/detail?id=31. You can spot dynamic URLs by looking for characters like: ? = &. Dynamic URLs have the disadvantage that different URLs can have the same content. So different users might link to URLs with different parameters which have the same content. That's one reason why webmasters sometimes want to rewrite their URLs to static ones.

Should I try to make my dynamic URLs look static?
Following are some key points you should keep in mind while dealing with dynamic URLs:
  1. It's quite hard to correctly create and maintain rewrites that change dynamic URLs to static-looking URLs.
  2. It's much safer to serve us the original dynamic URL and let us handle the problem of detecting and avoiding problematic parameters.
  3. If you want to rewrite your URL, please remove unnecessary parameters while maintaining a dynamic-looking URL.
  4. If you want to serve a static URL instead of a dynamic URL you should create a static equivalent of your content.
Which can Googlebot read better, static or dynamic URLs?
We've come across many webmasters who, like our friend, believed that static or static-looking URLs were an advantage for indexing and ranking their sites. This is based on the presumption that search engines have issues with crawling and analyzing URLs that include session IDs or source trackers. However, as a matter of fact, we at Google have made some progress in both areas. While static URLs might have a slight advantage in terms of clickthrough rates because users can easily read the urls, the decision to use database-driven websites does not imply a significant disadvantage in terms of indexing and ranking. Providing search engines with dynamic URLs should be favored over hiding parameters to make them look static.

Let's now look at some of the widespread beliefs concerning dynamic URLs and correct some of the assumptions which spook webmasters. :)

Myth: "Dynamic URLs cannot be crawled."
Fact: We can crawl dynamic URLs and interpret the different parameters. We might have problems crawling and ranking your dynamic URLs if you try to make your urls look static and in the process hide parameters which offer the Googlebot valuable information. One recommendation is to avoid reformatting a dynamic URL to make it look static. It's always advisable to use static content with static URLs as much as possible, but in cases where you decide to use dynamic content, you should give us the possibility to analyze your URL structure and not remove information by hiding parameters and making them look static.

Myth: "Dynamic URLs are okay if you use fewer than three parameters."
Fact: There is no limit on the number of parameters, but a good rule of thumb would be to keep your URLs short (this applies to all URLs, whether static or dynamic). You may be able to remove some parameters which aren't essential for Googlebot and offer your users a nice looking dynamic URL. If you are not able to figure out which parameters to remove, we'd advise you to serve us all the parameters in your dynamic URL and our system will figure out which ones do not matter. Hiding your parameters keeps us from analyzing your URLs properly and we won't be able to recognize the parameters as such, which could cause a loss of valuable information.

Following are some questions we thought you might have at this point.

Does that mean I should avoid rewriting dynamic URLs at all?
That's our recommendation, unless your rewrites are limited to removing unnecessary parameters, or you are very diligent in removing all parameters that could cause problems. If you transform your dynamic URL to make it look static you should be aware that we might not be able to interpret the information correctly in all cases. If you want to serve a static equivalent of your site, you might want to consider transforming the underlying content by serving a replacement which is truly static. One example would be to generate files for all the paths and make them accessible somewhere on your site. However, if you're using URL rewriting (rather than making a copy of the content) to produce static-looking URLs from a dynamic site, you could be doing harm rather than good. Feel free to serve us your standard dynamic URL and we will automatically find the parameters which are unnecessary.

Can you give me an example?
If you have a dynamic URL which is in the standard format like foo?key1=value&key2=value2 we recommend that you leave the url unchanged, and Google will determine which parameters can be removed; or you could remove uncessary parameters for your users. Be careful that you only remove parameters which do not matter. Here's an example of a URL with a couple of parameters:

www.example.com/article/bin/answer.foo?language=en&answer=3&sid=98971298178906&query=URL
  • language=en - indicates the language of the article
  • answer=3 - the article has the number 3
  • sid=8971298178906 - the session ID number is 8971298178906
  • query=URL - the query with which the article was found is [URL]
Not all of these parameters offer additional information. So rewriting the URL to www.example.com/article/bin/answer.foo?language=en&answer=3 probably would not cause any problems as all irrelevant parameters are removed.

The following are some examples of static-looking URLs which may cause more crawling problems than serving the dynamic URL without rewriting:
  • www.example.com/article/bin/answer.foo/en/3/98971298178906/URL
  • www.example.com/article/bin/answer.foo/language=en/answer=3/
    sid=98971298178906/query=URL
  • www.example.com/article/bin/answer.foo/language/en/answer/3/
    sid/98971298178906/query/URL
  • www.example.com/article/bin/answer.foo/en,3,98971298178906,URL
Rewriting your dynamic URL to one of these examples could cause us to crawl the same piece of content needlessly via many different URLs with varying values for session IDs (sid) and query. These forms make it difficult for us to understand that URL and 98971298178906 have nothing to do with the actual content which is returned via this URL. However, here's an example of a rewrite where all irrelevant parameters have been removed:
  • www.example.com/article/bin/answer.foo/en/3
Although we are able to process this URL correctly, we would still discourage you from using this rewrite as it is hard to maintain and needs to be updated as soon as a new parameter is added to the original dynamic URL. Failure to do this would again result in a static looking URL which is hiding parameters. So the best solution is often to keep your dynamic URLs as they are. Or, if you remove irrelevant parameters, bear in mind to leave the URL dynamic as the above example of a rewritten URL shows:
  • www.example.com/article/bin/answer.foo?language=en&answer=3
We hope this article is helpful to you and our friends to shed some light on the various assumptions around dynamic URLs. Please feel free to join our discussion group if you have any further questions.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Introducing a new Rich Snippets format: Events 2013

salam every one, this is a topic from google web master centrale blog: Webmaster Level: All

Last year we introduced Rich Snippets, a new feature that makes it possible to surface structured data from your pages on Google's search results. So far, user reaction to Rich Snippets has been enthusiastic -- after all, Rich Snippets help people make more informed clicks and find what they need even faster.

We originally introduced Rich Snippets with two formats: reviews and people. Later in the year we added support for marking up video information which is used to improve Video Search. Today, we're excited to kick off the new year by adding support for events.

Events markup is based off of the hCalendar microformat. Here's an example of what the new events Rich Snippets will look like:


The new format shows links to specific events on the page along with dates and locations. It provides a fast and convenient way for users to determine if a page has events they may be interested in.

If you have event listings on your site, we encourage you to review the events documentation we've prepared to help you get started. Please note, however, that marking up your content is not a guarantee that Rich Snippets will show for your site. Just as we did for previous formats, we will take a gradual approach to incorporating the new event snippets to ensure a great user experience along the way.

Stay tuned for more developments in Rich Snippets throughout the year!

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Improved handling of URLs with parameters 2013

salam every one, this is a topic from google web master centrale blog: Webmaster level: Advanced

You may have noticed that the Parameter Handling feature disappeared from the Site configuration > Settings section of Webmaster Tools. Fear not; you can now find it under its new name, URL Parameters! Along with renaming it, we refreshed and improved the feature. We hope you’ll find it even more useful. Configuration of URL parameters made in the old version of the feature will be automatically visible in the new version. Before we reveal all the cool things you can do with URL parameters now, let us remind you (or introduce, if you are new to this feature) of the purpose of this feature and when it may come in handy.

When to use
URL Parameters helps you control which URLs on your site should be crawled by Googlebot, depending on the parameters that appear in these URLs. This functionality provides a simple way to prevent crawling duplicate content on your site. Now, your site can be crawled more effectively, reducing your bandwidth usage and likely allowing more unique content from your site to be indexed. If you suspect that Googlebot's crawl coverage of the content on your site could be improved, using this feature can be a good idea. But with great power comes great responsibility! You should only use this feature if you're sure about the behavior of URL parameters on your site. Otherwise you might mistakenly prevent some URLs from being crawled, making their content no longer accessible to Googlebot.

A lot more to do
Okay, let’s talk about what’s new and improved. To begin with, in addition to assigning a crawl action to an individual parameter, you can now also describe the behavior of the parameter. You start by telling us whether or not the parameter changes the content of the page. If the parameter doesn’t affect the page’s content then your work is done; Googlebot will choose URLs with a representative value of this parameter and will crawl the URLs with this value. Since the parameter doesn’t change the content, any value chosen is equally good. However, if the parameter does change the content of a page, you can now assign one of four possible ways for Google to crawl URLs with this parameter:
  • Let Googlebot decide
  • Every URL
  • Only crawl URLs with value=x
  • No URLs
We also added the ability to provide your own specific value to be used, with the “Only URLs with value=x” option; you’re no longer restricted to the list of values that we provide. Optionally, you can also tell us exactly what the parameter does--whether it sorts, paginates, determines content, etc. One last improvement is that for every parameter, we’ll try to show you a sample of example URLs from your site that Googlebot crawled which contain that particular parameter.

Of the four crawl options listed above, “No URLs” is new and deserves special attention. This option is the most restrictive and, for any given URL, takes precedence over settings of other parameters in that URL. This means that if the URL contains a parameter that is set to the “No URLs” option, this URL will never be crawled, even if other parameters in the URL are set to “Every URL.” You should be careful when using this option. The second most restrictive setting is “Only URLs with value=x.”

Feature in use
Now let’s do something fun and exercise our brains on an example.
- - -
Once upon a time there was an online store, fairyclothes.example.com. The store’s website used parameters in its URLs, and the same content could be reached through multiple URLs. One day the store owner noticed, that too many redundant URLs could be preventing Googlebot from crawling the site thoroughly. So he sent his assistant CuriousQuestionAsker to The GreatWebWizard to get advice on using the URL parameters feature to reduce the duplicate content crawled by Googlebot. The Great WebWizard was famous for his wisdom. He looked at the URL parameters and proposed the following configuration:

Parameter nameEffect on content?What should Googlebot crawl?
trackingIdNoneOne representative URL
sortOrderSortsOnly URLs with value = ‘lowToHigh’
sortBySortsOnly URLs with value = ‘price’
filterByColorNarrowsNo URLs
itemIdSpecifiesEvery URL
pagePaginatesEvery URL

The CuriousQuestionAsker couldn’t avoid his nature and started asking questions:

CuriousQuestionAsker: You’ve instructed Googlebot to choose a representative URL for trackingId (value to be chosen by Googlebot). Why not select the Only URLs with value=x option and choose the value myself?
Great WebWizard: While crawling the web Googlebot encountered the following URLs that link to your site:
  1. fairyclothes.example.com/skirts/?trackingId=aaa123
  2. fairyclothes.example.com/skirts/?trackingId=aaa124
  3. fairyclothes.example.com/trousers/?trackingId=aaa125
Imagine that you were to tell Googebot to only crawl URLs where “trackingId=aaa125”. In that case Googlebot would not crawl URLs 1 and 2 as neither of them has the value aaa125 for trackingId. Their content would neither be crawled nor indexed and none of your inventory of fine skirts would show up in Google’s search results. No, for this case choosing a representative URL is the way to go. Why? Because that tells Googlebot that when it encounters two URLs on the web that differ only in this parameter (as URLs 1 and 2 above do) then it only needs to crawl one of them (either will do) and it will still get all the content. In the example above two URLs will be crawled; either 1 & 3, or 2 & 3. Not a single skirt or trouser will be lost.

CuriousQuestionAsker: What about the sortOrder parameter? I don’t care if the items are listed in ascending or descending order. Why not let Google select a representative value?
Great WebWizard: As Googlebot continues to crawl it may find the following URLs:
  1. fairyclothes.example.com/skirts/?page=1&sortBy=price&sortOrder=’lowToHigh’
  2. fairyclothes.example.com/skirts/?page=1&sortBy=price&sortOrder=’highToLow’
  3. fairyclothes.example.com/skirts/?page=2&sortBy=price&sortOrder=’lowToHigh’
  4. fairyclothes.example.com/skirts/?page=2&sortBy=price&sortOrder=’ highToLow’
Notice how the first pair of URLs (1 & 2) differs only in the value of the sortOrder parameter as do URLs in the second pair (3 & 4). However, URLs 1 and 2 will produce different content: the first showing the least expensive of your skirts while the second showing the priciest. That should be your first hint that using a single representative value is not a good choice for this situation. Moreover, if you let Googlebot choose a single representative from among a set of URLs that differ only in their sortOrder parameter it might choose a different value each time. In the example above, from the first pair of URLs, URL 1 might be chosen (sortOrder=’lowToHigh’). Whereas from the second pair URL 4 might be picked (sortOrder=’ highToLow’). If that were to happen Googlebot would crawl only the least expensive skirts (twice):
  • fairyclothes.example.com/skirts/?page=1&sortBy=price&sortOrder=’lowToHigh’
  • fairyclothes.example.com/skirts/?page=2&sortBy=price&sortOrder=’ highToLow’
Your most expensive skirts would not be crawled at all! When dealing with sorting parameters consistency is key. Always sort the same way.

CuriousQuestionAsker: How about the sortBy value?
Great WebWizard: This is very similar to the sortOrder attribute. You want the crawled URLs of your listing to be sorted consistently throughout all the pages, otherwise some of the items may not be visible to Googlebot. However, you should be careful which value you choose. If you sell books as well as shoes in your store, it would be better not to select the value ‘title’ since URLs pointing to shoes never contain ‘sortBy=title’, so they will not be crawled. Likewise setting ‘sortBy=size’ works well for crawling shoes, but not for crawling books. Keep in mind that parameters configuration has influence throughout the whole site.

CuriousQuestionAsker: Why not crawl URLs with parameter filterByColor?
Great WebWizard: Imagine that you have a three-page list of skirts. Some of the skirts are blue, some of them are red and others are green.
  • fairyclothes.example.com/skirts/?page=1
  • fairyclothes.example.com/skirts/?page=2
  • fairyclothes.example.com/skirts/?page=3
This list is filterable. When a user selects a color, she gets two pages of blue skirts:
  • fairyclothes.example.com/skirts/?page=1&flterByColor=blue
  • fairyclothes.example.com/skirts/?page=2&flterByColor=blue
They seem like new pages (the set of items are different from all other pages), but there is actually no new content on them, since all the blue skirts were already included in the original three pages. There’s no need to crawl URLs that narrow the content by color, since the content served on those URLs was already crawled. There is one important thing to notice here: before you disallow some URLs from being crawled by selecting the “No URLs” option, make sure that Googlebot can access the content in another way. Considering our example, Googlebot needs to be able to find the first three links on your site, and there should be no settings that prevent crawling them.
- - -

If your site has URL parameters that are potentially creating duplicate content issues then you should check out the new URL Parameters feature in Webmaster Tools. Let us know what you think or if you have any questions post them to the Webmaster Help Forum.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Making Websites Mobile Friendly 2013

salam every one, this is a topic from google web master centrale blog:

Webmaster level: Intermediate

We’ve noticed a rise in the number of questions from webmasters about how best to structure a website for mobile phones and how websites can best interact with Googlebot-Mobile. In this post we’ll explain the current situation and give you specific recommendations you can implement now.

Some Background

Let’s start with a simple question: what do we mean by “mobile phone” when talking about mobile-friendly websites?

A good way to answer this question is to think about the capabilities of the mobile phone’s web browser, especially in relation to the capabilities of modern desktop browsers. To simplify matters, we can break mobile phones into a few classifications:

  1. Traditional mobile phones: Phones with browsers that cannot render normal desktop webpages. This includes browsers for cHTML (iMode), WML, WAP, and the like.
  2. Smartphones: Phones with browsers that render normal desktop pages, at least to some extent. This category includes a diversity of devices, such Windows Phone 7, Blackberry devices, iPhones, and Android phones, and also tablets and eBook readers.

    We can further break down this category by support for HTML5:

    • Devices with browsers that do not support HTML5
    • Devices with browsers that support HTML5

Once upon a time, mobile phones connected to the Internet using browsers with limited rendering capabilities; but this is clearly a changing situation with the fast rise of smartphones which have browsers that rival the full desktop experience. As such, it’s important to note that the distinction we are making here is based on the current situation as we see it and might change in the future.

Googlebot and Mobile Content

Google has two crawlers relevant to this topic: Googlebot and Googlebot-Mobile. Googlebot crawls desktop-browser type of webpages and content embedded in them and Googlebot-Mobile crawls mobile content. The questions we’re seeing more of can be summed up as follows:

Given the diversity of capabilities of mobile web browsers, what kind of content should I serve to Googlebot-Mobile?

The answer lies in the User-agent that Googlebot-Mobile supplies when crawling. There are several User-agent strings in use by Googlebot-Mobile, all of which use this format:

[Phone name(s)] (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

To decide which content to serve, assess which content your website has that best serves the phone(s) in the User-agent string. A full list of Googlebot-Mobile User-agents can be found here.

Notice that we currently do not crawl with Googlebot-Mobile using a smartphone User-agent string. Thus at the current time, a correctly-configured content serving system will serve Googlebot-Mobile content only for the traditional phones described above, because that’s what the User-agent strings in use today dictate. This may change in the future, and if so, it may mean there would be a new Googlebot-Mobile User-agent string.

For now, we expect smartphones to handle desktop experience content so there is no real need for mobile-specific effort from webmasters. However, for many websites it may still make sense for the content to be formatted differently for smartphones, and the decision to do so should be based on how you can best serve your users.

URL Structure for Mobile Content

The next set of questions ask about the URLs mobile content should be served from. Let’s look in detail at some common use cases.

Websites with only Desktop Experience Content

Most websites currently have only one version of their content, namely in HTML that is designed for desktop web browsers. This means all browsers access the content from the same URL.

These websites may not be serving traditional mobile phone users. The quality experienced by their smartphone users depends on the mobile browser they are using and it could be as good as browsing from the desktop.

If you serve only desktop experience content for all User Agents, you should do so for Googlebot-Mobile too; that is, treat Googlebot-Mobile as you treat all other or unknown User Agents. In these cases, Google may modify your webpages for an improved mobile experience.

Websites with Dedicated Mobile Content

Many websites have content specifically optimized for mobile users. The content could be simply reformatted for the typically smaller mobile displays, or it could be in a different format (e.g., served using WAP, etc.).

A very common question we see is: Does it matter if the different types of content are served from the same URL or from different URLs? For example, some websites have www.example.com as the URL desktop browsers are meant to access and have m.example.com or wap.example.com for the different mobile devices. Other websites serve all types of content from just one URL structure like www.example.com.

For Googlebot and Googlebot-Mobile, it does not matter what the URL structure is as long as it returns exactly what a user sees too. For example, if you redirect mobile users from www.example.com to m.example.com, that will be recognized by Googlebot-Mobile and both websites will be crawled and added to the correct index. In this case, use a 301 redirect for both users and Googlebot-Mobile.

If you serve all types of content from www.example.com, i.e. serving desktop-optimized content or mobile-optimized content from the same URL depending on the User-agent, this will also lead to correct crawling by Googlebot and Googlebot-Mobile. This is not considered cloaking by Google.

It is worth repeating that regardless of URL structure, you must correctly detect the User-agent as given by your users and Googlebot-Mobile, and serve both the same content. Don’t forget to keep the default content, the desktop-optimized content, for when an unknown User-agent requests it.

Mobile Sitemaps in Webmaster Tools

Finally, we receive many questions about what URLs to put in Mobile Sitemaps. As explained in our Mobile Sitemaps Help Center articles, you should include only mobile content URLs in Mobile Sitemaps, even if these URLs also return non-mobile content when accessed by a non-mobile User-agent.

More Questions?

A good place to start is our Mobile Sites Help Center articles and the relevant sections in our Search Engine Optimization Starter Guide. We also created a thread in our forums for you to ask questions about this post.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Better understanding of your site 2013

salam every one, this is a topic from google web master centrale blog: SES Chicago was wonderful. Meeting so many of you made the trip absolutely perfect. It was as special as if (Chicago local) Oprah had joined us!

While hanging out at the Google booth, I was often asked about how to take advantage of our webmaster tools. For example, here's one tip on Common Words.

Common Words: Our prioritized listing of your site's content
The common words feature lists in order of priority (from highest to lowest) the prevalent words we've found in your site, and in links to your site. (This information isn't available for subdirectories or subdomains.) Here are the steps to leveraging common words:

1. Determine your website's key concepts. If it offers getaways to a cattle ranch in Wyoming, the key concepts may be "cattle ranch," "horseback riding," and "Wyoming."

2. Verify that Google detected the same phrases you believe are of high importance. Login to webmaster tools, select your verified site, and choose Page analysis from the Statistics tab. Here, under "Common words in your site's content," we list the phrases detected from your site's content in order of prevalence. Do the common words lack any concepts you believe are important? Are they listing phrases that have little direct relevance to your site?

2a. If you're missing important phrases, you should first review your content. Do you have solid, textual information that explains and relates to the key concepts of your site? If in the cattle-ranch example, "horseback riding" was absent from common words, you may then want to review the "activities" page of the site. Does it include mostly images, or only list a schedule of riding lessons, rather than conceptually relevant information?

It may sound obvious, but if you want to rank for a certain set of keywords, but we don't even see those keyword phrases on your website, then ranking for those phrases will be difficult.

2b. When you see general, non-illustrative common words that don't relate helpfully to your site's content (e.g. a top listing of "driving directions" or "contact us"), then it may be beneficial to increase the ratio of relevant content on your site. (Although don't be too worried if you see a few of these common words, as long as you also see words that are relevant to your main topics.) In the cattle ranch example, you would give visitors "driving directions" and "contact us" information. However, if these general, non-illustrative terms surface as the highest-rated common words, or the entire list of common words is only these types of terms, then Google (and likely other search engines) could not find enough "meaty" content.

2c. If you find that many of the common words still don't relate to your site, check out our blog post on unexpected common words.

3. Here are a few of our favorite posts on improving your site's content:
Target visitors or search engines?

Improving your site's indexing and ranking

NEW! SES Chicago - Using Images

4. Should you decide to update your content, please keep in mind that we will need to recrawl your site in order to recognize changes, and that this may take time. Of course, you can notify us of modifications by submitting a Sitemap.

Happy holidays from all of us on the Webmaster Central team!

SES Chicago: Googlers Trevor Foucher, Adam Lasnik and Jonathan Simon
this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Using schema.org markup for videos 2013

salam every one, this is a topic from google web master centrale blog: Webmaster level: All

Videos are one of the most common types of results on Google and we want to make sure that your videos get indexed. Today, we're also launching video support for schema.org. Schema.org is a joint effort between Google, Microsoft, Yahoo! and Yandex and is now the recommended way to describe videos on the web. The markup is very simple and can be easily added to most websites.

Adding schema.org video markup is just like adding any other schema.org data. Simply define an itemscope, an itemtype=”http://schema.org/VideoObject”, and make sure to set the name, description, and thumbnailUrl properties. You’ll also need either the embedURL — the location of the video player — or the contentURL — the location of the video file. A typical video player with markup might look like this:

<div itemscope itemtype="http://schema.org/VideoObject">
  <h2>Video: <span itemprop="name">Title</span></h2>
  <meta itemprop="duration" content="T1M33S" />
  <meta itemprop="thumbnailUrl" content="thumbnail.jpg" />
  <meta itemprop="embedURL"
    content="http://www.example.com/videoplayer.swf?video=123" />
  <object ...>
    <embed type="application/x-shockwave-flash" ...>
  </object>
  <span itemprop="description">Video description</span>
</div>


Using schema.org markup will not affect any Video Sitemaps or mRSS feeds you're already using. In fact, we still recommend that you also use a Video Sitemap because it alerts us of any new or updated videos faster and provides advanced functionality such as country and platform restrictions.

Since this means that there are now a number of ways to tell Google about your videos, choosing the right format can seem difficult. In order to make the video indexing process as easy as possible, we’ve put together a series of videos and articles about video indexing in our new Webmasters EDU microsite.

For more information, you can go through the Webmasters EDU video articles, read the full schema.org VideoObject specification, or ask questions in the Webmaster Help Forum. We look forward to seeing more of your video content in Google Search.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: To slash or not to slash 2013

salam every one, this is a topic from google web master centrale blog: Webmaster Level: Intermediate

That is the question we hear often. Onward to the answers! Historically, it’s common for URLs with a trailing slash to indicate a directory, and those without a trailing slash to denote a file:

http://example.com/foo/ (with trailing slash, conventionally a directory)
http://example.com/foo (without trailing slash, conventionally a file)

But they certainly don’t have to. Google treats each URL above separately (and equally) regardless of whether it’s a file or a directory, or it contains a trailing slash or it doesn’t contain a trailing slash.

Different content on / and no-/ URLs okay for Google, often less ideal for users

From a technical, search engine standpoint, it’s certainly permissible for these two URL versions to contain different content. Your users, however, may find this configuration horribly confusing -- just imagine if www.google.com/webmasters and www.google.com/webmasters/ produced two separate experiences.

For this reason, trailing slash and non-trailing slash URLs often serve the same content. The most common case is when a site is configured with a directory structure:
http://example.com/parent-directory/child-directory/

Your site’s configuration and your options

You can do a quick check on your site to see if the URLs:
  1. http://<your-domain-here>/<some-directory-here>/
    (with trailing slash)
  2. http://<your-domain-here>/<some-directory-here>
    (no trailing slash)
don’t both return a 200 response code, but that one version redirects to the other.
  • If only one version can be returned (i.e., the other redirects to it), that’s great! This behavior is beneficial because it reduces duplicate content. In the particular case of redirects to trailing slash URLs, our search results will likely show the version of the URL with the 200 response code (most often the trailing slash URL) -- regardless of whether the redirect was a 301 or 302.

  • If both slash and non-trailing-slash versions contain the same content and each returns 200, you can:
    • Consider changing this behavior (more info below) to reduce duplicate content and improve crawl efficiency.
    • Leave it as-is. Many sites have duplicate content. Our indexing process often handles this case for webmasters and users. While it’s not totally optimal behavior, it’s perfectly legitimate and a-okay. :)
    • Rest assured that for your root URL specifically, http://example.com is equivalent to http://example.com/ and can’t be redirected even if you’re Chuck Norris.
Steps for serving only one URL version

What if your site serves duplicate content on these two URLs:

http://<your-domain-here>/<some-directory-here>/
http://<your-domain-here>/<some-directory-here>

meaning that both URLs return 200 (neither has a redirect or contains rel=”canonical”), and you want to change the situation?
  1. Choose one URL as the preferred version. If your site has a directory structure, it’s more conventional to use a trailing slash with your directory URLs (e.g., example.com/directory/ rather than example.com/directory), but you’re free to choose whichever you like.

  2. Be consistent with the preferred version. Use it in your internal links. If you have a Sitemap, include the preferred version (and don’t include the duplicate URL).

  3. Use a 301 redirect from the duplicate to the preferred version. If that’s not possible, rel=”canonical” is a strong option. rel=”canonical” works similarly to a 301 for Google’s indexing purposes, and other major search engines as well.

  4. Test your 301 configuration through Fetch as Googlebot in Webmaster Tools. Make sure your URLs:
    http://example.com/foo/
    http://example.com/foo
    are behaving as expected. The preferred version should return 200. The duplicate URL should 301 to the preferred URL.

  5. Check for Crawl errors in Webmaster Tools, and, if possible, your webserver logs as a sanity check that the 301s are implemented.

  6. Profit! (just kidding) But you can bask in the sunshine of your efficient server configuration, warmed by the knowledge that your site is better optimized.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: How to verify Googlebot 2013

salam every one, this is a topic from google web master centrale blog: Lately I've heard a couple smart people ask that search engines provide a way know that a bot is authentic. After all, any spammer could name their bot "Googlebot" and claim to be Google, so which bots do you trust and which do you block?

The common request we hear is to post a list of Googlebot IP addresses in some public place. The problem with that is that if/when the IP ranges of our crawlers change, not everyone will know to check. In fact, the crawl team migrated Googlebot IPs a couple years ago and it was a real hassle alerting webmasters who had hard-coded an IP range. So the crawl folks have provided another way to authenticate Googlebot. Here's an answer from one of the crawl people (quoted with their permission):


Telling webmasters to use DNS to verify on a case-by-case basis seems like the best way to go. I think the recommended technique would be to do a reverse DNS lookup, verify that the name is in the googlebot.com domain, and then do a corresponding forward DNS->IP lookup using that googlebot.com name; eg:

> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1

I don't think just doing a reverse DNS lookup is sufficient, because a spoofer could set up reverse DNS to point to crawl-a-b-c-d.googlebot.com.


This answer has also been provided to our help-desk, so I'd consider it an official way to authenticate Googlebot. In order to fetch from the "official" Googlebot IP range, the bot has to respect robots.txt and our internal hostload conventions so that Google doesn't crawl you too hard.

(Thanks to N. and J. for help on this answer from the crawl side of things.)this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Help Google index your videos 2013

salam every one, this is a topic from google web master centrale blog: Webmaster Level: All

The single best way to make Google aware of all your videos on your website is to create and maintain a Video Sitemap. Video Sitemaps provide Google with essential information about your videos, including the URLs for the pages where the videos can be found, the titles of the videos, keywords, thumbnail images, durations, and other information. The Sitemap also allows you to define the period of time for which each video will be available. This is particularly useful for content that has explicit viewing windows, so that we can remove the content from our index when it expires.

Once your Sitemap is created, you can can submit the URL of the Sitemap file in Google Webmaster Tools or through your robots.txt file.

Once we have indexed a video, it may appear in our web search results in what we call a Video Onebox (a cluster of videos related to the queried topic) and in our video search property, Google Videos. A video result is immediately recognizable by its thumbnail, duration, and a description.

As an example, this is what a video result from CNN.com looks like on Google:


We encourage those of you with videos to submit Video Sitemaps and to keep them updated with your new content. Please also visit our recently updated Video Sitemap Help Center, and utilize our Sitemap Help Forum. If you've submitted a Video Sitemap file via Webmaster Tools and want to share your experiences or problems, you can do so here.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Debugging blocked URLs 2013

salam every one, this is a topic from google web master centrale blog: Vanessa's been posting a lot lately, and I'm starting to feel left out. So here my tidbit of wisdom for you: I've noticed a couple of webmasters confused by "blocked by robots.txt" errors, and I wanted to share the steps I take when debugging robots.txt problems:

A handy checklist for debugging a blocked URL

Let's assume you are looking at crawl errors for your website and notice a URL restricted by robots.txt that you weren't intending to block:
http://www.example.com/amanda.html URL restricted by robots.txt Sep 3, 2006

Check the robots.txt analysis tool
The first thing you should do is go to the robots.txt analysis tool for that site. Make sure you are looking at the correct site for that URL, paying attention that you are looking at the right protocol and subdomain. (Subdomains and protocols may have their own robots.txt file, so https://www.example.com/robots.txt may be different from http://example.com/robots.txt and may be different from http://amanda.example.com/robots.txt.) Paste the blocked URL into the "Test URLs against this robots.txt file" box. If the tool reports that it is blocked, you've found your problem. If the tool reports that it's allowed, we need to investigate further.

At the top of the robots.txt analysis tool, take a look at the HTTP status code. If we are reporting anything other than a 200 (Success) or a 404 (Not found) then we may not be able to reach your robots.txt file, which stops our crawling process. (Note that you can see the last time we downloaded your robots.txt file at the top of this tool. If you make changes to your file, check this date and time to see if your changes were made after our last download.)

Check for changes in your robots.txt file
If these look fine, you may want to check and see if your robots.txt file has changed since the error occurred by checking the date to see when your robots.txt file was last modified. If it was modified after the date given for the error in the crawl errors, it might be that someone has changed the file so that the new version no longer blocks this URL.

Check for redirects of the URL
If you can be certain that this URL isn't blocked, check to see if the URL redirects to another page. When Googlebot fetches a URL, it checks the robots.txt file to make sure it is allowed to access the URL. If the robots.txt file allows access to the URL, but the URL returns a redirect, Googlebot checks the robots.txt file again to see if the destination URL is accessible. If at any point Googlebot is redirected to a blocked URL, it reports that it could not get the content of the original URL because it was blocked by robots.txt.

Sometimes this behavior is easy to spot because a particular URL always redirects to another one. But sometimes this can be tricky to figure out. For instance:
  • Your site may not have a robots.txt file at all (and therefore, allows access to all pages), but a URL on the site may redirect to a different site, which does have a robots.txt file. In this case, you may see URLs blocked by robots.txt for your site (even though you don't have a robots.txt file).
  • Your site may prompt for registration after a certain number of page views. You may have the registration page blocked by a robots.txt file. In this case, the URL itself may not redirect, but if Googlebot triggers the registration prompt when accessing the URL, it will be redirected to the blocked registration page, and the original URL will be listed in the crawl errors page as blocked by robots.txt.

Ask for help
Finally, if you still can't pinpoint the problem, you might want to post on our forum for help. Be sure to include the URL that is blocked in your message. Sometimes its easier for other people to notice oversights you may have missed.

Good luck debugging! And by the way -- unrelated to robots.txt -- make sure that you don't have "noindex" meta tags at the top of your web pages; those also result in Google not showing a web site in our index.this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Googlebot activity reports 2013

salam every one, this is a topic from google web master centrale blog: The webmaster tools team has a very exciting mission: we dig into our logs, find as much useful information as possible, and pass it on to you, the webmasters. Our reward is that you more easily understand what Google sees, and why some pages don't make it to the index.

The latest batch of information that we've put together for you is the amount of traffic between Google and a given site. We show you the number of requests, number of kilobytes (yes, yes, I know that tech-savvy webmasters can usually dig this out, but our new charts make it really easy to see at a glance), and the average document download time. You can see this information in chart form, as well as in hard numbers (the maximum, minimum, and average).

For instance, here's the number of pages Googlebot has crawled in the Webmaster Central blog over the last 90 days. The maximum number of pages Googlebot has crawled in one day is 24 and the minimum is 2. That makes sense, because the blog was launched less than 90 days ago, and the chart shows that the number of pages crawled per day has increased over time. The number of pages crawled is sometimes more than the total number of pages in the site -- especially if the same page can be accessed via several URLs. So http://www..matrixar.com/2006/10/learn-more-about-googlebots-crawl-of.html and http://www..matrixar.com/2006/10/learn-more-about-googlebots-crawl-of.html#links are different, but point to the same page (the second points to an anchor within the page).


And here's the average number of kilobytes downloaded from this blog each day. As you can see, as the site has grown over the last two and a half months, the number of average kilobytes downloaded has increased as well.


The first two reports can help you diagnose the impact that changes in your site may have on its coverage. If you overhaul your site and dramatically reduce the number of pages, you'll likely notice a drop in the number of pages that Googlebot accesses.

The average document download time can help pinpoint subtle networking problems. If the average time spikes, you might have network slowdowns or bottlenecks that you should investigate. Here's the report for this blog that shows that we did have a short spike in early September (the maximum time was 1057 ms), but it quickly went back to a normal level, so things now look OK.

In general, the load time of a page doesn't affect its ranking, but we wanted to give this info because it can help you spot problems. We hope you will find this data as useful as we do!this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: All About Googlebot 2013

salam every one, this is a topic from google web master centrale blog: I've seen a lot of questions lately about robots.txt files and Googlebot's behavior. Last week at SES, I spoke on a new panel called the Bot Obedience course. And a few days ago, some other Googlers and I fielded questions on the WebmasterWorld forums. Here are some of the questions we got:

If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the "down for maintenance" page?
You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.

What should I do if Googlebot is crawling my site too much?
You can contact us -- we'll work with you to make sure we don't overwhelm your server's bandwidth. We're experimenting with a feature in our webmaster tools for you to provide input on your crawl rate, and have gotten great feedback so far, so we hope to offer it to everyone soon.

Is it better to use the meta robots tag or a robots.txt file?
Googlebot obeys either, but meta tags apply to single pages only. If you have a number of pages you want to exclude from crawling, you can structure your site in such a way that you can easily use a robots.txt file to block those pages (for instance, put the pages into a single directory).

If my robots.txt file contains a directive for all bots as well as a specific directive for Googlebot, how does Googlebot interpret the line addressed to all bots?
If your robots.txt file contains a generic or weak directive plus a directive specifically for Googlebot, Googlebot obeys the lines specifically directed at it.

For instance, for this robots.txt file:
User-agent: *
Disallow: /

User-agent: Googlebot
Disallow: /cgi-bin/
Googlebot will crawl everything in the site other than pages in the cgi-bin directory.

For this robots.txt file:
User-agent: *
Disallow: /
Googlebot won't crawl any pages of the site.

If you're not sure how Googlebot will interpret your robots.txt file, you can use our robots.txt analysis tool to test it. You can also test how Googlebot will interpret changes to the file.

For complete information on how Googlebot and Google's other user agents treat robots.txt files, see our webmaster help center.this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Working with multilingual websites 2013

salam every one, this is a topic from google web master centrale blog: Webmaster Level: Intermediate

A multilingual website is any website that offers content in more than one language. Examples of multilingual websites might include a Canadian business with an English and a French version of its site, or a blog on Latin American soccer available in both Spanish and Portuguese.

Usually, it makes sense to have a multilingual website when your target audience consists of speakers of different languages. If your blog on Latin American soccer aims to reach the Brazilian audience, you may choose to publish it only in Portuguese. But if you’d like to reach soccer fans from Argentina also, then providing content in Spanish could help you with that.

Google and language recognition


Google tries to determine the main languages of each one of your pages. You can help to make language recognition easier if you stick to only one language per page and avoid side-by-side translations. Although Google can recognize a page as being in more than one language, we recommend using the same language for all elements of a page: headers, sidebars, menus, etc.

Keep in mind that Google ignores all code-level language information, from “lang” attributes to Document Type Definitions (DTD). Some web editing programs create these attributes automatically, and therefore they aren’t very reliable when trying to determine the language of a webpage.

Someone who comes to Google and does a search in their language expects to find localized search results, and this is where you, as a webmaster, come in: if you’re going to localize, make it visible in the search results with some of our tips below.

The anatomy of a multilingual site: URL structure


There's no need to create special URLs when developing a multilingual website. Nonetheless, your users might like to identify what section of your website they’re on just by glancing at the URL. For example, the following URLs let users know that they’re on the English section of this site:

http://example.ca/en/mountain-bikes.html
http://
en.example.ca/mountain-bikes.html

While these other URLs let users know that they’re viewing the same page in French:

http://example.ca/fr/mountain-bikes.html
http://fr.example.ca/mountain-bikes.html


Additionally, this URL structure will make it easier for you to analyze the indexing of your multilingual content.

If you want to create URLs with non-English characters, make sure to use UTF-8 encoding. UTF-8 encoded URLs should be properly escaped when linked from within your content. Should you need to escape your URLs manually, you can easily find an online URL encoder that will do this for you. For example, if I wanted to translate the following URL from English to French,

http://example.ca/fr/mountain-bikes.html

It might look something like this:

http://example.ca/fr/vélo-de-montagne.html

Since this URL contains one non-English character (é), this is what it would look like properly escaped for use in a link on your pages:

http://example.ca/fr/v%C3%A9lo-de-montagne

Crawling and indexing your multilingual website


We recommend that you do not allow automated translations to get indexed. Automated translations don’t always make sense and they could potentially be viewed as spam. More importantly, the point of making a multilingual website is to reach a larger audience by providing valuable content in several languages. If your users can’t understand an automated translation or if it feels artificial to them, you should ask yourself whether you really want to present this kind of content to them.

If you’re going to localize, make it easy for Googlebot to crawl all language versions of your site. Consider cross-linking page by page. In other words, you can provide links between pages with the same content in different languages. This can also be very helpful to your users. Following our previous example, let’s suppose that a French speaker happens to land on http://example.ca/en/mountain-bikes.html; now, with one click he can get to http://example.ca/fr/vélo-de-montagne.html where he can view the same content in French.

To make all of your site's content more crawlable, avoid automatic redirections based on the user's perceived language. These redirections could prevent users (and search engines) from viewing all the versions of your site.

And last but not least, keep the content for each language on separate URLs - don't use cookies to show translated versions.

Working with character encodings


Google directly extracts character encodings from HTTP headers, HTML page headers, and content. There isn’t much you need to do about character encoding, other than watching out for conflicting information - for example, between content and headers. While Google can recognize different character encodings, we recommend that you use UTF-8 on your website whenever possible.

If your tongue gets twisted...


Now that you know all of this, your tongue may get twisted when you speak many languages, but your website doesn’t have to!

For more information, read our post on multi-regional sites and stay tuned for our next post, where we'll delve into special situations that may arise when working with global websites. Until then, don't hesitate to drop by the Help Forum and join the discussion!

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Flash indexing with external resource loading 2013

salam every one, this is a topic from google web master centrale blog:
Webmaster Level: All

We just added external resource loading to our Flash indexing capabilities. This means that when a SWF file loads content from some other file—whether it's text, HTML, XML, another SWF, etc.—we can index this external content too, and associate it with the parent SWF file and any documents that embed it.
This new capability improves search quality by allowing relevant content contained in external resources to appear in response to users' queries. For example, this result currently comes up in response to the query [2002 VW Transporter 888]:


Prior to this launch, this result did not appear, because all of the relevant content is contained in an XML file loaded by a SWF file.

To date, when Google encounters SWF files on the web, we can:
  • Index textual content displayed as a user interacts with the file. We click buttons and enter input, just like a user would.
  • Discover links within Flash files.
  • Load external resources and associate the content with the parent file.
  • Support common JavaScript techniques for embedding Flash, such as SWFObject and SWFObject2.
  • Index sites scripted with AS1 and AS2, even if the ActionScript is obfuscated. Update on June 19, 2009: We index sites with AS3 as well. The ActionScript version isn't particularly relevant in our Indexing process, so we support older versions of AS in addition to the latest.
If you don't want your SWF file or any of its external resources crawled by search engines, please use an appropriate robots.txt directive.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Running desktop and mobile versions of your site 2013

salam every one, this is a topic from google web master centrale blog: (This post was largely translated from our Japanese version of the Webmaster Central Blog )

Recently I introduced several methods to ensure your mobile site is properly indexed by Google. Today I'd like to share information useful for webmasters who manage both desktop and mobile phone versions of a site.

One of the most common problems for webmasters who run both mobile and desktop versions of a site is that the mobile version of the site appears for users on a desktop computer, or that the desktop version of the site appears when someone finds them from a mobile device. In dealing with this scenario, here are two viable options:

Redirect mobile users to the correct version
When a mobile user or crawler (like Googlebot-Mobile) accesses the desktop version of a URL, you can redirect them to the corresponding mobile version of the same page. Google notices the relationship between the two versions of the URL and displays the standard version for searches from desktops and the mobile version for mobile searches.

If you redirect users, please make sure that the content on the corresponding mobile/desktop URL matches as closely as possible. For example, if you run a shopping site and there's an access from a mobile phone to a desktop-version URL, make sure that the user is redirected to the mobile version of the page for the same product, and not to the homepage of the mobile version of the site. We occasionally find sites using this kind of redirect in an attempt to boost their search rankings, but this practice only results in a negative user experience, and so should be avoided at all costs.

On the other hand, when there's an access to a mobile-version URL from a desktop browser or by our web crawler, Googlebot, it's not necessary to redirect them to the desktop-version. For instance, Google doesn't automatically redirect desktop users from their mobile site to their desktop site, instead they include a link on the mobile-version page to the desktop version. These links are especially helpful when a mobile site doesn't provide the full functionality of the desktop version -- users can easily navigate to the desktop-version if they prefer.

Switch content based on User-agent
Some sites have the same URL for both desktop and mobile content, but change their format according to User-agent. In other words, both mobile users and desktop users access the same URL (i.e. no redirects), but the content/format changes slightly according to the User-agent. In this case, the same URL will appear for both mobile search and desktop search, and desktop users can see a desktop version of the content while mobile users can see a mobile version of the content.

However, note that if you fail to configure your site correctly, your site could be considered to be cloaking, which can lead to your site disappearing from our search results. Cloaking refers to an attempt to boost search result rankings by serving different content to Googlebot than to regular users. This causes problems such as less relevant results (pages appear in search results even though their content is actually unrelated to what users see/want), so we take cloaking very seriously.

So what does "the page that the user sees" mean if you provide both versions with a URL? As I mentioned in the previous post, Google uses "Googlebot" for web search and "Googlebot-Mobile" for mobile search. To remain within our guidelines, you should serve the same content to Googlebot as a typical desktop user would see, and the same content to Googlebot-Mobile as you would to the browser on a typical mobile device. It's fine if the contents for Googlebot are different from the one for Googlebot-Mobile.

One example of how you could be unintentionally detected for cloaking is if your site returns a message like "Please access from mobile phones" to desktop browsers, but then returns a full mobile version to both crawlers (so Googlebot receives the mobile version). In this case, the page which web search users see (e.g. "Please access from mobile phones") is different from the page which Googlebot crawls (e.g. "Welcome to my site"). Again, we detect cloaking because we want to serve users the same relevant content that Googlebot or Googlebot-Mobile crawled.

Diagram of serving content from your mobile-enabled site


We're working on a daily basis to improve search results and solve problems, but because the relationship between PC and mobile versions of a web site can be nuanced, we appreciate the cooperation of webmasters. Your help will result in more mobile content being indexed by Google, improving the search results provided to users. Thank you for your cooperation in improving the mobile search user experience.

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Deftly dealing with duplicate content 2013

salam every one, this is a topic from google web master centrale blog: At the recent Search Engine Strategies conference in freezing Chicago, many of us Googlers were asked questions about duplicate content. We recognize that there are many nuances and a bit of confusion on the topic, so we'd like to help set the record straight.

What is duplicate content?
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Most of the time when we see this, it's unintentional or at least not malicious in origin: forums that generate both regular and stripped-down mobile-targeted pages, store items shown (and -- worse yet -- linked) via multiple distinct URLs, and so on. In some cases, content is duplicated across domains in an attempt to manipulate search engine rankings or garner more traffic via popular or long-tail queries.

What isn't duplicate content?
Though we do offer a handy translation utility, our algorithms won't view the same article written in English and Spanish as duplicate content. Similarly, you shouldn't worry about occasional snippets (quotes and otherwise) being flagged as duplicate content.

Why does Google care about duplicate content?
Our users typically want to see a diverse cross-section of unique content when they do searches. In contrast, they're understandably annoyed when they see substantially the same content within a set of search results. Also, webmasters become sad when we show a complex URL (example.com/contentredir?value=shorty-george〈=en) instead of the pretty URL they prefer (example.com/en/shorty-george.htm).

What does Google do about it?
During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in "regular" and "printer" versions and neither set is blocked in robots.txt or via a noindex meta tag, we'll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we'll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments ... so in the vast majority of cases, the worst thing that'll befall webmasters is to see the "less desired" version of a page shown in our index.

How can Webmasters proactively address duplicate content issues?
  • Block appropriately: Rather than letting our algorithms determine the "best" version of a document, you may wish to help guide us to your preferred version. For instance, if you don't want us to index the printer versions of your site's articles, disallow those directories or make use of regular expressions in your robots.txt file.
  • Use 301s: If you have restructured your site, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly redirect users, the Googlebot, and other spiders.
  • Be consistent: Endeavor to keep your internal linking consistent; don't link to /page/ and /page and /page/index.htm.
  • Use TLDs: To help us serve the most appropriate version of a document, use top level domains whenever possible to handle country-specific content. We're more likely to know that .de indicates Germany-focused content, for instance, than /de or de.example.com.
  • Syndicate carefully: If you syndicate your content on other sites, make sure they include a link back to the original article on each syndicated article. Even with that, note that we'll always show the (unblocked) version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer.
  • Use the preferred domain feature of webmaster tools: If other sites link to yours using both the www and non-www version of your URLs, you can let us know which way you prefer your site to be indexed.
  • Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.
  • Avoid publishing stubs: Users don't like seeing "empty" pages, so avoid placeholders where possible. This means not publishing (or at least blocking) pages with zero reviews, no real estate listings, etc., so users (and bots) aren't subjected to a zillion instances of "Below you'll find a superb list of all the great rental opportunities in [insert cityname]..." with no actual listings.
  • Understand your CMS: Make sure you're familiar with how content is displayed on your Web site, particularly if it includes a blog, a forum, or related system that often shows the same content in multiple formats.
  • Don't worry be happy: Don't fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it's highly unlikely that such sites can negatively impact your site's presence in Google. If you do spot a case that's particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.

In short, a general awareness of duplicate content issues and a few minutes of thoughtful preventative maintenance should help you to help us provide users with unique and relevant content.this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Requesting removal of content from our index 2013

salam every one, this is a topic from google web master centrale blog:

Note: The user-interface of the described features has changed.

As a site owner, you control what content of your site is indexed in search engines. The easiest way to let search engines know what content you don't want indexed is to use a robots.txt file or robots meta tag. But sometimes, you want to remove content that's already been indexed. What's the best way to do that?

As always, the answer begins: it depends on the type of content that you want to remove. Our webmaster help center provides detailed information about each situation. Once we recrawl that page, we'll remove the content from our index automatically. But if you'd like to expedite the removal rather than wait for the next crawl, the way to do that has just gotten easier.

For sites that you've verified ownership for in your webmaster tools account, you'll now see a new option under the Diagnostic tab called URL Removals. To get started, simply click the URL Removals link, then New Removal Request. Choose the option that matches the type of removal you'd like.



Individual URLs
Choose this option if you'd like to remove a URL or image. In order for the URL to be eligible for removal, one of the following must be true:
Once the URL is ready for removal, enter the URL and indicate whether it appears in our web search results or image search results. Then click Add. You can add up to 100 URLs in a single request. Once you've added all the URLs you would like removed, click Submit Removal Request.

A directory
Choose this option if you'd like to remove all files and folders within a directory on your site. For instance, if you request removal of the following:

http://www.example.com/myfolder

this will remove all URLs that begin with that path, such as:

http://www.example.com/myfolder
http://www.example.com/myfolder/page1.html
http://www.example.com/myfolder/images/image.jpg

In order for a directory to be eligible for removal, you must block it using a robots.txt file. For instance, for the example above, http://www.example.com/robots.txt could include the following:

User-agent: Googlebot
Disallow: /myfolder

Your entire site
Choose this option only if you want to remove your entire site from the Google index. This option will remove all subdirectories and files. Do not use this option to remove the non-preferred version of your site's URLs from being indexed. For instance, if you want all of your URLs indexed using the www version, don't use this tool to request removal of the non-www version. Instead, specify the version you want indexed using the Preferred domain tool (and do a 301 redirect to the preferred version, if possible). To use this option, you must block the site using a robots.txt file.

Cached copies

Choose this option to remove cached copies of pages in our index. You have two options for making pages eligible for cache removal.

Using a meta noarchive tag and requesting expedited removal
If you don't want the page cached at all, you can add a meta noarchive tag to the page and then request expedited cache removal using this tool. By requesting removal using this tool, we'll remove the cached copy right away, and by adding the meta noarchive tag, we will never include the cached version. (If you change your mind later, you can remove the meta noarchive tag.)

Changing the page content
If you want to remove the cached version of a page because it contained content that you've removed and don't want indexed, you can request the cache removal here. We'll check to see that the content on the live page is different from the cached version and if so, we'll remove the cached version. We'll automatically make the latest cached version of the page available again after six months (and at that point, we likely will have recrawled the page and the cached version will reflect the latest content) or, if you see that we've recrawled the page sooner than that, you can request that we reinclude the cached version sooner using this tool.

Checking the status of removal requests
Removal requests show as pending until they have been processed, at which point, the status changes to either Denied or Removed. Generally, a request is denied if it doesn't meet the eligibility criteria for removal.


To reinclude content
If a request is successful, it appears in the Removed Content tab and you can reinclude it any time simply by removing the robots.txt or robots meta tag block and clicking Reinclude. Otherwise, we'll exclude the content for six months. After that six month period, if the content is still blocked or returns a 404 or 410 status message and we've recrawled the page, it won't be reincluded in our index. However, if the page is available to our crawlers after this six month period, we'll once again include it in our index.

Requesting removal of content you don't own
But what if you want to request removal of content that's located on a site that you don't own? It's just gotten easier to do that as well. Our new Webpage removal request tool steps through the process for each type of removal request.

Since Google indexes the web and doesn't control the content on web pages, we generally can't remove results from our index unless the webmaster has blocked or modified the content or removed the page. If you would like content removed, you can work with the site owner to do so, and then use this tool to expedite the removal from our search results.

If you have found search results that contain specific types of personal information, you can request removal even if you've been unable to work with the site owner. For this type of removal, provide your email address so we can work with you directly.



If you have found search results that shouldn't be returned with SafeSearch enabled, you can let us know using this tool as well.

You can check on the status of pending requests, and as with the version available in webmaster tools, the status will change to Removed or Denied once it's been processed. Generally, the request is denied if it doesn't meet the eligibility criteria. For requests that involve personal information, you won't see the status available here, but will instead receive an email with more information about next steps.

What about the existing URL removal tool?
If you've made previous requests with this tool, you can still log in to check on the status of those requests. However, make any new requests with this new and improved version of the tool.
this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com
Powered by Blogger.