MatrixAdapt | Logiciel de gestion d'Entreprise, Création et référencement des sites web

from web contents: URL removal explained, Part I: URLs & directories 2013

salam every one, this is a topic from google web master centrale blog: Webmaster level: All

There's a lot of content on the Internet these days. At some point, something may turn up online that you would rather not have out there—anything from an inflammatory blog post you regret publishing, to confidential data that accidentally got exposed. In most cases, deleting or restricting access to this content will cause it to naturally drop out of search results after a while. However, if you urgently need to remove unwanted content that has gotten indexed by Google and you can't wait for it to naturally disappear, you can use our URL removal tool to expedite the removal of content from our search results as long as it meets certain criteria (which we'll discuss below).

We've got a series of blog posts lined up for you explaining how to successfully remove various types of content, and common mistakes to avoid. In this first post, I'm going to cover a few basic scenarios: removing a single URL, removing an entire directory or site, and reincluding removed content. I also strongly recommend our previous post on managing what information is available about you online.

Removing a single URL

In general, in order for your removal requests to be successful, the owner of the URL(s) in question—whether that's you, or someone else—must have indicated that it's okay to remove that content. For an individual URL, this can be indicated in any of three ways:

block the page from crawling via a robots.txt file
block the page from indexing via a noindex meta tag
indicate that the page no longer exists by returning a 404 or 410 status code

Before submitting a removal request, you can check whether the URL is correctly blocked:

robots.txt: You can check whether the URL is correctly disallowed using either the Fetch as Googlebot or Test robots.txt features in Webmaster Tools.
noindex meta tag: You can use Fetch as Googlebot to make sure the meta tag appears somewhere between the <head> and </head> tags. If you want to check a page you can't verify in Webmaster Tools, you can open the URL in a browser, go to View > Page source, and make sure you see the meta tag between the <head> and </head> tags.
404 / 410 status code: You can use Fetch as Googlebot, or tools like Live HTTP Headers or web-sniffer.net to verify whether the URL is actually returning the correct code. Sometimes "deleted" pages may say "404" or "Not found" on the page, but actually return a 200 status code in the page header; so it's good to use a proper header-checking tool to double-check.

If unwanted content has been removed from a page but the page hasn't been blocked in any of the above ways, you will not be able to completely remove that URL from our search results. This is most common when you don't own the site that's hosting that content. We cover what to do in this situation in a subsequent post. in Part II of our removals series.

If a URL meets one of the above criteria, you can remove it by going to http://www.google.com/webmasters/tools/removals, entering the URL that you want to remove, and selecting the "Webmaster has already blocked the page" option. Note that you should enter the URL where the content was hosted, not the URL of the Google search where it's appearing. For example, enter
   http://www.example.com/embarrassing-stuff.html
not
   http://www.google.com/search?q=embarrassing+stuff

This article has more details about making sure you're entering the proper URL. Remember that if you don't tell us the exact URL that's troubling you, we won't be able to remove the content you had in mind.

Removing an entire directory or site

In order for a directory or site-wide removal to be successful, the directory or site must be disallowed in the site's robots.txt file. For example, in order to remove the http://www.example.com/secret/ directory, your robots.txt file would need to include:
   User-agent: *
   Disallow: /secret/
It isn't enough for the root of the directory to return a 404 status code, because it's possible for a directory to return a 404 but still serve out files underneath it. Using robots.txt to block a directory (or an entire site) ensures that all the URLs under that directory (or site) are blocked as well. You can test whether a directory has been blocked correctly using either the Fetch as Googlebot or Test robots.txt features in Webmaster Tools.

Only verified owners of a site can request removal of an entire site or directory in Webmaster Tools. To request removal of a directory or site, click on the site in question, then go to Site configuration > Crawler access > Remove URL. If you enter the root of your site as the URL you want to remove, you'll be asked to confirm that you want to remove the entire site. If you enter a subdirectory, select the "Remove directory" option from the drop-down menu.

Reincluding content

You can cancel removal requests for any site you own at any time, including those submitted by other people. In order to do so, you must be a verified owner of this site in Webmaster Tools. Once you've verified ownership, you can go to Site configuration > Crawler access > Remove URL > Removed URLs (or > Made by others) and click "Cancel" next to any requests you wish to cancel.

Still have questions? Stay tuned for the rest of our series on removing content from Google's search results. If you can't wait, much has already been written about URL removals, and troubleshooting individual cases, in our Help Forum. If you still have questions after reading others' experiences, feel free to ask. Note that, in most cases, it's hard to give relevant advice about a particular removal without knowing the site or URL in question. We recommend sharing your URL by using a URL shortening service so that the URL you're concerned about doesn't get indexed as part of your post; some shortening services will even let you disable the shortcut later on, once your question has been resolved.

Edit: Read the rest of this series:
Part II: Removing & updating cached content
Part III: Removing content you don't own
Part IV: Tracking requests, what not to remove

Companion post: Managing what information is available about you online

Posted by Susan Moskwa, Webmaster Trends Analystthis is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Will the Real <Your Site Here> Please Stand Up? 2013

salam every one, this is a topic from google web master centrale blog: Webmaster Level: Intermediate

In our recent post on the Google Online Security Blog, we described our system for identifying phishing pages. Of the millions of webpages that our scanners analyze for phishing, we successfully identify 9 out of 10 phishing pages. Our classification system only incorrectly flags a non-phishing site as a phishing site about 1 in 10,000 times, which is significantly better than similar systems. In our experience, these “false positive” sites are usually built to distribute spam or may be involved with other suspicious activity. If you find that your site has been added to our phishing page list (”Reported Web Forgery!”) by mistake, please report the error to us. On the other hand, if your site has been added to our malware list (”This site may harm your computer”), you should follow the instructions here. Our team tries to address all complaints within one day, and we usually respond within a few hours.

Unfortunately, sometimes when we try to follow up on your reports, we find that we are just as confused as our automated system. If you run a website, here are some simple guidelines that will allow us to quickly fix any mistakes and help keep your site off our phishing page list in the first place.

- Don’t ask for usernames and passwords that do not belong to your site. We consider this behavior phishing by definition, so don’t do it! If you want to provide an add-on service to another site, consider using a public API or OAuth instead.

- Avoid displaying logos that are not yours near login fields. Someone surfing the web might mistakenly believe that the logo represents your website, and they might be misled into entering personal information into your site that they intended for the other site. Furthermore, we can’t always be sure that you aren’t doing this intentionally, so we might block your site just to be safe. To prevent misunderstandings, we recommend exercising caution when displaying these logos.

- Minimize the number of domains used by your site, especially for logins. Asking for a username and password for Site X looks very suspicious on Site Y. Besides making it harder for us to evaluate your website, you may be inadvertently teaching your visitors to ignore suspicious URLs, making them more vulnerable to actual phishing attempts. If you must have your login page on a different domain from your main site, consider using a transparent proxy to enable users to access this page from your primary domain. If all else fails...

- Make it easy to find links to your pages. It is difficult for us (and for your users) to determine who controls an off-domain page in your site if the links to that page from your main site are hard to find. All it takes to clear this problem up is to have each off-domain page link back to an on-domain page which links to it. If you have not done this, and one of your pages ends up on our list by mistake, please mention in your error report how we can find the link from your main site to the wrongly blocked page. However, if you do nothing else...

- Don’t send strange links via email or IM. It’s all but impossible for us to verify unusual links that only appeared in your emails or instant messages. Worse, using these kinds of links conditions your users/customers/friends to click on strange links they receive through email or IM, which can put them at risk for other Internet crimes besides phishing.

While we hope you consider these recommendations to be common sense, we’ve seen major e-commerce and financial companies break these guidelines from time to time. Following them will not only improve your experience with our anti-phishing systems, but will also help provide your visitors with a better online experience.

Written by Colin Whittaker, Anti-Phishing Teamthis is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Getting started with structured data 2013

salam every one, this is a topic from google web master centrale blog:

Webmaster level: All

If Google understands your website’s content in a structured way, we can present that content more accurately and more attractively to Google users. For example, our algorithms can enhance your search results with “rich snippets” when we understand that your page is a structured product listing, event, recipe, review, or similar. We can also feature your data in Knowledge Graph panels or in Google Now cards, helping to spread the word about your content.

Today we’re excited to announce two features that make it simpler than ever before to participate in structured data features. The first is an expansion of Data Highlighter to seven new types of structured data. The second is a brand new tool, the Structured Data Markup Helper.

Support for Products, Businesses, Reviews and more in Data Highlighter

Data Highlighter launched in December 2012 as a point-and-click tool for teaching Google the pattern of structured data about events on your website — without even having to edit your site’s HTML. Now, you can also use Data Highlighter to teach us about many other kinds of structured data on your site: products, local businesses, articles, software applications, movies, restaurants, and TV episodes.

To get started, visit Webmaster Tools, select your site, click the "Optimization" link in the left sidebar, and click "Data Highlighter". You’ll be prompted to enter the URL of a typically structured page on your site (for example, a product or event’s detail page) and “tag” its key fields with your mouse.

The tagging process takes about 5 minutes for a single page, or about 15 minutes for a pattern of consistently formatted pages. At the end of the process, you’ll have the chance to verify Google’s understanding of your structured data and, if it’s correct, “publish” it to Google. Then, as your site is recrawled over time, your site will become eligible for enhanced displays of information like prices, reviews, and ratings right in the Google search results.

New Structured Data Markup Helper tool

While Data Highlighter is a great way to quickly teach Google about your site’s structured data without having to edit your HTML, it’s ultimately preferable to embed structured data markup directly into your web pages, so your structured content is available to everyone. To assist web authors with that task, we’re happy to announce a new tool: the Structured Data Markup Helper.

Like in Data Highlighter, you start by submitting a web page (URL or HTML source) and using your mouse to “tag” the key properties of the relevant data type. When you’re done, the Structured Data Markup Helper generates sample HTML code with microdata markup included. This code can be downloaded and used as a guide as you implement structured data on your website.

The Structured Data Markup Helper supports a subset of data types, including all the types supported by Data Highlighter as well as several types used for embedding structured data in Gmail. Consult schema.org for complete schema documentation.

We hope these two tools make it easier for all websites to participate in Google’s growing suite of structured data features! As always, please post in our forums if you have any questions or feedback.

Posted by Justin Boyan, Product Manager

this is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com

from web contents: Using RSS/Atom feeds to discover new URLs 2013

salam every one, this is a topic from google web master centrale blog: Webmaster Level: Intermediate

Google uses numerous sources to find new webpages, from links we find on the web to submitted URLs. We aim to discover new pages quickly so that users can find new content in Google search results soon after they go live. We recently launched a feature that uses RSS and Atom feeds for the discovery of new webpages.

RSS/Atom feeds have been very popular in recent years as a mechanism for content publication. They allow readers to check for new content from publishers. Using feeds for discovery allows us to get these new pages into our index more quickly than traditional crawling methods. We may use many potential sources to access updates from feeds including Reader, notification services, or direct crawls of feeds. Going forward, we might also explore mechanisms such as PubSubHubbub to identify updated items.

In order for us to use your RSS/Atom feeds for discovery, it's important that crawling these files is not disallowed by your robots.txt. To find out if Googlebot can crawl your feeds and find your pages as fast as possible, test your feed URLs with the robots.txt tester in Google Webmaster Tools.

Written by Raymond Lo, Guhan Viswanathan, and Dave Weissman, Crawl and Indexing Teamthis is a topic published in 2013... to get contents for your blog or your forum, just contact me at: devnasser@gmail.com