Les nouveautés et Tutoriels de Votre Codeur | SEO | Création de site web | Création de logiciel

seo Google BigQuery Service: Big data analytics at Google speed 2013

Seo Master present to you:
By Ju-kay Kwek, Product Manager

(Cross-posted on the Google App Engine Blog and the Google Enterprise Blog.)

Rapidly crunching terabytes of big data can lead to better business decisions, but this has traditionally required tremendous IT investments. Imagine a large online retailer that wants to provide better product recommendations by analyzing website usage and purchase patterns from millions of website visits. Or consider a car manufacturer that wants to maximize its advertising impact by learning how its last global campaign performed across billions of multimedia impressions. Fortune 500 companies struggle to unlock the potential of data, so it’s no surprise that it’s been even harder for smaller businesses.

We developed Google BigQuery Service for large-scale internal data analytics. At Google I/O last year, we opened a preview of the service to a limited number of enterprises and developers. Today we're releasing some big improvements, and putting one of Google's most powerful data analysis systems into the hands of more companies of all sizes.
  • We’ve added a graphical user interface for analysts and developers to rapidly explore massive data through a web application.
  • We’ve made big improvements for customers accessing the service programmatically through the API. The new REST API lets you run multiple jobs in the background and manage tables and permissions with more granularity.
  • Whether you use the BigQuery web application or API, you can now write even more powerful queries with JOIN statements. This lets you run queries across multiple data tables, linked by data that tables have in common.
  • It’s also now easy to manage, secure, and share access to your data tables in BigQuery, and export query results to the desktop or to Google Cloud Storage.

Michael J. Franklin, Professor of Computer Science at UC Berkeley, remarked that BigQuery (internally known as Dremel) leverages “thousands of machines to process data at a scale that is simply jaw-dropping given the current state of the art.” We’re looking forward to helping businesses innovate faster by harnessing their own large data sets. BigQuery is available free of charge for now, and we’ll let customers know at least 30 days before the free period ends. We’re bringing on a new batch of pilot customers, so let us know if your business wants to test drive BigQuery Service.


Ju-kay Kwek is a Product Manager for Google BigQuery Service.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Streak brings CRM to the inbox with Google Cloud Platform 2013

Seo Master present to you: Author PhotoBy Aleem Mawani, Co-Founder of Streak

Cross-posted with the Google App Engine Blog

This guest post was written by Aleem Mawani, Co-Founder of Streak, a startup alum of Y Combinator, a Silicon Valley incubator. Streak is a CRM tool built into Gmail. In this post, Aleem shares his experience building and scaling their product using Google Cloud Platform.

Everyone relies on email to get work done – yet most people use separate applications from their email to help them with various business processes. Streak fixes this problem by letting you do sales, hiring, fundraising, bug tracking, product development, deal flow, project management and almost any other business process right inside Gmail. In this post, I want to illustrate how we have used Google Cloud Platform to build Streak quickly, scalably and with the ability to deeply analyze our data.



We use several Google technologies on the backend of Streak:

  • BigQuery to analyze our logs and power dashboards.

Our core learning is that you should use the best tool for the job. No one technology will be able to solve all your data storage and access needs. Instead, for each type of functionality, you should use a different service. In our case, we aggressively mirror our data in all the services mentioned above. For example, although the source of truth for our user data is in the App Engine Datastore, we mirror that data in the App Engine Search API so that we can provide full text search, Gmail style, to our users. We also mirror that same data in BigQuery so that we can power internal dashboards.

System Architecture




App Engine - We use App Engine for Java primarily to serve our application to the browser and mobile clients in addition to serving our API. App Engine is the source of truth for all our data, so we aggressively cache using Memcache. We also use Objectify to simplify access to the Datastore, which I highly recommend.

Google Cloud Storage - We mirror all of our Datastore data as well as all our log data in Cloud Storage, which acts as a conduit to other Google cloud services. It lets us archive the data as well as push it to BigQuery and the Prediction API.

BigQuery - Pushing the data into BigQuery allows us to run non-realtime queries that can help generate useful business metrics and slice user data to better understand how our product is getting used. Not only can we run complex queries over our Datastore data but also over all of our log data. This is incredibly powerful for analyzing the request patterns to App Engine. We can answer questions like:

  • Which requests cost us the most money?
  • What is the average response time for every URL on our site over the last 3 days?

BigQuery helps us monitor error rates in our application. We process all of our log data with debug statements, as well as something called an “error type” for any request that fails. If it’s a known error, we'll log something sensible, and we log the exception type if we haven’t seen it before. This is beneficial because we built a dashboard that queries BigQuery for the most recent errors in the last hour grouped by error type. Whenever we do a release, we can monitor error rates in the application really easily.



A Streak dashboard powered by BigQuery showing current usage statistics
In order to move the data into Cloud Storage from the Datastore and LogService, we developed an open source library called Mache. It’s a drop-in library that can be configured to automatically push data into BigQuery via Cloud Storage. The data can come from the Datastore or from LogService and is very configurable - feel free to contribute and give us feedback on it!

Google Cloud Platform also makes our application better for our users. We take advantage of the App Engine Search API and again mirror our data there. Users can then query their Streak data using the familiar Gmail full text search syntax, for example, “before:yesterday name:Foo”. Since we also push our data to the Prediction API, we can help users throughout our app by making smart suggestions. In Streak, we train models based on which emails users have categorized into different projects. Then, when users get a new email, we can suggest the most likely box that the email belongs to.

One issue that arises is how to keep all these mirrored data sets in sync. It works differently for each service based on the architecture of the service. Here’s a simple breakdown:




Having these technologies easily available to us has been a huge help for Streak. It makes our products better and helps us understand our users. Streak’s user base grew 30% every week for 4 consecutive months after launch, and we couldn’t have scaled this easily without Google Cloud Platform. To read more details on why Cloud Platform makes sense for our business, check out our case study and our post on the Google Enterprise blog.


Aleem Mawani is the co-founder of Streak.com, a CRM tool built into Gmail. Previously, Aleem worked on Google Drive and various ads products at Google. He has a degree from the University of Waterloo in Software engineering and an MBA from Harvard University.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Got big JSON? BigQuery expands data import for large scale web apps 2013

Seo Master present to you: Author PhotoBy Ryan Boyd, Developer Advocate

JSON is the data format of the web. JSON is used to power most modern websites, is a native format for many NoSQL databases hosting top web applications, and provides the primary data format in many REST APIs. Google BigQuery, our cloud service for ad-hoc analytics on big data, has now added support for JSON and the nested/repeated structure inherent in the data format.

image
JSON opens the door to a more object-oriented view of your data compared to CSV, the original data format supported by BigQuery. It removes the need for duplication of data required when you flatten records into CSV. Here are some examples of data you might find a JSON format useful for:
  • Log files, with multiple headers and other name-value pairs.
  • User session activities, with information about each activity occurring nested beneath the session record.
  • Sensor data, with variable attributes collected in each measurement.
Nested/repeated data support is one of our most requested features. And while BigQuery's underlying infrastructure supports it, we'd only enabled it in a limited fashion through M-Lab's test data. Today, however, developers can use JSON to get any nested/repeated data into and out of BigQuery.

For more information on importing JSON and nested/repeated data into BigQuery, check out the new guide in our documentation. You should also see the Dealing with Data section for details on the new querying syntax available for this type of data.

Improvements to Data Loading Pipeline

We’ve made it much easier to ingest data into BigQuery – up to 1TB of data per load job, with each file up to 100GB uncompressed JSON or CSV. We’ve also eliminated the 2 imports per minute rate limit, enabling you to submit all your ingestion jobs and let us handle the queuing as necessary. In a recent project I’ve been working on, import jobs for 3TB of data that previously took me 12 hours to run now take me only 36 minutes – a 20x improvement!

We’ve published a new Ingestion Cookbook that explains how to take advantage of these new limits.

We’re initiating a small trusted tester program aimed at making it easier to move your data from the App Engine Datastore to BigQuery for analysis. If you store a lot of data in Datastore and are also using BigQuery, we’d like to hear from you. Please sign up now to be considered for the trusted tester program.

Learn more this week

Michael Manoochehri, Siddartha Naidu and I are in London this week talking about BigQuery and these new features at the Strata big data conference. Ju-kay Kwek will also be talking about BigQuery at the Interop NYC conference tomorrow. Please stop by, say hi, and let us know what you’re doing with big data.

We’ll also be producing a Google Developers Live session from Campus London on Friday at 16:00 BST (15:00 GMT).


Ryan Boyd is a Developer Advocate, focused on big data. He's been at Google for 6 years and previously helped build out the Google Apps ISV ecosystem. He published his first book, "Getting Started with OAuth 2.0", with O'Reilly.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo How redBus uses BigQuery to master big data 2013

Seo Master present to you: Author Photo
By Pradeep Kumar, redBus

This guest post was written by Pradeep Kumar. Pradeep is a technical architect at redBus, an online travel agency in India that provides a unified online bus ticketing service. We recently published a business case study for redBus and wanted to dive into some more technical detail for the readers of the Google Developers Blog.


Our company has been providing Internet bus ticketing for India since 2006. There are more than 10,000 bus routes available for booking, and we have dozens of machines processing booking requests. Each step in the booking process produces a lot of data – on search terms, route availability, server health and more. We needed tools to to be able to process this data quickly and easily to determine whether decreases in customer bookings are the result of server problems or simply less demand.

While we typically use relational databases to store and analyze data, we knew we needed something more powerful if we wanted to analyze 500GB or more, so we started to look at open source frameworks like Hadoop and analysis platforms like Hive and Pig. We found that these frameworks require considerable in-house expertise and infrastructure investments and wouldn’t give us answers to our questions as fast as we wanted. We decided to try out Google BigQuery as a trusted tester, with hopes that it would give us the ability to perform quick iterative analysis without much up-front investment. Our initial tests went very well, so we started building our analysis tools on top of BigQuery.

BigQuery allows us to run SQL-like queries to understand the bus routes in highest demand and what types of searches users are performing. We’ve also used it to build internal dashboards that give us a snapshot of system health.


For more information on how we structured our immutable tables, pipelined our data into BigQuery for analysis using RabitMQ, and to see example SQL queries we’ve used, check out my article on developers.google.com.


Pradeep Kumar is a technical architect at redBus.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Now in BigQuery: batch queries and a connector for Excel 2013

Seo Master present to you: Author Photo
By Ryan Boyd, Developer Advocate for Cloud Data Services

Businesses and developers are using BigQuery to solve a wide variety of use cases – from optimizing advertising campaigns, to spotting inventory shortfalls, to understanding customer behavior. Accommodating these varied use cases requires BigQuery to be flexible, both for the developers integrating applications with the API and for the analysts running ad-hoc queries. Today we’ve made it more flexible by adding batch queries and a connector for Microsoft Excel.

Batch priority queries

BigQuery was designed for ad-hoc, iterative analytics on millions-to-billions of rows of data. When you’re diving into your data to gain insights, you want your queries to run in seconds rather than waiting minutes or hours. Sometimes our customers don’t need these fast responses when they’re running nightly jobs to update dashboards or reports, but want to use the same BigQuery technology and underlying datasets for these queries. We’ve now added batch pricing to accommodate these developers, allowing them to run their queries at a significantly lower cost.

Here’s how to set the priority to ‘batch’ when submitting a new query via the Google APIs Client Library for Java:
    Job job = new Job();
   JobConfiguration config = new JobConfiguration();
   JobConfigurationQuery queryConfig = new JobConfigurationQuery();
   config.setQuery(queryConfig);


   job.setConfiguration(config);
   queryConfig.setQuery(querySql);
   
queryConfig.setPriority("BATCH");

   com.google.api.services.bigquery.Bigquery.Jobs.Insert insert =
     bigquery.jobs().insert(projectId, job);

Batch queries will execute between 30 minutes and 3 hours after they are submitted. See more information in our Developers Guide.

BigQuery Connector for Excel

Spreadsheets are a popular tool for analysts, executives and and developers to explore data. Last year we launched the ability for users of Google Spreadsheets to execute BigQuery queries using the Google Apps Script integration. Today, we’re launching the BigQuery Connector for Excel, which allows Microsoft Excel users to do the same with the ‘External Data’ functionality built into the product. Once the BigQuery results are in Excel, you can easily make pivot tables, create charts and integrate it with data from other sources. If you’re interested, you can try it right now!

Let us know what you think of these new features and what else you’d like to see in the roadmap by reaching out on Google+. We’ll also be holding office hours this Friday at 10 AM PDT on Google Developers Live to talk about these new features and answer any questions you have about BigQuery.

Microsoft and Excel are registered trademarks of Microsoft Corporation


Ryan Boyd is a Developer Advocate, focused on cloud data services. He's been at Google for 6 years and previously helped build out the Google Apps ISV ecosystem. He published his first book "Getting Started with OAuth 2.0" with O'Reilly.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Helping developers build more applications on Google BigQuery 2013

Seo Master present to you:
Amanda
Ju-kay
By Ju-kay Kwek and Amanda Bradford, Google BigQuery Team

In May we launched Google BigQuery, a fully managed cloud-based service that enables businesses to analyze enormous amounts of data in the cloud. While we're continually amazed by the range of business problems being solved, we recognize that writing one-off scripts to ingest data, or creating custom front-end integration, requires effort and takes away time from the fun stuff: getting results.

So today we're pleased to highlight a few new members of our Cloud Platform Partner Program that are here to help you be even more productive – by delivering tools integrated with BigQuery that make it much easier to automatically load data from a broad set of sources, as well as to analyze and visualize the data with spectacular dashboards.

BigQuery partner logos

Import data from multiple sources into BigQuery

We’ve partnered with Informatica, Pervasive Software, Talend and SQLstream to make it easier to bring data from a variety of sources into BigQuery. This means you can use their BigQuery connectors to move data very easily from on-premise or cloud IT systems to BigQuery. For example, TribusPoint, a consulting firm, leveraged the Informatica Cloud BigQuery connector to rapidly move large files from their on-premise data centers to BigQuery.

Build rich interactive dashboards on BigQuery

We’ve partnered with data visualization providers QlikTech, Jaspersoft, Bime Analytics and Metric Insights to help you build rich, interactive dashboards for a broad range of customers. You can use their tools to build dashboards and reports very easily. For example, Pixelfish leveraged Metric Insights BigQuery integration to create dashboards that delivered a 300% improvement in customer engagement.

Click the partner links above to see more specific customer examples of each. We have just scratched the surface on Google BigQuery. We can’t wait to see what other cool applications you can build on Google BigQuery using our APIs. Hack away!


Ju-kay Kwek is the Product Management Lead for Google's Cloud Big Data initiative. In this role, he focuses on creating services that enable businesses and developers to harness Google's unparalleled data processing infrastructure and algorithms to tackle Big Data needs.

Amanda Bradford works in Business Development at Google, driving strategic alliances and partnerships for Google and specializing in the Google Cloud Platform.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Google Compute Engine: Computing without limits 2013

Seo Master present to you: Author Photo
By Craig McLuckie, Product Manager, Google Compute Engine

Over the years, Google has built some of the most high performing, scalable and efficient data centers in the world by constantly refining our hardware and software. Since 2008, we've been working to open up our infrastructure to outside developers and businesses so they can take advantage of our cloud as they build applications and websites and store and analyze data. So far this includes products like Google App Engine, Google Cloud Storage, and Google BigQuery.

Today, in response to many requests from developers and businesses, we're going a step further. We're introducing Google Compute Engine, an Infrastructure-as-a-Service product that lets you run Linux Virtual Machines (VMs) on the same infrastructure that powers Google. This goes beyond just giving you greater flexibility and control; access to computing resources at this scale can fundamentally change the way you think about tackling a problem.

Google Compute Engine offers:
  • Scale. At Google we tackle huge computing tasks all the time, like indexing the web, or handling billions of search queries a day. Using Google's data centers, Google Compute Engine reduces the time to scale up for tasks that require large amounts of computing power. You can launch enormous compute clusters - tens of thousands of cores or more.
  • Performance. Many of you have learned to live with erratic performance in the cloud. We have built our systems to offer strong and consistent performance even at massive scale. For example, we have sophisticated network connections that ensure consistency. Even in a shared cloud you don’t see interruptions; you can tune your app and rely on it not degrading.
  • Value. Computing in the cloud is getting even more appealing from a cost perspective. The economy of scale and efficiency of our data centers allows Google Compute Engine to give you 50% more compute for your money than with other leading cloud providers. You can see pricing details here.
The capabilities of Google Compute Engine include:
  • Compute. Launch Linux VMs on-demand. 1, 2, 4 and 8 virtual core VMs are available with 3.75GB RAM per virtual core.
  • Storage. Store data on local disk, on our new persistent block device, or on our Internet-scale object store, Google Cloud Storage.
  • Network. Connect your VMs together using our high-performance network technology to form powerful compute clusters and manage connectivity to the Internet with configurable firewalls.
  • Tooling. Configure and control your VMs via a scriptable command line tool or web UI. Or you can create your own dynamic management system using our API.
At launch, we have worked with a number of partners - such as RightScale, Puppet Labs, OpsCode, Numerate, Cliqr and MapR - to integrate their products with Google Compute Engine. These partners offer management services that make it easy for you to move your applications to the cloud and between different cloud environments.

You can learn more about Google Compute Engine here. We’re going to pace ourselves and start with Google Compute Engine in limited preview (sign up here), but our goal is to give you all the pieces you need to build anything you want in the cloud. Whether you need a platform like Google App Engine, or virtual machines like Google Compute Engine, these days, you define your limits. We’re just at the start of what the cloud can do.


Craig McLuckie is the Product Management Lead for Google Compute Engine. He spends his days working with an amazing engineering team to open Google’s infrastructure to the world.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Use Google BigQuery for your visual interactive dashboards 2013

Seo Master present to you:
Author Photo
Ju-kay
Momchil

By Ju-kay Kwek and Momchil Filev,
BigQuery Team


(Cross-posted with the Google Enterprise Blog)

Last month we announced the public launch of Google BigQuery, which enables developers and businesses to gain real-time business insights from massive amounts of data without any hardware or software investments.

Since then, we’ve added new features to Google BigQuery every week. For example, our most recent release includes support for running up to 20 concurrent queries, depending on the volume of data. This enables developers to build visually interactive dashboards on Google BigQuery.

Today, we’re highlighting two data visualization providers, QlikView and Bime, who are using Google BigQuery’s latest features to build dashboards with snappier and richer experiences.

QlikView

QlikView, one of the leaders in the Business Intelligence market, has developed a dashboard that visualizes the birth-record data for all babies born to mothers of different ages and races. With the help of BigQuery, QlikView can crunch millions of rows of data in seconds to answer questions like, “What's the average age of a mother in New York vs. in Texas?"

Bime

Bime, a cloud-based Business Intelligence provider based in France, is another early adopter of Google BigQuery. They’ve built a slick UI on top of the Google BigQuery platform that allows users to slice and dice 432 million rows of business data. For example, you can adjust a few simple parameters to see the sales distribution across products or regions on a map.

This is just a snapshot of how developers can use Google BigQuery to build interactive visual dashboards using a browser and without the hassle of managing SQL. Sign up and share your BigQuery use cases via our developer feedback form or on the Google Enterprise Google+ page.


Ju-kay Kwek is the Product Management Lead for Google's Cloud Big Data initiative. In this role, he focuses on creating services that enable businesses and developers to harness Google's unparalleled data processing infrastructure and algorithms to tackle Big Data needs.

Momchil Filev works in Business Development at Google, driving strategic alliances and partnerships for Google and specializing in the Google Cloud Platform.


Posted by Ashleigh Rentz, Editor Emerita
2013, By: Seo Master

seo Google BigQuery new features: bigger, faster, smarter 2013

Seo Master present to you: Author PictureBy Felipe Hoffa, Cloud Platform team

Google BigQuery is designed to make it easy to analyze large amounts of data quickly. Today we announced several updates that give BigQuery the ability to handle arbitrarily large result sets, use window functions for advanced analytics, and cache query results. You are also getting new UI features, larger interactive quotas, and a new convenient tiered pricing scheme. In this post we'll dig further into the technical details of these new features.

Large results

BigQuery is able to process terabytes of data, but until today BigQuery could only output up to 128 MB of compressed data per query. Many of you asked for more and from now on BigQuery will be able to output results as large as the largest tables our customers have ever had.

To get this benefit, you should enable the new "--allow_large_results" flag when issuing a query job, and specify a destination table. All results will be saved to the new specified table (or appended, if the table exists). In the updated web UI these options can be found under the new "Enable Options" menu.

With this feature, you can run big transformations on your tables, plus get big subsets of data to further analyze from the new table.

Analytic functions

BigQuery's power is in the ability to interactively run aggregate queries over terabytes of data, but sometimes counts and averages are not enough. That's why BigQuery also lets you calculate quantiles, variance and standard deviation, as well as other advanced functions.

To make BigQuery even more powerful, today we are adding support for window functions (also known as "analytical functions") for ranking, percentiles, and relative row navigation. These new functions give you different ways to rank results, explore distributions and percentiles, and traverse results without the need for a self join.

To introduce these functions with an advanced example, let's use the dataset we collected from the Data Sensing Lab at Google I/O. With the percentile_cont() function it's easy to get the median temperature over each room:


SELECT percentile_cont(0.5) OVER (PARTITION BY room ORDER BY data) AS median, room
FROM [io_sensor_data.moscone_io13]
WHERE sensortype='temperature'

In this example, each original data row shows the median temperature for each room. To visualize it better, it's a good idea to group all results by room with an outer query:


SELECT MAX(median) AS median, room FROM (
SELECT percentile_cont(0.5) OVER (PARTITION BY room ORDER BY data) AS median, room
FROM [io_sensor_data.moscone_io13]
WHERE sensortype='temperature'
)
GROUP BY room

We can add an additional outer query, to rank the rooms according to which one had the coldest median temperature. We'll use one of the new ranking window functions, dense_rank():


SELECT DENSE_RANK() OVER (ORDER BY median) rank, median, room FROM (
SELECT MAX(median) AS median, room FROM (
SELECT percentile_cont(0.5) OVER (PARTITION BY room ORDER BY data) AS median, room
FROM [io_sensor_data.moscone_io13]
WHERE sensortype='temperature'
)
GROUP BY room
)

We've updated the documentation with descriptions and examples for each of the new window functions. Note that they require the OVER() clause, with an optional PARTITION BY and sometimes required ORDER BY arguments. ORDER BY tells the window function what criteria to use to rank items, while PARTITION BY allows you to define multiple groups to be analyzed independently of each other.

The window functions don't work with the big GROUP EACH BY and JOIN EACH BY operators, but they do work with the traditional GROUP BY and JOIN BY. As a reminder, we announced GROUP EACH BY and JOIN EACH BY last March, to allow large join and group operations.

Query caching

BigQuery now remembers values that you've previously computed, saving you time and the cost of recalculating the query. To maintain privacy, queries are cached on a per-user basis. Cached results are only returned for tables that haven't changed since the last query, or for queries that are not dependent on non-deterministic parameters (such as the current time). Reading cached results is free, but each query still counts against the max number of queries per day quota. Query results are kept cached for 24 hours, on a best effort basis. You can disable query caching with the new flag --use_cache in bq, or "useQueryCache" in the API. This feature is also accessible with the new query options on the BigQuery Web UI.

BigQuery Web UI: Query validator, cost estimator, and abandonment

The BigQuery UI gets even better: You'll get instant information while writing a query if its syntax is valid. If the syntax is not valid, you'll know where the error is. If the syntax is valid, the UI will inform you how much the query would cost to run. This feature is also available with the bq tool and API, using the --dry_run flag.

An additional improvement: When running queries on the UI, previously you had to wait until its completion before starting another one. Now you have the option to abandon it, to start working on the next iteration of the query without waiting for the abandoned one.

Pricing updates

Starting in July, BigQuery pricing becomes more affordable for everyone: Data storage costs are going from $0.12/GB/month to $0.08/GB/month. And if you are a high-volume user, you'll soon be able to opt-in for tiered query pricing, for even better value.

Bigger quota

To support larger workloads we're doubling interactive query quotas for all users, from 200GB + 1 concurrent query, to 400 GB of concurrent queries + 2 additional queries of unlimited size.

These updates make BigQuery a faster, smarter, and even more affordable solution for ad hoc analysis of extremely large datasets. We expect they'll help to scale your projects, and we hope you'll share your use cases with us on Google+.


The BigQuery UI features a collection of public datasets for you to use when trying out these new features. To get started, visit our sign-up page and Quick Start guide. You should take a look at our API docs, and ask questions about BigQuery development on Stack Overflow. Finally, don't forget to give us feedback and join the discussion on our Cloud Platform Developers Google+ page.



Felipe Hoffa has recently joined the Cloud Platform team. He'd love to see the world's data accessible for everyone in BigQuery.

Posted by Ashleigh Rentz, Editor Emerita
2013, By: Seo Master

seo BigQuery and Prediction API: Get more from your data with Google 2013

Seo Master present to you:
To deliver our services, Google has had to develop sophisticated internal tools to process data more efficiently. We know that some of these tools could be useful to any developer, so we’ve been working to create external versions that we can share with the world.

We’re excited to introduce two new developer tools to get more from your data: BigQuery and Prediction API. These two tools can be used with your data stored on Google Storage for Developers.

BigQuery enables fast, interactive analysis over datasets containing trillions of records. Using SQL commands via a RESTful API, you can quickly explore and understand your massive historical data. BigQuery can help you analyze your network logs, identify seasonal sales trends, or find a needle in a haystack of big data.

Prediction API exposes Google’s advanced machine learning algorithms as a RESTful web service to make your apps more intelligent. The service helps you use historical data to make real-time decisions such as recommending products, assessing user sentiment from blogs and tweets, routing messages or assessing suspicious activities.

We are introducing BigQuery and Prediction API as a preview to a limited number of developers. There is no charge for using these services during the preview. To learn more and sign up for an invitation, please visit the BigQuery and Prediction API sites.

If you are in San Francisco for Google I/O, we look forward to meeting you. Please come to our session tomorrow to learn more.

Posted by Amit Agarwal and Jordan Breckenridge, BigQuery and Prediction API Teams
2013, By: Seo Master

seo Using Google BigQuery to learn from GitHub data 2013

Seo Master present to you: Author Photo
By Ilya Grigorik, Web Performance Engineer

Open-source developers all over the world contribute to millions of projects every day: writing and reviewing code, filing and discussing bug reports, updating documentation and project wikis, and so forth. The data generated from this activity can reveal interesting trends across many industries, including popularity of programming languages over time, defect rates, contribution metrics, and popularity of specific frameworks and libraries.

The challenge in extracting these trends is gathering the data. Each project has its own distributed workflow, code repositories, and conventions. Having hosted dozens of my own projects on GitHub, I've long wanted to analyze the developer activity from the 2.6M+ public projects hosted on GitHub. Hence, earlier this year GitHub Archive was born!

GitHub Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis. Each day it archives over 120,000 public activities, ranging from new commits and fork events to opening and closing tickets, each with detailed metadata.

Once I collected the data, I needed a tool to analyze it, and that is when I found Google BigQuery. Based on the research behind Dremel, a popular internal tool at Google for analyzing web-scale datasets, BigQuery allowed me to easily import the entire dataset and use a familiar SQL like syntax to comb through the gigabytes of data in seconds. Plus the tool will scale to terabyte datasets, so there is plenty of room to grow!

The best news is that thanks to collaboration from the GitHub and BigQuery teams, the GitHub dataset is now public and available for you to slice and dice in any way you like. No need to worry about data gathering or database schemas: BigQuery will do all the heavy lifting, and you can just compose your queries to be executed in realtime.

Here's a real-world example. What are the most popular programming languages on GitHub over the past month?


chart showing number of commits by language

If you are curious for more, sign up for BigQuery and follow the instructions on githubarchive.org to access the GitHub dataset. You can use the free 100GB query quota to run your analysis and perhaps even win some of the prizes from the GitHub Data Challenge!


Ilya Grigorik is a Web Performance Engineer and Advocate at Google, an open-source evangelist, and an analytics geek. You can find him on GitHub under igrigorik, and blogging about web performance at igvita.com.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Google BigQuery brings Big Data analytics to all businesses 2013

Seo Master present to you: Author Photo
By Ju-kay Kwek, Product Manager, BigQuery

BigQuery enables businesses and developers to gain real-time business insights from massive amounts of data without any upfront hardware or software investments. Imagine a big pharmaceutical company optimizing daily marketing spend using worldwide sales and advertisement data. Or think of a small online retailer that makes product recommendations based on user clicks. Today, we are making BigQuery publicly available, an important milestone in our effort to bring Big Data analytics to all businesses via the cloud.

Since announcing BigQuery in limited preview last November, many businesses and developers have started using it for real-time Big Data analytics in the cloud. Claritics, a social and mobile analytics company, built a web application for game developers to gain real-time insights into user behavior. Crystalloids, an Amsterdam-based analytics firm, built a cloud-based application to help a resort network analyze customer reservations, optimize marketing and maximize revenue. This just scratches the surface of use cases for BigQuery.

BigQuery is accessible via a simple UI or REST interface. It lets you take advantage of Google’s massive compute power, store as much data as needed and pay only for what you use. Your data is protected with multiple layers of security, replicated across multiple data centers and can be easily exported.

Developers and businesses can sign up for BigQuery online and query up to 100 GB of data per month for free. See our introductory pricing plan for storing and querying datasets of up to 2 TB. If you need more than that, contact a sales representative.

We hope you will be able to gain real-time business insights using BigQuery. Share your BigQuery use cases and feedback in our user forums or on our +Google Enterprise page.


Ju-kay Kwek is the Product Management Lead for Google's Cloud Big Data initiative. In this role, he focuses on creating services that enable businesses and developers to harness Google's unparalleled data processing infrastructure and algorithms to tackle Big Data needs.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo BigQuery gets big new features to make data analysis even easier 2013

Seo Master present to you: Author Photo
By Michael Manoochehri, Developer Programs Engineer, Cloud Platform

Google BigQuery is designed to make it easy to analyze large amounts of data quickly. Overwhelmingly, developers have asked us for features to help simplify their work even further. Today we are launching a collection of updates that gives BigQuery a greater range of query and data types, more flexibility with table structure, and better tools for collaborative analysis.

Big JOIN and Big Group Aggregations

Extracting insights from multiple datasets can be challenging and time-consuming. This is especially true when datasets become too large to query with a traditional database system. With traditional databases, SQL functions like joining and grouping are typically used to bring together data for analysis. What happens when your data is too large to fit into a conventional database? Working with multi-terabyte datasets often requires developing complicated MapReduce workflows, investing in expensive infrastructure, and great deal of time. Very often, it's a combination of all three.

In response to developer feedback, we're launching new features that enable analysts and developers to run fast SQL-like join and aggregate queries on datasets without the need for batch-based processing. Our new Big JOIN feature gives users the ability to produce a result set by merging data from two large tables by a common key. Big JOIN simplifies data analysis that would otherwise require a data transformation step, by allowing users to specify JOIN operations using SQL.

Popular web applications produce user activity logs that can grow by billions of rows each week. Dividing users into smaller groups is a key step for analysis. However, each group of users can number in the millions. To handle this for such large volumes, we've enabled Big Group Aggregations, which significantly increases the number of distinct values that can be grouped in a result set.

To use these new features, simply add the EACH modifier to JOIN or GROUP BY clauses.


/* JOIN EACH example
* Selects the top 10 most edited Wikipedia pages
* of words that appear in works of Shakespeare.
*/

SELECT
 TOP(wiki.title, 10), COUNT(*)
FROM
 [publicdata:samples.wikipedia] AS wiki
JOIN EACH
 [publicdata:samples.shakespeare] AS shakespeare
ON
 shakespeare.word = wiki.title;


For more information, including best practices, when using JOIN EACH and GROUP EACH BY, visit the BigQuery query reference.

Native support for TIMESTAMP data type

We are also adding a new TIMESTAMP data type, in response to one of our most frequent requests from developers. This new data type lets you import date and time values in formats familiar to users of databases such as MySQL, while preserving timezone offset information.

Along with the new data type come new functions for converting TIMESTAMP fields into other formats, calculating intervals, and extracting components such as the hour, day of week, and quarter.


/* TIMESTAMP example
* Which hours in the day are the most popular for GitHub actions?
* This query converts github_timeline "created_at" date time   
* strings to BigQuery TIMESTAMP, and extracts the hour from each.
*/

SELECT
 HOUR(TIMESTAMP(created_at)) AS event_create_hour,
 COUNT(*) AS event_count
FROM
 [publicdata:samples.github_timeline]
GROUP BY
 event_create_hour
ORDER BY
 event_count DESC;


Read more about the available TIMESTAMP functions in our query reference guide.

Add columns to existing BigQuery tables

When working with large amounts of fast moving data, it's not uncommon to find out that you need to add additional fields to your tables. In response to developer feedback, we have added the ability to add new columns to existing BigQuery tables.

To take advantage of this feature, simply provide a new schema with additional columns using either the "Tables: update" or "Tables: patch" BigQuery API methods.

For more information on this feature, visit the BigQuery API reference.

BigQuery Web UI: Dataset links and dataset sharing notifications

BigQuery has always provided project owners with very fine-grained control of how their datasets are shared. To make it easier for teams to work on collaborative data analysis, we've added direct links to individual datasets in the BigQuery Web UI. This provides a convenient way for authorized users to quickly access a dataset, and allows for bookmarking and sharing.

In addition, we've also added email notifications to inform users when they've been given dataset access privileges. When a dataset has been shared with another user via the sharing control panel, BigQuery sends a notification email containing a direct link to the dataset.


The BigQuery UI features a collection of public datasets for you to use when trying out these new features. To get started, visit our sign up page and Quick Start guide. You should take a look at our API docs, and ask questions about BigQuery development on Stack Overflow. Finally, don't forget to give us feedback and join the discussion on our Cloud Platform Developers Google+ page.


Michael Manoochehri is a Developer Programs Engineer supporting the Google Cloud Platform. His goal is to help make cloud computing and data analysis universally accessible and useful.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Find sample code and more for Google Cloud Platform, now on GitHub 2013

Seo Master present to you: Author PhotoBy Julia Ferraioli, Developer Advocate, Google Compute Engine

Cross-posted from the Google Open Source Blog

Today, we’re announcing that you can now find Google Cloud Platform on GitHub! The GitHub organization for the Google Cloud Platform is your destination for samples and tools relating to App Engine, BigQuery, Compute Engine, Cloud SQL, and Cloud Storage. Most Google Cloud Platform existing open source tools will be migrated to the organization over time. You can quickly get your app running by forking any of our repositories and diving into the code.

Currently, the GitHub organization for the Google Cloud Platform has 36 public repositories, some of which are currently undergoing their initial code reviews, which you can follow on the repo. The Google Cloud Platform Developer Relations Team will be using GitHub to maintain our starter projects, which show how to get started with our APIs using different stacks. We will continue to add repositories that illustrate solutions, such as the classic guest book app on Google App Engine. For good measure, you will also see some tools that will make your life easier, such as an OAuth 2.0 helper.

From getting started with Python on Google Cloud Storage to monitoring your Google Compute Engine instances with App Engine, our GitHub organization is home to it all.

Trick of the trade: to find samples relating to a specific platform, try filtering on the name in the “Find a Repository” text field.

We set up this organization not only to give you an easy way to find and follow our samples, but also to give you a way to get involved and start hacking alongside us. We’ll be monitoring our repositories for any reported issues as well as for pull requests. If you’re interested in seeing what a code review looks like for Google’s open source code, you can follow along with the discussion happening right on the commits.

Let us know about your suggestions for samples. We look forward to seeing what you create!


Julia Ferraioli is a Developer Advocate for Google Compute Engine, based in Seattle. She helps developers harness the power of Google's infrastructure to tackle their computationally intensive processes and jobs. She comes from an industrial background in software engineering, and an academic background in machine learning and assistive technology.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo BigQuery, meet Google Spreadsheets 2013

Seo Master present to you:

Since announcing BigQuery at Google IO last May, we’ve been very excited by the response and feedback we’ve received from the developer community, enterprises and academia. The one consistent request we heard from everyone is the ability to interactively analyze large volumes of data without having to worry about provisioning, maintaining and scaling infrastructure.

Today, we would like to announce the integration of BigQuery with Google Apps Script and Google Spreadsheets, a feature we first demoed at Google IO. With this integration users now have the power to query multi-billion row tables, visualize the results and share them with others. Below you can see a simple script that queries a sample dataset and plots the results. A simple tutorial is available here with more to come soon.




We’ve seen a big uptake of the APIs (released in October) which let you create, populate and delete tables in BigQuery. Users have been loading more and more data in BigQuery. For instance the current M-Lab dataset in BigQuery stands at 240B rows!

The details of BigQuery and new features are available on the BigQuery website. We are gradually adding more developers during this free preview period. Please sign up for an invitation, and let us know about the creative and valuable ways you’re using BigQuery.

2013, By: Seo Master
Powered by Blogger.