Création des Logiciels de gestion d'Entreprise, Création et référencement des sites web, Réseaux et Maintenance, Conception
Création des Logiciels de gestion d'Entreprise, Création et référencement des sites web, Réseaux et Maintenance, Conception
A Streak dashboard powered by BigQuery showing current usage statistics |
Job job = new Job();Batch queries will execute between 30 minutes and 3 hours after they are submitted. See more information in our Developers Guide.
JobConfiguration config = new JobConfiguration();
JobConfigurationQuery queryConfig = new JobConfigurationQuery();
config.setQuery(queryConfig);
job.setConfiguration(config);
queryConfig.setQuery(querySql);
queryConfig.setPriority("BATCH");
com.google.api.services.bigquery.Bigquery.Jobs.Insert insert =
bigquery.jobs().insert(projectId, job);
Amanda |
Ju-kay |
Ju-kay |
Momchil |
Last month we announced the public launch of Google BigQuery, which enables developers and businesses to gain real-time business insights from massive amounts of data without any hardware or software investments.
Since then, we’ve added new features to Google BigQuery every week. For example, our most recent release includes support for running up to 20 concurrent queries, depending on the volume of data. This enables developers to build visually interactive dashboards on Google BigQuery.
Today, we’re highlighting two data visualization providers, QlikView and Bime, who are using Google BigQuery’s latest features to build dashboards with snappier and richer experiences.
QlikView, one of the leaders in the Business Intelligence market, has developed a dashboard that visualizes the birth-record data for all babies born to mothers of different ages and races. With the help of BigQuery, QlikView can crunch millions of rows of data in seconds to answer questions like, “What's the average age of a mother in New York vs. in Texas?"
Bime, a cloud-based Business Intelligence provider based in France, is another early adopter of Google BigQuery. They’ve built a slick UI on top of the Google BigQuery platform that allows users to slice and dice 432 million rows of business data. For example, you can adjust a few simple parameters to see the sales distribution across products or regions on a map.
This is just a snapshot of how developers can use Google BigQuery to build interactive visual dashboards using a browser and without the hassle of managing SQL. Sign up and share your BigQuery use cases via our developer feedback form or on the Google Enterprise Google+ page.
Google BigQuery is designed to make it easy to analyze large amounts of data quickly. Today we announced several updates that give BigQuery the ability to handle arbitrarily large result sets, use window functions for advanced analytics, and cache query results. You are also getting new UI features, larger interactive quotas, and a new convenient tiered pricing scheme. In this post we'll dig further into the technical details of these new features.
BigQuery is able to process terabytes of data, but until today BigQuery could only output up to 128 MB of compressed data per query. Many of you asked for more and from now on BigQuery will be able to output results as large as the largest tables our customers have ever had.
To get this benefit, you should enable the new "--allow_large_results
" flag when issuing a query job, and specify a destination table. All results will be saved to the new specified table (or appended, if the table exists). In the updated web UI these options can be found under the new "Enable Options" menu.
With this feature, you can run big transformations on your tables, plus get big subsets of data to further analyze from the new table.
BigQuery's power is in the ability to interactively run aggregate queries over terabytes of data, but sometimes counts and averages are not enough. That's why BigQuery also lets you calculate quantiles, variance and standard deviation, as well as other advanced functions.
To make BigQuery even more powerful, today we are adding support for window functions (also known as "analytical functions") for ranking, percentiles, and relative row navigation. These new functions give you different ways to rank results, explore distributions and percentiles, and traverse results without the need for a self join.
To introduce these functions with an advanced example, let's use the dataset we collected from the Data Sensing Lab at Google I/O. With the percentile_cont()
function it's easy to get the median temperature over each room:
SELECT percentile_cont(0.5) OVER (PARTITION BY room ORDER BY data) AS median, room
FROM [io_sensor_data.moscone_io13]
WHERE sensortype='temperature'
In this example, each original data row shows the median temperature for each room. To visualize it better, it's a good idea to group all results by room with an outer query:
SELECT MAX(median) AS median, room FROM (
SELECT percentile_cont(0.5) OVER (PARTITION BY room ORDER BY data) AS median, room
FROM [io_sensor_data.moscone_io13]
WHERE sensortype='temperature'
)
GROUP BY room
We can add an additional outer query, to rank the rooms according to which one had the coldest median temperature. We'll use one of the new ranking window functions, dense_rank()
:
SELECT DENSE_RANK() OVER (ORDER BY median) rank, median, room FROM (
SELECT MAX(median) AS median, room FROM (
SELECT percentile_cont(0.5) OVER (PARTITION BY room ORDER BY data) AS median, room
FROM [io_sensor_data.moscone_io13]
WHERE sensortype='temperature'
)
GROUP BY room
)
We've updated the documentation with descriptions and examples for each of the new window functions. Note that they require the OVER()
clause, with an optional PARTITION BY
and sometimes required ORDER BY
arguments. ORDER BY
tells the window function what criteria to use to rank items, while PARTITION BY
allows you to define multiple groups to be analyzed independently of each other.
The window functions don't work with the big GROUP EACH BY
and JOIN EACH BY
operators, but they do work with the traditional GROUP BY
and JOIN BY
. As a reminder, we announced GROUP EACH BY
and JOIN EACH BY
last March, to allow large join and group operations.
BigQuery now remembers values that you've previously computed, saving you time and the cost of recalculating the query. To maintain privacy, queries are cached on a per-user basis. Cached results are only returned for tables that haven't changed since the last query, or for queries that are not dependent on non-deterministic parameters (such as the current time). Reading cached results is free, but each query still counts against the max number of queries per day quota. Query results are kept cached for 24 hours, on a best effort basis. You can disable query caching with the new flag --use_cache
in bq, or "useQueryCache
" in the API. This feature is also accessible with the new query options on the BigQuery Web UI.
The BigQuery UI gets even better: You'll get instant information while writing a query if its syntax is valid. If the syntax is not valid, you'll know where the error is. If the syntax is valid, the UI will inform you how much the query would cost to run. This feature is also available with the bq tool and API, using the --dry_run
flag.
An additional improvement: When running queries on the UI, previously you had to wait until its completion before starting another one. Now you have the option to abandon it, to start working on the next iteration of the query without waiting for the abandoned one.
Starting in July, BigQuery pricing becomes more affordable for everyone: Data storage costs are going from $0.12/GB/month to $0.08/GB/month. And if you are a high-volume user, you'll soon be able to opt-in for tiered query pricing, for even better value.
To support larger workloads we're doubling interactive query quotas for all users, from 200GB + 1 concurrent query, to 400 GB of concurrent queries + 2 additional queries of unlimited size.
These updates make BigQuery a faster, smarter, and even more affordable solution for ad hoc analysis of extremely large datasets. We expect they'll help to scale your projects, and we hope you'll share your use cases with us on Google+.
The BigQuery UI features a collection of public datasets for you to use when trying out these new features. To get started, visit our sign-up page and Quick Start guide. You should take a look at our API docs, and ask questions about BigQuery development on Stack Overflow. Finally, don't forget to give us feedback and join the discussion on our Cloud Platform Developers Google+ page.
/* JOIN EACH example * Selects the top 10 most edited Wikipedia pages * of words that appear in works of Shakespeare. */ SELECT TOP(wiki.title, 10), COUNT(*) FROM [publicdata:samples.wikipedia] AS wiki JOIN EACH [publicdata:samples.shakespeare] AS shakespeare ON shakespeare.word = wiki.title; |
/* TIMESTAMP example * Which hours in the day are the most popular for GitHub actions? * This query converts github_timeline "created_at" date time * strings to BigQuery TIMESTAMP, and extracts the hour from each. */ SELECT HOUR(TIMESTAMP(created_at)) AS event_create_hour, COUNT(*) AS event_count FROM [publicdata:samples.github_timeline] GROUP BY event_create_hour ORDER BY event_count DESC; |
Since announcing BigQuery at Google IO last May, we’ve been very excited by the response and feedback we’ve received from the developer community, enterprises and academia. The one consistent request we heard from everyone is the ability to interactively analyze large volumes of data without having to worry about provisioning, maintaining and scaling infrastructure.
Today, we would like to announce the integration of BigQuery with Google Apps Script and Google Spreadsheets, a feature we first demoed at Google IO. With this integration users now have the power to query multi-billion row tables, visualize the results and share them with others. Below you can see a simple script that queries a sample dataset and plots the results. A simple tutorial is available here with more to come soon.
We’ve seen a big uptake of the APIs (released in October) which let you create, populate and delete tables in BigQuery. Users have been loading more and more data in BigQuery. For instance the current M-Lab dataset in BigQuery stands at 240B rows!
The details of BigQuery and new features are available on the BigQuery website. We are gradually adding more developers during this free preview period. Please sign up for an invitation, and let us know about the creative and valuable ways you’re using BigQuery.
By Amit Agarwal, BigQuery Team