Les nouveautés et Tutoriels de Votre Codeur | SEO | Création de site web | Création de logiciel

seo Google BigQuery Service: Big data analytics at Google speed 2013

Seo Master present to you:
By Ju-kay Kwek, Product Manager

(Cross-posted on the Google App Engine Blog and the Google Enterprise Blog.)

Rapidly crunching terabytes of big data can lead to better business decisions, but this has traditionally required tremendous IT investments. Imagine a large online retailer that wants to provide better product recommendations by analyzing website usage and purchase patterns from millions of website visits. Or consider a car manufacturer that wants to maximize its advertising impact by learning how its last global campaign performed across billions of multimedia impressions. Fortune 500 companies struggle to unlock the potential of data, so it’s no surprise that it’s been even harder for smaller businesses.

We developed Google BigQuery Service for large-scale internal data analytics. At Google I/O last year, we opened a preview of the service to a limited number of enterprises and developers. Today we're releasing some big improvements, and putting one of Google's most powerful data analysis systems into the hands of more companies of all sizes.
  • We’ve added a graphical user interface for analysts and developers to rapidly explore massive data through a web application.
  • We’ve made big improvements for customers accessing the service programmatically through the API. The new REST API lets you run multiple jobs in the background and manage tables and permissions with more granularity.
  • Whether you use the BigQuery web application or API, you can now write even more powerful queries with JOIN statements. This lets you run queries across multiple data tables, linked by data that tables have in common.
  • It’s also now easy to manage, secure, and share access to your data tables in BigQuery, and export query results to the desktop or to Google Cloud Storage.

Michael J. Franklin, Professor of Computer Science at UC Berkeley, remarked that BigQuery (internally known as Dremel) leverages “thousands of machines to process data at a scale that is simply jaw-dropping given the current state of the art.” We’re looking forward to helping businesses innovate faster by harnessing their own large data sets. BigQuery is available free of charge for now, and we’ll let customers know at least 30 days before the free period ends. We’re bringing on a new batch of pilot customers, so let us know if your business wants to test drive BigQuery Service.


Ju-kay Kwek is a Product Manager for Google BigQuery Service.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Streak brings CRM to the inbox with Google Cloud Platform 2013

Seo Master present to you: Author PhotoBy Aleem Mawani, Co-Founder of Streak

Cross-posted with the Google App Engine Blog

This guest post was written by Aleem Mawani, Co-Founder of Streak, a startup alum of Y Combinator, a Silicon Valley incubator. Streak is a CRM tool built into Gmail. In this post, Aleem shares his experience building and scaling their product using Google Cloud Platform.

Everyone relies on email to get work done – yet most people use separate applications from their email to help them with various business processes. Streak fixes this problem by letting you do sales, hiring, fundraising, bug tracking, product development, deal flow, project management and almost any other business process right inside Gmail. In this post, I want to illustrate how we have used Google Cloud Platform to build Streak quickly, scalably and with the ability to deeply analyze our data.



We use several Google technologies on the backend of Streak:

  • BigQuery to analyze our logs and power dashboards.

Our core learning is that you should use the best tool for the job. No one technology will be able to solve all your data storage and access needs. Instead, for each type of functionality, you should use a different service. In our case, we aggressively mirror our data in all the services mentioned above. For example, although the source of truth for our user data is in the App Engine Datastore, we mirror that data in the App Engine Search API so that we can provide full text search, Gmail style, to our users. We also mirror that same data in BigQuery so that we can power internal dashboards.

System Architecture




App Engine - We use App Engine for Java primarily to serve our application to the browser and mobile clients in addition to serving our API. App Engine is the source of truth for all our data, so we aggressively cache using Memcache. We also use Objectify to simplify access to the Datastore, which I highly recommend.

Google Cloud Storage - We mirror all of our Datastore data as well as all our log data in Cloud Storage, which acts as a conduit to other Google cloud services. It lets us archive the data as well as push it to BigQuery and the Prediction API.

BigQuery - Pushing the data into BigQuery allows us to run non-realtime queries that can help generate useful business metrics and slice user data to better understand how our product is getting used. Not only can we run complex queries over our Datastore data but also over all of our log data. This is incredibly powerful for analyzing the request patterns to App Engine. We can answer questions like:

  • Which requests cost us the most money?
  • What is the average response time for every URL on our site over the last 3 days?

BigQuery helps us monitor error rates in our application. We process all of our log data with debug statements, as well as something called an “error type” for any request that fails. If it’s a known error, we'll log something sensible, and we log the exception type if we haven’t seen it before. This is beneficial because we built a dashboard that queries BigQuery for the most recent errors in the last hour grouped by error type. Whenever we do a release, we can monitor error rates in the application really easily.



A Streak dashboard powered by BigQuery showing current usage statistics
In order to move the data into Cloud Storage from the Datastore and LogService, we developed an open source library called Mache. It’s a drop-in library that can be configured to automatically push data into BigQuery via Cloud Storage. The data can come from the Datastore or from LogService and is very configurable - feel free to contribute and give us feedback on it!

Google Cloud Platform also makes our application better for our users. We take advantage of the App Engine Search API and again mirror our data there. Users can then query their Streak data using the familiar Gmail full text search syntax, for example, “before:yesterday name:Foo”. Since we also push our data to the Prediction API, we can help users throughout our app by making smart suggestions. In Streak, we train models based on which emails users have categorized into different projects. Then, when users get a new email, we can suggest the most likely box that the email belongs to.

One issue that arises is how to keep all these mirrored data sets in sync. It works differently for each service based on the architecture of the service. Here’s a simple breakdown:




Having these technologies easily available to us has been a huge help for Streak. It makes our products better and helps us understand our users. Streak’s user base grew 30% every week for 4 consecutive months after launch, and we couldn’t have scaled this easily without Google Cloud Platform. To read more details on why Cloud Platform makes sense for our business, check out our case study and our post on the Google Enterprise blog.


Aleem Mawani is the co-founder of Streak.com, a CRM tool built into Gmail. Previously, Aleem worked on Google Drive and various ads products at Google. He has a degree from the University of Waterloo in Software engineering and an MBA from Harvard University.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo Got big JSON? BigQuery expands data import for large scale web apps 2013

Seo Master present to you: Author PhotoBy Ryan Boyd, Developer Advocate

JSON is the data format of the web. JSON is used to power most modern websites, is a native format for many NoSQL databases hosting top web applications, and provides the primary data format in many REST APIs. Google BigQuery, our cloud service for ad-hoc analytics on big data, has now added support for JSON and the nested/repeated structure inherent in the data format.

image
JSON opens the door to a more object-oriented view of your data compared to CSV, the original data format supported by BigQuery. It removes the need for duplication of data required when you flatten records into CSV. Here are some examples of data you might find a JSON format useful for:
  • Log files, with multiple headers and other name-value pairs.
  • User session activities, with information about each activity occurring nested beneath the session record.
  • Sensor data, with variable attributes collected in each measurement.
Nested/repeated data support is one of our most requested features. And while BigQuery's underlying infrastructure supports it, we'd only enabled it in a limited fashion through M-Lab's test data. Today, however, developers can use JSON to get any nested/repeated data into and out of BigQuery.

For more information on importing JSON and nested/repeated data into BigQuery, check out the new guide in our documentation. You should also see the Dealing with Data section for details on the new querying syntax available for this type of data.

Improvements to Data Loading Pipeline

We’ve made it much easier to ingest data into BigQuery – up to 1TB of data per load job, with each file up to 100GB uncompressed JSON or CSV. We’ve also eliminated the 2 imports per minute rate limit, enabling you to submit all your ingestion jobs and let us handle the queuing as necessary. In a recent project I’ve been working on, import jobs for 3TB of data that previously took me 12 hours to run now take me only 36 minutes – a 20x improvement!

We’ve published a new Ingestion Cookbook that explains how to take advantage of these new limits.

We’re initiating a small trusted tester program aimed at making it easier to move your data from the App Engine Datastore to BigQuery for analysis. If you store a lot of data in Datastore and are also using BigQuery, we’d like to hear from you. Please sign up now to be considered for the trusted tester program.

Learn more this week

Michael Manoochehri, Siddartha Naidu and I are in London this week talking about BigQuery and these new features at the Strata big data conference. Ju-kay Kwek will also be talking about BigQuery at the Interop NYC conference tomorrow. Please stop by, say hi, and let us know what you’re doing with big data.

We’ll also be producing a Google Developers Live session from Campus London on Friday at 16:00 BST (15:00 GMT).


Ryan Boyd is a Developer Advocate, focused on big data. He's been at Google for 6 years and previously helped build out the Google Apps ISV ecosystem. He published his first book, "Getting Started with OAuth 2.0", with O'Reilly.

Posted by Scott Knaster, Editor
2013, By: Seo Master

seo How redBus uses BigQuery to master big data 2013

Seo Master present to you: Author Photo
By Pradeep Kumar, redBus

This guest post was written by Pradeep Kumar. Pradeep is a technical architect at redBus, an online travel agency in India that provides a unified online bus ticketing service. We recently published a business case study for redBus and wanted to dive into some more technical detail for the readers of the Google Developers Blog.


Our company has been providing Internet bus ticketing for India since 2006. There are more than 10,000 bus routes available for booking, and we have dozens of machines processing booking requests. Each step in the booking process produces a lot of data – on search terms, route availability, server health and more. We needed tools to to be able to process this data quickly and easily to determine whether decreases in customer bookings are the result of server problems or simply less demand.

While we typically use relational databases to store and analyze data, we knew we needed something more powerful if we wanted to analyze 500GB or more, so we started to look at open source frameworks like Hadoop and analysis platforms like Hive and Pig. We found that these frameworks require considerable in-house expertise and infrastructure investments and wouldn’t give us answers to our questions as fast as we wanted. We decided to try out Google BigQuery as a trusted tester, with hopes that it would give us the ability to perform quick iterative analysis without much up-front investment. Our initial tests went very well, so we started building our analysis tools on top of BigQuery.

BigQuery allows us to run SQL-like queries to understand the bus routes in highest demand and what types of searches users are performing. We’ve also used it to build internal dashboards that give us a snapshot of system health.


For more information on how we structured our immutable tables, pipelined our data into BigQuery for analysis using RabitMQ, and to see example SQL queries we’ve used, check out my article on developers.google.com.


Pradeep Kumar is a technical architect at redBus.

Posted by Scott Knaster, Editor
2013, By: Seo Master
Powered by Blogger.