Throughout the past decade, several metaphors and labels have evolved to describe the software that curates the data storage. Some were called warehouses; they generally offered stronger structure and compliance, but they were often unable to manage the larger volumes of information from modern web applications. Another term, the “data lake,” referred to less structured collections that were engineered to scale easily, in part because they enforced fewer rules. Google wants BigLake to offer the control of the best data warehouses with the seemingly endless availability of cloud storage.
“All of these organizations who try to innovate on top of the data lake found it to be, at the end of the day, just a data swamp,” said Kazmaier. “Our innovation at Google Cloud is that we take BigQuery and its unique architecture, its unique Serverless model, its unique storage architecture and a unique compute architecture and [integrate it] with open-source file formats and open-source processing engines.”
The open-source architecture is intended to allow customers to adopt Google’s tools slowly through integration with existing data infrastructure. These open formats simplify sharing information, making it a more welcoming environment.