How can you optimise costs on Google Cloud?
It is quite an interesting phenomenon, everything is migrating to the cloud and many businesses still believe that doing so may results in a higher total cost of ownership (TCO). In this post we will not argue whether or not this is true, that’s something for another time. We will discuss the different ways that organisations can introduce small tweaks to their infrastructure and leverage best practices to even further reduce their TCO.
As there are quite a lot of different possibilities we will focus our attention in this post on data warehousing and storage solutions.
#1 Optimise costs on Google Cloud through Data warehousing
Our first point on the agenda is reducing the cost of Google Cloud through cloud data warehousing. We can not speak of a data warehouse without data of course. We therefore must load data into it. For this purpose you most probably use Google BigQuery. This can be done through streaming and batch loading. If you are using streaming loading, changes are great that you are being billed for it. Opting for batch loading in your workflow can reduce that cost to zero, as batch loading is totally free. The down-side is that you are not able to do real-time analysis on streaming data. Depending on your use case this may or may not be a disadvantage.
Once our data is in our data warehouse, we can start analysing it using different queries. As Google BigQuery is a serverless, highly scalable, and cost-effective data warehouse, even suboptimal written queries can be executed in a matter of seconds. Although BigQuery finishes in seconds, it still needs to process giant amounts of data and you are billed for each byte processed. To reduce this cost to the bare minimal, we can help key users optimise their queries.
Sitting next to them while they write their queries would not be a true scalable solution. Instead we can opt for partitioning and clustering our data and enforcing it onto our users. Partitioning can be done based on a timestamp or integer range column within your datasets. This will only reduce cost if your users use the WHERE-clause. This can be enforced by enabling the Require partition filter. Queries without a WHERE-clause will therefore not be executed. An additional benefit is that data that has not used for 90 days, will automatically be moved to a longterm storage reducing cost by 50%.
While partitioning and clustering your data can already drastically improve the state of mind of your finance department, we can do even better. We can opt for a maximum of processed bytes per query. While using a WHERE-clause already enforces this quite a bit, users may still query unnecessary huge amounts of data. When one does so, BigQuery will stop the execution warning the user. As a bonus the queried bytes will not be billed. Enabling the setting for maximum bytes billed can thus be an additional cost-saver for many organisations.
#2 Optimise costs on Google Cloud through Storage
So, how can you optimise your costs on Google Cloud through storage?
When loading data into your data warehouse it must come from somewhere. More often than not, data is being loaded from your data lake. If your organisation does also use Google Cloud Storage, keep reading to find out how to reduce costs.
First of all a data lifecycle policy must be developed. The acquisition of new data happens so fast nowadays that it can become quite overwhelming and expensive to store it all. After a particular time one may simply not need to store backups from ages ago. We can then setup a policy to remove older data that has already been processed and loaded into our data warehouse. This process can be optimised per organisation and bucket.
Some organisations may have the need to keep a backup of their data for a longer period of time. Due to some compliances or fall-back scenarios. A data policy may then still be possible but instead of deleting the data after a certain period and/or action, the organisation opts to move the data to a lower-cost long-term data storage class. This will reduce the costs significantly while keeping the data close-by.
For other use cases fast access to the data stored in buckets is required. Let’s say an organisation has a large user-base across the globe that rely on data stored in a particular bucket. Serving users from Europe data from a bucket in the US may result in a noticeable lag and reduced user experience. This could also incur additional egress charges. Organisations in such scenario’s could leverage the use of a multi-regional bucket.
Using multi-regional buckets can thus reduce potential lag and improve the user experience for a user-base across regions. If we would want to further reduce cost, we could opt for Cloud DNS. In stead of transmitting data from your cloud storage, you will be sending out data from a cache. Egress cost from a Cloud DNS is much cheaper than from a bucket. This would therefore reduce your total overall bill.
Check out part 2 of this post for more Google Cloud Platform tips & tricks.