Automating Google Cloud Storage Management with Python

How to make use of Python to automate the management of your Cloud Storage objects.

Elise Landman
Google Cloud - Community

--

Image from Unsplash.

Google Cloud is one of the biggest cloud providers in the market. Google Cloud Platform (GCP) provides various cloud solutions, including Compute services, Storage options, Networking and Security solutions, and Big Data and AI frameworks.

A while ago, I published an article that covered the basics on how AWS S3 file management can be automated with Python. Today, we will switch our focus to Google Cloud and I will show you, how you can equivalently automate your file management with Google’s Cloud Storage solution.

GCP provides users with free tiers, for Cloud Storage these include 5GB of storage, 5'000 class A, and 50'000 class B operations. Additionally, new users receive 300$ of free credits to spend on various cloud services.

In the below sections, I will go through the steps of setting up your GCP environment, and show you how to make use of the GCP Cloud Storage API with Python 3 to automate the management of your buckets and files.

1 | Preparing your GCP Environment

Before we can start making use of any APIs to interact with GCP, we need to perform some preparatory steps.

Creating a Service Account

Firstly, we need to create a Service Account (SA) which will be used to interact with Cloud Storage via the API. The actions we can perform with the API are defined by the scope of rights that we grant to the SA.

Service Accounts (SA) provide you with a simple way of securing, and segmenting access to your GCP environment.

We could for example create a SA which can only view files in our buckets, but not create new buckets. This helps us securing our GCP environment, especially when f. e. multiple teams with different roles and aims would be working on the same project. For our use case, we will be creating a SA with admin rights to Cloud Storage, meaning it will be able to perform any action.

Login to your Google Cloud Management Console, go to IAM & Admin and choose the Service Accounts tab. From there, click on Create Service Account.

Image from Google Cloud Management Console — IAM & Admin > Service Accounts

Choose a service account name, for example “cloud-storage-sa”, and optionally add a brief description. Finally, we need to give access rights to our SA. In our case, we will select the role of “Cloud Storage: Storage Admin”.

Gathering the API Key

Now that we’ve created the SA, we need to generate its respective key, which will allow us to authenticate when using the API.

Click on the email address of your newly created SA, displayed in your list of service accounts. The email address will look something like “cloud-storage-sa@project-name.gserviceaccount.com”. Then go to the Keys tab, click on Add Key > Create New Key and select JSON as key type. This will download a JSON file to your PC, which includes they key needed to authenticate to your GCP console.

Make sure to store your JSON key file securely, as this is they key giving access to your GCP resources.

Now that we have gathered the API key of our Cloud Storage SA, we can start making API calls to Cloud Storage.

2 | Interacting with GCP via API

The google-cloud-storage package is the official Google Python package for interacting with Cloud Storage. You can easily install it via the pip installer, as shown below:

pip install google-cloud-storage

Before we can make any API calls, we need to create an environment variable in our Python script that holds the path to our JSON file containing our API key:

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = r"/path/to/credentials/project-name-123456.json"

Great — we are all set, let’s get hands-on.

A. Creating a Cloud Storage Bucket

First, we can start by creating a bucket.

The Cloud Storage Python package allows us to define the storage class and the storage location by setting them as parameters to the ‘create_bucket’ function. In the example below, we select ‘STANDARD as storage class, and ‘us-central1’ as location:

Other options for the storage class parameter include ‘COLDLINE’, ‘NEARLINE’ or ‘ARCHIVE’.

Depending on the how granular you define the location parameter, the bucket data will either be stored regionally, or multi-regionally. As shown in the table below, by setting ‘location’ to ‘us’, the location type of your bucket will automatically be set to multi-region — and equivalently for the other locations.

Multi-region or regional Cloud Storage bucket, location type options. Image by author.

To set your bucket location type to dual-location, add the ‘data_locations’ parameter to the ‘create_bucket’ function, and define which two locations your bucket should store its data in.

B. Uploading a File to a Cloud Storage Bucket

Now that we have seen how we can create a Cloud Storage bucket, we can also define a function that lets us upload files to our bucket:

C. Downloading a File to from a Cloud Storage Bucket

Similarly, we can also download files from our Cloud Storage bucket to our local machine:

D. List Files in a Cloud Storage Bucket

Apart from uploading and downloading files to/from our Cloud Storage bucket, we can define a function ‘list_cs_files’ which returns a list of all the filenames in the bucket:

E. Get Public URL of a File in a Cloud Storage Bucket

Lastly, we can also make use of the API to generate a publicly accessible link to one of the files in our bucket. Out of security reasons, we need to set a time after which the URL will expire. By default, this is set to 24 hours in the function shown below:

This will return the URL as a Python string, which you can then share with anyone outside of your organization.

I hope this article has been informative for you and could help you save a few minutes of your time in starting to make use of the Cloud Storage API and the google-cloud-storage Python package! 🎉 Below you will find a few reference links of the official Cloud Storage API documentation.

Feel free to share any feedback you have with me in the comments! 💬

References:

[1] Google Cloud, Cloud Storage Client Libraries Documentation (2022)

[2] Google Cloud, GitHub “python-storage” code samples (2022)

[3] Google Cloud, Cloud Storage (2022)

--

--

Elise Landman
Google Cloud - Community

Data Analytics MSc @UCD and Customer Engineer @Google Cloud — passionate about data science & programmability with 🐍