If you work with Backblaze B2, you’re probably already aware of resources such as the Backblaze B2 Python SDK and the Backblaze B2 Command Line Tool, but did you know that there is also a Terraform Provider for Backblaze B2, an SDK for Java, and a whole slew of open source samples showing how to integrate with Backblaze B2 from web browsers, serverless platforms, and more? Today, I’ll take you on a quick tour of our open source SDKs, tools, and sample code, pointing out some interesting sights along the way.
Why open source?
We’ve long been believers in open source code here at Backblaze, open sourcing our implementation of Reed-Solomon erasure coding back in 2015, and, even before then, sharing our Storage Pod designs and, of course, Drive Stats, the statistics and insights based on our observations of the hard drives we operate in our data centers, including the raw metrics we collect from many thousands of hard drives, every day.
While the Storage Pod designs and Drive Stats live here on the Backblaze website, we make our open source code available via two GitHub organizations:
- https://github.com/backblaze contains official, supported, Backblaze SDKs and other tools.
- https://github.com/backblaze-b2-samples contains unsupported samples and demos showing how to integrate Backblaze B2 with a wide variety of services, frameworks, and applications.
Let’s take a closer look.
Official Backblaze SDKs and tools
You can use any of AWS’ range of SDKs, plus the AWS Command Line Interface (CLI), to access Backblaze B2 via its S3 Compatible API; just remember to configure the endpoint URL as well as the access key ID and secret access key.
Not every Backblaze B2 operation is accessible via the S3 Compatible API—for example, application key management—so we also support a range of open source SDKs for accessing Backblaze B2’s Native API from a variety of programming languages:
- The Backblaze B2 Python SDK: This SDK provides access to the basic operations of the Native API, such as
list_buckets()
anddownload_file_by_id()
, as well as a powerfulSynchronizer
class that implements high performance, multi-threaded file copying between Backblaze B2 and local file storage. - The Backblaze B2 Java SDK: Although it doesn’t include anything quite as sophisticated as the Python Synchronizer, the Java SDK does implement high-level functionality such as
uploadLargeFile()
, which encapsulates all of the mechanics of a multi-threaded file upload in a single method call. We also use it internally at Backblaze in our production environment. - blazer, an open source Backblaze B2 SDK for Go (aka golang): We adopted blazer from its original author, Toby Burress, when he was no longer able to maintain it. We’ve made a few improvements since taking it on, and we’re looking at doing more with it.
The Backblaze GitHub organization also contains a pair of tools built on the Python SDK:
- The Backblaze B2 Command Line Tool (also known as the B2 CLI): The B2 CLI gives easy access to all of the capabilities of Backblaze B2, from uploading and downloading files to managing buckets and application keys. Command Like a Pro with New Backblaze B2 CLI Enhancements provides a summary of recent upgrades to the B2 CLI.
- The Terraform Provider for Backblaze B2: This tool allows you to manage Backblaze B2 resources, such as buckets, files, and application keys, from Terraform configurations. Although the provider is written in Go, it embeds the B2 Python SDK for accessing the Backblaze B2 API.
The remaining repositories contain utilities and other code that we have published over the years, including our open source Reed-Solomon erasure coding implementation and a utility we wrote to support migrating a live Cassandra cluster from one data center to another.
Backblaze sample and demo code
Our https://github.com/backblaze-b2-samples organization contains, at the time of writing, 34 repositories, demonstrating how to use Backblaze B2 in a wide variety of situations. We’ve covered a few of them in past blog posts:
- Build a conversational chatbot: Last month, we walked through the example code in retrieval-augmented generation (RAG) with Backblaze B2, showing how you can build a conversational chatbot that answers questions based on content downloaded from a private Backblaze B2 Bucket.
- Create an AI-powered media asset management (MAM) app: Earlier this year, we showed how you can integrate Backblaze B2 with the Twelve Labs video understanding platform, publishing the code for a simple AI-powered MAM.
- Set up a tiered media storage architecture: In 2022, we explained how iconik, LucidLink, and Backblaze B2 complement each other in a tiered media storage architecture. Several customers have deployed the open source Backblaze B2 Storage Plugin for iconik; one of those customers joined me on stage at NAB Show 2023 to tell their story.
As you explore the https://github.com/backblaze-b2-samples organization, you’ll also find repositories that have not yet been covered here on the Backblaze blog:
- B2listen allows you to forward Backblaze B2 Event Notifications to a service listening on a local URL. B2listen uses Cloudflare’s free Quick Tunnels feature to proxy traffic from an internet-accessible URL to a local endpoint.
- B2 Browser Upload shows you how to upload files directly to Backblaze B2 from JavaScript code running in the browser, with sample code for both the Backblaze B2 Native and S3-compatible APIs.
- The Backblaze B2 Zip Files Example implements a simple Python web app, using the Flask web application framework and the flask-executor task queue, that can compress a set of files located in Backblaze B2 into an archive, also stored in Backblaze B2, without using any local storage.
We’ll write more about these, and other, as yet unreleased, open source projects, over the coming weeks and months, but, if you’d like us to prioritize any of the above three repositories, or any of our other projects, let us know in the comments!