Building an AI Agent with Backblaze B2, LangChain, and Drive Stats

A decorative image showing multiple computer windows folding into the cloud.

Last August, I explained how you can use a Jupyter Notebook to explore AI development; specifically, building a chatbot that answers questions based on custom context downloaded from a private bucket in Backblaze B2 Cloud Storage. 

In this post, I’ll look at another AI technology, agents, and show you how I built an AI agent that answers questions about hard drive reliability based on over 11 years of raw data from our Drive Stats franchise. 

The Drive Stats dataset is ideal for this kind of work. It’s a real-world dataset, but, it only weighs in at around 500 million records consuming about 20GB of storage in Parquet format (“only” being a relative term), so you can use it with big data and AI tools on a laptop in a reasonable amount of time rather than spinning up an expensive virtual machine (VM) and/or spending days waiting for an operation to complete. As an example, converting the entire Drive Stats data set from CSV to Parquet using a Python app on my MacBook Pro takes a couple of hours. On the same hardware, converting a terabyte-scale data set would take about four days.

Speaking of Drive Stats

The Drive Stats 2024 report comes out February 11, and we’re hosting a LinkedIn Live event where Andy Klein, resident Drive Stats guru, will share highlights. Register today to save your spot.

Register Here ➔ 

You can use these same techniques with any large dataset, from healthcare to ecommerce to financial services. In this example, we’re working with a single table, but you could adapt the sample code to a data lake comprising any number of tables.

What is an AI agent?

In the spirit of the times, I posed this question to ChatGPT. Its answer:

An AI agent is a software system designed to autonomously perform tasks or make decisions based on its environment and goals. It leverages artificial intelligence techniques—such as machine learning, reasoning, and natural language processing—to process information, make decisions, and take actions to achieve specific objectives.

Key components of an AI agent include:

  1. Perception: The ability to sense and understand its environment. This could be through sensors, input data, or other means of gathering information.
  2. Reasoning/decision-making: The core processing mechanism that helps the agent interpret its environment, make decisions, and plan actions. It could use various algorithms, such as decision trees, reinforcement learning, or neural networks.
  3. Action: Once the agent has analyzed the environment and made a decision, it takes action to achieve its goal, whether it’s performing an operation, giving a recommendation, or interacting with another system.
  4. Learning: Some AI agents can adapt over time, improving their decision-making and actions based on experience (via reinforcement learning, supervised learning, etc.).

AI agents can range from simple systems, like chatbots or virtual assistants, to more complex systems like autonomous vehicles, robots, or financial trading algorithms.

In general, the term “agent” emphasizes the idea of autonomy—the agent operates independently, often with the ability to learn, adapt, and make decisions based on changing conditions without direct human intervention.

In this example, the agent’s environment is a database containing the Drive Stats data (more on that below), and I want it to perform the following tasks:

  • Based on a natural language question, such as “Which drive has the lowest annual failure rate?”, generate a SQL query that retrieves data that will help answer the question.
  • Execute that query against the Drive Stats dataset.
  • Based on the query results, either create a new query that better answers the question, or generate a natural language answer.

As in my previous post, I’m using the open source LangChain framework. This tutorial on building a question/answering system over SQL data was my starting point. I’ll explain key points of the integration in this blog post; the full source code is available as a Jupyter notebook in the ai-agent-demo repository.

Querying the Drive Stats dataset

Now I’ve established that my agent will be writing a SQL query, the next question is, “What will it be querying?” I’ve written about querying the Drive Stats dataset before; in that blog post I explained how I wrote a Python script to convert the Drive Stats data from the CSV format in which we publish it to Apache Parquet, a column-oriented file format particularly well-suited for storing tabular data for use in analytical queries, and upload it to a Backblaze B2 Bucket using the Apache Hive table format. There’s a broad ecosystem of tools and platforms that can manipulate Parquet data in object storage (for example, Apache Spark and Snowflake) and I chose Trino, the open source distributed SQL engine that forms the basis for Amazon Athena, to execute queries against the data.

I could have used the same technologies for this exercise, but I decided to add Apache Iceberg to the mix. While Parquet is a file format that specifies how tabular data is stored in files, Iceberg is a table format that governs how those files can be combined and interpreted as a database table. Iceberg provides a number of advantages over Hive as a table format, including better performance and much more flexible data partitioning.

What is partitioning?

Partitioning splits a dataset on one or more column values, easing data management and improving performance when a query includes a partition column.

Partitioning by year and month makes sense for the Drive Stats dataset—the resulting Parquet files are in the hundreds of megabytes, the sweet spot for Parquet data. To apply this partitioning to the Drive Stats data using the Hive table format, I had to create otherwise redundant month and year columns from the existing date column, complicating the schema.

Iceberg, by contrast, supports hidden partitioning, allowing you to apply a transformation to a column value to produce a partition value without adding any new columns. With the Drive Stats data, that meant I could simply define the partitioning as month(date) (the resulting value being the number of months since 1/1/1970, rather than an integer between 1 and 12), with no need to create any additional columns.

LangChain’s SQLDatabase class provides access to databases via the SQLAlchemy  open-source Python library. The demo code obtains a SQLDatabase instance by providing a URI containing the trino scheme, a username and the location of the database node:

db = SQLDatabase.from_uri('trino://admin@localhost:8080/iceberg/drivestats')

Note: In this and other code excerpts in this blog post, I’ve omitted extraneous “boilerplate” code. As mentioned above, the full source code is available in the ai-agent-demo repository.

As you can infer from the localhost domain name, I’m running Trino on my laptop. I’m actually running it in Docker, using the Iceberg/Hive Docker Compose script from the trino-getting-started-b2 repository. I’ll dive into that example in a future blog post.

A simple query confirms that we have a successful database connection:

db.run("SELECT COUNT(*) FROM drivestats")

'[(537220724,)]'

As the result conveys, there are over 537 million rows in the Drive Stats dataset.

Each row contains the metrics collected from a single drive in the Backblaze fleet on a specific day. The schema has evolved over time, but, currently, the following columns are included:

  • date: The date of collection.
  • serial_number: The unique serial number of the drive.
  • model: The manufacturer’s model number of the drive.
  • capacity_bytes: The drive’s capacity in bytes.
  • failure: 1 if this was the last day that the drive was operational before failing, 0 if all is well.
  • pod_slot_num: The physical location of a drive within a storage server, as an integer from 0 to 59. The specific slot differs based on the storage server type and capacity: Backblaze (45 or 60 drives), Dell (26 drives), or Supermicro (60 drives).
  • pod_id: There are 20 storage servers in each Backblaze Vault. The pod_id is a numeric field with values from 0 to 19 assigned to each of the 20 storage servers.
  • vault_id: All data drives are members of a Backblaze Vault. Each Vault consists of either 900 or 1,200 hard drives divided evenly across 20 storage servers. The Vault is a numeric value starting at 1,000.
  • cluster_id: The name of a given collection of storage servers logically grouped together to optimize system performance, formatted as a numeric field with up to two digits. Note: At this time the cluster_id is not always correct; we are working on fixing that.
  • datacenter: The Backblaze data center where the drive is installed, currently one of ams5 (Amsterdam, Netherlands), iad1 (Reston, Virginia), phx1 (Phoenix, Arizona), sac0 (Sacramento, California), sac2 (Stockton, California) or, now live, yyz1, our new Toronto, Ontario, data center.
  • is_legacy_format: Currently 0, but may change in future as more fields are added.
  • A collection of SMART attributes. The number of attributes collected has risen over time; currently we store 93 SMART attributes in each record, each one in both raw and normalized form, with field names of the form smart_n_normalized and smart_n_raw, where n is between 1 and 255.

Using OpenAI to generate a SQL query

For this project, I decided to use the OpenAI API, rather than running a large language model (LLM) directly on my laptop. LangChain has a chat model integration for OpenAI, as well as many other providers, so you could use, for example, a local Llama model (via ChatOllama) or one of the Claude models (via ChatAnthropic) if you prefer.

To use the OpenAI API, you must sign up for an OpenAI account and create an OpenAI API key. This code loads the API key from a .env file and creates a chat model instance using OpenAI’s GPT-4o mini model:

# OPENAI_API_KEY must be defined in the .env file
load_dotenv()
llm = ChatOpenAI(model="gpt-4o-mini")

Now we need a system prompt template. We’ll combine this with the database schema and a natural language question to form the prompt that we send to OpenAI. As in the LangChain tutorial, I’m using a prompt from the LangChain Prompt Hub:

query_prompt_template = hub.pull("langchain-ai/sql-query-system-prompt")
query_prompt_template.messages[0].pretty_print()

This is the prompt template text, with the placeholders shown in {braces}:

================================ System Message ================================

Given an input question, create a syntactically correct {dialect} query to run to help find the answer. Unless the user specifies in his question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.

Never query for all the columns from a specific table, only ask for a few relevant columns given the question.

Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Only use the following tables:
{table_info}

Question: {input}

Notice how the template requires you to specify the correct SQL dialect, constrains the number of results returned, and encourages the model to not hallucinate column names that do not exist in the schema.

A helper function populates the prompt template, sends it to the model, and returns the generated SQL query:

def write_query(state: State):
    prompt = query_prompt_template.invoke(
        {
            "dialect": db.dialect,
            "top_k": 10,
            "table_info": db.get_table_info(),
            "input": state["question"],
        }
    )
    structured_llm = llm.with_structured_output(QueryOutput)
    result = structured_llm.invoke(prompt)
    return {"query": result["query"].rstrip(';')}

We can test the helper function by calling it directly with a Python dictionary containing a simple question:

question = {"question": "How many drives are there?"}
query = write_query(question)

The resulting query dictionary does indeed contain a valid SQL query, but it won’t give us the answer we are looking for.

{'query': 'SELECT COUNT(*) AS drive_count FROM drivestats'}

That query will tell us how many rows there are in the dataset, rather than how many drives. We supplied the database schema to the model, but we haven’t given it any information on the semantics of the columns in the drivestats table. We can provide a bit more detail to obtain the correct query:

question = {"question": "Each drive has its own serial number. How many drives are there?"}
query = write_query(question)

This time, the generated SQL query is correct:

{'query': 'SELECT COUNT(DISTINCT serial_number) AS total_drives FROM drivestats'}

As you can see, it’s important to check the output of AI models—they can and do generate unexpected results.

A second helper function executes the query against the database:

def execute_query(state: State):
    execute_query_tool = QuerySQLDatabaseTool(db=db)
    return {"result": execute_query_tool.invoke(state["query"])}

We can test it using the (correct) generated query:

result = execute_query(query)

{'result': '[(430464,)]'}

We need one more helper function, to pass the result set to the model and have it generate a natural language response. This time, we define our own prompt:

def generate_answer(state: State):
    prompt = (
        "Given the following user question, corresponding SQL query, "
        "and SQL result, answer the user question.\n\n"
        f'Question: {state["question"]}\n'
        f'SQL Query: {state["query"]}\n'
        f'SQL Result: {state["result"]}'
    )
    response = llm.invoke(prompt)
    return {"answer": response.content}

Again, we can test it in isolation. Notice that we have to provide the question and query, as well as the result so that the model has the context it needs:

answer = generate_answer(question | query | result)
answer['answer']

'There are 430,464 drives.'

Success! At the present time, there are indeed 430,464 drives in the Drive Stats dataset.

LangChain’s LangGraph orchestration framework allows us to compile our three helper functions into a single graph object:

graph_builder = StateGraph(State).add_sequence(
    [write_query, execute_query, generate_answer]
)
graph_builder.add_edge(START, "write_query")
graph = graph_builder.compile()

We can visualize the flow in the notebook:

display(Image(graph.get_graph().draw_mermaid_png()))
A diagram showing a query workflow. The workflow is defined as start, write_query, execute_query, generate_answer.

We’ve combined the write_query and execute_query steps into a graph object that can run agent-generated queries. I’ll quote the security note from the LangChain tutorial on the inherent risks in doing so:

Building Q&A systems of SQL databases requires executing model-generated SQL queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent’s needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, see here.

In this example, we are querying a public dataset, and I followed best practice by configuring Trino’s Iceberg connector with a read-only application key scoped to the bucket containing the Drive Stats Iceberg tables.

Now let’s stream a new question through the flow. This mode of operation displays the output of each step as it is executed, essential for understanding the flow’s behavior, particularly when it is behaving unexpectedly. The model returns structured text in Markdown format. With a couple of lines of code to extract the message from the step variable, we can use the display_markdown function to render each step’s output:

for step in graph.stream(
    {"question": "Each drive has its own serial number. How many drives did each data center have on 9/1/2024"}, stream_mode="updates"
):
    # unwrap the step value to get the markdown message
    state = one(step.values())
    message = one(state.values())
    display_markdown(message, raw=True)

This is the model’s output, and it gives us three different messages. I’ve separated them with a horizontal line for clarity:

SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count FROM drivestats WHERE date = ‘2024-09-01’ GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10


Error: (trino.exceptions.TrinoUserError) TrinoUserError(type=USER_ERROR, name=TYPE_MISMATCH, message=”line 3:12: Cannot apply operator: date = varchar(10)”, query_id=20250113_221649_00214_bsut5) [SQL: SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count FROM drivestats WHERE date = ‘2024-09-01’ GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10] (Background on this error at: https://sqlalche.me/e/20/f405)


The SQL query encountered an error because it attempted to compare a date column with a string in the format ‘YYYY-MM-DD’. Specifically, the error message indicates that the date column is of a different type (likely not a string), which is causing the type mismatch.

To answer the user question about how many drives each data center had on 9/1/2024, the SQL query needs to be corrected. Here’s the revised query:

SELECT datacenter, COUNT(DISTINCT serial_number) AS drive_count 
FROM drivestats 
WHERE date = DATE '2024-09-01' 
GROUP BY datacenter 
ORDER BY drive_count DESC 
LIMIT 10

This corrected query uses the DATE keyword to ensure that the date string is properly interpreted as a date type, which should resolve the type mismatch error. If executed successfully, this query will provide the count of distinct drives in each data center for the specified date. However, without executing the corrected query against the database, we cannot provide the exact counts.

As you can see in the output from the first step, the model generated an invalid query, comparing a date to a string, despite the database schema being included in the prompt. The output of the second step contains the resulting error message from the database, while the third step contains the model’s diagnosis of the error.

This exchange highlights a limitation of a flow that is simply a linear series of steps, such as write_query, execute_query, and generate_answer. We cannot rely on the model to generate a valid SQL query, although it is able to point the way towards resolving its error.

Creating a ReAct AI agent with LangGraph

The LangGraph framework gives you the capability to create AI agents based on arbitrarily complex logic. In this article, I’ve used its prebuilt ReAct (Reason+Act) agent, since it neatly demonstrates the agent concept, rewriting the SQL query repeatedly in response to database errors.

There are three steps to creating the agent. The first is to create an instance of LangChain’s SQLDatabaseToolkit, passing it the database and model, and obtain its list of tools:

toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()

The tools list contains tools that execute queries, retrieve the names, schemas and content of database tables, and check SQL query syntax.

The next step is to retrieve a suitable prompt template from the Prompt Hub and populate the template placeholders:

prompt_template = hub.pull("langchain-ai/sql-agent-system-prompt")
system_message = prompt_template.format(dialect=db.dialect, top_k=10)

Here is the prompt template’s text:

================================ System Message ================================

You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the below tools. Only use the information returned by the below tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.

To start you should ALWAYS look at the tables in the database to see what you can query.
Do NOT skip this step.
Then you should query the schema of the most relevant tables.

Now we can create an instance of the prebuilt agent:

agent_executor = create_react_agent(llm, tools, 
state_modifier=system_message)

Note how the agent must select the next step, and how the flow can cycle between the agent and tools steps:

display(Image(agent_executor.get_graph().draw_mermaid_png()))
A diagram showing the workflow between tools and agent. The workflow is as follows: start, agent, then a split option to access tools (a recursive step), or to end. The diagram shows that after agent, you can optionally select tools or end, indicating that you can end without choosing tools.

Again, we can stream the agent’s execution to show us each step of its operation.

for step in agent_executor.stream(
    {"messages": [{"role": "user", "content": "Each drive has its own serial number. How many drives did each data center have on 9/1/2024?"}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

The output from this flow is over 300 lines long; I posted it in its entirety as a Gist, but I’ll summarize the steps here:

  • Question: Each drive has its own serial number. How many drives did each data center have on 9/1/2024?
  • The model calls the “list tables” tool.
  • The list tables tool responds with a single table name, drivestats.
  • The model calls the “get schema” tool, passing it the table name.
  • The get schema tool responds with the schema and three sample rows from the drivestats table.
  • The model submits a query to the “query checker” tool:
    SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10;
  • The query checker responds with the checked query, which is the same as its input. Note that the query checker only checks the SQL query’s syntax. The query contains the same data type mismatch as the query we generated earlier, as well as another error, as we’re about to discover.
  • The model submits the query to the “query executor” tool.
  • The query executor responds with a syntax error—Trino does not allow a trailing semi-colon on the query.
  • The model submits a modified query to the query checker tool:
    SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10
  • The query checker responds with the checked query, which is the same as its input.
  • The model submits the query to the “query executor” tool.
  • The query executor responds with a type mismatch error since the query tries to compare a string value with a date column.
  • The model submits a query with the necessary DATE type identifier to the query checker tool:
    SELECT datacenter, COUNT(serial_number) AS drive_count FROM drivestats WHERE date = DATE '2024-09-01' GROUP BY datacenter ORDER BY drive_count DESC LIMIT 10
  • The query checker responds with the checked query, which is the same as its input.
  • The model submits the query to the “query executor” tool.
  • The query executor responds with a result set:
    [
      ('phx1', 89477),
      ('sac0', 78444),
      ('sac2', 60775),
      ('', 24080),
      ('iad1', 22800),
      ('ams5', 16139)
    ]
  • The model returns a message containing the answer:

On September 1, 2024, the following datacenters had the specified number of drives:

1. phx1: 89,477 drives
2. sac0: 78,444 drives
3. sac2: 60,775 drives
4. (unknown datacenter): 24,080 drives
5. iad1: 22,800 drives
6. ams5: 16,139 drives

These results show the datacenters with their respective drive counts.

Now let’s see if the model can calculate the annualized failure rate of a drive model. We’ll use the Seagate ST4000DM000, just because that is the drive model with the most days of operation in the dataset.

for step in agent_executor.stream(
        {"messages": [{"role": "user", "content": "Each drive has its own serial number. What is the annualized failure rate of the ST4000DM000 drive model?"}]},
        stream_mode="values",
):
    step["messages"][-1].pretty_print()

The agent’s response mixes Markdown and LaTex notation. I used QuickLaTeX to render the LaTex to images:

The annualized failure rate (AFR) for the ST4000DM000 drive model can be calculated using the following information:

– Total failures: 5,791

– Total drives: 37,040

– Time period: from May 10, 2013, to September 30, 2024, which is approximately 11.35 years.

The formula for calculating the annualized failure rate is:

The calculation for the annualized failure rate. It's total failures divided by total drives, multiplied by one over the total years, multiplied by 100.

Plugging in the numbers:

Real number for the annualize failure rate calculations. In this instance, the text reads 5791 divided by 37040, multiplied by one over 11.35, multiplied by 100, which equals approximately 13.77 percent.

Therefore, the annualized failure rate (AFR) of the ST4000DM000 drive model is approximately 13.77%.

It’s impressive that the agent shows its working so comprehensively, but, unfortunately, it arrives at the wrong answer. Those drives were not all running for the entire span of the Drive Stats dataset. The correct calculation involves determining the number of days with data for those drives and dividing it by 365 to get the correct number of years’ operation.

It’s clear that the model is not able to answer questions on drive reliability given the data available to it so far. The solution lies in prompt engineering—providing more context on the semantics of the data in the system prompt.

We can extend the default AI agent system prompt template to include specific instructions on working with the Drive Stats dataset:

prompt_template.messages[0].prompt.template += """
Each row of the drivestats table records one day of a drive’s operation, and contains the serial number of a drive, its model name, capacity in bytes, whether it failed on that day, SMART attributes and identifiers for the slot, pod, vault, cluster and data center in which it is located.

Use this calculation for the annualized failure rate (AFR) for a drive model over a given time period:

1. **drive_days** is the number of rows for that model during the time period.
2. **failures** is the number of rows for that model during the time period where **failure** is equal to 1.
3. **annual failure rate** is 100 * (**failures** / (**drive_days** / 365)).

Use double precision arithmetic in the calculation to avoid truncation errors. To convert an integer **i** to a double, use CAST(**i** AS DOUBLE)

Note that the date column is a DATE type, not a string. Use the DATE type identifier when comparing the date column to a string.

Do not add a semi-colon suffix to SQL queries."""

Now, when we ask the same question on the annual failure rate of the ST4000DM000 drive model, the AI agent generates a correct SQL query and a more concise, and correct, final response (you can inspect the full output here).

SELECT 100 * (CAST(COUNT(CASE WHEN failure = 1 THEN 1 END) AS DOUBLE) / (COUNT(*) / 365)) AS annual_failure_rate
FROM drivestats
WHERE model = 'ST4000DM000'

The annual failure rate (AFR) for the ST4000DM000 drive model is approximately 2.63%.

Let’s ask the AI agent for a statistic that we can corroborate from the Backblaze Drive Stats for Q3 2024 blog post.

response = agent_executor.invoke(
    {"messages": [{"role": "user", "content": "What was the annual failure rate of the ST8000NM000A drive model in Q3 2024?"}]}
)
response['messages'][-3].pretty_print()
display_markdown(response['messages'][-1].content, raw=True)

The query makes sense, and the response agrees with the table in the blog post:

SELECT 100 * (CAST(SUM(failure) AS DOUBLE) / (COUNT(*) / 365)) AS annual_failure_rate
FROM drivestats
WHERE model = 'ST8000NM000A' AND date >= DATE '2024-07-01' AND date < DATE '2024-10-01'

The annual failure rate (AFR) of the ST8000NM000A drive model in Q3 2024 is approximately 1.61%.

Interestingly, this time the SQL query used SUM(failure) to count the number of failures, rather than the equivalent, but rather long-winded COUNT(CASE WHEN failure = 1 THEN 1 END) it used in the previous query. Also, looking at the full response, we can see that, as directed by the custom prompt, the agent generated the correct syntax for comparing dates, so it didn’t need to correct and retry any queries.

Finally, let’s ask a more convoluted question, including the constraints given in the blog post:

response = agent_executor.invoke(
    {"messages": [{"role": "user", "content": "Considering only drive models which had at least 100 drives in service at the end of the quarter and which accumulated 10,000 or more drive days during the quarter, which drive had the most failures in Q3 2024, and what was its failure rate?"}]}
)
response['messages'][-3].pretty_print()
display_markdown(response['messages'][-1].content, raw=True)

Again, the AI agent is able to generate a valid SQL query, this time including a subquery, and its response matches the data from the blog post exactly:

WITH drive_stats AS (
    SELECT model,
           COUNT(DISTINCT serial_number) AS drive_count,
           COUNT(*) AS drive_days,
           COUNT(CASE WHEN failure = 1 THEN 1 END) AS failures
    FROM drivestats
    WHERE date >= DATE '2024-07-01' AND date < DATE '2024-10-01'
    GROUP BY model
    HAVING COUNT(DISTINCT serial_number) >= 100 AND COUNT(*) >= 10000
)
SELECT model,
       failures,
       100 * (CAST(failures AS DOUBLE) / (CAST(drive_days AS DOUBLE) / 365)) AS failure_rate
FROM drive_stats
ORDER BY failures DESC
LIMIT 10

The drive model with the most failures in Q3 2024 is the TOSHIBA MG08ACA16TA, which had 181 failures. Its failure rate during this period was approximately 1.84%.

Closing thoughts

My experience building an AI agent was astonishment at its ability to correctly generate quite complex SQL queries based on natural language instructions, tempered with frustration at its limitations, particularly the way that it would confidently generate an incorrect response, rather than saying “I’m sorry, but I don’t know how to do that.” Your AI agent development process should include generous testing time, as well as ongoing monitoring to ensure that it is coming up with the right answers.

About Pat Patterson

Pat Patterson is the chief technical evangelist at Backblaze. Over his three decades in the industry, Pat has built software and communities at Sun Microsystems, Salesforce, StreamSets, and Citrix. In his role at Backblaze, he creates and delivers content tailored to the needs of the hands-on technical professional, acts as the “voice of the developer” on the Product team, and actively participates in the wider technical community. Outside the office, Pat runs far, having completed ultramarathons up to the 50 mile distance. Catch up with Pat via Twitter or LinkedIn.