Claude tests a BareMetal VM

June 3, 2026 - Ian Seyler

With BareMetal’s potential to be used as an agent I thought it would be a good idea to see what a LLM could do when given an instance to use.

A new script has been added to the BareMetal-Firecracker repo to allow for a user (human or LLM) to interact with a BareMetal VM hosted via Firecracker.

The test subject was Claude Code and it was given a minimal prompt - Use only firecracker-agent.sh to interact with a BareMetal VM. Execute 'help' in the VM to understand operation. Start sqlite and run some commands in it to verify its functionality. Stop the VM by executing the 'shutdown' command or via the agent script.

The LLM interaction:

ian@dev01:~/Code/BareMetal-Firecracker$ claude
 ▐▛███▜▌   Claude Code v2.1.158
▝▜█████▛▘  Sonnet 4.6 · Claude Pro
  ▘▘ ▝▝    ~/Code/BareMetal-Firecracker


❯ Use only firecracker-agent.sh to interact with a BareMetal VM. Execute 'help' in the VM to understand operation. Start sqlite and run some commands in it to verify its functionality. Stop the VM by executing the 'shutdown' command or via the agent script.

● Let me start by reading the firecracker-agent.sh script to understand how to use it.

  Read 1 file (ctrl+o to expand)

● Now I'll start the VM and wait for it to boot up.

● Bash(./firecracker-agent.sh start)
  ⎿  No screen session found.
     VM started. Log: /tmp/fc-serial.log

● Bash(sleep 2 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     >

● VM is booted. Now let me run the help command to understand what's available.

● Bash(./firecracker-agent.sh send "help" && sleep 1 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +10 lines (ctrl+o to expand)

● Good. Let me see what programs are on disk.

● Bash(./firecracker-agent.sh send "dir" && sleep 1 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +20 lines (ctrl+o to expand)

● SQLite is program #6. Let me load it.

● Bash(./firecracker-agent.sh send "load" && sleep 0.5 && ./firecracker-agent.sh send "6" && sleep 2 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +22 lines (ctrl+o to expand)

● SQLite is loaded. Now let me execute it.

● Bash(./firecracker-agent.sh send "exec" && sleep 3 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +28 lines (ctrl+o to expand)

● SQLite is running. Now let me run some SQL commands to verify functionality.

● Bash(./firecracker-agent.sh send "CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT, age INTEGER);" && sleep 1 && ./firecracker-agent.sh
      output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +29 lines (ctrl+o to expand)

● Table created. Now insert some rows.

● Bash(./firecracker-agent.sh send "INSERT INTO users VALUES (1, 'Alice', 30);" && sleep 0.5 && \
      ./firecracker-agent.sh send "INSERT INTO users VALUES (2, 'Bob', 25);…)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +32 lines (ctrl+o to expand)

● Rows inserted. Now let me query them.

● Bash(./firecracker-agent.sh send "SELECT * FROM users;" && sleep 1 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +37 lines (ctrl+o to expand)

● SELECT works perfectly. Let me run a few more queries to verify additional functionality.

● Bash(./firecracker-agent.sh send "SELECT name, age FROM users WHERE age > 28 ORDER BY age DESC;" && sleep 1 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +41 lines (ctrl+o to expand)

● WHERE and ORDER BY work. Let me test an aggregate function.

● Bash(./firecracker-agent.sh send "SELECT COUNT(*), AVG(age), MIN(age), MAX(age) FROM users;" && sleep 1 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +44 lines (ctrl+o to expand)

● Aggregates work correctly. Now let me exit SQLite and shut down the VM.

● Bash(./firecracker-agent.sh send ".exit" && sleep 1 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +47 lines (ctrl+o to expand)

● Back at the BareMetal prompt. Now shutting down the VM.

● Bash(./firecracker-agent.sh send "shutdown" && sleep 2 && ./firecracker-agent.sh output)
  ⎿  [cpu: 1]  [mem: 2 MiB]  [net: 02:FC:AB:CD:EF:01]

     > help
     … +47 lines (ctrl+o to expand)

● Bash(./firecracker-agent.sh status)
  ⎿  VM is not running

● Everything worked. Here's a summary of the session:

  VM Boot: BareMetal started with 1 vCPU / 4 MiB RAM on Firecracker.

  Help output showed 8 built-in commands: cls, dir, load, exec, ver, peek, poke, dump.

  dir listed 6 programs on the BMFS disk, including sqlite.app (1 MB) at slot #6.

  SQLite 3.53.1 launched successfully after load 6 + exec. The following SQL was verified:

  ┌───────────────────────────────┬──────────────────────────────────┐
  │             Test              │              Result              │
  ├───────────────────────────────┼──────────────────────────────────┤
  │ CREATE TABLE                  │ OK                               │
  ├───────────────────────────────┼──────────────────────────────────┤
  │ INSERT (3 rows)               │ OK                               │
  ├───────────────────────────────┼──────────────────────────────────┤
  │ SELECT *                      │ All 3 rows returned correctly    │
  ├───────────────────────────────┼──────────────────────────────────┤
  │ WHERE + ORDER BY DESC         │ Filtered and sorted correctly    │
  ├───────────────────────────────┼──────────────────────────────────┤
  │ COUNT(*), AVG(), MIN(), MAX() │ 3 / 30.0 / 25 / 35 — all correct │
  └───────────────────────────────┴──────────────────────────────────┘

  Shutdown: .exit returned to BareMetal shell, shutdown command terminated the VM cleanly.

✻ Crunched for 1m 17s

The output logic of the script will need to be adjusted as any of the new input wasn’t being displayed in the Claude Code interface.

Summary

This opens up a straightforward path to automated integration testing for BareMetal and its applications. Claude Code (on any other LLM) can run a build, deploy the binary to the disk image, boot the VM, test the application, and report results.

Cookie Settings
This website uses cookies

Cookie Settings

We use cookies to improve user experience. Choose what cookie categories you allow us to use. You can read more about our Cookie Policy by clicking on Cookie Policy below.

These cookies enable strictly necessary cookies for security, language support and verification of identity. These cookies can’t be disabled.

These cookies collect data to remember choices users make to improve and give a better user experience. Disabling can cause some parts of the site to not work properly.

These cookies help us to understand how visitors interact with our website, help us measure and analyze traffic to improve our service.

These cookies help us to better deliver marketing content and customized ads.