The Crucial Distinction: Redundancy vs. Eviction
In the storage world, words have strict definitions. A Backup is a duplicate copy of active data created for disaster recovery. If your primary server catches fire, you restore the backup. Crucially, a backup means your data exists in two places simultaneously.
An Archive, on the other hand, is about economics. It is the process of moving cold, inactive data off your expensive primary storage and onto cheap, deep storage (like LTO tape or Cloud object storage). Your data now exists in one place. The goal isn't redundancy; the goal is space reclamation.
Data Duplication
Data Eviction
If you treat an archive like a backup, you defeat its purpose. The whole reason HuskHoard exists is to prevent you from having to buy another $15,000 flash array just because your team finished a massive project and left the files sitting around.
The Illusion of Presence (File Stubbing)
The biggest hurdle to archiving is human psychology. Users hate archives because they hate "losing" their files. If you move a folder to a tape drive, the user can't see it anymore. They panic, call IT, and ask for it back.
HuskHoard solves this by providing the illusion that the file never left. We achieve this using a Linux kernel feature called Hole Punching.
When the HuskHoard background worker (the Janitor) successfully writes a file to tape, it doesn't delete the original file. Instead, it asks the Linux kernel to "punch a hole" through the entire file. The kernel drops the physical data blocks from the SSD, but it strictly maintains the file's metadata.
If a user runs ls -l, the video file still says it's 100 GB. It still has the same creation date. It still lives in the same folder. But if you run du -h, it reports taking up 0 bytes* on the disk.
To the user, nothing changed. To your IT budget, you just reclaimed 100 GB of NVMe space.
The Hot Tier High-Water Mark
So, how does HuskHoard decide when to evict a file? It runs an event-driven policy engine (the Janitor) that watches the "Hot Tier" (your expensive SSDs).
By default, the Janitor is looking at file age (e.g., "archive anything untouched for 30 days"). But what happens if a user suddenly dumps 50 TB of data onto your hot tier overnight? You don't have 30 days to wait.
The default hot_tier_max_usage_percent. If disk usage exceeds this, HuskHoard enters Emergency Spillover mode.
Because stubbed files use 0 physical blocks, reclaiming space is mathematically perfect and instantaneous.
HuskHoard sorts active tracks by last_touch ASC to ensure the oldest, coldest data is evicted first to save your array.
When the 80% High-Water Mark is breached, HuskHoard calculates exactly how many bytes it needs to free to return to safe levels (emergency_bytes_to_free). It ignores the 30-day age rule, finds the oldest files on the drive, safely streams them to tape, and immediately punches holes in them until the SSD is back in the green.
O(1) Streaming: The StreamGate Architecture
Evicting data is the easy part. The magic of an active archive is how it handles reads.
If a user double-clicks a stubbed file, their media player tries to read the 0-byte hole. Normally, this would result in the OS returning empty zeroes. But HuskHoard is listening.
Using fanotify_init(FAN_CLASS_PRE_CONTENT), HuskHoard intercepts the read request at the kernel level before the application gets any data. We pause the application thread, look up the file in our SQLite catalog, and wake up the hardware.
But we don't just restore the whole file. If a user scrubs to the middle of a 200 GB video file, restoring the whole file would take 15 minutes. Instead, we built StreamGate.
When HuskHoard archives a file, it compresses it using Zstandard (Zstd) in 16MB frames. It records the uncompressed offset, compressed offset, and size of every single frame into a Jump Table in the database.
When VLC asks for byte offset 50,000,000,000, StreamGate queries the Jump Table, finds exactly which 16MB frame on the physical tape holds that byte, seeks the tape drive directly to that physical block, decompresses just that frame in RAM, and pipes it directly into the application's file descriptor.
Stop Wasting Expensive NVMe
If an application opens a file just to overwrite it (O_TRUNC), HuskHoard detects the flag and instantly bypasses the tape entirely, tearing down the stub and letting the user write fresh data at SSD speeds.
This is the difference between a backup and an active archive. A backup is a passive insurance policy. HuskHoard is an active, kernel-level storage tier that blends the extreme speed of NVMe with the infinite, penny-per-gigabyte depth of LTO tape. It's time to stop hoarding cold data on hot disks.
* 0 bytes of payload! 0 bytes would mean there is no information about the file. There are about 256 bytes that make up the sparse data in the stub and the smallest the file system can report is 4Kb. we will talk more about this in another post.