6.9 KiB
Hash+Store Priority Pattern & Database Connection Fixes
Summary of Changes
1. Database Connection Leak Fixes ✅
Problem: FolderDB connections were not being properly closed, causing database locks and resource leaks.
Files Fixed:
cmdlets/search_store.py- Now useswith FolderDB()context managercmdlets/search_provider.py- Now useswith FolderDB()context managerhelper/store.py(Folder.init) - Now useswith FolderDB()for temporary connectionshelper/worker_manager.py- Addedclose()method and context manager support (__enter__/__exit__)
Pattern:
# OLD (leaked connections):
db = FolderDB(path)
try:
db.do_something()
finally:
if db:
db.close() # Could be skipped if exception occurs early
# NEW (guaranteed cleanup):
with FolderDB(path) as db:
db.do_something()
# Connection automatically closed when exiting block
2. Hash+Store Priority Pattern ✅
Philosophy: The hash+store pair is the canonical identifier for files across all storage backends. Sort order and table structure should not matter because we're always using hash+store.
Why This Matters:
@Nselections pass hash+store from search results- Hash+store works consistently across all backends (Hydrus, Folder, Remote)
- Path-based resolution is fragile (files move, temp paths expire, etc.)
- Hash+store never changes and uniquely identifies content
Updated Resolution Priority in add_file.py:
def _resolve_source(result, path_arg, pipe_obj, config):
"""
PRIORITY 1: hash+store from result dict (most reliable for @N selections)
- Checks result.get("hash") and result.get("store")
- Uses FileStorage[store].get_file(hash) to retrieve
- Works for: Hydrus, Folder, Remote backends
PRIORITY 2: Explicit -path argument
- Direct path specified by user
PRIORITY 3: pipe_obj.file_path
- Legacy path from previous pipeline stage
PRIORITY 4: Hydrus hash from pipe_obj.extra
- Fallback for older Hydrus workflows
PRIORITY 5: String/list result parsing
- Last resort for simple string paths
"""
Example Flow:
# User searches and selects result
$ search-store system:limit=5
# Result items include:
{
"hash": "a1b2c3d4...",
"store": "home", # Specific Hydrus instance
"title": "example.mp4"
}
# User selects @2 (index 1)
$ @2 | add-file -storage test
# add-file now:
1. Extracts hash="a1b2c3d4..." store="home" from result dict
2. Calls FileStorage["home"].get_file("a1b2c3d4...")
3. Retrieves actual file path from "home" backend
4. Proceeds with copy/upload to "test" storage
3. Benefits of This Approach
Consistency:
- @N selection always uses the same hash+store regardless of display order
- No confusion about which row index maps to which file
- Table synchronization issues (rows vs items) don't break selection
Reliability:
- Hash uniquely identifies content (SHA256 collision is effectively impossible)
- Store identifies the authoritative source backend
- No dependency on temporary paths or file locations
Multi-Instance Support:
- Works seamlessly with multiple Hydrus instances ("home", "work")
- Works with mixed backends (Hydrus + Folder + Remote)
- Each backend can independently retrieve file by hash
Debugging:
- Hash+store are visible in debug logs:
[add-file] Using hash+store: hash=a1b2c3d4..., store=home - Easy to trace which backend is being queried
- Clear error messages when hash+store lookup fails
How @N Selection Works Now
Selection Process:
-
Search creates result list with hash+store:
results_list = [ {"hash": "abc123...", "store": "home", "title": "file1.mp4"}, {"hash": "def456...", "store": "default", "title": "file2.jpg"}, {"hash": "ghi789...", "store": "test", "title": "file3.png"}, ] -
User selects @2 (second item, index 1):
- CLI extracts:
result = {"hash": "def456...", "store": "default", "title": "file2.jpg"} - Passes this dict to the next cmdlet
- CLI extracts:
-
Next cmdlet receives dict with hash+store:
def run(self, result, args, config): # result is the dict from selection file_hash = result.get("hash") # "def456..." store_name = result.get("store") # "default" # Use hash+store to retrieve file backend = FileStorage(config)[store_name] file_path = backend.get_file(file_hash)
Why This is Better Than Path-Based:
Path-Based (OLD):
# Fragile: path could be temp file, symlink, moved file, etc.
result = {"file_path": "/tmp/hydrus-abc123.mp4"}
# What if file was moved? What if it's a temp path that expires?
Hash+Store (NEW):
# Reliable: hash+store always works regardless of current location
result = {"hash": "abc123...", "store": "home"}
# Backend retrieves current location from its database/API
Testing the Fixes
1. Test Database Connections:
# Search multiple times and check for database locks
search-store system:limit=5
search-store system:limit=5
search-store system:limit=5
# Should complete without "database is locked" errors
2. Test Hash+Store Selection:
# Search and select
search-store system:limit=5
@2 | get-metadata
# Should show metadata for the selected file using hash+store
# Debug log should show: [add-file] Using hash+store from result: hash=...
3. Test WorkerManager Cleanup:
# In Python script:
from helper.worker_manager import WorkerManager
from pathlib import Path
with WorkerManager(Path("C:/path/to/library")) as wm:
# Do work
pass
# Database automatically closed when exiting block
Cmdlets That Already Use Hash+Store Pattern
These cmdlets already correctly extract hash+store:
- ✅
get-file- Export file via hash+store - ✅
get-metadata- Retrieve metadata via hash+store - ✅
get-url- Get url via hash+store - ✅
get-tag- Get tags via hash+store - ✅
add-url- Add URL via hash+store - ✅
delete-url- Delete URL via hash+store - ✅
add-file- NOW UPDATED to prioritize hash+store
Future Improvements
-
Make hash+store mandatory in result dicts:
- All search cmdlets should emit hash+store
- Validate that result dicts include these fields
-
Add hash+store validation:
- Warn if hash is not 64-char hex string
- Warn if store is not a registered backend
-
Standardize error messages:
- "File not found via hash+store: hash=abc123 store=home"
- Makes debugging much clearer
-
Consider deprecating path-based workflows:
- Migrate legacy cmdlets to hash+store pattern
- Remove path-based fallbacks once all cmdlets updated
Key Takeaway
The hash+store pair is now the primary way to identify and retrieve files across the entire system. This makes the codebase more reliable, consistent, and easier to debug. Database connections are properly cleaned up to prevent locks and resource leaks.