# Hash+Store Priority Pattern & Database Connection Fixes ## Summary of Changes ### 1. Database Connection Leak Fixes ✅ **Problem:** FolderDB connections were not being properly closed, causing database locks and resource leaks. **Files Fixed:** - `cmdlets/search_store.py` - Now uses `with FolderDB()` context manager - `cmdlets/search_provider.py` - Now uses `with FolderDB()` context manager - `helper/store.py` (Folder.__init__) - Now uses `with FolderDB()` for temporary connections - `helper/worker_manager.py` - Added `close()` method and context manager support (`__enter__`/`__exit__`) **Pattern:** ```python # OLD (leaked connections): db = FolderDB(path) try: db.do_something() finally: if db: db.close() # Could be skipped if exception occurs early # NEW (guaranteed cleanup): with FolderDB(path) as db: db.do_something() # Connection automatically closed when exiting block ``` ### 2. Hash+Store Priority Pattern ✅ **Philosophy:** The hash+store pair is the **canonical identifier** for files across all storage backends. Sort order and table structure should not matter because we're always using hash+store. **Why This Matters:** - `@N` selections pass hash+store from search results - Hash+store works consistently across all backends (Hydrus, Folder, Remote) - Path-based resolution is fragile (files move, temp paths expire, etc.) - Hash+store never changes and uniquely identifies content **Updated Resolution Priority in `add_file.py`:** ```python def _resolve_source(result, path_arg, pipe_obj, config): """ PRIORITY 1: hash+store from result dict (most reliable for @N selections) - Checks result.get("hash") and result.get("store") - Uses FileStorage[store].get_file(hash) to retrieve - Works for: Hydrus, Folder, Remote backends PRIORITY 2: Explicit -path argument - Direct path specified by user PRIORITY 3: pipe_obj.file_path - Legacy path from previous pipeline stage PRIORITY 4: Hydrus hash from pipe_obj.extra - Fallback for older Hydrus workflows PRIORITY 5: String/list result parsing - Last resort for simple string paths """ ``` **Example Flow:** ```bash # User searches and selects result $ search-store system:limit=5 # Result items include: { "hash": "a1b2c3d4...", "store": "home", # Specific Hydrus instance "title": "example.mp4" } # User selects @2 (index 1) $ @2 | add-file -storage test # add-file now: 1. Extracts hash="a1b2c3d4..." store="home" from result dict 2. Calls FileStorage["home"].get_file("a1b2c3d4...") 3. Retrieves actual file path from "home" backend 4. Proceeds with copy/upload to "test" storage ``` ### 3. Benefits of This Approach **Consistency:** - @N selection always uses the same hash+store regardless of display order - No confusion about which row index maps to which file - Table synchronization issues (rows vs items) don't break selection **Reliability:** - Hash uniquely identifies content (SHA256 collision is effectively impossible) - Store identifies the authoritative source backend - No dependency on temporary paths or file locations **Multi-Instance Support:** - Works seamlessly with multiple Hydrus instances ("home", "work") - Works with mixed backends (Hydrus + Folder + Remote) - Each backend can independently retrieve file by hash **Debugging:** - Hash+store are visible in debug logs: `[add-file] Using hash+store: hash=a1b2c3d4..., store=home` - Easy to trace which backend is being queried - Clear error messages when hash+store lookup fails ## How @N Selection Works Now ### Selection Process: 1. **Search creates result list with hash+store:** ```python results_list = [ {"hash": "abc123...", "store": "home", "title": "file1.mp4"}, {"hash": "def456...", "store": "default", "title": "file2.jpg"}, {"hash": "ghi789...", "store": "test", "title": "file3.png"}, ] ``` 2. **User selects @2 (second item, index 1):** - CLI extracts: `result = {"hash": "def456...", "store": "default", "title": "file2.jpg"}` - Passes this dict to the next cmdlet 3. **Next cmdlet receives dict with hash+store:** ```python def run(self, result, args, config): # result is the dict from selection file_hash = result.get("hash") # "def456..." store_name = result.get("store") # "default" # Use hash+store to retrieve file backend = FileStorage(config)[store_name] file_path = backend.get_file(file_hash) ``` ### Why This is Better Than Path-Based: **Path-Based (OLD):** ```python # Fragile: path could be temp file, symlink, moved file, etc. result = {"file_path": "/tmp/hydrus-abc123.mp4"} # What if file was moved? What if it's a temp path that expires? ``` **Hash+Store (NEW):** ```python # Reliable: hash+store always works regardless of current location result = {"hash": "abc123...", "store": "home"} # Backend retrieves current location from its database/API ``` ## Testing the Fixes ### 1. Test Database Connections: ```powershell # Search multiple times and check for database locks search-store system:limit=5 search-store system:limit=5 search-store system:limit=5 # Should complete without "database is locked" errors ``` ### 2. Test Hash+Store Selection: ```powershell # Search and select search-store system:limit=5 @2 | get-metadata # Should show metadata for the selected file using hash+store # Debug log should show: [add-file] Using hash+store from result: hash=... ``` ### 3. Test WorkerManager Cleanup: ```powershell # In Python script: from helper.worker_manager import WorkerManager from pathlib import Path with WorkerManager(Path("C:/path/to/library")) as wm: # Do work pass # Database automatically closed when exiting block ``` ## Cmdlets That Already Use Hash+Store Pattern These cmdlets already correctly extract hash+store: - ✅ `get-file` - Export file via hash+store - ✅ `get-metadata` - Retrieve metadata via hash+store - ✅ `get-url` - Get url via hash+store - ✅ `get-tag` - Get tags via hash+store - ✅ `add-url` - Add URL via hash+store - ✅ `delete-url` - Delete URL via hash+store - ✅ `add-file` - **NOW UPDATED** to prioritize hash+store ## Future Improvements 1. **Make hash+store mandatory in result dicts:** - All search cmdlets should emit hash+store - Validate that result dicts include these fields 2. **Add hash+store validation:** - Warn if hash is not 64-char hex string - Warn if store is not a registered backend 3. **Standardize error messages:** - "File not found via hash+store: hash=abc123 store=home" - Makes debugging much clearer 4. **Consider deprecating path-based workflows:** - Migrate legacy cmdlets to hash+store pattern - Remove path-based fallbacks once all cmdlets updated ## Key Takeaway **The hash+store pair is now the primary way to identify and retrieve files across the entire system.** This makes the codebase more reliable, consistent, and easier to debug. Database connections are properly cleaned up to prevent locks and resource leaks.