dfdkflj

2025-12-11 12:47:30 -08:00
parent 6b05dc5552
commit 65d12411a2
92 changed files with 17447 additions and 14308 deletions
--- a/HASH_STORE_PRIORITY_PATTERN.md
+++ b/HASH_STORE_PRIORITY_PATTERN.md
@@ -0,0 +1,222 @@
+# Hash+Store Priority Pattern & Database Connection Fixes
+
+## Summary of Changes
+
+### 1. Database Connection Leak Fixes ✅
+
+**Problem:** FolderDB connections were not being properly closed, causing database locks and resource leaks.
+
+**Files Fixed:**
+- `cmdlets/search_store.py` - Now uses `with FolderDB()` context manager
+- `cmdlets/search_provider.py` - Now uses `with FolderDB()` context manager  
+- `helper/store.py` (Folder.__init__) - Now uses `with FolderDB()` for temporary connections
+- `helper/worker_manager.py` - Added `close()` method and context manager support (`__enter__`/`__exit__`)
+
+**Pattern:**
+```python
+# OLD (leaked connections):
+db = FolderDB(path)
+try:
+    db.do_something()
+finally:
+    if db:
+        db.close()  # Could be skipped if exception occurs early
+
+# NEW (guaranteed cleanup):
+with FolderDB(path) as db:
+    db.do_something()
+# Connection automatically closed when exiting block
+```
+
+### 2. Hash+Store Priority Pattern ✅
+
+**Philosophy:** The hash+store pair is the **canonical identifier** for files across all storage backends. Sort order and table structure should not matter because we're always using hash+store.
+
+**Why This Matters:**
+- `@N` selections pass hash+store from search results
+- Hash+store works consistently across all backends (Hydrus, Folder, Remote)
+- Path-based resolution is fragile (files move, temp paths expire, etc.)
+- Hash+store never changes and uniquely identifies content
+
+**Updated Resolution Priority in `add_file.py`:**
+
+```python
+def _resolve_source(result, path_arg, pipe_obj, config):
+    """
+    PRIORITY 1: hash+store from result dict (most reliable for @N selections)
+       - Checks result.get("hash") and result.get("store")
+       - Uses FileStorage[store].get_file(hash) to retrieve
+       - Works for: Hydrus, Folder, Remote backends
+    
+    PRIORITY 2: Explicit -path argument
+       - Direct path specified by user
+    
+    PRIORITY 3: pipe_obj.file_path
+       - Legacy path from previous pipeline stage
+    
+    PRIORITY 4: Hydrus hash from pipe_obj.extra
+       - Fallback for older Hydrus workflows
+    
+    PRIORITY 5: String/list result parsing
+       - Last resort for simple string paths
+    """
+```
+
+**Example Flow:**
+```bash
+# User searches and selects result
+$ search-store system:limit=5
+
+# Result items include:
+{
+  "hash": "a1b2c3d4...",
+  "store": "home",  # Specific Hydrus instance
+  "title": "example.mp4"
+}
+
+# User selects @2 (index 1)
+$ @2 | add-file -storage test
+
+# add-file now:
+1. Extracts hash="a1b2c3d4..." store="home" from result dict
+2. Calls FileStorage["home"].get_file("a1b2c3d4...")
+3. Retrieves actual file path from "home" backend
+4. Proceeds with copy/upload to "test" storage
+```
+
+### 3. Benefits of This Approach
+
+**Consistency:**
+- @N selection always uses the same hash+store regardless of display order
+- No confusion about which row index maps to which file
+- Table synchronization issues (rows vs items) don't break selection
+
+**Reliability:**
+- Hash uniquely identifies content (SHA256 collision is effectively impossible)
+- Store identifies the authoritative source backend
+- No dependency on temporary paths or file locations
+
+**Multi-Instance Support:**
+- Works seamlessly with multiple Hydrus instances ("home", "work")
+- Works with mixed backends (Hydrus + Folder + Remote)
+- Each backend can independently retrieve file by hash
+
+**Debugging:**
+- Hash+store are visible in debug logs: `[add-file] Using hash+store: hash=a1b2c3d4..., store=home`
+- Easy to trace which backend is being queried
+- Clear error messages when hash+store lookup fails
+
+## How @N Selection Works Now
+
+### Selection Process:
+
+1. **Search creates result list with hash+store:**
+   ```python
+   results_list = [
+       {"hash": "abc123...", "store": "home", "title": "file1.mp4"},
+       {"hash": "def456...", "store": "default", "title": "file2.jpg"},
+       {"hash": "ghi789...", "store": "test", "title": "file3.png"},
+   ]
+   ```
+
+2. **User selects @2 (second item, index 1):**
+   - CLI extracts: `result = {"hash": "def456...", "store": "default", "title": "file2.jpg"}`
+   - Passes this dict to the next cmdlet
+
+3. **Next cmdlet receives dict with hash+store:**
+   ```python
+   def run(self, result, args, config):
+       # result is the dict from selection
+       file_hash = result.get("hash")  # "def456..."
+       store_name = result.get("store")  # "default"
+       
+       # Use hash+store to retrieve file
+       backend = FileStorage(config)[store_name]
+       file_path = backend.get_file(file_hash)
+   ```
+
+### Why This is Better Than Path-Based:
+
+**Path-Based (OLD):**
+```python
+# Fragile: path could be temp file, symlink, moved file, etc.
+result = {"file_path": "/tmp/hydrus-abc123.mp4"}
+# What if file was moved? What if it's a temp path that expires?
+```
+
+**Hash+Store (NEW):**
+```python
+# Reliable: hash+store always works regardless of current location
+result = {"hash": "abc123...", "store": "home"}
+# Backend retrieves current location from its database/API
+```
+
+## Testing the Fixes
+
+### 1. Test Database Connections:
+
+```powershell
+# Search multiple times and check for database locks
+search-store system:limit=5
+search-store system:limit=5
+search-store system:limit=5
+
+# Should complete without "database is locked" errors
+```
+
+### 2. Test Hash+Store Selection:
+
+```powershell
+# Search and select
+search-store system:limit=5
+@2 | get-metadata
+
+# Should show metadata for the selected file using hash+store
+# Debug log should show: [add-file] Using hash+store from result: hash=...
+```
+
+### 3. Test WorkerManager Cleanup:
+
+```powershell
+# In Python script:
+from helper.worker_manager import WorkerManager
+from pathlib import Path
+
+with WorkerManager(Path("C:/path/to/library")) as wm:
+    # Do work
+    pass
+# Database automatically closed when exiting block
+```
+
+## Cmdlets That Already Use Hash+Store Pattern
+
+These cmdlets already correctly extract hash+store:
+- ✅ `get-file` - Export file via hash+store
+- ✅ `get-metadata` - Retrieve metadata via hash+store
+- ✅ `get-url` - Get url via hash+store
+- ✅ `get-tag` - Get tags via hash+store
+- ✅ `add-url` - Add URL via hash+store
+- ✅ `delete-url` - Delete URL via hash+store
+- ✅ `add-file` - **NOW UPDATED** to prioritize hash+store
+
+## Future Improvements
+
+1. **Make hash+store mandatory in result dicts:**
+   - All search cmdlets should emit hash+store
+   - Validate that result dicts include these fields
+
+2. **Add hash+store validation:**
+   - Warn if hash is not 64-char hex string
+   - Warn if store is not a registered backend
+
+3. **Standardize error messages:**
+   - "File not found via hash+store: hash=abc123 store=home"
+   - Makes debugging much clearer
+
+4. **Consider deprecating path-based workflows:**
+   - Migrate legacy cmdlets to hash+store pattern
+   - Remove path-based fallbacks once all cmdlets updated
+
+## Key Takeaway
+
+**The hash+store pair is now the primary way to identify and retrieve files across the entire system.** This makes the codebase more reliable, consistent, and easier to debug. Database connections are properly cleaned up to prevent locks and resource leaks.