dfdkflj
This commit is contained in:
222
HASH_STORE_PRIORITY_PATTERN.md
Normal file
222
HASH_STORE_PRIORITY_PATTERN.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Hash+Store Priority Pattern & Database Connection Fixes
|
||||
|
||||
## Summary of Changes
|
||||
|
||||
### 1. Database Connection Leak Fixes ✅
|
||||
|
||||
**Problem:** FolderDB connections were not being properly closed, causing database locks and resource leaks.
|
||||
|
||||
**Files Fixed:**
|
||||
- `cmdlets/search_store.py` - Now uses `with FolderDB()` context manager
|
||||
- `cmdlets/search_provider.py` - Now uses `with FolderDB()` context manager
|
||||
- `helper/store.py` (Folder.__init__) - Now uses `with FolderDB()` for temporary connections
|
||||
- `helper/worker_manager.py` - Added `close()` method and context manager support (`__enter__`/`__exit__`)
|
||||
|
||||
**Pattern:**
|
||||
```python
|
||||
# OLD (leaked connections):
|
||||
db = FolderDB(path)
|
||||
try:
|
||||
db.do_something()
|
||||
finally:
|
||||
if db:
|
||||
db.close() # Could be skipped if exception occurs early
|
||||
|
||||
# NEW (guaranteed cleanup):
|
||||
with FolderDB(path) as db:
|
||||
db.do_something()
|
||||
# Connection automatically closed when exiting block
|
||||
```
|
||||
|
||||
### 2. Hash+Store Priority Pattern ✅
|
||||
|
||||
**Philosophy:** The hash+store pair is the **canonical identifier** for files across all storage backends. Sort order and table structure should not matter because we're always using hash+store.
|
||||
|
||||
**Why This Matters:**
|
||||
- `@N` selections pass hash+store from search results
|
||||
- Hash+store works consistently across all backends (Hydrus, Folder, Remote)
|
||||
- Path-based resolution is fragile (files move, temp paths expire, etc.)
|
||||
- Hash+store never changes and uniquely identifies content
|
||||
|
||||
**Updated Resolution Priority in `add_file.py`:**
|
||||
|
||||
```python
|
||||
def _resolve_source(result, path_arg, pipe_obj, config):
|
||||
"""
|
||||
PRIORITY 1: hash+store from result dict (most reliable for @N selections)
|
||||
- Checks result.get("hash") and result.get("store")
|
||||
- Uses FileStorage[store].get_file(hash) to retrieve
|
||||
- Works for: Hydrus, Folder, Remote backends
|
||||
|
||||
PRIORITY 2: Explicit -path argument
|
||||
- Direct path specified by user
|
||||
|
||||
PRIORITY 3: pipe_obj.file_path
|
||||
- Legacy path from previous pipeline stage
|
||||
|
||||
PRIORITY 4: Hydrus hash from pipe_obj.extra
|
||||
- Fallback for older Hydrus workflows
|
||||
|
||||
PRIORITY 5: String/list result parsing
|
||||
- Last resort for simple string paths
|
||||
"""
|
||||
```
|
||||
|
||||
**Example Flow:**
|
||||
```bash
|
||||
# User searches and selects result
|
||||
$ search-store system:limit=5
|
||||
|
||||
# Result items include:
|
||||
{
|
||||
"hash": "a1b2c3d4...",
|
||||
"store": "home", # Specific Hydrus instance
|
||||
"title": "example.mp4"
|
||||
}
|
||||
|
||||
# User selects @2 (index 1)
|
||||
$ @2 | add-file -storage test
|
||||
|
||||
# add-file now:
|
||||
1. Extracts hash="a1b2c3d4..." store="home" from result dict
|
||||
2. Calls FileStorage["home"].get_file("a1b2c3d4...")
|
||||
3. Retrieves actual file path from "home" backend
|
||||
4. Proceeds with copy/upload to "test" storage
|
||||
```
|
||||
|
||||
### 3. Benefits of This Approach
|
||||
|
||||
**Consistency:**
|
||||
- @N selection always uses the same hash+store regardless of display order
|
||||
- No confusion about which row index maps to which file
|
||||
- Table synchronization issues (rows vs items) don't break selection
|
||||
|
||||
**Reliability:**
|
||||
- Hash uniquely identifies content (SHA256 collision is effectively impossible)
|
||||
- Store identifies the authoritative source backend
|
||||
- No dependency on temporary paths or file locations
|
||||
|
||||
**Multi-Instance Support:**
|
||||
- Works seamlessly with multiple Hydrus instances ("home", "work")
|
||||
- Works with mixed backends (Hydrus + Folder + Remote)
|
||||
- Each backend can independently retrieve file by hash
|
||||
|
||||
**Debugging:**
|
||||
- Hash+store are visible in debug logs: `[add-file] Using hash+store: hash=a1b2c3d4..., store=home`
|
||||
- Easy to trace which backend is being queried
|
||||
- Clear error messages when hash+store lookup fails
|
||||
|
||||
## How @N Selection Works Now
|
||||
|
||||
### Selection Process:
|
||||
|
||||
1. **Search creates result list with hash+store:**
|
||||
```python
|
||||
results_list = [
|
||||
{"hash": "abc123...", "store": "home", "title": "file1.mp4"},
|
||||
{"hash": "def456...", "store": "default", "title": "file2.jpg"},
|
||||
{"hash": "ghi789...", "store": "test", "title": "file3.png"},
|
||||
]
|
||||
```
|
||||
|
||||
2. **User selects @2 (second item, index 1):**
|
||||
- CLI extracts: `result = {"hash": "def456...", "store": "default", "title": "file2.jpg"}`
|
||||
- Passes this dict to the next cmdlet
|
||||
|
||||
3. **Next cmdlet receives dict with hash+store:**
|
||||
```python
|
||||
def run(self, result, args, config):
|
||||
# result is the dict from selection
|
||||
file_hash = result.get("hash") # "def456..."
|
||||
store_name = result.get("store") # "default"
|
||||
|
||||
# Use hash+store to retrieve file
|
||||
backend = FileStorage(config)[store_name]
|
||||
file_path = backend.get_file(file_hash)
|
||||
```
|
||||
|
||||
### Why This is Better Than Path-Based:
|
||||
|
||||
**Path-Based (OLD):**
|
||||
```python
|
||||
# Fragile: path could be temp file, symlink, moved file, etc.
|
||||
result = {"file_path": "/tmp/hydrus-abc123.mp4"}
|
||||
# What if file was moved? What if it's a temp path that expires?
|
||||
```
|
||||
|
||||
**Hash+Store (NEW):**
|
||||
```python
|
||||
# Reliable: hash+store always works regardless of current location
|
||||
result = {"hash": "abc123...", "store": "home"}
|
||||
# Backend retrieves current location from its database/API
|
||||
```
|
||||
|
||||
## Testing the Fixes
|
||||
|
||||
### 1. Test Database Connections:
|
||||
|
||||
```powershell
|
||||
# Search multiple times and check for database locks
|
||||
search-store system:limit=5
|
||||
search-store system:limit=5
|
||||
search-store system:limit=5
|
||||
|
||||
# Should complete without "database is locked" errors
|
||||
```
|
||||
|
||||
### 2. Test Hash+Store Selection:
|
||||
|
||||
```powershell
|
||||
# Search and select
|
||||
search-store system:limit=5
|
||||
@2 | get-metadata
|
||||
|
||||
# Should show metadata for the selected file using hash+store
|
||||
# Debug log should show: [add-file] Using hash+store from result: hash=...
|
||||
```
|
||||
|
||||
### 3. Test WorkerManager Cleanup:
|
||||
|
||||
```powershell
|
||||
# In Python script:
|
||||
from helper.worker_manager import WorkerManager
|
||||
from pathlib import Path
|
||||
|
||||
with WorkerManager(Path("C:/path/to/library")) as wm:
|
||||
# Do work
|
||||
pass
|
||||
# Database automatically closed when exiting block
|
||||
```
|
||||
|
||||
## Cmdlets That Already Use Hash+Store Pattern
|
||||
|
||||
These cmdlets already correctly extract hash+store:
|
||||
- ✅ `get-file` - Export file via hash+store
|
||||
- ✅ `get-metadata` - Retrieve metadata via hash+store
|
||||
- ✅ `get-url` - Get url via hash+store
|
||||
- ✅ `get-tag` - Get tags via hash+store
|
||||
- ✅ `add-url` - Add URL via hash+store
|
||||
- ✅ `delete-url` - Delete URL via hash+store
|
||||
- ✅ `add-file` - **NOW UPDATED** to prioritize hash+store
|
||||
|
||||
## Future Improvements
|
||||
|
||||
1. **Make hash+store mandatory in result dicts:**
|
||||
- All search cmdlets should emit hash+store
|
||||
- Validate that result dicts include these fields
|
||||
|
||||
2. **Add hash+store validation:**
|
||||
- Warn if hash is not 64-char hex string
|
||||
- Warn if store is not a registered backend
|
||||
|
||||
3. **Standardize error messages:**
|
||||
- "File not found via hash+store: hash=abc123 store=home"
|
||||
- Makes debugging much clearer
|
||||
|
||||
4. **Consider deprecating path-based workflows:**
|
||||
- Migrate legacy cmdlets to hash+store pattern
|
||||
- Remove path-based fallbacks once all cmdlets updated
|
||||
|
||||
## Key Takeaway
|
||||
|
||||
**The hash+store pair is now the primary way to identify and retrieve files across the entire system.** This makes the codebase more reliable, consistent, and easier to debug. Database connections are properly cleaned up to prevent locks and resource leaks.
|
||||
Reference in New Issue
Block a user