223 lines
6.9 KiB
Markdown
223 lines
6.9 KiB
Markdown
|
|
# Hash+Store Priority Pattern & Database Connection Fixes
|
||
|
|
|
||
|
|
## Summary of Changes
|
||
|
|
|
||
|
|
### 1. Database Connection Leak Fixes ✅
|
||
|
|
|
||
|
|
**Problem:** FolderDB connections were not being properly closed, causing database locks and resource leaks.
|
||
|
|
|
||
|
|
**Files Fixed:**
|
||
|
|
- `cmdlets/search_store.py` - Now uses `with FolderDB()` context manager
|
||
|
|
- `cmdlets/search_provider.py` - Now uses `with FolderDB()` context manager
|
||
|
|
- `helper/store.py` (Folder.__init__) - Now uses `with FolderDB()` for temporary connections
|
||
|
|
- `helper/worker_manager.py` - Added `close()` method and context manager support (`__enter__`/`__exit__`)
|
||
|
|
|
||
|
|
**Pattern:**
|
||
|
|
```python
|
||
|
|
# OLD (leaked connections):
|
||
|
|
db = FolderDB(path)
|
||
|
|
try:
|
||
|
|
db.do_something()
|
||
|
|
finally:
|
||
|
|
if db:
|
||
|
|
db.close() # Could be skipped if exception occurs early
|
||
|
|
|
||
|
|
# NEW (guaranteed cleanup):
|
||
|
|
with FolderDB(path) as db:
|
||
|
|
db.do_something()
|
||
|
|
# Connection automatically closed when exiting block
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Hash+Store Priority Pattern ✅
|
||
|
|
|
||
|
|
**Philosophy:** The hash+store pair is the **canonical identifier** for files across all storage backends. Sort order and table structure should not matter because we're always using hash+store.
|
||
|
|
|
||
|
|
**Why This Matters:**
|
||
|
|
- `@N` selections pass hash+store from search results
|
||
|
|
- Hash+store works consistently across all backends (Hydrus, Folder, Remote)
|
||
|
|
- Path-based resolution is fragile (files move, temp paths expire, etc.)
|
||
|
|
- Hash+store never changes and uniquely identifies content
|
||
|
|
|
||
|
|
**Updated Resolution Priority in `add_file.py`:**
|
||
|
|
|
||
|
|
```python
|
||
|
|
def _resolve_source(result, path_arg, pipe_obj, config):
|
||
|
|
"""
|
||
|
|
PRIORITY 1: hash+store from result dict (most reliable for @N selections)
|
||
|
|
- Checks result.get("hash") and result.get("store")
|
||
|
|
- Uses FileStorage[store].get_file(hash) to retrieve
|
||
|
|
- Works for: Hydrus, Folder, Remote backends
|
||
|
|
|
||
|
|
PRIORITY 2: Explicit -path argument
|
||
|
|
- Direct path specified by user
|
||
|
|
|
||
|
|
PRIORITY 3: pipe_obj.file_path
|
||
|
|
- Legacy path from previous pipeline stage
|
||
|
|
|
||
|
|
PRIORITY 4: Hydrus hash from pipe_obj.extra
|
||
|
|
- Fallback for older Hydrus workflows
|
||
|
|
|
||
|
|
PRIORITY 5: String/list result parsing
|
||
|
|
- Last resort for simple string paths
|
||
|
|
"""
|
||
|
|
```
|
||
|
|
|
||
|
|
**Example Flow:**
|
||
|
|
```bash
|
||
|
|
# User searches and selects result
|
||
|
|
$ search-store system:limit=5
|
||
|
|
|
||
|
|
# Result items include:
|
||
|
|
{
|
||
|
|
"hash": "a1b2c3d4...",
|
||
|
|
"store": "home", # Specific Hydrus instance
|
||
|
|
"title": "example.mp4"
|
||
|
|
}
|
||
|
|
|
||
|
|
# User selects @2 (index 1)
|
||
|
|
$ @2 | add-file -storage test
|
||
|
|
|
||
|
|
# add-file now:
|
||
|
|
1. Extracts hash="a1b2c3d4..." store="home" from result dict
|
||
|
|
2. Calls FileStorage["home"].get_file("a1b2c3d4...")
|
||
|
|
3. Retrieves actual file path from "home" backend
|
||
|
|
4. Proceeds with copy/upload to "test" storage
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Benefits of This Approach
|
||
|
|
|
||
|
|
**Consistency:**
|
||
|
|
- @N selection always uses the same hash+store regardless of display order
|
||
|
|
- No confusion about which row index maps to which file
|
||
|
|
- Table synchronization issues (rows vs items) don't break selection
|
||
|
|
|
||
|
|
**Reliability:**
|
||
|
|
- Hash uniquely identifies content (SHA256 collision is effectively impossible)
|
||
|
|
- Store identifies the authoritative source backend
|
||
|
|
- No dependency on temporary paths or file locations
|
||
|
|
|
||
|
|
**Multi-Instance Support:**
|
||
|
|
- Works seamlessly with multiple Hydrus instances ("home", "work")
|
||
|
|
- Works with mixed backends (Hydrus + Folder + Remote)
|
||
|
|
- Each backend can independently retrieve file by hash
|
||
|
|
|
||
|
|
**Debugging:**
|
||
|
|
- Hash+store are visible in debug logs: `[add-file] Using hash+store: hash=a1b2c3d4..., store=home`
|
||
|
|
- Easy to trace which backend is being queried
|
||
|
|
- Clear error messages when hash+store lookup fails
|
||
|
|
|
||
|
|
## How @N Selection Works Now
|
||
|
|
|
||
|
|
### Selection Process:
|
||
|
|
|
||
|
|
1. **Search creates result list with hash+store:**
|
||
|
|
```python
|
||
|
|
results_list = [
|
||
|
|
{"hash": "abc123...", "store": "home", "title": "file1.mp4"},
|
||
|
|
{"hash": "def456...", "store": "default", "title": "file2.jpg"},
|
||
|
|
{"hash": "ghi789...", "store": "test", "title": "file3.png"},
|
||
|
|
]
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **User selects @2 (second item, index 1):**
|
||
|
|
- CLI extracts: `result = {"hash": "def456...", "store": "default", "title": "file2.jpg"}`
|
||
|
|
- Passes this dict to the next cmdlet
|
||
|
|
|
||
|
|
3. **Next cmdlet receives dict with hash+store:**
|
||
|
|
```python
|
||
|
|
def run(self, result, args, config):
|
||
|
|
# result is the dict from selection
|
||
|
|
file_hash = result.get("hash") # "def456..."
|
||
|
|
store_name = result.get("store") # "default"
|
||
|
|
|
||
|
|
# Use hash+store to retrieve file
|
||
|
|
backend = FileStorage(config)[store_name]
|
||
|
|
file_path = backend.get_file(file_hash)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Why This is Better Than Path-Based:
|
||
|
|
|
||
|
|
**Path-Based (OLD):**
|
||
|
|
```python
|
||
|
|
# Fragile: path could be temp file, symlink, moved file, etc.
|
||
|
|
result = {"file_path": "/tmp/hydrus-abc123.mp4"}
|
||
|
|
# What if file was moved? What if it's a temp path that expires?
|
||
|
|
```
|
||
|
|
|
||
|
|
**Hash+Store (NEW):**
|
||
|
|
```python
|
||
|
|
# Reliable: hash+store always works regardless of current location
|
||
|
|
result = {"hash": "abc123...", "store": "home"}
|
||
|
|
# Backend retrieves current location from its database/API
|
||
|
|
```
|
||
|
|
|
||
|
|
## Testing the Fixes
|
||
|
|
|
||
|
|
### 1. Test Database Connections:
|
||
|
|
|
||
|
|
```powershell
|
||
|
|
# Search multiple times and check for database locks
|
||
|
|
search-store system:limit=5
|
||
|
|
search-store system:limit=5
|
||
|
|
search-store system:limit=5
|
||
|
|
|
||
|
|
# Should complete without "database is locked" errors
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Test Hash+Store Selection:
|
||
|
|
|
||
|
|
```powershell
|
||
|
|
# Search and select
|
||
|
|
search-store system:limit=5
|
||
|
|
@2 | get-metadata
|
||
|
|
|
||
|
|
# Should show metadata for the selected file using hash+store
|
||
|
|
# Debug log should show: [add-file] Using hash+store from result: hash=...
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Test WorkerManager Cleanup:
|
||
|
|
|
||
|
|
```powershell
|
||
|
|
# In Python script:
|
||
|
|
from helper.worker_manager import WorkerManager
|
||
|
|
from pathlib import Path
|
||
|
|
|
||
|
|
with WorkerManager(Path("C:/path/to/library")) as wm:
|
||
|
|
# Do work
|
||
|
|
pass
|
||
|
|
# Database automatically closed when exiting block
|
||
|
|
```
|
||
|
|
|
||
|
|
## Cmdlets That Already Use Hash+Store Pattern
|
||
|
|
|
||
|
|
These cmdlets already correctly extract hash+store:
|
||
|
|
- ✅ `get-file` - Export file via hash+store
|
||
|
|
- ✅ `get-metadata` - Retrieve metadata via hash+store
|
||
|
|
- ✅ `get-url` - Get url via hash+store
|
||
|
|
- ✅ `get-tag` - Get tags via hash+store
|
||
|
|
- ✅ `add-url` - Add URL via hash+store
|
||
|
|
- ✅ `delete-url` - Delete URL via hash+store
|
||
|
|
- ✅ `add-file` - **NOW UPDATED** to prioritize hash+store
|
||
|
|
|
||
|
|
## Future Improvements
|
||
|
|
|
||
|
|
1. **Make hash+store mandatory in result dicts:**
|
||
|
|
- All search cmdlets should emit hash+store
|
||
|
|
- Validate that result dicts include these fields
|
||
|
|
|
||
|
|
2. **Add hash+store validation:**
|
||
|
|
- Warn if hash is not 64-char hex string
|
||
|
|
- Warn if store is not a registered backend
|
||
|
|
|
||
|
|
3. **Standardize error messages:**
|
||
|
|
- "File not found via hash+store: hash=abc123 store=home"
|
||
|
|
- Makes debugging much clearer
|
||
|
|
|
||
|
|
4. **Consider deprecating path-based workflows:**
|
||
|
|
- Migrate legacy cmdlets to hash+store pattern
|
||
|
|
- Remove path-based fallbacks once all cmdlets updated
|
||
|
|
|
||
|
|
## Key Takeaway
|
||
|
|
|
||
|
|
**The hash+store pair is now the primary way to identify and retrieve files across the entire system.** This makes the codebase more reliable, consistent, and easier to debug. Database connections are properly cleaned up to prevent locks and resource leaks.
|