Medios-Macina/HASH_STORE_PRIORITY_PATTERN.md

# Hash+Store Priority Pattern & Database Connection Fixes

## Summary of Changes

### 1. Database Connection Leak Fixes ✅

**Problem:** FolderDB connections were not being properly closed, causing database locks and resource leaks.

**Files Fixed:**
- `cmdlets/search_store.py` - Now uses `with FolderDB()` context manager
- `cmdlets/search_provider.py` - Now uses `with FolderDB()` context manager
- `helper/store.py` (Folder.__init__) - Now uses `with FolderDB()` for temporary connections
- `helper/worker_manager.py` - Added `close()` method and context manager support (`__enter__`/`__exit__`)

**Pattern:**
```python
# OLD (leaked connections):
db = FolderDB(path)
try:
    db.do_something()
finally:
    if db:
        db.close()  # Could be skipped if exception occurs early

# NEW (guaranteed cleanup):
with FolderDB(path) as db:
    db.do_something()
# Connection automatically closed when exiting block
```

### 2. Hash+Store Priority Pattern ✅

**Philosophy:** The hash+store pair is the **canonical identifier** for files across all storage backends. Sort order and table structure should not matter because we're always using hash+store.

**Why This Matters:**
- `@N` selections pass hash+store from search results
- Hash+store works consistently across all backends (Hydrus, Folder, Remote)
- Path-based resolution is fragile (files move, temp paths expire, etc.)
- Hash+store never changes and uniquely identifies content

**Updated Resolution Priority in `add_file.py`:**

```python
def _resolve_source(result, path_arg, pipe_obj, config):
    """
    PRIORITY 1: hash+store from result dict (most reliable for @N selections)
       - Checks result.get("hash") and result.get("store")
       - Uses FileStorage[store].get_file(hash) to retrieve
       - Works for: Hydrus, Folder, Remote backends

    PRIORITY 2: Explicit -path argument
       - Direct path specified by user

    PRIORITY 3: pipe_obj.file_path
       - Legacy path from previous pipeline stage

    PRIORITY 4: Hydrus hash from pipe_obj.extra
       - Fallback for older Hydrus workflows

    PRIORITY 5: String/list result parsing
       - Last resort for simple string paths
    """
```

**Example Flow:**
```bash
# User searches and selects result
$ search-store system:limit=5

# Result items include:
{
  "hash": "a1b2c3d4...",
  "store": "home",  # Specific Hydrus instance
  "title": "example.mp4"
}

# User selects @2 (index 1)
$ @2 | add-file -storage test

# add-file now:
1. Extracts hash="a1b2c3d4..." store="home" from result dict
2. Calls FileStorage["home"].get_file("a1b2c3d4...")
3. Retrieves actual file path from "home" backend
4. Proceeds with copy/upload to "test" storage
```

### 3. Benefits of This Approach

**Consistency:**
- @N selection always uses the same hash+store regardless of display order
- No confusion about which row index maps to which file
- Table synchronization issues (rows vs items) don't break selection

**Reliability:**
- Hash uniquely identifies content (SHA256 collision is effectively impossible)
- Store identifies the authoritative source backend
- No dependency on temporary paths or file locations

**Multi-Instance Support:**
- Works seamlessly with multiple Hydrus instances ("home", "work")
- Works with mixed backends (Hydrus + Folder + Remote)
- Each backend can independently retrieve file by hash

**Debugging:**
- Hash+store are visible in debug logs: `[add-file] Using hash+store: hash=a1b2c3d4..., store=home`
- Easy to trace which backend is being queried
- Clear error messages when hash+store lookup fails

## How @N Selection Works Now

### Selection Process:

1. **Search creates result list with hash+store:**
   ```python
   results_list = [
       {"hash": "abc123...", "store": "home", "title": "file1.mp4"},
       {"hash": "def456...", "store": "default", "title": "file2.jpg"},
       {"hash": "ghi789...", "store": "test", "title": "file3.png"},
   ]
   ```

2. **User selects @2 (second item, index 1):**
   - CLI extracts: `result = {"hash": "def456...", "store": "default", "title": "file2.jpg"}`
   - Passes this dict to the next cmdlet

3. **Next cmdlet receives dict with hash+store:**
   ```python
   def run(self, result, args, config):
       # result is the dict from selection
       file_hash = result.get("hash")  # "def456..."
       store_name = result.get("store")  # "default"

       # Use hash+store to retrieve file
       backend = FileStorage(config)[store_name]
       file_path = backend.get_file(file_hash)
   ```

### Why This is Better Than Path-Based:

**Path-Based (OLD):**
```python
# Fragile: path could be temp file, symlink, moved file, etc.
result = {"file_path": "/tmp/hydrus-abc123.mp4"}
# What if file was moved? What if it's a temp path that expires?
```

**Hash+Store (NEW):**
```python
# Reliable: hash+store always works regardless of current location
result = {"hash": "abc123...", "store": "home"}
# Backend retrieves current location from its database/API
```

## Testing the Fixes

### 1. Test Database Connections:

```powershell
# Search multiple times and check for database locks
search-store system:limit=5
search-store system:limit=5
search-store system:limit=5

# Should complete without "database is locked" errors
```

### 2. Test Hash+Store Selection:

```powershell
# Search and select
search-store system:limit=5
@2 | get-metadata

# Should show metadata for the selected file using hash+store
# Debug log should show: [add-file] Using hash+store from result: hash=...
```

### 3. Test WorkerManager Cleanup:

```powershell
# In Python script:
from helper.worker_manager import WorkerManager
from pathlib import Path

with WorkerManager(Path("C:/path/to/library")) as wm:
    # Do work
    pass
# Database automatically closed when exiting block
```

## Cmdlets That Already Use Hash+Store Pattern

These cmdlets already correctly extract hash+store:
- ✅ `get-file` - Export file via hash+store
- ✅ `get-metadata` - Retrieve metadata via hash+store
- ✅ `get-url` - Get url via hash+store
- ✅ `get-tag` - Get tags via hash+store
- ✅ `add-url` - Add URL via hash+store
- ✅ `delete-url` - Delete URL via hash+store
- ✅ `add-file` - **NOW UPDATED** to prioritize hash+store

## Future Improvements

1. **Make hash+store mandatory in result dicts:**
   - All search cmdlets should emit hash+store
   - Validate that result dicts include these fields

2. **Add hash+store validation:**
   - Warn if hash is not 64-char hex string
   - Warn if store is not a registered backend

3. **Standardize error messages:**
   - "File not found via hash+store: hash=abc123 store=home"
   - Makes debugging much clearer

4. **Consider deprecating path-based workflows:**
   - Migrate legacy cmdlets to hash+store pattern
   - Remove path-based fallbacks once all cmdlets updated

## Key Takeaway

**The hash+store pair is now the primary way to identify and retrieve files across the entire system.** This makes the codebase more reliable, consistent, and easier to debug. Database connections are properly cleaned up to prevent locks and resource leaks.