This commit is contained in:
nose
2025-12-11 12:47:30 -08:00
parent 6b05dc5552
commit 65d12411a2
92 changed files with 17447 additions and 14308 deletions

2
.gitignore vendored
View File

@@ -35,7 +35,7 @@ cookies.txt
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
backup/
# Unit test / coverage reports
htmlcov/
.tox/

View File

@@ -0,0 +1,159 @@
# add-file.py Refactor Summary
## Changes Made
### 1. Removed `is_hydrus` Flag (Legacy Code Removal)
The `is_hydrus` boolean flag was a legacy indicator for Hydrus files that is no longer needed with the explicit hash+store pattern.
**Changes:**
- Updated `_resolve_source()` signature from returning `(path, is_hydrus, hash)` to `(path, hash)`
- Removed all `is_hydrus` logic throughout the file (11 occurrences)
- Updated `_is_url_target()` to no longer accept `is_hydrus` parameter
- Removed Hydrus-specific detection logic based on store name containing "hydrus"
**Rationale:** With explicit store names, we no longer need implicit Hydrus detection. The `store` field in PipeObject provides clear backend identification.
### 2. Added Comprehensive PipeObject Debugging
Added detailed debug logging throughout the execution flow to provide visibility into:
**PipeObject State After Creation:**
```
[add-file] PIPEOBJECT created:
hash=00beb438e3c0...
store=local
file_path=C:\Users\Admin\Downloads\Audio\yapping.m4a
tags=[]
title=None
extra keys=[]
```
**Input Result Details:**
```
[add-file] INPUT result type=NoneType
```
**Parsed Arguments:**
```
[add-file] PARSED args: location=test, provider=None, delete=False
```
**Source Resolution:**
```
[add-file] RESOLVED source: path=C:\Users\Admin\Downloads\Audio\yapping.m4a, hash=N/A...
```
**Execution Path Decision:**
```
[add-file] DECISION POINT: provider=None, location=test
media_path=C:\Users\Admin\Downloads\Audio\yapping.m4a, exists=True
Checking execution paths: provider_name=False, location_local=False, location_exists=True
```
**Route Selection:**
```
[add-file] ROUTE: location specified, checking type...
[add-file] _is_local_path check: location=test, slash=False, backslash=False, colon=False, result=False
[add-file] _is_storage_backend check: location=test, backends=['default', 'home', 'test'], result=True
[add-file] ROUTE: storage backend path
```
**Error Paths:**
```
[add-file] ERROR: No location or provider specified - all checks failed
[add-file] ERROR: Invalid location (not local path or storage backend): {location}
```
### 3. Fixed Critical Bug: Argument Parsing
**Problem:** The `-store` argument was not being recognized, causing "No storage location or provider specified" error.
**Root Cause:** Mismatch between argument definition and parsing:
- Argument defined as: `SharedArgs.STORE` (name="store")
- Code was looking for: `parsed.get("storage")`
**Fix:** Changed line 65 from:
```python
location = parsed.get("storage")
```
to:
```python
location = parsed.get("store") # Fixed: was "storage", should be "store"
```
### 4. Enhanced Helper Method Debugging
**`_is_local_path()`:**
```python
debug(f"[add-file] _is_local_path check: location={location}, slash={has_slash}, backslash={has_backslash}, colon={has_colon}, result={result}")
```
**`_is_storage_backend()`:**
```python
debug(f"[add-file] _is_storage_backend check: location={location}, backends={backends}, result={is_backend}")
debug(f"[add-file] _is_storage_backend ERROR: {exc}") # On exception
```
## Testing Results
### Before Fix:
```
[add-file] PARSED args: location=None, provider=None, delete=False
[add-file] ERROR: No location or provider specified - all checks failed
No storage location or provider specified
```
### After Fix:
```
[add-file] PARSED args: location=test, provider=None, delete=False
[add-file] _is_storage_backend check: location=test, backends=['default', 'home', 'test'], result=True
[add-file] ROUTE: storage backend path
✓ File added to 'test': 00beb438e3c02cdc0340526deb0c51f916ffd6330259be4f350009869c5448d9
```
## Impact
### Files Modified:
- `cmdlets/add_file.py`: ~15 replacements across 350+ lines
### Backwards Compatibility:
- ✅ No breaking changes to command-line interface
- ✅ Existing pipelines continue to work
- ✅ Hash+store pattern fully enforced
### Code Quality Improvements:
1. **Removed Legacy Code:** Eliminated `is_hydrus` flag (11 occurrences)
2. **Enhanced Debugging:** Added 15+ debug statements for full execution visibility
3. **Fixed Critical Bug:** Corrected argument parsing mismatch
4. **Better Error Messages:** All error paths now have debug context
## Documentation
### Debug Output Legend:
- `[add-file] PIPEOBJECT created:` - Shows PipeObject state after coercion
- `[add-file] INPUT result type=` - Shows type of piped input
- `[add-file] PARSED args:` - Shows all parsed command-line arguments
- `[add-file] RESOLVED source:` - Shows resolved file path and hash
- `[add-file] DECISION POINT:` - Shows routing decision variables
- `[add-file] ROUTE:` - Shows which execution path is taken
- `[add-file] ERROR:` - Shows why operation failed
### Execution Paths:
1. **Provider Upload** (`provider_name` set) → `_handle_provider_upload()`
2. **Local Import** (`location == 'local'`) → `_handle_local_import()`
3. **Local Export** (location is path) → `_handle_local_export()`
4. **Storage Backend** (location is backend name) → `_handle_storage_backend()`
5. **Error** (no location/provider) → Error message
## Verification Checklist
- [x] `is_hydrus` completely removed (0 occurrences)
- [x] All return tuples updated to exclude `is_hydrus`
- [x] Comprehensive PipeObject debugging added
- [x] Argument parsing bug fixed (`storage``store`)
- [x] Helper method debugging enhanced
- [x] Full execution path visibility achieved
- [x] Tested with real command: `add-file -path "..." -store test`
## Related Refactorings
- **PIPELINE_REFACTOR_SUMMARY.md**: Removed backwards compatibility from pipeline.py
- **MODELS_REFACTOR_SUMMARY.md**: Refactored PipeObject to hash+store pattern
This refactor completes the trilogy of modernization efforts, ensuring add-file.py fully embraces the hash+store canonical pattern with zero legacy code.

View File

@@ -0,0 +1,100 @@
"""
Analysis: Export-Store vs Get-File cmdlet
=== FINDINGS ===
1. GET-FILE ALREADY EXISTS AND IS SUFFICIENT
- Located: cmdlets/get_file.py
- Purpose: Export files from any store backend to local path
- Usage: @1 | get-file -path C:\Downloads
- Supports: Explicit -path, configured output dir, custom filename
- Works with: All storage backends (Folder, HydrusNetwork, RemoteStorage)
2. ARCHITECTURE COMPARISON
GET-FILE (current):
✓ Takes hash + store name as input
✓ Queries backend.get_metadata(hash) to find file details
✓ For Folder: Returns direct Path from database
✓ For HydrusNetwork: Downloads to temp location via HTTP
✓ Outputs file to specified directory
✓ Supports both input modes: explicit (-hash, -store) and piped results
EXPORT-STORE (hypothetical):
✗ Would be redundant with get-file
✗ Would only work with HydrusNetwork (not Folder, Remote, etc.)
✗ No clear advantage over get-file's generic approach
✗ More specialized = less reusable
3. RECOMMENDED PATTERN
Sequence for moving files between stores:
search-store -store home | get-file -path /tmp/staging | add-file -storage test
This reads:
1. Search Hydrus "home" instance
2. Export matching files to staging
3. Import to Folder "test" storage
4. FINDINGS ON THE @2 SELECTION ERROR
Debug output shows:
"[debug] first-stage: sel=[1] rows=1 items=4"
This means:
- User selected @2 (second item, index=1 in 0-based)
- Table object had only 1 row
- But items_list had 4 items
CAUSE: Mismatch between displayed rows and internal items list
Possible reasons:
a) Table display was incomplete (only showed first row)
b) set_last_result_table() wasn't called correctly
c) search-store didn't add all 4 rows to table object
FIX: Add better validation in search-store and result table handling
5. DEBUG IMPROVEMENTS MADE
Added to add_file.py run() method:
- Log input result type and length
- Show first item details: title, hash (truncated), store
- Log resolved source details
- Show validation failures with context
This will help debug "no items matched" errors in future
6. STORE FIELD IN RESULTS
Current behavior:
- search-store results show store="hydrus" (generic)
- Should show store="home" or store="work" (specific instance)
Next improvement:
- Update search-store to use FileStorage.list_backends() logic
- Use dynamic store detection like .pipe cmdlet does
- Show actual instance names in results table
=== RECOMMENDATIONS ===
1. DO NOT create export-store cmdlet
- get-file is already generic and works for all backends
- Adding export-store adds confusion without benefit
2. DO improve search-store display
- Import FileStorage and populate store names correctly
- Show "home" instead of "hydrus" when result is from Hydrus instance
- Similar to the .pipe cmdlet refactoring
3. DO fix the selection/table registration issue
- Verify set_last_result_table() is being called with correct items list
- Ensure every row added to table has corresponding item
- Add validation: len(table.rows) == len(items_list)
4. DO use the new debug logs in add_file
- Run: @2 | add-file -storage test
- Observe: [add-file] INPUT result details
- This will show if result is coming through correctly
"""

807
CLI.py

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,127 @@
DEBUGGING IMPROVEMENTS IMPLEMENTED
==================================
1. ENHANCED ADD-FILE DEBUG LOGGING
=================================
Now logs when cmdlet is executed:
- INPUT result type (list, dict, PipeObject, None, etc.)
- List length if applicable
- First item details: title, hash (first 12 chars), store
- Resolved source: path/URL, whether from Hydrus, hash value
- Error details if resolution or validation fails
Example output:
[add-file] INPUT result type=list
[add-file] INPUT result is list with 4 items
[add-file] First item details: title=i ve been down, hash=b0780e68a2dc..., store=hydrus
[add-file] RESOLVED source: path=None, is_hydrus=True, hash=b0780e68a2dc...
[add-file] ERROR: Source validation failed for None
This will help identify:
- Where the result is being lost
- If hash is being extracted correctly
- Which store the file comes from
2. ENHANCED SEARCH-STORE DEBUG LOGGING
===================================
Now logs after building results:
- Number of table rows added
- Number of items in results_list
- WARNING if there's a mismatch
Example output:
[search-store] Added 4 rows to table, 4 items to results_list
[search-store] WARNING: Table/items mismatch! rows=1 items=4
This directly debugs the "@2 selection" issue:
- Will show if table/items registration is correct
- Helps diagnose why only 1 row shows when 4 items exist
3. ROOT CAUSE ANALYSIS: "@2 SELECTION FAILED"
==========================================
Your debug output showed:
[debug] first-stage: sel=[1] rows=1 items=4
This means:
- search-store found 4 results
- But only 1 row registered in table for selection
- User selected @2 (index 1) which is valid (0-4)
- But table only had 1 row, so selection was out of bounds
The mismatch is between:
- What's displayed to the user (seems like 4 rows based on output)
- What's registered for @N selection (only 1 row)
With the new debug logging, running the same command will show:
[search-store] Added X rows to table, Y items to results_list
If X=1 and Y=4, then search-store isn't adding all results to the table
If X=4 and Y=4, then the issue is in CLI selection logic
4. NEXT DEBUGGING STEPS
===================
To diagnose the "@2 selection" issue:
1. Run: search-store system:limit=5
2. Look for: [search-store] Added X rows...
3. Compare X to number of rows shown in table
4. If X < display_rows: Problem is in table.add_result()
5. If X == display_rows: Problem is in CLI selection mapping
After running add-file:
1. Run: @2 | add-file -storage test
2. Look for: [add-file] INPUT result details
3. Check if hash, title, and store are extracted
4. If missing: Problem is in result object structure
5. If present: Problem is in _resolve_source() logic
5. ARCHITECTURE DECISION: EXPORT-STORE CMDLET
==========================================
Recommendation: DO NOT CREATE EXPORT-STORE
Reason: get-file already provides this functionality
get-file:
- Takes hash + store name
- Retrieves from any backend (Folder, HydrusNetwork, Remote, etc.)
- Exports to specified path
- Works for all storage types
- Already tested and working
Example workflow for moving files between stores:
$ search-store -store home | get-file -path /tmp | add-file -storage test
This is cleaner than having specialized export-store cmdlet
6. FUTURE IMPROVEMENTS
===================
Based on findings:
a) Update search-store to show specific instance names
Currently: store="hydrus"
Should be: store="home" or store="work"
Implementation: Use FileStorage to detect which instance
b) Fix selection/table registration validation
Add assertion: len(table.rows) == len(results_list)
Fail fast if mismatch detected
c) Enhance add-file to handle Hydrus imports
Current: Needs file path on local filesystem
Future: Should support add-file -hash <hash> -store home
This would copy from one Hydrus instance to another
SUMMARY
=======
✓ Better debug logging in add-file and search-store
✓ Root cause identified for "@2 selection" issue
✓ Confirmed get-file is sufficient (no export-store needed)
✓ Path forward: Use new logging to identify exact failure point

View File

@@ -0,0 +1,222 @@
# Hash+Store Priority Pattern & Database Connection Fixes
## Summary of Changes
### 1. Database Connection Leak Fixes ✅
**Problem:** FolderDB connections were not being properly closed, causing database locks and resource leaks.
**Files Fixed:**
- `cmdlets/search_store.py` - Now uses `with FolderDB()` context manager
- `cmdlets/search_provider.py` - Now uses `with FolderDB()` context manager
- `helper/store.py` (Folder.__init__) - Now uses `with FolderDB()` for temporary connections
- `helper/worker_manager.py` - Added `close()` method and context manager support (`__enter__`/`__exit__`)
**Pattern:**
```python
# OLD (leaked connections):
db = FolderDB(path)
try:
db.do_something()
finally:
if db:
db.close() # Could be skipped if exception occurs early
# NEW (guaranteed cleanup):
with FolderDB(path) as db:
db.do_something()
# Connection automatically closed when exiting block
```
### 2. Hash+Store Priority Pattern ✅
**Philosophy:** The hash+store pair is the **canonical identifier** for files across all storage backends. Sort order and table structure should not matter because we're always using hash+store.
**Why This Matters:**
- `@N` selections pass hash+store from search results
- Hash+store works consistently across all backends (Hydrus, Folder, Remote)
- Path-based resolution is fragile (files move, temp paths expire, etc.)
- Hash+store never changes and uniquely identifies content
**Updated Resolution Priority in `add_file.py`:**
```python
def _resolve_source(result, path_arg, pipe_obj, config):
"""
PRIORITY 1: hash+store from result dict (most reliable for @N selections)
- Checks result.get("hash") and result.get("store")
- Uses FileStorage[store].get_file(hash) to retrieve
- Works for: Hydrus, Folder, Remote backends
PRIORITY 2: Explicit -path argument
- Direct path specified by user
PRIORITY 3: pipe_obj.file_path
- Legacy path from previous pipeline stage
PRIORITY 4: Hydrus hash from pipe_obj.extra
- Fallback for older Hydrus workflows
PRIORITY 5: String/list result parsing
- Last resort for simple string paths
"""
```
**Example Flow:**
```bash
# User searches and selects result
$ search-store system:limit=5
# Result items include:
{
"hash": "a1b2c3d4...",
"store": "home", # Specific Hydrus instance
"title": "example.mp4"
}
# User selects @2 (index 1)
$ @2 | add-file -storage test
# add-file now:
1. Extracts hash="a1b2c3d4..." store="home" from result dict
2. Calls FileStorage["home"].get_file("a1b2c3d4...")
3. Retrieves actual file path from "home" backend
4. Proceeds with copy/upload to "test" storage
```
### 3. Benefits of This Approach
**Consistency:**
- @N selection always uses the same hash+store regardless of display order
- No confusion about which row index maps to which file
- Table synchronization issues (rows vs items) don't break selection
**Reliability:**
- Hash uniquely identifies content (SHA256 collision is effectively impossible)
- Store identifies the authoritative source backend
- No dependency on temporary paths or file locations
**Multi-Instance Support:**
- Works seamlessly with multiple Hydrus instances ("home", "work")
- Works with mixed backends (Hydrus + Folder + Remote)
- Each backend can independently retrieve file by hash
**Debugging:**
- Hash+store are visible in debug logs: `[add-file] Using hash+store: hash=a1b2c3d4..., store=home`
- Easy to trace which backend is being queried
- Clear error messages when hash+store lookup fails
## How @N Selection Works Now
### Selection Process:
1. **Search creates result list with hash+store:**
```python
results_list = [
{"hash": "abc123...", "store": "home", "title": "file1.mp4"},
{"hash": "def456...", "store": "default", "title": "file2.jpg"},
{"hash": "ghi789...", "store": "test", "title": "file3.png"},
]
```
2. **User selects @2 (second item, index 1):**
- CLI extracts: `result = {"hash": "def456...", "store": "default", "title": "file2.jpg"}`
- Passes this dict to the next cmdlet
3. **Next cmdlet receives dict with hash+store:**
```python
def run(self, result, args, config):
# result is the dict from selection
file_hash = result.get("hash") # "def456..."
store_name = result.get("store") # "default"
# Use hash+store to retrieve file
backend = FileStorage(config)[store_name]
file_path = backend.get_file(file_hash)
```
### Why This is Better Than Path-Based:
**Path-Based (OLD):**
```python
# Fragile: path could be temp file, symlink, moved file, etc.
result = {"file_path": "/tmp/hydrus-abc123.mp4"}
# What if file was moved? What if it's a temp path that expires?
```
**Hash+Store (NEW):**
```python
# Reliable: hash+store always works regardless of current location
result = {"hash": "abc123...", "store": "home"}
# Backend retrieves current location from its database/API
```
## Testing the Fixes
### 1. Test Database Connections:
```powershell
# Search multiple times and check for database locks
search-store system:limit=5
search-store system:limit=5
search-store system:limit=5
# Should complete without "database is locked" errors
```
### 2. Test Hash+Store Selection:
```powershell
# Search and select
search-store system:limit=5
@2 | get-metadata
# Should show metadata for the selected file using hash+store
# Debug log should show: [add-file] Using hash+store from result: hash=...
```
### 3. Test WorkerManager Cleanup:
```powershell
# In Python script:
from helper.worker_manager import WorkerManager
from pathlib import Path
with WorkerManager(Path("C:/path/to/library")) as wm:
# Do work
pass
# Database automatically closed when exiting block
```
## Cmdlets That Already Use Hash+Store Pattern
These cmdlets already correctly extract hash+store:
- ✅ `get-file` - Export file via hash+store
- ✅ `get-metadata` - Retrieve metadata via hash+store
- ✅ `get-url` - Get url via hash+store
- ✅ `get-tag` - Get tags via hash+store
- ✅ `add-url` - Add URL via hash+store
- ✅ `delete-url` - Delete URL via hash+store
- ✅ `add-file` - **NOW UPDATED** to prioritize hash+store
## Future Improvements
1. **Make hash+store mandatory in result dicts:**
- All search cmdlets should emit hash+store
- Validate that result dicts include these fields
2. **Add hash+store validation:**
- Warn if hash is not 64-char hex string
- Warn if store is not a registered backend
3. **Standardize error messages:**
- "File not found via hash+store: hash=abc123 store=home"
- Makes debugging much clearer
4. **Consider deprecating path-based workflows:**
- Migrate legacy cmdlets to hash+store pattern
- Remove path-based fallbacks once all cmdlets updated
## Key Takeaway
**The hash+store pair is now the primary way to identify and retrieve files across the entire system.** This makes the codebase more reliable, consistent, and easier to debug. Database connections are properly cleaned up to prevent locks and resource leaks.

127
MODELS_REFACTOR_SUMMARY.md Normal file
View File

@@ -0,0 +1,127 @@
# Models.py Refactoring Summary
## Overview
Refactored `models.py` PipeObject class to align with the hash+store canonical pattern, removing all backwards compatibility and legacy code.
## PipeObject Changes
### Removed Legacy Fields
-`source` - Replaced with `store` (storage backend name)
-`identifier` - Replaced with `hash` (SHA-256 hash)
-`file_hash` - Replaced with `hash` (canonical field)
-`remote_metadata` - Removed (can go in metadata dict or extra)
-`mpv_metadata` - Removed (can go in metadata dict or extra)
-`king_hash` - Moved to relationships dict
-`alt_hashes` - Moved to relationships dict
-`related_hashes` - Moved to relationships dict
-`parent_id` - Renamed to `parent_hash` for consistency
### New Canonical Fields
```python
@dataclass(slots=True)
class PipeObject:
hash: str # SHA-256 hash (canonical identifier)
store: str # Storage backend name (e.g., 'default', 'hydrus', 'test')
tags: List[str]
title: Optional[str]
source_url: Optional[str]
duration: Optional[float]
metadata: Dict[str, Any]
warnings: List[str]
file_path: Optional[str]
relationships: Dict[str, Any] # Contains king/alt/related
is_temp: bool
action: Optional[str]
parent_hash: Optional[str] # Renamed from parent_id
extra: Dict[str, Any]
```
### Updated Methods
#### Removed
-`register_as_king(file_hash)` - Replaced with `add_relationship()`
-`add_alternate(alt_hash)` - Replaced with `add_relationship()`
-`add_related(related_hash)` - Replaced with `add_relationship()`
-`@property hash` - Now a direct field
-`as_dict()` - Removed backwards compatibility alias
-`to_serializable()` - Removed backwards compatibility alias
#### Added/Updated
-`add_relationship(rel_type, rel_hash)` - Generic relationship management
-`get_relationships()` - Returns copy of relationships dict
-`to_dict()` - Updated to serialize new fields
## Updated Files
### cmdlets/_shared.py
- Updated `coerce_to_pipe_object()` to use hash+store pattern
- Now computes hash from file_path if not provided
- Extracts relationships dict instead of individual king/alt/related fields
- Removes all references to source/identifier/file_hash
### cmdlets/add_file.py
- Updated `_update_pipe_object_destination()` signature to use hash/store
- Updated `_resolve_source()` to use pipe_obj.hash
- Updated `_prepare_metadata()` to use pipe_obj.hash
- Updated `_resolve_file_hash()` to check pipe_obj.hash
- Updated all call sites to pass hash/store instead of source/identifier/file_hash
### cmdlets/add_tag.py & cmdlets/add_tags.py
- Updated to access `res.hash` instead of `res.file_hash`
- Updated dict access to use `get('hash')` instead of `get('file_hash')`
### cmdlets/trim_file.py
- Updated to access `item.hash` instead of `item.file_hash`
- Updated dict access to use `get('hash')` only
### metadata.py
- Updated IMDb, MusicBrainz, and OpenLibrary tag extraction to return dicts directly
- Removed PipeObject instantiation with old signature (source/identifier)
- Updated remote metadata function to return dict instead of using PipeObject
## Benefits
1. **Canonical Pattern**: All file operations now use hash+store as the single source of truth
2. **Simplified Model**: Removed 9 legacy fields, consolidated into 2 canonical fields + relationships dict
3. **Consistency**: All cmdlets now use the same hash+store pattern for identification
4. **Maintainability**: One code path, no backwards compatibility burden
5. **Type Safety**: Direct fields instead of computed properties
6. **Flexibility**: Relationships dict allows for extensible relationship types
## Migration Notes
### Old Code
```python
pipe_obj = PipeObject(
source="hydrus",
identifier=file_hash,
file_hash=file_hash,
king_hash=king,
alt_hashes=[alt1, alt2]
)
```
### New Code
```python
pipe_obj = PipeObject(
hash=file_hash,
store="hydrus",
relationships={
"king": king,
"alt": [alt1, alt2]
}
)
```
### Accessing Fields
| Old | New |
|-----|-----|
| `obj.file_hash` | `obj.hash` |
| `obj.source` | `obj.store` |
| `obj.identifier` | `obj.hash` |
| `obj.king_hash` | `obj.relationships.get("king")` |
| `obj.alt_hashes` | `obj.relationships.get("alt", [])` |
| `obj.parent_id` | `obj.parent_hash` |
## Zero Backwards Compatibility
As requested, **all backwards compatibility has been removed**. Old code using the previous PipeObject signature will need to be updated to use hash+store.

79
NEXT_DEBUG_SESSION.md Normal file
View File

@@ -0,0 +1,79 @@
NEXT DEBUGGING SESSION
======================
Run these commands in sequence and watch the [add-file] and [search-store] debug logs:
Step 1: Search and observe table/items mismatch
------
$ search-store system:limit=5
Expected output:
- Should see your 4 items in the table
- Watch for: [search-store] Added X rows to table, Y items to results_list
- If X=1 and Y=4: Problem is in table.add_result() or _ensure_storage_columns()
- If X=4 and Y=4: Problem is in CLI selection mapping (elsewhere)
Step 2: Test selection with debugging
------
$ @2 | add-file -storage test
Expected output:
- [add-file] INPUT result details should show the item you selected
- [add-file] RESOLVED source should have hash and store
- If either is missing/wrong: result object structure is wrong
- If both are correct: problem is in source resolution logic
Step 3: If selection works
------
If you successfully select @2 and add-file processes it:
- Congratulations! The issue was a one-time glitch
- If it fails again, compare debug logs to this run
Step 4: If selection still fails
------
Collect these logs:
1. Output of: search-store system:limit=5
2. Output of: @2 | add-file -storage test
3. Run diagnostic command to verify table state:
$ search-store system:limit=5 | .pipe
(This will show what .pipe sees in the results)
Step 5: Understanding @N selection format
------
When you see: [debug] first-stage: sel=[1] rows=1 items=4
- sel=[1] means you selected @2 (0-based index: @2 = index 1)
- rows=1 means the table object has only 1 row registered
- items=4 means there are 4 items in the results_list
The fix depends on which is wrong:
- If rows should be 4: table.add_result() isn't adding rows
- If items should be 1: results are being duplicated somehow
QUICK REFERENCE: DEBUGGING COMMANDS
===================================
Show debug logs:
$ debug on
$ search-store system:limit=5
$ @2 | add-file -storage test
Check what @2 selection resolves to:
$ @2 | get-metadata
Alternative (bypass @N selection issue):
$ search-store system:limit=5 | get-metadata -store home | .pipe
This avoids the @N selection and directly pipes results through cmdlets.
EXPECTED BEHAVIOR
================
Correct sequence when selection works:
1. search-store finds 4 results
2. [search-store] Added 4 rows to table, 4 items to results_list
3. @2 selects item at index 1 (second item: "i ve been down")
4. [add-file] INPUT result is dict: title=i ve been down, hash=b0780e68a2dc..., store=hydrus
5. [add-file] RESOLVED source: path=/tmp/medios-hydrus/..., is_hydrus=True, hash=b0780e68a2dc...
6. File is successfully added to "test" storage
If you see different output, the logs will show exactly where it diverges.

View File

@@ -0,0 +1,127 @@
# Pipeline Refactoring Summary
## Overview
Refactored `pipeline.py` to remove all backwards compatibility and legacy code, consolidating on a single modern context-based approach using `PipelineStageContext`.
## Changes Made
### 1. Removed Legacy Global Variables
-`_PIPE_EMITS` - Replaced with `PipelineStageContext.emits`
-`_PIPE_ACTIVE` - Replaced with checking `_CURRENT_CONTEXT is not None`
-`_PIPE_IS_LAST` - Replaced with `PipelineStageContext.is_last_stage`
-`_LAST_PIPELINE_CAPTURE` - Removed (unused ephemeral handoff)
### 2. Removed Legacy Functions
-`set_active(bool)` - No longer needed, context tracks this
-`set_last_stage(bool)` - No longer needed, context tracks this
-`set_last_capture(obj)` - Removed
-`get_last_capture()` - Removed
### 3. Updated Core Functions
#### `emit(obj)`
**Before:** Dual-path with fallback to legacy `_PIPE_EMITS`
```python
if _CURRENT_CONTEXT is not None:
_CURRENT_CONTEXT.emit(obj)
return
_PIPE_EMITS.append(obj) # Legacy fallback
```
**After:** Single context-based path
```python
if _CURRENT_CONTEXT is not None:
_CURRENT_CONTEXT.emit(obj)
```
#### `emit_list(objects)`
**Before:** Dual-path with legacy fallback
**After:** Single context-based path, removed duplicate definition
#### `print_if_visible()`
**Before:** Checked `_PIPE_ACTIVE` and `_PIPE_IS_LAST`
```python
should_print = (not _PIPE_ACTIVE) or _PIPE_IS_LAST
```
**After:** Uses context state
```python
should_print = (_CURRENT_CONTEXT is None) or (_CURRENT_CONTEXT.is_last_stage)
```
#### `get_emitted_items()`
**Before:** Returned `_PIPE_EMITS`
**After:** Returns `_CURRENT_CONTEXT.emits` if context exists
#### `clear_emits()`
**Before:** Cleared global `_PIPE_EMITS`
**After:** Clears `_CURRENT_CONTEXT.emits` if context exists
#### `reset()`
**Before:** Reset 10+ legacy variables
**After:** Only resets active state variables, sets `_CURRENT_CONTEXT = None`
### 4. Updated Call Sites
#### TUI/pipeline_runner.py
**Before:**
```python
ctx.set_stage_context(pipeline_ctx)
ctx.set_active(True)
ctx.set_last_stage(index == total - 1)
# ...
ctx.set_stage_context(None)
ctx.set_active(False)
```
**After:**
```python
ctx.set_stage_context(pipeline_ctx)
# ...
ctx.set_stage_context(None)
```
#### CLI.py (2 locations)
**Before:**
```python
ctx.set_stage_context(pipeline_ctx)
ctx.set_active(True)
```
**After:**
```python
ctx.set_stage_context(pipeline_ctx)
```
## Result
### Code Reduction
- Removed ~15 lines of legacy global variable declarations
- Removed ~30 lines of legacy function definitions
- Removed ~10 lines of dual-path logic in core functions
- Removed ~8 lines of redundant function calls at call sites
### Benefits
1. **Single Source of Truth**: All pipeline state is now in `PipelineStageContext`
2. **Cleaner API**: No redundant `set_active()` / `set_last_stage()` calls needed
3. **Type Safety**: Context object provides better type hints and IDE support
4. **Maintainability**: One code path to maintain, no backwards compatibility burden
5. **Clarity**: Intent is clear - context manages all stage-related state
## Preserved Functionality
All user-facing functionality remains unchanged:
-@N selection syntax
- ✅ Result table history (@.. and @,,)
- ✅ Display overlays
- ✅ Pipeline value storage/retrieval
- ✅ Worker attribution
- ✅ UI refresh callbacks
- ✅ Pending pipeline tail preservation
## Type Checking Notes
Some type checker warnings remain about accessing attributes on Optional types (e.g., `_LAST_RESULT_TABLE.source_command`). These are safe because:
1. Code uses `_is_selectable_table()` runtime checks before access
2. Functions check `is not None` before attribute access
3. These warnings are false positives from static analysis
These do not represent actual runtime bugs.

View File

@@ -38,8 +38,32 @@ Adding your first file
.pipe "https://www.youtube.com/watch?v=_23dFb50Z2Y" # Add URL to current playlist
```
Example pipelines:
1. **Simple download with metadata (tags and URL registration)**:
```
download-media "https://www.youtube.com/watch?v=dQw4w9WgXcQ" | add-file -storage local | add-url
```
2. **Download playlist item with tags**:
```
download-media "https://www.youtube.com/playlist?list=PLxxxxx" -item 2 | add-file -storage local | add-url
```
3. **Download with merge (e.g., Bandcamp albums)**:
```
download-data "https://altrusiangrace.bandcamp.com/album/ancient-egyptian-legends-full-audiobook" | merge-file | add-file -storage local | add-url
```
4. **Download direct file (PDF, document)**:
```
download-file "https://example.com/file.pdf" | add-file -storage local | add-url
```
Search examples:
1. search-file -provider youtube "something in the way"
2. @1
1. download-data "https://altrusiangrace.bandcamp.com/album/ancient-egyptian-legends-full-audiobook" | merge-file | add-file -storage local
3. download-media [URL] | add-file -storage local | add-url

View File

@@ -1,4 +1,4 @@
"""Modal for displaying files/URLs to access in web mode."""
"""Modal for displaying files/url to access in web mode."""
from textual.screen import ModalScreen
from textual.containers import Container, Vertical, Horizontal
@@ -93,7 +93,7 @@ class AccessModal(ModalScreen):
yield Label("[bold cyan]File:[/bold cyan]", classes="access-label")
# Display as clickable link using HTML link element for web mode
# Rich link markup `[link=URL]` has parsing issues with URLs containing special chars
# Rich link markup `[link=URL]` has parsing issues with url containing special chars
# Instead, use the HTML link markup that Textual-serve renders as <a> tag
# Format: [link=URL "tooltip"]text[/link] - the quotes help with parsing
link_text = f'[link="{self.item_content}"]Open in Browser[/link]'

View File

@@ -233,8 +233,8 @@ class DownloadModal(ModalScreen):
self.screenshot_checkbox.value = False
self.playlist_merge_checkbox.value = False
# Initialize PDF playlist URLs (set by _handle_pdf_playlist)
self.pdf_urls = []
# Initialize PDF playlist url (set by _handle_pdf_playlist)
self.pdf_url = []
self.is_pdf_playlist = False
# Hide playlist by default (show format select)
@@ -288,10 +288,10 @@ class DownloadModal(ModalScreen):
# Launch the background worker with PDF playlist info
self._submit_worker(url, tags, source, download_enabled, playlist_selection, merge_enabled,
is_pdf_playlist=self.is_pdf_playlist, pdf_urls=self.pdf_urls if self.is_pdf_playlist else [])
is_pdf_playlist=self.is_pdf_playlist, pdf_url=self.pdf_url if self.is_pdf_playlist else [])
@work(thread=True)
def _submit_worker(self, url: str, tags: list, source: str, download_enabled: bool, playlist_selection: str = "", merge_enabled: bool = False, is_pdf_playlist: bool = False, pdf_urls: Optional[list] = None) -> None:
def _submit_worker(self, url: str, tags: list, source: str, download_enabled: bool, playlist_selection: str = "", merge_enabled: bool = False, is_pdf_playlist: bool = False, pdf_url: Optional[list] = None) -> None:
"""Background worker to execute the cmdlet pipeline.
Args:
@@ -302,10 +302,10 @@ class DownloadModal(ModalScreen):
playlist_selection: Playlist track selection (e.g., "1-3", "all", "merge")
merge_enabled: Whether to merge playlist files after download
is_pdf_playlist: Whether this is a PDF pseudo-playlist
pdf_urls: List of PDF URLs if is_pdf_playlist is True
pdf_url: List of PDF url if is_pdf_playlist is True
"""
if pdf_urls is None:
pdf_urls = []
if pdf_url is None:
pdf_url = []
# Initialize worker to None so outer exception handler can check it
worker = None
@@ -340,9 +340,9 @@ class DownloadModal(ModalScreen):
worker.log_step("Download initiated")
# Handle PDF playlist specially
if is_pdf_playlist and pdf_urls:
logger.info(f"Processing PDF playlist with {len(pdf_urls)} PDFs")
self._handle_pdf_playlist_download(pdf_urls, tags, playlist_selection, merge_enabled)
if is_pdf_playlist and pdf_url:
logger.info(f"Processing PDF playlist with {len(pdf_url)} PDFs")
self._handle_pdf_playlist_download(pdf_url, tags, playlist_selection, merge_enabled)
self.app.call_from_thread(self._hide_progress)
self.app.call_from_thread(self.dismiss)
return
@@ -690,7 +690,7 @@ class DownloadModal(ModalScreen):
'media_kind': 'audio',
'hash_hex': None,
'hash': None,
'known_urls': [],
'url': [],
'title': filepath_obj.stem
})()
files_to_merge.append(file_result)
@@ -934,8 +934,8 @@ class DownloadModal(ModalScreen):
"""Scrape metadata from URL(s) in URL textarea - wipes tags and source.
This is triggered by Ctrl+T when URL textarea is focused.
Supports single URL or multiple URLs (newline/comma-separated).
For multiple PDF URLs, creates pseudo-playlist for merge workflow.
Supports single URL or multiple url (newline/comma-separated).
For multiple PDF url, creates pseudo-playlist for merge workflow.
"""
try:
text = self.paragraph_textarea.text.strip()
@@ -943,29 +943,29 @@ class DownloadModal(ModalScreen):
logger.warning("No URL to scrape metadata from")
return
# Parse multiple URLs (newline or comma-separated)
urls = []
# Parse multiple url (newline or comma-separated)
url = []
for line in text.split('\n'):
line = line.strip()
if line:
# Handle comma-separated URLs within a line
# Handle comma-separated url within a line
for url in line.split(','):
url = url.strip()
if url:
urls.append(url)
url.append(url)
# Check if multiple URLs provided
if len(urls) > 1:
logger.info(f"Detected {len(urls)} URLs - checking for PDF pseudo-playlist")
# Check if all URLs appear to be PDFs
all_pdfs = all(url.endswith('.pdf') or 'pdf' in url.lower() for url in urls)
# Check if multiple url provided
if len(url) > 1:
logger.info(f"Detected {len(url)} url - checking for PDF pseudo-playlist")
# Check if all url appear to be PDFs
all_pdfs = all(url.endswith('.pdf') or 'pdf' in url.lower() for url in url)
if all_pdfs:
logger.info(f"All URLs are PDFs - creating pseudo-playlist")
self._handle_pdf_playlist(urls)
logger.info(f"All url are PDFs - creating pseudo-playlist")
self._handle_pdf_playlist(url)
return
# Single URL - proceed with normal metadata scraping
url = urls[0] if urls else text.strip()
url = url[0] if url else text.strip()
logger.info(f"Scraping fresh metadata from: {url}")
# Check if tags are already provided in textarea
@@ -1044,21 +1044,21 @@ class DownloadModal(ModalScreen):
)
def _handle_pdf_playlist(self, pdf_urls: list) -> None:
"""Handle multiple PDF URLs as a pseudo-playlist.
def _handle_pdf_playlist(self, pdf_url: list) -> None:
"""Handle multiple PDF url as a pseudo-playlist.
Creates a playlist-like structure with PDF metadata for merge workflow.
Extracts title from URL or uses default naming.
Args:
pdf_urls: List of PDF URLs to process
pdf_url: List of PDF url to process
"""
try:
logger.info(f"Creating PDF pseudo-playlist with {len(pdf_urls)} items")
logger.info(f"Creating PDF pseudo-playlist with {len(pdf_url)} items")
# Create playlist items from PDF URLs
# Create playlist items from PDF url
playlist_items = []
for idx, url in enumerate(pdf_urls, 1):
for idx, url in enumerate(pdf_url, 1):
# Extract filename from URL for display
try:
# Get filename from URL path
@@ -1083,15 +1083,15 @@ class DownloadModal(ModalScreen):
# Build minimal metadata structure for UI population
metadata = {
'title': f'{len(pdf_urls)} PDF Documents',
'title': f'{len(pdf_url)} PDF Documents',
'tags': [],
'formats': [('pdf', 'pdf')], # Default format is PDF
'playlist_items': playlist_items,
'is_pdf_playlist': True # Mark as PDF pseudo-playlist
}
# Store URLs for later use during merge
self.pdf_urls = pdf_urls
# Store url for later use during merge
self.pdf_url = pdf_url
self.is_pdf_playlist = True
# Populate the modal with metadata
@@ -1099,7 +1099,7 @@ class DownloadModal(ModalScreen):
self._populate_from_metadata(metadata, wipe_tags_and_source=True)
self.app.notify(
f"Loaded {len(pdf_urls)} PDFs as playlist",
f"Loaded {len(pdf_url)} PDFs as playlist",
title="PDF Playlist",
severity="information",
timeout=3
@@ -1115,11 +1115,11 @@ class DownloadModal(ModalScreen):
)
def _handle_pdf_playlist_download(self, pdf_urls: list, tags: list, selection: str, merge_enabled: bool) -> None:
def _handle_pdf_playlist_download(self, pdf_url: list, tags: list, selection: str, merge_enabled: bool) -> None:
"""Download and merge PDF playlist.
Args:
pdf_urls: List of PDF URLs to download
pdf_url: List of PDF url to download
tags: Tags to apply to the merged PDF
selection: Selection string like "1-3" or "1,3,5"
merge_enabled: Whether to merge the PDFs
@@ -1141,7 +1141,7 @@ class DownloadModal(ModalScreen):
# Create temporary list of playlist items for selection parsing
# We need this because _parse_playlist_selection uses self.playlist_items
temp_items = []
for url in pdf_urls:
for url in pdf_url:
temp_items.append({'title': url})
self.playlist_items = temp_items
@@ -1149,20 +1149,20 @@ class DownloadModal(ModalScreen):
selected_indices = self._parse_playlist_selection(selection)
if not selected_indices:
# No valid selection, use all
selected_indices = list(range(len(pdf_urls)))
selected_indices = list(range(len(pdf_url)))
selected_urls = [pdf_urls[i] for i in selected_indices]
selected_url = [pdf_url[i] for i in selected_indices]
logger.info(f"Downloading {len(selected_urls)} selected PDFs for merge")
logger.info(f"Downloading {len(selected_url)} selected PDFs for merge")
# Download PDFs to temporary directory
temp_dir = Path.home() / ".downlow_temp_pdfs"
temp_dir.mkdir(exist_ok=True)
downloaded_files = []
for idx, url in enumerate(selected_urls, 1):
for idx, url in enumerate(selected_url, 1):
try:
logger.info(f"Downloading PDF {idx}/{len(selected_urls)}: {url}")
logger.info(f"Downloading PDF {idx}/{len(selected_url)}: {url}")
response = requests.get(url, timeout=30)
response.raise_for_status()
@@ -1619,7 +1619,7 @@ class DownloadModal(ModalScreen):
)
return
else:
success_msg = "download-data completed successfully"
success_msg = "download-data completed successfully"
logger.info(success_msg)
if worker:
worker.append_stdout(f"{success_msg}\n")
@@ -1670,7 +1670,7 @@ class DownloadModal(ModalScreen):
worker.append_stdout(f"{warning_msg}\n")
else:
if worker:
worker.append_stdout("Tags applied successfully\n")
worker.append_stdout("Tags applied successfully\n")
except Exception as e:
error_msg = f"❌ Tagging error: {e}"
logger.error(error_msg, exc_info=True)
@@ -1684,7 +1684,7 @@ class DownloadModal(ModalScreen):
worker.append_stdout(f"{warning_msg}\n")
else:
if worker:
worker.append_stdout("Download complete (no tags to apply)\n")
worker.append_stdout("Download complete (no tags to apply)\n")
def _show_format_select(self) -> None:
"""Show format select (always visible for single files)."""
@@ -1770,9 +1770,9 @@ class DownloadModal(ModalScreen):
# Namespaces to exclude (metadata-only, not user-facing)
excluded_namespaces = {
'hash', # Hash values (internal)
'known_url', # URLs (internal)
'url', # url (internal)
'relationship', # Internal relationships
'url', # URLs (internal)
'url', # url (internal)
}
# Add all other tags

View File

@@ -350,9 +350,9 @@ class ExportModal(ModalScreen):
if tag:
export_tags.add(tag)
# For Hydrus export, filter out metadata-only tags (hash:, known_url:, relationship:)
# For Hydrus export, filter out metadata-only tags (hash:, url:, relationship:)
if export_to == "libraries" and library == "hydrus":
metadata_prefixes = {'hash:', 'known_url:', 'relationship:'}
metadata_prefixes = {'hash:', 'url:', 'relationship:'}
export_tags = {tag for tag in export_tags if not any(tag.lower().startswith(prefix) for prefix in metadata_prefixes)}
logger.info(f"Filtered tags for Hydrus - removed metadata tags, {len(export_tags)} tags remaining")
@@ -404,9 +404,9 @@ class ExportModal(ModalScreen):
metadata = self.result_data.get('metadata', {})
# Extract file source info from result_data (passed by hub-ui)
file_hash = self.result_data.get('file_hash')
file_url = self.result_data.get('file_url')
file_path = self.result_data.get('file_path') # For local files
file_hash = self.result_data.get('hash') or self.result_data.get('file_hash')
file_url = self.result_data.get('url') or self.result_data.get('file_url')
file_path = self.result_data.get('path') or self.result_data.get('file_path') # For local files
source = self.result_data.get('source', 'unknown')
# Prepare export data
@@ -419,8 +419,11 @@ class ExportModal(ModalScreen):
'format': file_format,
'metadata': metadata,
'original_data': self.result_data,
'hash': file_hash,
'file_hash': file_hash,
'url': file_url,
'file_url': file_url,
'path': file_path,
'file_path': file_path, # Pass file path for local files
'source': source,
}

View File

@@ -16,7 +16,7 @@ import asyncio
sys.path.insert(0, str(Path(__file__).parent.parent))
from config import load_config
from result_table import ResultTable
from helper.search_provider import get_provider
from helper.provider import get_provider
logger = logging.getLogger(__name__)
@@ -183,7 +183,7 @@ class SearchModal(ModalScreen):
else:
# Fallback if no columns defined
row.add_column("Title", res.title)
row.add_column("Target", res.target)
row.add_column("Target", getattr(res, 'path', None) or getattr(res, 'url', None) or getattr(res, 'target', None) or '')
self.current_result_table = table

View File

@@ -197,8 +197,6 @@ class PipelineExecutor:
pipeline_ctx = ctx.PipelineStageContext(stage_index=index, total_stages=total)
ctx.set_stage_context(pipeline_ctx)
ctx.set_active(True)
ctx.set_last_stage(index == total - 1)
try:
return_code = cmd_fn(piped_input, list(stage_args), self._config)
@@ -210,7 +208,6 @@ class PipelineExecutor:
return stage
finally:
ctx.set_stage_context(None)
ctx.set_active(False)
emitted = list(getattr(pipeline_ctx, "emits", []) or [])
stage.emitted = emitted

View File

@@ -24,70 +24,12 @@ def register(names: Iterable[str]):
return _wrap
class AutoRegister:
"""Decorator that automatically registers a cmdlet function using CMDLET.aliases.
Usage:
CMDLET = Cmdlet(
name="delete-file",
aliases=["del", "del-file"],
...
)
@AutoRegister(CMDLET)
def _run(result, args, config) -> int:
...
Registers the cmdlet under:
- Its main name from CMDLET.name
- All aliases from CMDLET.aliases
This allows the help display to show: "cmd: delete-file | alias: del, del-file"
"""
def __init__(self, cmdlet):
self.cmdlet = cmdlet
def __call__(self, fn: Cmdlet) -> Cmdlet:
"""Register fn for the main name and all aliases in cmdlet."""
normalized_name = None
# Register for main name first
if hasattr(self.cmdlet, 'name') and self.cmdlet.name:
normalized_name = self.cmdlet.name.replace('_', '-').lower()
REGISTRY[normalized_name] = fn
# Register for all aliases
if hasattr(self.cmdlet, 'aliases') and self.cmdlet.aliases:
for alias in self.cmdlet.aliases:
normalized_alias = alias.replace('_', '-').lower()
# Always register (aliases are separate from main name)
REGISTRY[normalized_alias] = fn
return fn
def get(cmd_name: str) -> Cmdlet | None:
return REGISTRY.get(cmd_name.replace('_', '-').lower())
def format_cmd_help(cmdlet) -> str:
"""Format a cmdlet for help display showing cmd:name and aliases.
Example output: "delete-file | aliases: del, del-file"
"""
if not hasattr(cmdlet, 'name'):
return str(cmdlet)
cmd_str = f"cmd: {cmdlet.name}"
if hasattr(cmdlet, 'aliases') and cmdlet.aliases:
aliases_str = ", ".join(cmdlet.aliases)
cmd_str += f" | aliases: {aliases_str}"
return cmd_str
# Dynamically import all cmdlet modules in this directory (ignore files starting with _ and __init__.py)
# Cmdlets self-register when instantiated via their __init__ method
import os
cmdlet_dir = os.path.dirname(__file__)
for filename in os.listdir(cmdlet_dir):
@@ -106,27 +48,7 @@ for filename in os.listdir(cmdlet_dir):
continue
try:
module = _import_module(f".{mod_name}", __name__)
# Auto-register based on CMDLET object with exec function
# This allows cmdlets to be fully self-contained in the CMDLET object
if hasattr(module, 'CMDLET'):
cmdlet_obj = module.CMDLET
# Get the execution function from the CMDLET object
run_fn = getattr(cmdlet_obj, 'exec', None) if hasattr(cmdlet_obj, 'exec') else None
if callable(run_fn):
# Register main name
if hasattr(cmdlet_obj, 'name') and cmdlet_obj.name:
normalized_name = cmdlet_obj.name.replace('_', '-').lower()
REGISTRY[normalized_name] = run_fn
# Register all aliases
if hasattr(cmdlet_obj, 'aliases') and cmdlet_obj.aliases:
for alias in cmdlet_obj.aliases:
normalized_alias = alias.replace('_', '-').lower()
REGISTRY[normalized_alias] = run_fn
_import_module(f".{mod_name}", __name__)
except Exception as e:
import sys
print(f"Error importing cmdlet '{mod_name}': {e}", file=sys.stderr)
@@ -141,8 +63,6 @@ except Exception:
pass
# Import root-level modules that also register cmdlets
# Note: search_libgen, search_soulseek, and search_debrid are now consolidated into search_provider.py
# Use search-file -provider libgen, -provider soulseek, or -provider debrid instead
for _root_mod in ("select_cmdlet",):
try:
_import_module(_root_mod)

View File

@@ -11,7 +11,7 @@ import sys
import inspect
from collections.abc import Iterable as IterableABC
from helper.logger import log
from helper.logger import log, debug
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Sequence, Set
from dataclasses import dataclass, field
@@ -37,22 +37,9 @@ class CmdletArg:
"""Optional handler function/callable for processing this argument's value"""
variadic: bool = False
"""Whether this argument accepts multiple values (consumes remaining positional args)"""
def to_dict(self) -> Dict[str, Any]:
"""Convert to dict for backward compatibility."""
d = {
"name": self.name,
"type": self.type,
"required": self.required,
"description": self.description,
"variadic": self.variadic,
}
if self.choices:
d["choices"] = self.choices
if self.alias:
d["alias"] = self.alias
return d
usage: str = ""
"""dsf"""
def resolve(self, value: Any) -> Any:
"""Resolve/process the argument value using the handler if available.
@@ -135,11 +122,68 @@ class SharedArgs:
# File/Hash arguments
HASH = CmdletArg(
"hash",
name="hash",
type="string",
description="Override the Hydrus file hash (SHA256) to target instead of the selected result."
description="File hash (SHA256, 64-char hex string)",
)
STORE = CmdletArg(
name="store",
type="enum",
choices=[], # Dynamically populated via get_store_choices()
description="Selects store",
)
PATH = CmdletArg(
name="path",
type="string",
choices=[], # Dynamically populated via get_store_choices()
description="Selects store",
)
URL = CmdletArg(
name="url",
type="string",
description="http parser",
)
@staticmethod
def get_store_choices(config: Optional[Dict[str, Any]] = None) -> List[str]:
"""Get list of available storage backend names from FileStorage.
This method dynamically discovers all configured storage backends
instead of using a static list. Should be called when building
autocomplete choices or validating store names.
Args:
config: Optional config dict. If not provided, will try to load from config module.
Returns:
List of backend names (e.g., ['default', 'test', 'home', 'work'])
Example:
# In a cmdlet that needs dynamic choices
from helper.store import FileStorage
storage = FileStorage(config)
SharedArgs.STORE.choices = SharedArgs.get_store_choices(config)
"""
try:
from helper.store import FileStorage
# If no config provided, try to load it
if config is None:
try:
from config import load_config
config = load_config()
except Exception:
return []
file_storage = FileStorage(config)
return file_storage.list_backends()
except Exception:
# Fallback to empty list if FileStorage isn't available
return []
LOCATION = CmdletArg(
"location",
type="enum",
@@ -205,16 +249,7 @@ class SharedArgs:
type="string",
description="Output file path."
)
STORAGE = CmdletArg(
"storage",
type="enum",
choices=["hydrus", "local", "ftp", "matrix"],
required=False,
description="Storage location or destination for saving/uploading files.",
alias="s",
handler=lambda val: SharedArgs.resolve_storage(val) if val else None
)
# Generic arguments
QUERY = CmdletArg(
@@ -325,78 +360,61 @@ class Cmdlet:
log(cmd.name) # "add-file"
log(cmd.summary) # "Upload a media file"
log(cmd.args[0].name) # "location"
# Convert to dict for JSON serialization
log(json.dumps(cmd.to_dict()))
"""
name: str
"""Cmdlet name, e.g., 'add-file'"""
""""""
summary: str
"""One-line summary of the cmdlet"""
usage: str
"""Usage string, e.g., 'add-file <location> [-delete]'"""
aliases: List[str] = field(default_factory=list)
alias: List[str] = field(default_factory=list)
"""List of aliases for this cmdlet, e.g., ['add', 'add-f']"""
args: List[CmdletArg] = field(default_factory=list)
arg: List[CmdletArg] = field(default_factory=list)
"""List of arguments accepted by this cmdlet"""
details: List[str] = field(default_factory=list)
detail: List[str] = field(default_factory=list)
"""Detailed explanation lines (for help text)"""
exec: Optional[Any] = field(default=None)
"""The execution function: func(result, args, config) -> int"""
def __post_init__(self) -> None:
"""Auto-discover _run function if exec not explicitly provided.
If exec is None, looks for a _run function in the module where
this Cmdlet was instantiated and uses it automatically.
"""
if self.exec is None:
# Walk up the call stack to find _run in the calling module
frame = inspect.currentframe()
try:
# Walk up frames until we find one with _run in globals
while frame:
if '_run' in frame.f_globals:
self.exec = frame.f_globals['_run']
break
frame = frame.f_back
finally:
del frame # Avoid reference cycles
def to_dict(self) -> Dict[str, Any]:
"""Convert to dict for backward compatibility with existing code.
Returns a dict matching the old CMDLET format so existing code
that expects a dict will still work.
"""
# Format command for display: "cmd: name alias: alias1, alias2"
cmd_display = f"cmd: {self.name}"
if self.aliases:
aliases_str = ", ".join(self.aliases)
cmd_display += f" alias: {aliases_str}"
return {
"name": self.name,
"summary": self.summary,
"usage": self.usage,
"cmd": cmd_display, # Display-friendly command name with aliases on one line
"aliases": self.aliases,
"args": [arg.to_dict() for arg in self.args],
"details": self.details,
}
def __getitem__(self, key: str) -> Any:
"""Dict-like access for backward compatibility.
Allows code like: cmdlet["name"] or cmdlet["args"]
"""
d = self.to_dict()
return d.get(key)
def get(self, key: str, default: Any = None) -> Any:
"""Dict-like get() method for backward compatibility."""
d = self.to_dict()
return d.get(key, default)
def _collect_names(self) -> List[str]:
"""Collect primary name plus aliases, de-duplicated and normalized."""
names: List[str] = []
if self.name:
names.append(self.name)
for alias in (self.alias or []):
if alias:
names.append(alias)
for alias in (getattr(self, "aliases", None) or []):
if alias:
names.append(alias)
seen: Set[str] = set()
deduped: List[str] = []
for name in names:
key = name.replace("_", "-").lower()
if key in seen:
continue
seen.add(key)
deduped.append(name)
return deduped
def register(self) -> "Cmdlet":
"""Register this cmdlet's exec under its name and aliases."""
if not callable(self.exec):
return self
try:
from . import register as _register # Local import to avoid circular import cost
except Exception:
return self
names = self._collect_names()
if not names:
return self
_register(names)(self.exec)
return self
def get_flags(self, arg_name: str) -> set[str]:
"""Generate -name and --name flag variants for an argument.
@@ -432,7 +450,7 @@ class Cmdlet:
elif low in flags.get('tag', set()):
# handle tag
"""
return {arg.name: self.get_flags(arg.name) for arg in self.args}
return {arg.name: self.get_flags(arg.name) for arg in self.arg}
# Tag groups cache (loaded from JSON config file)
@@ -479,19 +497,19 @@ def parse_cmdlet_args(args: Sequence[str], cmdlet_spec: Dict[str, Any] | Cmdlet)
"""
result: Dict[str, Any] = {}
# Handle both dict and Cmdlet objects
if isinstance(cmdlet_spec, Cmdlet):
cmdlet_spec = cmdlet_spec.to_dict()
# Only accept Cmdlet objects
if not isinstance(cmdlet_spec, Cmdlet):
raise TypeError(f"Expected Cmdlet, got {type(cmdlet_spec).__name__}")
# Build arg specs tracking which are positional vs flagged
arg_specs: List[Dict[str, Any]] = cmdlet_spec.get("args", [])
positional_args: List[Dict[str, Any]] = [] # args without prefix in definition
flagged_args: List[Dict[str, Any]] = [] # args with prefix in definition
# Build arg specs from cmdlet
arg_specs: List[CmdletArg] = cmdlet_spec.arg
positional_args: List[CmdletArg] = [] # args without prefix in definition
flagged_args: List[CmdletArg] = [] # args with prefix in definition
arg_spec_map: Dict[str, str] = {} # prefix variant -> canonical name (without prefix)
for spec in arg_specs:
name = spec.get("name")
name = spec.name
if not name:
continue
@@ -520,10 +538,10 @@ def parse_cmdlet_args(args: Sequence[str], cmdlet_spec: Dict[str, Any] | Cmdlet)
# Check if this token is a known flagged argument
if token_lower in arg_spec_map:
canonical_name = arg_spec_map[token_lower]
spec = next((s for s in arg_specs if str(s.get("name", "")).lstrip("-").lower() == canonical_name.lower()), None)
spec = next((s for s in arg_specs if str(s.name).lstrip("-").lower() == canonical_name.lower()), None)
# Check if it's a flag type (which doesn't consume next value, just marks presence)
is_flag = spec and spec.get("type") == "flag"
is_flag = spec and spec.type == "flag"
if is_flag:
# For flags, just mark presence without consuming next token
@@ -535,7 +553,7 @@ def parse_cmdlet_args(args: Sequence[str], cmdlet_spec: Dict[str, Any] | Cmdlet)
value = args[i + 1]
# Check if variadic
is_variadic = spec and spec.get("variadic", False)
is_variadic = spec and spec.variadic
if is_variadic:
if canonical_name not in result:
result[canonical_name] = []
@@ -550,8 +568,8 @@ def parse_cmdlet_args(args: Sequence[str], cmdlet_spec: Dict[str, Any] | Cmdlet)
# Otherwise treat as positional if we have positional args remaining
elif positional_index < len(positional_args):
positional_spec = positional_args[positional_index]
canonical_name = str(positional_spec.get("name", "")).lstrip("-")
is_variadic = positional_spec.get("variadic", False)
canonical_name = str(positional_spec.name).lstrip("-")
is_variadic = positional_spec.variadic
if is_variadic:
# For variadic args, append to a list
@@ -591,6 +609,183 @@ def normalize_hash(hash_hex: Optional[str]) -> Optional[str]:
return text.lower() if text else None
def get_hash_for_operation(override_hash: Optional[str], result: Any, field_name: str = "hash_hex") -> Optional[str]:
"""Get normalized hash from override or result object, consolidating common pattern.
Eliminates repeated pattern: normalize_hash(override) if override else normalize_hash(get_field(result, ...))
Args:
override_hash: Hash passed as command argument (takes precedence)
result: Object containing hash field (fallback)
field_name: Name of hash field in result object (default: "hash_hex")
Returns:
Normalized hash string, or None if neither override nor result provides valid hash
"""
if override_hash:
return normalize_hash(override_hash)
# Try multiple field names for robustness
hash_value = get_field(result, field_name) or getattr(result, field_name, None) or getattr(result, "hash", None) or result.get("file_hash") if isinstance(result, dict) else None
return normalize_hash(hash_value)
def fetch_hydrus_metadata(config: Any, hash_hex: str, **kwargs) -> tuple[Optional[Dict[str, Any]], Optional[int]]:
"""Fetch metadata from Hydrus for a given hash, consolidating common fetch pattern.
Eliminates repeated boilerplate: client initialization, error handling, metadata extraction.
Args:
config: Configuration object (passed to hydrus_wrapper.get_client)
hash_hex: File hash to fetch metadata for
**kwargs: Additional arguments to pass to client.fetch_file_metadata()
Common: include_service_keys_to_tags, include_notes, include_file_url, include_duration, etc.
Returns:
Tuple of (metadata_dict, error_code)
- metadata_dict: Dict from Hydrus (first item in metadata list) or None if unavailable
- error_code: 0 on success, 1 on any error (suitable for returning from cmdlet execute())
"""
from helper import hydrus
hydrus_wrapper = hydrus
try:
client = hydrus_wrapper.get_client(config)
except Exception as exc:
log(f"Hydrus client unavailable: {exc}")
return None, 1
if client is None:
log("Hydrus client unavailable")
return None, 1
try:
payload = client.fetch_file_metadata(hashes=[hash_hex], **kwargs)
except Exception as exc:
log(f"Hydrus metadata fetch failed: {exc}")
return None, 1
items = payload.get("metadata") if isinstance(payload, dict) else None
meta = items[0] if (isinstance(items, list) and items and isinstance(items[0], dict)) else None
return meta, 0
def get_origin(obj: Any, default: Optional[str] = None) -> Optional[str]:
"""Extract origin field with fallback to store/source field, consolidating common pattern.
Supports both dict and object access patterns.
Args:
obj: Object (dict or dataclass) with 'store', 'origin', or 'source' field
default: Default value if none of the fields are found
Returns:
Store/origin/source string, or default if none exist
"""
if isinstance(obj, dict):
return obj.get("store") or obj.get("origin") or obj.get("source") or default
else:
return getattr(obj, "store", None) or getattr(obj, "origin", None) or getattr(obj, "source", None) or default
def get_field(obj: Any, field: str, default: Optional[Any] = None) -> Any:
"""Extract a field from either a dict or object with fallback default.
Handles both dict.get(field) and getattr(obj, field) access patterns.
Also handles lists by accessing the first element.
For PipeObjects, checks the extra field as well.
Used throughout cmdlets to uniformly access fields from mixed types.
Args:
obj: Dict, object, or list to extract from
field: Field name to retrieve
default: Value to return if field not found (default: None)
Returns:
Field value if found, otherwise the default value
Examples:
get_field(result, "hash") # From dict or object
get_field(result, "origin", "unknown") # With default
"""
# Handle lists by accessing the first element
if isinstance(obj, list) and obj:
obj = obj[0]
if isinstance(obj, dict):
# Direct lookup first
val = obj.get(field, default)
if val is not None:
return val
# Fallback aliases for common fields
if field == "path":
for alt in ("file_path", "target", "filepath", "file"):
v = obj.get(alt)
if v:
return v
if field == "hash":
for alt in ("file_hash", "hash_hex"):
v = obj.get(alt)
if v:
return v
if field == "store":
for alt in ("storage", "storage_source", "origin"):
v = obj.get(alt)
if v:
return v
return default
else:
# Try direct attribute access first
value = getattr(obj, field, None)
if value is not None:
return value
# Attribute fallback aliases for common fields
if field == "path":
for alt in ("file_path", "target", "filepath", "file", "url"):
v = getattr(obj, alt, None)
if v:
return v
if field == "hash":
for alt in ("file_hash", "hash_hex"):
v = getattr(obj, alt, None)
if v:
return v
if field == "store":
for alt in ("storage", "storage_source", "origin"):
v = getattr(obj, alt, None)
if v:
return v
# For PipeObjects, also check the extra field
if hasattr(obj, 'extra') and isinstance(obj.extra, dict):
return obj.extra.get(field, default)
return default
def should_show_help(args: Sequence[str]) -> bool:
"""Check if help flag was passed in arguments.
Consolidates repeated pattern of checking for help flags across cmdlets.
Args:
args: Command arguments to check
Returns:
True if any help flag is present (-?, /?, --help, -h, help, --cmdlet)
Examples:
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
"""
try:
return any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args)
except Exception:
return False
def looks_like_hash(candidate: Optional[str]) -> bool:
"""Check if a string looks like a SHA256 hash (64 hex chars).
@@ -609,8 +804,8 @@ def looks_like_hash(candidate: Optional[str]) -> bool:
def pipeline_item_local_path(item: Any) -> Optional[str]:
"""Extract local file path from a pipeline item.
Supports both dataclass objects with .target attribute and dicts.
Returns None for HTTP/HTTPS URLs.
Supports both dataclass objects with .path attribute and dicts.
Returns None for HTTP/HTTPS url.
Args:
item: Pipeline item (PipelineItem dataclass, dict, or other)
@@ -618,15 +813,15 @@ def pipeline_item_local_path(item: Any) -> Optional[str]:
Returns:
Local file path string, or None if item is not a local file
"""
target: Optional[str] = None
if hasattr(item, "target"):
target = getattr(item, "target", None)
path_value: Optional[str] = None
if hasattr(item, "path"):
path_value = getattr(item, "path", None)
elif isinstance(item, dict):
raw = item.get("target") or item.get("path") or item.get("url")
target = str(raw) if raw is not None else None
if not isinstance(target, str):
raw = item.get("path") or item.get("url")
path_value = str(raw) if raw is not None else None
if not isinstance(path_value, str):
return None
text = target.strip()
text = path_value.strip()
if not text:
return None
if text.lower().startswith(("http://", "https://")):
@@ -686,22 +881,60 @@ def collect_relationship_labels(payload: Any, label_stack: List[str] | None = No
def parse_tag_arguments(arguments: Sequence[str]) -> List[str]:
"""Parse tag arguments from command line tokens.
Handles both space-separated and comma-separated tags.
Example: parse_tag_arguments(["tag1,tag2", "tag3"]) -> ["tag1", "tag2", "tag3"]
- Supports comma-separated tags.
- Supports pipe namespace shorthand: "artist:A|B|C" -> artist:A, artist:B, artist:C.
Args:
arguments: Sequence of argument strings
Returns:
List of normalized tag strings (empty strings filtered out)
"""
def _expand_pipe_namespace(text: str) -> List[str]:
parts = text.split('|')
expanded: List[str] = []
last_ns: Optional[str] = None
for part in parts:
segment = part.strip()
if not segment:
continue
if ':' in segment:
ns, val = segment.split(':', 1)
ns = ns.strip()
val = val.strip()
last_ns = ns or last_ns
if last_ns and val:
expanded.append(f"{last_ns}:{val}")
elif ns or val:
expanded.append(f"{ns}:{val}".strip(':'))
else:
if last_ns:
expanded.append(f"{last_ns}:{segment}")
else:
expanded.append(segment)
return expanded
tags: List[str] = []
for argument in arguments:
for token in argument.split(','):
text = token.strip()
if text:
tags.append(text)
if not text:
continue
# Expand namespace shorthand with pipes
pipe_expanded = _expand_pipe_namespace(text)
for entry in pipe_expanded:
candidate = entry.strip()
if not candidate:
continue
if ':' in candidate:
ns, val = candidate.split(':', 1)
ns = ns.strip()
val = val.strip()
candidate = f"{ns}:{val}" if ns or val else ""
if candidate:
tags.append(candidate)
return tags
@@ -944,7 +1177,7 @@ def create_pipe_object_result(
result = {
'source': source,
'id': identifier,
'file_path': file_path,
'path': file_path,
'action': f'cmdlet:{cmdlet_name}', # Format: cmdlet:cmdlet_name
}
@@ -952,6 +1185,7 @@ def create_pipe_object_result(
result['title'] = title
if file_hash:
result['file_hash'] = file_hash
result['hash'] = file_hash
if is_temp:
result['is_temp'] = True
if parent_hash:
@@ -959,6 +1193,13 @@ def create_pipe_object_result(
if tags:
result['tags'] = tags
# Canonical store field: use source for compatibility
try:
if source:
result['store'] = source
except Exception:
pass
# Add any extra fields
result.update(extra)
@@ -996,13 +1237,13 @@ def get_pipe_object_path(pipe_object: Any) -> Optional[str]:
"""Extract file path from PipeObject, dict, or pipeline-friendly object."""
if pipe_object is None:
return None
for attr in ('file_path', 'path', 'target'):
for attr in ('path', 'target'):
if hasattr(pipe_object, attr):
value = getattr(pipe_object, attr)
if value:
return value
if isinstance(pipe_object, dict):
for key in ('file_path', 'path', 'target'):
for key in ('path', 'target'):
value = pipe_object.get(key)
if value:
return value
@@ -1209,40 +1450,40 @@ def extract_title_from_result(result: Any) -> Optional[str]:
return None
def extract_known_urls_from_result(result: Any) -> list[str]:
urls: list[str] = []
def extract_url_from_result(result: Any) -> list[str]:
url: list[str] = []
def _extend(candidate: Any) -> None:
if not candidate:
return
if isinstance(candidate, list):
urls.extend(candidate)
url.extend(candidate)
elif isinstance(candidate, str):
urls.append(candidate)
url.append(candidate)
if isinstance(result, models.PipeObject):
_extend(result.extra.get('known_urls'))
_extend(result.extra.get('url'))
_extend(result.extra.get('url')) # Also check singular url
if isinstance(result.metadata, dict):
_extend(result.metadata.get('known_urls'))
_extend(result.metadata.get('urls'))
_extend(result.metadata.get('url'))
elif hasattr(result, 'known_urls') or hasattr(result, 'urls'):
# Handle objects with known_urls/urls attribute
_extend(getattr(result, 'known_urls', None))
_extend(getattr(result, 'urls', None))
_extend(result.metadata.get('url'))
_extend(result.metadata.get('url'))
elif hasattr(result, 'url') or hasattr(result, 'url'):
# Handle objects with url/url attribute
_extend(getattr(result, 'url', None))
_extend(getattr(result, 'url', None))
if isinstance(result, dict):
_extend(result.get('known_urls'))
_extend(result.get('urls'))
_extend(result.get('url'))
_extend(result.get('url'))
_extend(result.get('url'))
extra = result.get('extra')
if isinstance(extra, dict):
_extend(extra.get('known_urls'))
_extend(extra.get('urls'))
_extend(extra.get('url'))
_extend(extra.get('url'))
_extend(extra.get('url'))
return merge_sequences(urls, case_sensitive=True)
return merge_sequences(url, case_sensitive=True)
def extract_relationships(result: Any) -> Optional[Dict[str, Any]]:
@@ -1272,3 +1513,248 @@ def extract_duration(result: Any) -> Optional[float]:
return float(duration)
except (TypeError, ValueError):
return None
def coerce_to_pipe_object(value: Any, default_path: Optional[str] = None) -> models.PipeObject:
"""Normalize any incoming result to a PipeObject for single-source-of-truth state.
Uses hash+store canonical pattern.
"""
# Debug: Print ResultItem details if coming from search_file.py
try:
from helper.logger import is_debug_enabled, debug
if is_debug_enabled() and hasattr(value, '__class__') and value.__class__.__name__ == 'ResultItem':
debug("[ResultItem -> PipeObject conversion]")
debug(f" origin={getattr(value, 'origin', None)}")
debug(f" title={getattr(value, 'title', None)}")
debug(f" target={getattr(value, 'target', None)}")
debug(f" hash_hex={getattr(value, 'hash_hex', None)}")
debug(f" media_kind={getattr(value, 'media_kind', None)}")
debug(f" tags={getattr(value, 'tags', None)}")
debug(f" tag_summary={getattr(value, 'tag_summary', None)}")
debug(f" size_bytes={getattr(value, 'size_bytes', None)}")
debug(f" duration_seconds={getattr(value, 'duration_seconds', None)}")
debug(f" relationships={getattr(value, 'relationships', None)}")
debug(f" url={getattr(value, 'url', None)}")
debug(f" full_metadata keys={list(getattr(value, 'full_metadata', {}).keys()) if hasattr(value, 'full_metadata') and value.full_metadata else []}")
except Exception:
pass
if isinstance(value, models.PipeObject):
return value
known_keys = {
"hash", "store", "tags", "title", "url", "source_url", "duration", "metadata",
"warnings", "path", "relationships", "is_temp", "action", "parent_hash",
}
# Convert ResultItem to dict to preserve all attributes
if hasattr(value, 'to_dict'):
value = value.to_dict()
if isinstance(value, dict):
# Extract hash and store (canonical identifiers)
hash_val = value.get("hash") or value.get("file_hash")
# Recognize multiple possible store naming conventions (store, origin, storage, storage_source)
store_val = value.get("store") or value.get("origin") or value.get("storage") or value.get("storage_source") or "PATH"
# If the store value is embedded under extra, also detect it
if not store_val or store_val in ("local", "PATH"):
extra_store = None
try:
extra_store = value.get("extra", {}).get("store") or value.get("extra", {}).get("storage") or value.get("extra", {}).get("storage_source")
except Exception:
extra_store = None
if extra_store:
store_val = extra_store
# If no hash, try to compute from path or use placeholder
if not hash_val:
path_val = value.get("path")
if path_val:
try:
from helper.utils import sha256_file
from pathlib import Path
hash_val = sha256_file(Path(path_val))
except Exception:
hash_val = "unknown"
else:
hash_val = "unknown"
# Extract title from filename if not provided
title_val = value.get("title")
if not title_val:
path_val = value.get("path")
if path_val:
try:
from pathlib import Path
title_val = Path(path_val).stem
except Exception:
pass
extra = {k: v for k, v in value.items() if k not in known_keys}
# Extract URL: prefer direct url field, then url list
url_val = value.get("url")
if not url_val:
url = value.get("url") or value.get("url") or []
if url and isinstance(url, list) and len(url) > 0:
url_val = url[0]
# Preserve url in extra if multiple url exist
if url and len(url) > 1:
extra["url"] = url
# Extract relationships
rels = value.get("relationships") or {}
# Consolidate tags: prefer tags_set over tags, tag_summary
tags_val = []
if "tags_set" in value and value["tags_set"]:
tags_val = list(value["tags_set"])
elif "tags" in value and isinstance(value["tags"], (list, set)):
tags_val = list(value["tags"])
elif "tag" in value:
# Single tag string or list
if isinstance(value["tag"], list):
tags_val = value["tag"] # Already a list
else:
tags_val = [value["tag"]] # Wrap single string in list
# Consolidate path: prefer explicit path key, but NOT target if it's a URL
path_val = value.get("path")
# Only use target as path if it's not a URL (url should stay in url field)
if not path_val and "target" in value:
target = value["target"]
if target and not (isinstance(target, str) and (target.startswith("http://") or target.startswith("https://"))):
path_val = target
# If the path value is actually a URL, move it to url_val and clear path_val
try:
if isinstance(path_val, str) and (path_val.startswith("http://") or path_val.startswith("https://")):
# Prefer existing url_val if present, otherwise move path_val into url_val
if not url_val:
url_val = path_val
path_val = None
except Exception:
pass
# Extract media_kind if available
if "media_kind" in value:
extra["media_kind"] = value["media_kind"]
pipe_obj = models.PipeObject(
hash=hash_val,
store=store_val,
tags=tags_val,
title=title_val,
url=url_val,
source_url=value.get("source_url"),
duration=value.get("duration") or value.get("duration_seconds"),
metadata=value.get("metadata") or value.get("full_metadata") or {},
warnings=list(value.get("warnings") or []),
path=path_val,
relationships=rels,
is_temp=bool(value.get("is_temp", False)),
action=value.get("action"),
parent_hash=value.get("parent_hash") or value.get("parent_id"),
extra=extra,
)
# Debug: Print formatted table
pipe_obj.debug_table()
return pipe_obj
# Fallback: build from path argument or bare value
hash_val = "unknown"
path_val = default_path or getattr(value, "path", None)
title_val = None
if path_val and path_val != "unknown":
try:
from helper.utils import sha256_file
from pathlib import Path
path_obj = Path(path_val)
hash_val = sha256_file(path_obj)
# Extract title from filename (without extension)
title_val = path_obj.stem
except Exception:
pass
# When coming from path argument, store should be "PATH" (file path, not a backend)
store_val = "PATH"
pipe_obj = models.PipeObject(
hash=hash_val,
store=store_val,
path=str(path_val) if path_val and path_val != "unknown" else None,
title=title_val,
tags=[],
extra={},
)
# Debug: Print formatted table
pipe_obj.debug_table()
return pipe_obj
def register_url_with_local_library(pipe_obj: models.PipeObject, config: Dict[str, Any]) -> bool:
"""Register url with a file in the local library database.
This is called automatically by download cmdlets to ensure url are persisted
without requiring a separate add-url step in the pipeline.
Args:
pipe_obj: PipeObject with path and url
config: Config dict containing local library path
Returns:
True if url were registered, False otherwise
"""
try:
from config import get_local_storage_path
from helper.folder_store import FolderDB
file_path = get_field(pipe_obj, "path")
url_field = get_field(pipe_obj, "url", [])
urls: List[str] = []
if isinstance(url_field, str):
urls = [u.strip() for u in url_field.split(",") if u.strip()]
elif isinstance(url_field, (list, tuple)):
urls = [u for u in url_field if isinstance(u, str) and u.strip()]
if not file_path or not urls:
return False
path_obj = Path(file_path)
if not path_obj.exists():
return False
storage_path = get_local_storage_path(config)
if not storage_path:
return False
with FolderDB(storage_path) as db:
file_hash = db.get_file_hash(path_obj)
if not file_hash:
return False
metadata = db.get_metadata(file_hash) or {}
existing_url = metadata.get("url") or []
# Add any new url
changed = False
for u in urls:
if u not in existing_url:
existing_url.append(u)
changed = True
if changed:
metadata["url"] = existing_url
db.save_metadata(path_obj, metadata)
return True
return True # url already existed
except Exception:
return False

File diff suppressed because it is too large Load Diff

View File

@@ -7,19 +7,19 @@ from . import register
import models
import pipeline as ctx
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash
from ._shared import Cmdlet, CmdletArg, normalize_hash, should_show_help
from helper.logger import log
CMDLET = Cmdlet(
name="add-note",
summary="Add or set a note on a Hydrus file.",
usage="add-note [-hash <sha256>] <name> <text>",
args=[
arg=[
CmdletArg("hash", type="string", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
CmdletArg("name", type="string", required=True, description="The note name/key to set (e.g. 'comment', 'source', etc.)."),
CmdletArg("text", type="string", required=True, description="The note text/content to store.", variadic=True),
],
details=[
detail=[
"- Notes are stored in the 'my notes' service by default.",
],
)
@@ -28,12 +28,9 @@ CMDLET = Cmdlet(
@register(["add-note", "set-note", "add_note"]) # aliases
def add(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
from ._shared import parse_cmdlet_args
parsed = parse_cmdlet_args(args, CMDLET)

View File

@@ -14,20 +14,20 @@ from . import register
import models
import pipeline as ctx
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, parse_cmdlet_args, normalize_result_input
from helper.local_library import read_sidecar, find_sidecar
from ._shared import Cmdlet, CmdletArg, parse_cmdlet_args, normalize_result_input, should_show_help, get_field
from helper.folder_store import read_sidecar, find_sidecar
CMDLET = Cmdlet(
name="add-relationship",
summary="Associate file relationships (king/alt/related) in Hydrus based on relationship tags in sidecar.",
usage="@1-3 | add-relationship -king @4 OR add-relationship -path <file> OR @1,@2,@3 | add-relationship",
args=[
arg=[
CmdletArg("path", type="string", description="Specify the local file path (if not piping a result)."),
CmdletArg("-king", type="string", description="Explicitly set the king hash/file for relationships (e.g., -king @4 or -king hash)"),
CmdletArg("-type", type="string", description="Relationship type for piped items (default: 'alt', options: 'king', 'alt', 'related')"),
],
details=[
detail=[
"- Mode 1: Pipe multiple items, first becomes king, rest become alts (default)",
"- Mode 2: Use -king to explicitly set which item/hash is the king: @1-3 | add-relationship -king @4",
"- Mode 3: Read relationships from sidecar (format: 'relationship: hash(king)<HASH>,hash(alt)<HASH>...')",
@@ -108,13 +108,11 @@ def _resolve_king_reference(king_arg: str) -> Optional[str]:
item = items[index]
# Try to extract hash from the item (could be dict or object)
item_hash = None
if isinstance(item, dict):
# Dictionary: try common hash field names
item_hash = item.get('hash_hex') or item.get('hash') or item.get('file_hash')
else:
# Object: use getattr
item_hash = getattr(item, 'hash_hex', None) or getattr(item, 'hash', None)
item_hash = (
get_field(item, 'hash_hex')
or get_field(item, 'hash')
or get_field(item, 'file_hash')
)
if item_hash:
normalized = _normalise_hash_hex(item_hash)
@@ -122,13 +120,11 @@ def _resolve_king_reference(king_arg: str) -> Optional[str]:
return normalized
# If no hash, try to get file path (for local storage)
file_path = None
if isinstance(item, dict):
# Dictionary: try common path field names
file_path = item.get('file_path') or item.get('path') or item.get('target')
else:
# Object: use getattr
file_path = getattr(item, 'file_path', None) or getattr(item, 'path', None) or getattr(item, 'target', None)
file_path = (
get_field(item, 'file_path')
or get_field(item, 'path')
or get_field(item, 'target')
)
if file_path:
return str(file_path)
@@ -199,12 +195,9 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
Returns 0 on success, non-zero on failure.
"""
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in _args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(_args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
# Parse arguments using CMDLET spec
parsed = parse_cmdlet_args(_args, CMDLET)
@@ -235,7 +228,7 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
items_to_process = [{"file_path": arg_path}]
# Import local storage utilities
from helper.local_library import LocalLibrarySearchOptimizer
from helper.folder_store import LocalLibrarySearchOptimizer
from config import get_local_storage_path
local_storage_path = get_local_storage_path(config) if config else None

567
cmdlets/add_tag.py Normal file
View File

@@ -0,0 +1,567 @@
from __future__ import annotations
from typing import Any, Dict, List, Sequence, Optional
from pathlib import Path
import sys
from helper.logger import log
import models
import pipeline as ctx
from ._shared import normalize_result_input, filter_results_by_temp
from helper import hydrus as hydrus_wrapper
from helper.folder_store import write_sidecar, FolderDB
from ._shared import Cmdlet, CmdletArg, SharedArgs, normalize_hash, parse_tag_arguments, expand_tag_groups, parse_cmdlet_args, collapse_namespace_tags, should_show_help, get_field
from config import get_local_storage_path
class Add_Tag(Cmdlet):
"""Class-based add-tag cmdlet with Cmdlet metadata inheritance."""
def __init__(self) -> None:
super().__init__(
name="add-tag",
summary="Add a tag to a Hydrus file or write it to a local .tags sidecar.",
usage="add-tag [-hash <sha256>] [-store <backend>] [-duplicate <format>] [-list <list>[,<list>...]] [--all] <tag>[,<tag>...]",
arg=[
SharedArgs.HASH,
SharedArgs.STORE,
CmdletArg("-duplicate", type="string", description="Copy existing tag values to new namespaces. Formats: title:album,artist (explicit) or title,album,artist (inferred)"),
CmdletArg("-list", type="string", description="Load predefined tag lists from adjective.json. Comma-separated list names (e.g., -list philosophy,occult)."),
CmdletArg("--all", type="flag", description="Include temporary files in tagging (by default, only tags non-temporary files)."),
CmdletArg("tags", type="string", required=False, description="One or more tags to add. Comma- or space-separated. Can also use {list_name} syntax. If omitted, uses tags from pipeline payload.", variadic=True),
],
detail=[
"- By default, only tags non-temporary files (from pipelines). Use --all to tag everything.",
"- Without -hash and when the selection is a local file, tags are written to <file>.tags.",
"- With a Hydrus hash, tags are sent to the 'my tags' service.",
"- Multiple tags can be comma-separated or space-separated.",
"- Use -list to include predefined tag lists from adjective.json: -list philosophy,occult",
"- Tags can also reference lists with curly braces: add-tag {philosophy} \"other:tag\"",
"- Use -duplicate to copy EXISTING tag values to new namespaces:",
" Explicit format: -duplicate title:album,artist (copies title: to album: and artist:)",
" Inferred format: -duplicate title,album,artist (first is source, rest are targets)",
"- The source namespace must already exist in the file being tagged.",
"- Target namespaces that already have a value are skipped (not overwritten).",
"- You can also pass the target hash as a tag token: hash:<sha256>. This overrides -hash and is removed from the tag list.",
],
exec=self.run,
)
self.register()
@staticmethod
def _extract_title_tag(tags: List[str]) -> Optional[str]:
"""Return the value of the first title: tag if present."""
for tag in tags:
if isinstance(tag, str) and tag.lower().startswith("title:"):
value = tag.split(":", 1)[1].strip()
if value:
return value
return None
@staticmethod
def _apply_title_to_result(res: Any, title_value: Optional[str]) -> None:
"""Update result object/dict title fields and columns in-place."""
if not title_value:
return
if isinstance(res, models.PipeObject):
res.title = title_value
if hasattr(res, "columns") and isinstance(res.columns, list) and res.columns:
label, *_ = res.columns[0]
if str(label).lower() == "title":
res.columns[0] = (res.columns[0][0], title_value)
elif isinstance(res, dict):
res["title"] = title_value
cols = res.get("columns")
if isinstance(cols, list):
updated = []
changed = False
for col in cols:
if isinstance(col, tuple) and len(col) == 2:
label, val = col
if str(label).lower() == "title":
updated.append((label, title_value))
changed = True
else:
updated.append(col)
else:
updated.append(col)
if changed:
res["columns"] = updated
@staticmethod
def _matches_target(item: Any, hydrus_hash: Optional[str], file_hash: Optional[str], file_path: Optional[str]) -> bool:
"""Determine whether a result item refers to the given hash/path target."""
hydrus_hash_l = hydrus_hash.lower() if hydrus_hash else None
file_hash_l = file_hash.lower() if file_hash else None
file_path_l = file_path.lower() if file_path else None
def norm(val: Any) -> Optional[str]:
return str(val).lower() if val is not None else None
hash_fields = ["hydrus_hash", "hash", "hash_hex", "file_hash"]
path_fields = ["path", "file_path", "target"]
if isinstance(item, dict):
hashes = [norm(item.get(field)) for field in hash_fields]
paths = [norm(item.get(field)) for field in path_fields]
else:
hashes = [norm(get_field(item, field)) for field in hash_fields]
paths = [norm(get_field(item, field)) for field in path_fields]
if hydrus_hash_l and hydrus_hash_l in hashes:
return True
if file_hash_l and file_hash_l in hashes:
return True
if file_path_l and file_path_l in paths:
return True
return False
@staticmethod
def _update_item_title_fields(item: Any, new_title: str) -> None:
"""Mutate an item to reflect a new title in plain fields and columns."""
if isinstance(item, models.PipeObject):
item.title = new_title
if hasattr(item, "columns") and isinstance(item.columns, list) and item.columns:
label, *_ = item.columns[0]
if str(label).lower() == "title":
item.columns[0] = (label, new_title)
elif isinstance(item, dict):
item["title"] = new_title
cols = item.get("columns")
if isinstance(cols, list):
updated_cols = []
changed = False
for col in cols:
if isinstance(col, tuple) and len(col) == 2:
label, val = col
if str(label).lower() == "title":
updated_cols.append((label, new_title))
changed = True
else:
updated_cols.append(col)
else:
updated_cols.append(col)
if changed:
item["columns"] = updated_cols
def _refresh_result_table_title(self, new_title: str, hydrus_hash: Optional[str], file_hash: Optional[str], file_path: Optional[str]) -> None:
"""Refresh the cached result table with an updated title and redisplay it."""
try:
last_table = ctx.get_last_result_table()
items = ctx.get_last_result_items()
if not last_table or not items:
return
updated_items = []
match_found = False
for item in items:
try:
if self._matches_target(item, hydrus_hash, file_hash, file_path):
self._update_item_title_fields(item, new_title)
match_found = True
except Exception:
pass
updated_items.append(item)
if not match_found:
return
from result_table import ResultTable # Local import to avoid circular dependency
new_table = last_table.copy_with_title(getattr(last_table, "title", ""))
for item in updated_items:
new_table.add_result(item)
ctx.set_last_result_table_overlay(new_table, updated_items)
except Exception:
pass
def _refresh_tags_view(self, res: Any, hydrus_hash: Optional[str], file_hash: Optional[str], file_path: Optional[str], config: Dict[str, Any]) -> None:
"""Refresh tag display via get-tag. Prefer current subject; fall back to direct hash refresh."""
try:
from cmdlets import get_tag as get_tag_cmd # type: ignore
except Exception:
return
target_hash = hydrus_hash or file_hash
refresh_args: List[str] = []
if target_hash:
refresh_args = ["-hash", target_hash, "-store", target_hash]
try:
subject = ctx.get_last_result_subject()
if subject and self._matches_target(subject, hydrus_hash, file_hash, file_path):
get_tag_cmd._run(subject, refresh_args, config)
return
except Exception:
pass
if target_hash:
try:
get_tag_cmd._run(res, refresh_args, config)
except Exception:
pass
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Add a tag to a file with smart filtering for pipeline results."""
if should_show_help(args):
log(f"Cmdlet: {self.name}\nSummary: {self.summary}\nUsage: {self.usage}")
return 0
parsed = parse_cmdlet_args(args, self)
# Check for --all flag
include_temp = parsed.get("all", False)
# Get explicit -hash and -store overrides from CLI
hash_override = normalize_hash(parsed.get("hash"))
store_override = parsed.get("store") or parsed.get("storage")
# Normalize input to list
results = normalize_result_input(result)
# If no piped results but we have -hash flag, create a minimal synthetic result
if not results and hash_override:
results = [{"hash": hash_override, "is_temp": False}]
if store_override:
results[0]["store"] = store_override
# Filter by temp status (unless --all is set)
if not include_temp:
results = filter_results_by_temp(results, include_temp=False)
if not results:
log("No valid files to tag (all results were temporary; use --all to include temporary files)", file=sys.stderr)
return 1
# Get tags from arguments (or fallback to pipeline payload)
raw_tags = parsed.get("tags", [])
if isinstance(raw_tags, str):
raw_tags = [raw_tags]
# Fallback: if no tags provided explicitly, try to pull from first result payload
if not raw_tags and results:
first = results[0]
payload_tags = None
# Try multiple tag lookup strategies in order
tag_lookups = [
lambda x: x.extra.get("tags") if isinstance(x, models.PipeObject) and isinstance(x.extra, dict) else None,
lambda x: x.get("tags") if isinstance(x, dict) else None,
lambda x: x.get("extra", {}).get("tags") if isinstance(x, dict) and isinstance(x.get("extra"), dict) else None,
lambda x: getattr(x, "tags", None),
]
for lookup in tag_lookups:
try:
payload_tags = lookup(first)
if payload_tags:
break
except (AttributeError, TypeError, KeyError):
continue
if payload_tags:
if isinstance(payload_tags, str):
raw_tags = [payload_tags]
elif isinstance(payload_tags, list):
raw_tags = payload_tags
# Handle -list argument (convert to {list} syntax)
list_arg = parsed.get("list")
if list_arg:
for l in list_arg.split(','):
l = l.strip()
if l:
raw_tags.append(f"{{{l}}}")
# Parse and expand tags
tags_to_add = parse_tag_arguments(raw_tags)
tags_to_add = expand_tag_groups(tags_to_add)
# Allow hash override via namespaced token (e.g., "hash:abcdef...")
extracted_hash = None
filtered_tags: List[str] = []
for tag in tags_to_add:
if isinstance(tag, str) and tag.lower().startswith("hash:"):
_, _, hash_val = tag.partition(":")
if hash_val:
extracted_hash = normalize_hash(hash_val.strip())
continue
filtered_tags.append(tag)
tags_to_add = filtered_tags
if not tags_to_add:
log("No tags provided to add", file=sys.stderr)
return 1
def _find_library_root(path_obj: Path) -> Optional[Path]:
candidates = []
cfg_root = get_local_storage_path(config) if config else None
if cfg_root:
try:
candidates.append(Path(cfg_root).expanduser())
except Exception:
pass
try:
for candidate in candidates:
if (candidate / "medios-macina.db").exists():
return candidate
for parent in [path_obj] + list(path_obj.parents):
if (parent / "medios-macina.db").exists():
return parent
except Exception:
pass
return None
# Get other flags
duplicate_arg = parsed.get("duplicate")
if not tags_to_add and not duplicate_arg:
# Write sidecar files with the tags that are already in the result dicts
sidecar_count = 0
for res in results:
# Handle both dict and PipeObject formats
file_path = None
tags = []
file_hash = ""
# Use canonical field access with get_field for both dict and objects
file_path = get_field(res, "path")
# Try tags from top-level 'tags' or from 'extra.tags'
tags = get_field(res, "tags") or (get_field(res, "extra") or {}).get("tags", [])
file_hash = get_field(res, "hash") or get_field(res, "file_hash") or get_field(res, "hash_hex") or ""
if not file_path:
log(f"[add_tag] Warning: Result has no path, skipping", file=sys.stderr)
ctx.emit(res)
continue
if tags:
# Write sidecar file for this file with its tags
try:
sidecar_path = write_sidecar(Path(file_path), tags, [], file_hash)
log(f"[add_tag] Wrote {len(tags)} tag(s) to sidecar: {sidecar_path}", file=sys.stderr)
sidecar_count += 1
except Exception as e:
log(f"[add_tag] Warning: Failed to write sidecar for {file_path}: {e}", file=sys.stderr)
ctx.emit(res)
if sidecar_count > 0:
log(f"[add_tag] Wrote {sidecar_count} sidecar file(s) with embedded tags", file=sys.stderr)
else:
log(f"[add_tag] No tags to write - passed {len(results)} result(s) through unchanged", file=sys.stderr)
return 0
# Main loop: process results with tags to add
total_new_tags = 0
total_modified = 0
for res in results:
# Extract file info from result
file_path = None
existing_tags = []
file_hash = ""
storage_source = None
# Use canonical getters for fields from both dicts and PipeObject
file_path = get_field(res, "path")
existing_tags = get_field(res, "tags") or []
if not existing_tags:
existing_tags = (get_field(res, "extra", {}) or {}).get("tags") or []
file_hash = get_field(res, "hash") or get_field(res, "file_hash") or get_field(res, "hash_hex") or ""
storage_source = get_field(res, "store") or get_field(res, "storage") or get_field(res, "storage_source") or get_field(res, "origin")
hydrus_hash = get_field(res, "hydrus_hash") or file_hash
# Infer storage source from result if not found
if not storage_source:
if file_path:
storage_source = 'local'
elif file_hash and file_hash != "unknown":
storage_source = 'hydrus'
original_tags_lower = {str(t).lower() for t in existing_tags if isinstance(t, str)}
original_title = self._extract_title_tag(list(existing_tags))
# Apply CLI overrides if provided
if hash_override and not file_hash:
file_hash = hash_override
if store_override and not storage_source:
storage_source = store_override
# Check if we have sufficient identifier (file_path OR file_hash)
if not file_path and not file_hash:
log(f"[add_tag] Warning: Result has neither path nor hash available, skipping", file=sys.stderr)
ctx.emit(res)
continue
# Handle -duplicate logic (copy existing tags to new namespaces)
if duplicate_arg:
# Parse duplicate format: source:target1,target2 or source,target1,target2
parts = duplicate_arg.split(':')
source_ns = ""
targets = []
if len(parts) > 1:
# Explicit format: source:target1,target2
source_ns = parts[0]
targets = parts[1].split(',')
else:
# Inferred format: source,target1,target2
parts = duplicate_arg.split(',')
if len(parts) > 1:
source_ns = parts[0]
targets = parts[1:]
if source_ns and targets:
# Find tags in source namespace
source_tags = [t for t in existing_tags if t.startswith(source_ns + ':')]
for t in source_tags:
value = t.split(':', 1)[1]
for target_ns in targets:
new_tag = f"{target_ns}:{value}"
if new_tag not in existing_tags and new_tag not in tags_to_add:
tags_to_add.append(new_tag)
# Initialize tag mutation tracking local variables
removed_tags = []
new_tags_added = []
final_tags = list(existing_tags) if existing_tags else []
# Determine where to add tags: Hydrus or Folder storage
if storage_source and storage_source.lower() == 'hydrus':
# Add tags to Hydrus using the API
target_hash = file_hash
if target_hash:
try:
hydrus_client = hydrus_wrapper.get_client(config)
service_name = hydrus_wrapper.get_tag_service_name(config)
# For namespaced tags, remove old tags in same namespace
removed_tags = []
for new_tag in tags_to_add:
if ':' in new_tag:
namespace = new_tag.split(':', 1)[0]
to_remove = [t for t in existing_tags if t.startswith(namespace + ':') and t.lower() != new_tag.lower()]
removed_tags.extend(to_remove)
# Add new tags
if tags_to_add:
log(f"[add_tag] Adding {len(tags_to_add)} tag(s) to Hydrus file: {target_hash}", file=sys.stderr)
hydrus_client.add_tags(target_hash, tags_to_add, service_name)
# Delete replaced namespace tags
if removed_tags:
unique_removed = sorted(set(removed_tags))
hydrus_client.delete_tags(target_hash, unique_removed, service_name)
if tags_to_add or removed_tags:
total_new_tags += len(tags_to_add)
total_modified += 1
log(f"[add_tag] ✓ Added {len(tags_to_add)} tag(s) to Hydrus", file=sys.stderr)
# Refresh final tag list from the backend for accurate display
try:
from helper.store import FileStorage
storage = FileStorage(config)
if storage and storage_source in storage.list_backends():
backend = storage[storage_source]
refreshed_tags, _ = backend.get_tag(target_hash)
if refreshed_tags is not None:
final_tags = refreshed_tags
new_tags_added = [t for t in refreshed_tags if t.lower() not in original_tags_lower]
# Update result tags for downstream cmdlets/UI
if isinstance(res, models.PipeObject):
res.tags = refreshed_tags
if isinstance(res.extra, dict):
res.extra['tags'] = refreshed_tags
elif isinstance(res, dict):
res['tags'] = refreshed_tags
except Exception:
# Ignore failures - this is best-effort for refreshing tag state
pass
except Exception as e:
log(f"[add_tag] Warning: Failed to add tags to Hydrus: {e}", file=sys.stderr)
else:
log(f"[add_tag] Warning: No hash available for Hydrus file, skipping", file=sys.stderr)
elif storage_source:
# For any Folder-based storage (local, test, default, etc.), delegate to backend
# If storage_source is not a registered backend, fallback to writing a sidecar
from helper.store import FileStorage
storage = FileStorage(config)
try:
if storage and storage_source in storage.list_backends():
backend = storage[storage_source]
if file_hash and backend.add_tag(file_hash, tags_to_add):
# Refresh tags from backend to get merged result
refreshed_tags, _ = backend.get_tag(file_hash)
if refreshed_tags:
# Update result tags
if isinstance(res, models.PipeObject):
res.tags = refreshed_tags
# Also keep as extra for compatibility
if isinstance(res.extra, dict):
res.extra['tags'] = refreshed_tags
elif isinstance(res, dict):
res['tags'] = refreshed_tags
# Update title if changed
title_value = self._extract_title_tag(refreshed_tags)
self._apply_title_to_result(res, title_value)
# Compute stats
new_tags_added = [t for t in refreshed_tags if t.lower() not in original_tags_lower]
total_new_tags += len(new_tags_added)
if new_tags_added:
total_modified += 1
log(f"[add_tag] Added {len(new_tags_added)} new tag(s); {len(refreshed_tags)} total tag(s) stored in {storage_source}", file=sys.stderr)
final_tags = refreshed_tags
else:
log(f"[add_tag] Warning: Failed to add tags to {storage_source}", file=sys.stderr)
else:
# Not a registered backend - fallback to sidecar if we have a path
if file_path:
try:
sidecar_path = write_sidecar(Path(file_path), tags_to_add, [], file_hash)
log(f"[add_tag] Wrote {len(tags_to_add)} tag(s) to sidecar: {sidecar_path}", file=sys.stderr)
total_new_tags += len(tags_to_add)
total_modified += 1
# Update res tags
if isinstance(res, models.PipeObject):
res.tags = (res.tags or []) + tags_to_add
if isinstance(res.extra, dict):
res.extra['tags'] = res.tags
elif isinstance(res, dict):
res['tags'] = list(set((res.get('tags') or []) + tags_to_add))
except Exception as exc:
log(f"[add_tag] Warning: Failed to write sidecar for {file_path}: {exc}", file=sys.stderr)
else:
log(f"[add_tag] Warning: Storage backend '{storage_source}' not found in config", file=sys.stderr)
except KeyError:
# storage[storage_source] raised KeyError - treat as absent backend
if file_path:
try:
sidecar_path = write_sidecar(Path(file_path), tags_to_add, [], file_hash)
log(f"[add_tag] Wrote {len(tags_to_add)} tag(s) to sidecar: {sidecar_path}", file=sys.stderr)
total_new_tags += len(tags_to_add)
total_modified += 1
# Update res tags for downstream
if isinstance(res, models.PipeObject):
res.tags = (res.tags or []) + tags_to_add
if isinstance(res.extra, dict):
res.extra['tags'] = res.tags
elif isinstance(res, dict):
res['tags'] = list(set((res.get('tags') or []) + tags_to_add))
except Exception as exc:
log(f"[add_tag] Warning: Failed to write sidecar for {file_path}: {exc}", file=sys.stderr)
else:
log(f"[add_tag] Warning: Storage backend '{storage_source}' not found in config", file=sys.stderr)
else:
# For other storage types or unknown sources, avoid writing sidecars to reduce clutter
# (local/hydrus are handled above).
ctx.emit(res)
continue
# If title changed, refresh the cached result table so the display reflects the new name
final_title = self._extract_title_tag(final_tags)
if final_title and (not original_title or final_title.lower() != original_title.lower()):
self._refresh_result_table_title(final_title, hydrus_hash or file_hash, file_hash, file_path)
# If tags changed, refresh tag view via get-tag (prefer current subject; fall back to hash refresh)
if new_tags_added or removed_tags:
self._refresh_tags_view(res, hydrus_hash, file_hash, file_path, config)
# Emit the modified result
ctx.emit(res)
log(f"[add_tag] Added {total_new_tags} new tag(s) across {len(results)} item(s); modified {total_modified} item(s)", file=sys.stderr)
return 0
CMDLET = Add_Tag()

View File

@@ -1,20 +1,18 @@
from __future__ import annotations
from typing import Any, Dict, List, Sequence, Optional
import json
from pathlib import Path
import sys
from helper.logger import log
from . import register
import models
import pipeline as ctx
from ._shared import normalize_result_input, filter_results_by_temp
from helper import hydrus as hydrus_wrapper
from helper.local_library import read_sidecar, write_sidecar, find_sidecar, has_sidecar, LocalLibraryDB
from helper.folder_store import read_sidecar, write_sidecar, find_sidecar, has_sidecar, FolderDB
from metadata import rename
from ._shared import Cmdlet, CmdletArg, normalize_hash, parse_tag_arguments, expand_tag_groups, parse_cmdlet_args, collapse_namespace_tags
from ._shared import Cmdlet, CmdletArg, SharedArgs, normalize_hash, parse_tag_arguments, expand_tag_groups, parse_cmdlet_args, collapse_namespace_tags, should_show_help, get_field
from config import get_local_storage_path
@@ -68,29 +66,16 @@ def _matches_target(item: Any, hydrus_hash: Optional[str], file_hash: Optional[s
def norm(val: Any) -> Optional[str]:
return str(val).lower() if val is not None else None
# Define field names to check for hashes and paths
hash_fields = ["hydrus_hash", "hash", "hash_hex", "file_hash"]
path_fields = ["path", "file_path", "target"]
if isinstance(item, dict):
hashes = [
norm(item.get("hydrus_hash")),
norm(item.get("hash")),
norm(item.get("hash_hex")),
norm(item.get("file_hash")),
]
paths = [
norm(item.get("path")),
norm(item.get("file_path")),
norm(item.get("target")),
]
hashes = [norm(item.get(field)) for field in hash_fields]
paths = [norm(item.get(field)) for field in path_fields]
else:
hashes = [
norm(getattr(item, "hydrus_hash", None)),
norm(getattr(item, "hash_hex", None)),
norm(getattr(item, "file_hash", None)),
]
paths = [
norm(getattr(item, "path", None)),
norm(getattr(item, "file_path", None)),
norm(getattr(item, "target", None)),
]
hashes = [norm(get_field(item, field)) for field in hash_fields]
paths = [norm(get_field(item, field)) for field in path_fields]
if hydrus_hash_l and hydrus_hash_l in hashes:
return True
@@ -147,20 +132,18 @@ def _refresh_result_table_title(new_title: str, hydrus_hash: Optional[str], file
except Exception:
pass
updated_items.append(item)
if not match_found:
return
from result_table import ResultTable # Local import to avoid circular dependency
new_table = ResultTable(getattr(last_table, "title", ""), title_width=getattr(last_table, "title_width", 80), max_columns=getattr(last_table, "max_columns", None))
if getattr(last_table, "source_command", None):
new_table.set_source_command(last_table.source_command, getattr(last_table, "source_args", []))
new_table = last_table.copy_with_title(getattr(last_table, "title", ""))
for item in updated_items:
new_table.add_result(item)
ctx.set_last_result_table_preserve_history(new_table, updated_items)
# Keep the underlying history intact; update only the overlay so @.. can
# clear the overlay then continue back to prior tables (e.g., the search list).
ctx.set_last_result_table_overlay(new_table, updated_items)
except Exception:
pass
@@ -194,347 +177,409 @@ def _refresh_tags_view(res: Any, hydrus_hash: Optional[str], file_hash: Optional
class Add_Tag(Cmdlet):
"""Class-based add-tags cmdlet with Cmdlet metadata inheritance."""
@register(["add-tag", "add-tags"])
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Add tags to a file with smart filtering for pipeline results."""
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
def __init__(self) -> None:
super().__init__(
name="add-tags",
summary="Add tags to a Hydrus file or write them to a local .tags sidecar.",
usage="add-tags [-hash <sha256>] [-duplicate <format>] [-list <list>[,<list>...]] [--all] <tag>[,<tag>...]",
arg=[
SharedArgs.HASH,
CmdletArg("-duplicate", type="string", description="Copy existing tag values to new namespaces. Formats: title:album,artist (explicit) or title,album,artist (inferred)"),
CmdletArg("-list", type="string", description="Load predefined tag lists from adjective.json. Comma-separated list names (e.g., -list philosophy,occult)."),
CmdletArg("--all", type="flag", description="Include temporary files in tagging (by default, only tags non-temporary files)."),
CmdletArg("tags", type="string", required=False, description="One or more tags to add. Comma- or space-separated. Can also use {list_name} syntax. If omitted, uses tags from pipeline payload.", variadic=True),
],
detail=[
"- By default, only tags non-temporary files (from pipelines). Use --all to tag everything.",
"- Without -hash and when the selection is a local file, tags are written to <file>.tags.",
"- With a Hydrus hash, tags are sent to the 'my tags' service.",
"- Multiple tags can be comma-separated or space-separated.",
"- Use -list to include predefined tag lists from adjective.json: -list philosophy,occult",
"- Tags can also reference lists with curly braces: add-tag {philosophy} \"other:tag\"",
"- Use -duplicate to copy EXISTING tag values to new namespaces:",
" Explicit format: -duplicate title:album,artist (copies title: to album: and artist:)",
" Inferred format: -duplicate title,album,artist (first is source, rest are targets)",
"- The source namespace must already exist in the file being tagged.",
"- Target namespaces that already have a value are skipped (not overwritten).",
"- You can also pass the target hash as a tag token: hash:<sha256>. This overrides -hash and is removed from the tag list.",
],
exec=self.run,
)
self.register()
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Add tags to a file with smart filtering for pipeline results."""
if should_show_help(args):
log(f"Cmdlet: {self.name}\nSummary: {self.summary}\nUsage: {self.usage}")
return 0
except Exception:
pass
# Parse arguments
parsed = parse_cmdlet_args(args, CMDLET)
# Check for --all flag
include_temp = parsed.get("all", False)
# Normalize input to list
results = normalize_result_input(result)
# Filter by temp status (unless --all is set)
if not include_temp:
results = filter_results_by_temp(results, include_temp=False)
if not results:
log("No valid files to tag (all results were temporary; use --all to include temporary files)", file=sys.stderr)
return 1
# Get tags from arguments (or fallback to pipeline payload)
raw_tags = parsed.get("tags", [])
if isinstance(raw_tags, str):
raw_tags = [raw_tags]
# Fallback: if no tags provided explicitly, try to pull from first result payload
if not raw_tags and results:
first = results[0]
payload_tags = None
if isinstance(first, models.PipeObject):
payload_tags = first.extra.get("tags") if isinstance(first.extra, dict) else None
elif isinstance(first, dict):
payload_tags = first.get("tags")
if not payload_tags:
payload_tags = first.get("extra", {}).get("tags") if isinstance(first.get("extra"), dict) else None
# If metadata payload stored tags under nested list, accept directly
if payload_tags is None:
payload_tags = getattr(first, "tags", None)
if payload_tags:
if isinstance(payload_tags, str):
raw_tags = [payload_tags]
elif isinstance(payload_tags, list):
raw_tags = payload_tags
# Handle -list argument (convert to {list} syntax)
list_arg = parsed.get("list")
if list_arg:
for l in list_arg.split(','):
l = l.strip()
if l:
raw_tags.append(f"{{{l}}}")
# Parse and expand tags
tags_to_add = parse_tag_arguments(raw_tags)
tags_to_add = expand_tag_groups(tags_to_add)
if not tags_to_add:
log("No tags provided to add", file=sys.stderr)
return 1
# Get other flags
hash_override = normalize_hash(parsed.get("hash"))
duplicate_arg = parsed.get("duplicate")
# If no tags provided (and no list), write sidecar files with embedded tags
# Note: Since 'tags' is required=True in CMDLET, this block might be unreachable via CLI
# unless called programmatically or if required check is bypassed.
if not tags_to_add and not duplicate_arg:
# Write sidecar files with the tags that are already in the result dicts
# Parse arguments
parsed = parse_cmdlet_args(args, self)
# Check for --all flag
include_temp = parsed.get("all", False)
# Normalize input to list
results = normalize_result_input(result)
# Filter by temp status (unless --all is set)
if not include_temp:
results = filter_results_by_temp(results, include_temp=False)
if not results:
log("No valid files to tag (all results were temporary; use --all to include temporary files)", file=sys.stderr)
return 1
# Get tags from arguments (or fallback to pipeline payload)
raw_tags = parsed.get("tags", [])
if isinstance(raw_tags, str):
raw_tags = [raw_tags]
# Fallback: if no tags provided explicitly, try to pull from first result payload
if not raw_tags and results:
first = results[0]
payload_tags = None
# Try multiple tag lookup strategies in order
tag_lookups = [
lambda x: x.extra.get("tags") if isinstance(x, models.PipeObject) and isinstance(x.extra, dict) else None,
lambda x: x.get("tags") if isinstance(x, dict) else None,
lambda x: x.get("extra", {}).get("tags") if isinstance(x, dict) and isinstance(x.get("extra"), dict) else None,
lambda x: getattr(x, "tags", None),
]
for lookup in tag_lookups:
try:
payload_tags = lookup(first)
if payload_tags:
break
except (AttributeError, TypeError, KeyError):
continue
if payload_tags:
if isinstance(payload_tags, str):
raw_tags = [payload_tags]
elif isinstance(payload_tags, list):
raw_tags = payload_tags
# Handle -list argument (convert to {list} syntax)
list_arg = parsed.get("list")
if list_arg:
for l in list_arg.split(','):
l = l.strip()
if l:
raw_tags.append(f"{{{l}}}")
# Parse and expand tags
tags_to_add = parse_tag_arguments(raw_tags)
tags_to_add = expand_tag_groups(tags_to_add)
# Allow hash override via namespaced token (e.g., "hash:abcdef...")
extracted_hash = None
filtered_tags: List[str] = []
for tag in tags_to_add:
if isinstance(tag, str) and tag.lower().startswith("hash:"):
_, _, hash_val = tag.partition(":")
if hash_val:
extracted_hash = normalize_hash(hash_val.strip())
continue
filtered_tags.append(tag)
tags_to_add = filtered_tags
if not tags_to_add:
log("No tags provided to add", file=sys.stderr)
return 1
# Get other flags (hash override can come from -hash or hash: token)
hash_override = normalize_hash(parsed.get("hash")) or extracted_hash
duplicate_arg = parsed.get("duplicate")
# If no tags provided (and no list), write sidecar files with embedded tags
# Note: Since 'tags' is required=False in the cmdlet arg, this block can be reached via CLI
# when no tag arguments are provided.
if not tags_to_add and not duplicate_arg:
# Write sidecar files with the tags that are already in the result dicts
sidecar_count = 0
for res in results:
# Handle both dict and PipeObject formats
file_path = None
tags = []
file_hash = ""
if isinstance(res, models.PipeObject):
file_path = res.file_path
tags = res.extra.get('tags', [])
file_hash = res.hash or ""
elif isinstance(res, dict):
file_path = res.get('file_path')
# Try multiple tag locations in order
tag_sources = [lambda: res.get('tags', []), lambda: res.get('extra', {}).get('tags', [])]
for source in tag_sources:
tags = source()
if tags:
break
file_hash = res.get('hash', "")
if not file_path:
log(f"[add_tags] Warning: Result has no file_path, skipping", file=sys.stderr)
ctx.emit(res)
continue
if tags:
# Write sidecar file for this file with its tags
try:
sidecar_path = write_sidecar(Path(file_path), tags, [], file_hash)
log(f"[add_tags] Wrote {len(tags)} tag(s) to sidecar: {sidecar_path}", file=sys.stderr)
sidecar_count += 1
except Exception as e:
log(f"[add_tags] Warning: Failed to write sidecar for {file_path}: {e}", file=sys.stderr)
ctx.emit(res)
if sidecar_count > 0:
log(f"[add_tags] Wrote {sidecar_count} sidecar file(s) with embedded tags", file=sys.stderr)
else:
log(f"[add_tags] No tags to write - passed {len(results)} result(s) through unchanged", file=sys.stderr)
return 0
# Tags ARE provided - append them to each result and write sidecar files or add to Hydrus
sidecar_count = 0
total_new_tags = 0
total_modified = 0
for res in results:
# Handle both dict and PipeObject formats
file_path = None
tags = []
existing_tags = []
file_hash = ""
storage_source = None
hydrus_hash = None
# Define field name aliases to check
path_field_names = ['file_path', 'path']
source_field_names = ['storage_source', 'source', 'origin']
hash_field_names = ['hydrus_hash', 'hash', 'hash_hex']
if isinstance(res, models.PipeObject):
file_path = res.file_path
tags = res.extra.get('tags', [])
existing_tags = res.extra.get('tags', [])
file_hash = res.file_hash or ""
for field in source_field_names:
storage_source = res.extra.get(field)
if storage_source:
break
hydrus_hash = res.extra.get('hydrus_hash')
elif isinstance(res, dict):
file_path = res.get('file_path')
tags = res.get('tags', []) # Check both tags and extra['tags']
if not tags and 'extra' in res:
tags = res['extra'].get('tags', [])
# Try path field names in order
for field in path_field_names:
file_path = res.get(field)
if file_path:
break
# Try tag locations in order
tag_sources = [lambda: res.get('tags', []), lambda: res.get('extra', {}).get('tags', [])]
for source in tag_sources:
existing_tags = source()
if existing_tags:
break
file_hash = res.get('file_hash', "")
if not file_path:
log(f"[add_tags] Warning: Result has no file_path, skipping", file=sys.stderr)
# Try source field names in order (top-level then extra)
for field in source_field_names:
storage_source = res.get(field)
if storage_source:
break
if not storage_source and 'extra' in res:
for field in source_field_names:
storage_source = res.get('extra', {}).get(field)
if storage_source:
break
# Try hash field names in order (top-level then extra)
for field in hash_field_names:
hydrus_hash = res.get(field)
if hydrus_hash:
break
if not hydrus_hash and 'extra' in res:
for field in hash_field_names:
hydrus_hash = res.get('extra', {}).get(field)
if hydrus_hash:
break
if not hydrus_hash and file_hash:
hydrus_hash = file_hash
if not storage_source and hydrus_hash and not file_path:
storage_source = 'hydrus'
# If we have a file path but no storage source, assume local to avoid sidecar spam
if not storage_source and file_path:
storage_source = 'local'
else:
ctx.emit(res)
continue
if tags:
# Write sidecar file for this file with its tags
try:
sidecar_path = write_sidecar(Path(file_path), tags, [], file_hash)
log(f"[add_tags] Wrote {len(tags)} tag(s) to sidecar: {sidecar_path}", file=sys.stderr)
sidecar_count += 1
except Exception as e:
log(f"[add_tags] Warning: Failed to write sidecar for {file_path}: {e}", file=sys.stderr)
ctx.emit(res)
if sidecar_count > 0:
log(f"[add_tags] Wrote {sidecar_count} sidecar file(s) with embedded tags", file=sys.stderr)
else:
log(f"[add_tags] No tags to write - passed {len(results)} result(s) through unchanged", file=sys.stderr)
return 0
# Tags ARE provided - append them to each result and write sidecar files or add to Hydrus
sidecar_count = 0
total_new_tags = 0
total_modified = 0
for res in results:
# Handle both dict and PipeObject formats
file_path = None
existing_tags = []
file_hash = ""
storage_source = None
hydrus_hash = None
if isinstance(res, models.PipeObject):
file_path = res.file_path
existing_tags = res.extra.get('tags', [])
file_hash = res.file_hash or ""
storage_source = res.extra.get('storage_source') or res.extra.get('source')
hydrus_hash = res.extra.get('hydrus_hash')
elif isinstance(res, dict):
file_path = res.get('file_path') or res.get('path')
existing_tags = res.get('tags', [])
if not existing_tags and 'extra' in res:
existing_tags = res['extra'].get('tags', [])
file_hash = res.get('file_hash', "")
storage_source = res.get('storage_source') or res.get('source') or res.get('origin')
if not storage_source and 'extra' in res:
storage_source = res['extra'].get('storage_source') or res['extra'].get('source')
# For Hydrus results from search-file, look for hash, hash_hex, or target (all contain the hash)
hydrus_hash = res.get('hydrus_hash') or res.get('hash') or res.get('hash_hex')
if not hydrus_hash and 'extra' in res:
hydrus_hash = res['extra'].get('hydrus_hash') or res['extra'].get('hash') or res['extra'].get('hash_hex')
if not hydrus_hash and file_hash:
hydrus_hash = file_hash
if not storage_source and hydrus_hash and not file_path:
storage_source = 'hydrus'
# If we have a file path but no storage source, assume local to avoid sidecar spam
if not storage_source and file_path:
storage_source = 'local'
else:
ctx.emit(res)
continue
original_tags_lower = {str(t).lower() for t in existing_tags if isinstance(t, str)}
original_tags_snapshot = list(existing_tags)
original_title = _extract_title_tag(original_tags_snapshot)
removed_tags: List[str] = []
# Apply hash override if provided
if hash_override:
hydrus_hash = hash_override
# If we have a hash override, we treat it as a Hydrus target
storage_source = "hydrus"
if not file_path and not hydrus_hash:
log(f"[add_tags] Warning: Result has neither file_path nor hash available, skipping", file=sys.stderr)
ctx.emit(res)
continue
# Handle -duplicate logic (copy existing tags to new namespaces)
if duplicate_arg:
# Parse duplicate format: source:target1,target2 or source,target1,target2
parts = duplicate_arg.split(':')
source_ns = ""
targets = []
if len(parts) > 1:
# Explicit format: source:target1,target2
source_ns = parts[0]
targets = parts[1].split(',')
else:
# Inferred format: source,target1,target2
parts = duplicate_arg.split(',')
original_tags_lower = {str(t).lower() for t in existing_tags if isinstance(t, str)}
original_tags_snapshot = list(existing_tags)
original_title = _extract_title_tag(original_tags_snapshot)
removed_tags: List[str] = []
# Apply hash override if provided
if hash_override:
hydrus_hash = hash_override
# If we have a hash override, we treat it as a Hydrus target
storage_source = "hydrus"
if not file_path and not hydrus_hash:
log(f"[add_tags] Warning: Result has neither file_path nor hash available, skipping", file=sys.stderr)
ctx.emit(res)
continue
# Handle -duplicate logic (copy existing tags to new namespaces)
if duplicate_arg:
# Parse duplicate format: source:target1,target2 or source,target1,target2
parts = duplicate_arg.split(':')
source_ns = ""
targets = []
if len(parts) > 1:
# Explicit format: source:target1,target2
source_ns = parts[0]
targets = parts[1:]
if source_ns and targets:
# Find tags in source namespace
source_tags = [t for t in existing_tags if t.startswith(source_ns + ':')]
for t in source_tags:
value = t.split(':', 1)[1]
for target_ns in targets:
new_tag = f"{target_ns}:{value}"
if new_tag not in existing_tags and new_tag not in tags_to_add:
tags_to_add.append(new_tag)
# Merge new tags with existing tags, handling namespace overwrites
# When adding a tag like "namespace:value", remove any existing "namespace:*" tags
for new_tag in tags_to_add:
# Check if this is a namespaced tag (format: "namespace:value")
if ':' in new_tag:
namespace = new_tag.split(':', 1)[0]
# Track removals for Hydrus: delete old tags in same namespace (except identical)
to_remove = [t for t in existing_tags if t.startswith(namespace + ':') and t.lower() != new_tag.lower()]
removed_tags.extend(to_remove)
# Remove any existing tags with the same namespace
existing_tags = [t for t in existing_tags if not (t.startswith(namespace + ':'))]
# Add the new tag if not already present
if new_tag not in existing_tags:
existing_tags.append(new_tag)
# Ensure only one tag per namespace (e.g., single title:) with latest preferred
existing_tags = collapse_namespace_tags(existing_tags, "title", prefer="last")
# Compute new tags relative to original
new_tags_added = [t for t in existing_tags if isinstance(t, str) and t.lower() not in original_tags_lower]
total_new_tags += len(new_tags_added)
# Update the result's tags
if isinstance(res, models.PipeObject):
res.extra['tags'] = existing_tags
elif isinstance(res, dict):
res['tags'] = existing_tags
# If a title: tag was added, update the in-memory title and columns so downstream display reflects it immediately
title_value = _extract_title_tag(existing_tags)
_apply_title_to_result(res, title_value)
final_tags = existing_tags
# Determine where to add tags: Hydrus, local DB, or sidecar
if storage_source and storage_source.lower() == 'hydrus':
# Add tags to Hydrus using the API
target_hash = hydrus_hash or file_hash
if target_hash:
try:
tags_to_send = [t for t in existing_tags if isinstance(t, str) and t.lower() not in original_tags_lower]
hydrus_client = hydrus_wrapper.get_client(config)
service_name = hydrus_wrapper.get_tag_service_name(config)
if tags_to_send:
log(f"[add_tags] Adding {len(tags_to_send)} new tag(s) to Hydrus file: {target_hash}", file=sys.stderr)
hydrus_client.add_tags(target_hash, tags_to_send, service_name)
else:
log(f"[add_tags] No new tags to add for Hydrus file: {target_hash}", file=sys.stderr)
# Delete old namespace tags we replaced (e.g., previous title:)
if removed_tags:
unique_removed = sorted(set(removed_tags))
hydrus_client.delete_tags(target_hash, unique_removed, service_name)
if tags_to_send:
log(f"[add_tags] ✓ Tags added to Hydrus", file=sys.stderr)
elif removed_tags:
log(f"[add_tags] ✓ Removed {len(unique_removed)} tag(s) from Hydrus", file=sys.stderr)
sidecar_count += 1
if tags_to_send or removed_tags:
total_modified += 1
except Exception as e:
log(f"[add_tags] Warning: Failed to add tags to Hydrus: {e}", file=sys.stderr)
else:
log(f"[add_tags] Warning: No hash available for Hydrus file, skipping", file=sys.stderr)
elif storage_source and storage_source.lower() == 'local':
# For local storage, save directly to DB (no sidecar needed)
if file_path:
library_root = get_local_storage_path(config)
if library_root:
try:
path_obj = Path(file_path)
with LocalLibraryDB(library_root) as db:
db.save_tags(path_obj, existing_tags)
# Reload tags to reflect DB state (preserves auto-title logic)
refreshed_tags = db.get_tags(path_obj) or existing_tags
# Recompute title from refreshed tags for accurate display
refreshed_title = _extract_title_tag(refreshed_tags)
if refreshed_title:
_apply_title_to_result(res, refreshed_title)
res_tags = refreshed_tags or existing_tags
if isinstance(res, models.PipeObject):
res.extra['tags'] = res_tags
elif isinstance(res, dict):
res['tags'] = res_tags
log(f"[add_tags] Added {len(new_tags_added)} new tag(s); {len(res_tags)} total tag(s) stored locally", file=sys.stderr)
sidecar_count += 1
if new_tags_added or removed_tags:
total_modified += 1
final_tags = res_tags
except Exception as e:
log(f"[add_tags] Warning: Failed to save tags to local DB: {e}", file=sys.stderr)
targets = parts[1].split(',')
else:
log(f"[add_tags] Warning: No library root configured for local storage, skipping", file=sys.stderr)
# Inferred format: source,target1,target2
parts = duplicate_arg.split(',')
if len(parts) > 1:
source_ns = parts[0]
targets = parts[1:]
if source_ns and targets:
# Find tags in source namespace
source_tags = [t for t in existing_tags if t.startswith(source_ns + ':')]
for t in source_tags:
value = t.split(':', 1)[1]
for target_ns in targets:
new_tag = f"{target_ns}:{value}"
if new_tag not in existing_tags and new_tag not in tags_to_add:
tags_to_add.append(new_tag)
# Merge new tags with existing tags, handling namespace overwrites
# When adding a tag like "namespace:value", remove any existing "namespace:*" tags
for new_tag in tags_to_add:
# Check if this is a namespaced tag (format: "namespace:value")
if ':' in new_tag:
namespace = new_tag.split(':', 1)[0]
# Track removals for Hydrus: delete old tags in same namespace (except identical)
to_remove = [t for t in existing_tags if t.startswith(namespace + ':') and t.lower() != new_tag.lower()]
removed_tags.extend(to_remove)
# Remove any existing tags with the same namespace
existing_tags = [t for t in existing_tags if not (t.startswith(namespace + ':'))]
# Add the new tag if not already present
if new_tag not in existing_tags:
existing_tags.append(new_tag)
# Ensure only one tag per namespace (e.g., single title:) with latest preferred
existing_tags = collapse_namespace_tags(existing_tags, "title", prefer="last")
# Compute new tags relative to original
new_tags_added = [t for t in existing_tags if isinstance(t, str) and t.lower() not in original_tags_lower]
total_new_tags += len(new_tags_added)
# Update the result's tags
if isinstance(res, models.PipeObject):
res.extra['tags'] = existing_tags
elif isinstance(res, dict):
res['tags'] = existing_tags
# If a title: tag was added, update the in-memory title and columns so downstream display reflects it immediately
title_value = _extract_title_tag(existing_tags)
_apply_title_to_result(res, title_value)
final_tags = existing_tags
# Determine where to add tags: Hydrus, local DB, or sidecar
if storage_source and storage_source.lower() == 'hydrus':
# Add tags to Hydrus using the API
target_hash = hydrus_hash or file_hash
if target_hash:
try:
tags_to_send = [t for t in existing_tags if isinstance(t, str) and t.lower() not in original_tags_lower]
hydrus_client = hydrus_wrapper.get_client(config)
service_name = hydrus_wrapper.get_tag_service_name(config)
if tags_to_send:
log(f"[add_tags] Adding {len(tags_to_send)} new tag(s) to Hydrus file: {target_hash}", file=sys.stderr)
hydrus_client.add_tags(target_hash, tags_to_send, service_name)
else:
log(f"[add_tags] No new tags to add for Hydrus file: {target_hash}", file=sys.stderr)
# Delete old namespace tags we replaced (e.g., previous title:)
if removed_tags:
unique_removed = sorted(set(removed_tags))
hydrus_client.delete_tags(target_hash, unique_removed, service_name)
if tags_to_send:
log(f"[add_tags] ✓ Tags added to Hydrus", file=sys.stderr)
elif removed_tags:
log(f"[add_tags] ✓ Removed {len(unique_removed)} tag(s) from Hydrus", file=sys.stderr)
sidecar_count += 1
if tags_to_send or removed_tags:
total_modified += 1
except Exception as e:
log(f"[add_tags] Warning: Failed to add tags to Hydrus: {e}", file=sys.stderr)
else:
log(f"[add_tags] Warning: No hash available for Hydrus file, skipping", file=sys.stderr)
elif storage_source and storage_source.lower() == 'local':
# For local storage, save directly to DB (no sidecar needed)
if file_path:
library_root = get_local_storage_path(config)
if library_root:
try:
path_obj = Path(file_path)
with FolderDB(library_root) as db:
db.save_tags(path_obj, existing_tags)
# Reload tags to reflect DB state (preserves auto-title logic)
file_hash = db.get_file_hash(path_obj)
refreshed_tags = db.get_tags(file_hash) if file_hash else existing_tags
# Recompute title from refreshed tags for accurate display
refreshed_title = _extract_title_tag(refreshed_tags)
if refreshed_title:
_apply_title_to_result(res, refreshed_title)
res_tags = refreshed_tags or existing_tags
if isinstance(res, models.PipeObject):
res.extra['tags'] = res_tags
elif isinstance(res, dict):
res['tags'] = res_tags
log(f"[add_tags] Added {len(new_tags_added)} new tag(s); {len(res_tags)} total tag(s) stored locally", file=sys.stderr)
sidecar_count += 1
if new_tags_added or removed_tags:
total_modified += 1
final_tags = res_tags
except Exception as e:
log(f"[add_tags] Warning: Failed to save tags to local DB: {e}", file=sys.stderr)
else:
log(f"[add_tags] Warning: No library root configured for local storage, skipping", file=sys.stderr)
else:
log(f"[add_tags] Warning: No file path for local storage, skipping", file=sys.stderr)
else:
log(f"[add_tags] Warning: No file path for local storage, skipping", file=sys.stderr)
else:
# For other storage types or unknown sources, avoid writing sidecars to reduce clutter
# (local/hydrus are handled above).
# For other storage types or unknown sources, avoid writing sidecars to reduce clutter
# (local/hydrus are handled above).
ctx.emit(res)
continue
# If title changed, refresh the cached result table so the display reflects the new name
final_title = _extract_title_tag(final_tags)
if final_title and (not original_title or final_title.lower() != original_title.lower()):
_refresh_result_table_title(final_title, hydrus_hash or file_hash, file_hash, file_path)
# If tags changed, refresh tag view via get-tag (prefer current subject; fall back to hash refresh)
if new_tags_added or removed_tags:
_refresh_tags_view(res, hydrus_hash, file_hash, file_path, config)
# Emit the modified result
ctx.emit(res)
continue
# If title changed, refresh the cached result table so the display reflects the new name
final_title = _extract_title_tag(final_tags)
if final_title and (not original_title or final_title.lower() != original_title.lower()):
_refresh_result_table_title(final_title, hydrus_hash or file_hash, file_hash, file_path)
log(f"[add_tags] Added {total_new_tags} new tag(s) across {len(results)} item(s); modified {total_modified} item(s)", file=sys.stderr)
return 0
# If tags changed, refresh tag view via get-tag (prefer current subject; fall back to hash refresh)
if new_tags_added or removed_tags:
_refresh_tags_view(res, hydrus_hash, file_hash, file_path, config)
# Emit the modified result
ctx.emit(res)
log(f"[add_tags] Added {total_new_tags} new tag(s) across {len(results)} item(s); modified {total_modified} item(s)", file=sys.stderr)
return 0
CMDLET = Cmdlet(
name="add-tags",
summary="Add tags to a Hydrus file or write them to a local .tags sidecar.",
usage="add-tags [-hash <sha256>] [-duplicate <format>] [-list <list>[,<list>...]] [--all] <tag>[,<tag>...]",
args=[
CmdletArg("-hash", type="string", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
CmdletArg("-duplicate", type="string", description="Copy existing tag values to new namespaces. Formats: title:album,artist (explicit) or title,album,artist (inferred)"),
CmdletArg("-list", type="string", description="Load predefined tag lists from adjective.json. Comma-separated list names (e.g., -list philosophy,occult)."),
CmdletArg("--all", type="flag", description="Include temporary files in tagging (by default, only tags non-temporary files)."),
CmdletArg("tags", type="string", required=False, description="One or more tags to add. Comma- or space-separated. Can also use {list_name} syntax. If omitted, uses tags from pipeline payload.", variadic=True),
],
details=[
"- By default, only tags non-temporary files (from pipelines). Use --all to tag everything.",
"- Without -hash and when the selection is a local file, tags are written to <file>.tags.",
"- With a Hydrus hash, tags are sent to the 'my tags' service.",
"- Multiple tags can be comma-separated or space-separated.",
"- Use -list to include predefined tag lists from adjective.json: -list philosophy,occult",
"- Tags can also reference lists with curly braces: add-tag {philosophy} \"other:tag\"",
"- Use -duplicate to copy EXISTING tag values to new namespaces:",
" Explicit format: -duplicate title:album,artist (copies title: to album: and artist:)",
" Inferred format: -duplicate title,album,artist (first is source, rest are targets)",
"- The source namespace must already exist in the file being tagged.",
"- Target namespaces that already have a value are skipped (not overwritten).",
],
)
CMDLET = Add_Tag()

View File

@@ -1,170 +1,85 @@
from __future__ import annotations
from typing import Any, Dict, Sequence
import json
import sys
from pathlib import Path
from . import register
import models
import pipeline as ctx
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash
from ._shared import Cmdlet, CmdletArg, SharedArgs, parse_cmdlet_args, get_field, normalize_hash
from helper.logger import log
from config import get_local_storage_path
from helper.local_library import LocalLibraryDB
from helper.logger import debug
CMDLET = Cmdlet(
name="add-url",
summary="Associate a URL with a file (Hydrus or Local).",
usage="add-url [-hash <sha256>] <url>",
args=[
CmdletArg("-hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
CmdletArg("url", required=True, description="The URL to associate with the file."),
],
details=[
"- Adds the URL to the file's known URL list.",
],
)
from helper.store import FileStorage
@register(["add-url", "ass-url", "associate-url", "add_url"]) # aliases
def add(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
class Add_Url(Cmdlet):
"""Add URL associations to files via hash+store."""
NAME = "add-url"
SUMMARY = "Associate a URL with a file"
USAGE = "@1 | add-url <url>"
ARGS = [
SharedArgs.HASH,
SharedArgs.STORE,
CmdletArg("url", required=True, description="URL to associate"),
]
DETAIL = [
"- Associates URL with file identified by hash+store",
"- Multiple url can be comma-separated",
]
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Add URL to file via hash+store backend."""
parsed = parse_cmdlet_args(args, self)
# Extract hash and store from result or args
file_hash = parsed.get("hash") or get_field(result, "hash")
store_name = parsed.get("store") or get_field(result, "store")
url_arg = parsed.get("url")
if not file_hash:
log("Error: No file hash provided")
return 1
if not store_name:
log("Error: No store name provided")
return 1
if not url_arg:
log("Error: No URL provided")
return 1
# Normalize hash
file_hash = normalize_hash(file_hash)
if not file_hash:
log("Error: Invalid hash format")
return 1
# Parse url (comma-separated)
url = [u.strip() for u in str(url_arg).split(',') if u.strip()]
if not url:
log("Error: No valid url provided")
return 1
# Get backend and add url
try:
storage = FileStorage(config)
backend = storage[store_name]
for url in url:
backend.add_url(file_hash, url)
ctx.emit(f"Added URL: {url}")
return 0
except Exception:
pass
from ._shared import parse_cmdlet_args
parsed = parse_cmdlet_args(args, CMDLET)
override_hash = parsed.get("hash")
url_arg = parsed.get("url")
if not url_arg:
log("Requires a URL argument")
return 1
url_arg = str(url_arg).strip()
if not url_arg:
log("Requires a non-empty URL")
return 1
# Split by comma to handle multiple URLs
urls_to_add = [u.strip() for u in url_arg.split(',') if u.strip()]
# Handle @N selection which creates a list - extract the first item
if isinstance(result, list) and len(result) > 0:
result = result[0]
# Helper to get field from both dict and object
def get_field(obj: Any, field: str, default: Any = None) -> Any:
if isinstance(obj, dict):
return obj.get(field, default)
else:
return getattr(obj, field, default)
success = False
# 1. Try Local Library
file_path = get_field(result, "file_path") or get_field(result, "path")
if file_path and not override_hash:
try:
path_obj = Path(file_path)
if path_obj.exists():
storage_path = get_local_storage_path(config)
if storage_path:
with LocalLibraryDB(storage_path) as db:
metadata = db.get_metadata(path_obj) or {}
known_urls = metadata.get("known_urls") or []
local_changed = False
for url in urls_to_add:
if url not in known_urls:
known_urls.append(url)
local_changed = True
ctx.emit(f"Associated URL with local file {path_obj.name}: {url}")
else:
ctx.emit(f"URL already exists for local file {path_obj.name}: {url}")
if local_changed:
metadata["known_urls"] = known_urls
# Ensure we have a hash if possible, but don't fail if not
if not metadata.get("hash"):
try:
from helper.utils import sha256_file
metadata["hash"] = sha256_file(path_obj)
except Exception:
pass
db.save_metadata(path_obj, metadata)
success = True
except Exception as e:
log(f"Error updating local library: {e}", file=sys.stderr)
# 2. Try Hydrus
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(get_field(result, "hash_hex", None))
if hash_hex:
try:
client = hydrus_wrapper.get_client(config)
if client:
for url in urls_to_add:
client.associate_url(hash_hex, url)
preview = hash_hex[:12] + ('' if len(hash_hex) > 12 else '')
ctx.emit(f"Associated URL with Hydrus file {preview}: {url}")
success = True
except KeyError:
log(f"Error: Storage backend '{store_name}' not configured")
return 1
except Exception as exc:
# Only log error if we didn't succeed locally either
if not success:
log(f"Hydrus add-url failed: {exc}", file=sys.stderr)
return 1
log(f"Error adding URL: {exc}", file=sys.stderr)
return 1
if success:
# If we just mutated the currently displayed item, refresh URLs via get-url
try:
from cmdlets import get_url as get_url_cmd # type: ignore
except Exception:
get_url_cmd = None
if get_url_cmd:
try:
subject = ctx.get_last_result_subject()
if subject is not None:
def norm(val: Any) -> str:
return str(val).lower()
target_hash = norm(hash_hex) if hash_hex else None
target_path = norm(file_path) if 'file_path' in locals() else None
subj_hashes = []
subj_paths = []
if isinstance(subject, dict):
subj_hashes = [norm(v) for v in [subject.get("hydrus_hash"), subject.get("hash"), subject.get("hash_hex"), subject.get("file_hash")] if v]
subj_paths = [norm(v) for v in [subject.get("file_path"), subject.get("path"), subject.get("target")] if v]
else:
subj_hashes = [norm(getattr(subject, f, None)) for f in ("hydrus_hash", "hash", "hash_hex", "file_hash") if getattr(subject, f, None)]
subj_paths = [norm(getattr(subject, f, None)) for f in ("file_path", "path", "target") if getattr(subject, f, None)]
is_match = False
if target_hash and target_hash in subj_hashes:
is_match = True
if target_path and target_path in subj_paths:
is_match = True
if is_match:
refresh_args: list[str] = []
if hash_hex:
refresh_args.extend(["-hash", hash_hex])
get_url_cmd._run(subject, refresh_args, config)
except Exception:
debug("URL refresh skipped (error)")
return 0
if not hash_hex and not file_path:
log("Selected result does not include a file path or Hydrus hash", file=sys.stderr)
return 1
return 1
# Register cmdlet
register(["add-url", "add_url"])(Add_Url)

View File

@@ -8,19 +8,19 @@ from helper.logger import log
from . import register
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash
from ._shared import Cmdlet, CmdletArg, SharedArgs, normalize_hash, should_show_help
CMDLET = Cmdlet(
name="check-file-status",
summary="Check if a file is active, deleted, or corrupted in Hydrus.",
usage="check-file-status [-hash <sha256>]",
args=[
CmdletArg("-hash", description="File hash (SHA256) to check. If not provided, uses selected result."),
arg=[
SharedArgs.HASH,
],
details=[
detail=[
"- Shows whether file is active in Hydrus or marked as deleted",
"- Detects corrupted data (e.g., comma-separated URLs)",
"- Detects corrupted data (e.g., comma-separated url)",
"- Displays file metadata and service locations",
"- Note: Hydrus keeps deleted files for recovery. Use cleanup-corrupted for full removal.",
],
@@ -30,12 +30,9 @@ CMDLET = Cmdlet(
@register(["check-file-status", "check-status", "file-status", "status"])
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
# Parse arguments
override_hash: str | None = None
@@ -109,11 +106,11 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
log(f" - {sname} ({stype}) - deleted at {time_deleted}", file=sys.stderr)
# URL check
urls = file_info.get("known_urls", [])
log(f"\n🔗 URLs ({len(urls)}):", file=sys.stderr)
url = file_info.get("url", [])
log(f"\n🔗 url ({len(url)}):", file=sys.stderr)
corrupted_count = 0
for i, url in enumerate(urls, 1):
for i, url in enumerate(url, 1):
if "," in url:
corrupted_count += 1
log(f" [{i}] ⚠️ CORRUPTED (comma-separated): {url[:50]}...", file=sys.stderr)

View File

@@ -9,11 +9,12 @@ from __future__ import annotations
from typing import Any, Dict, Sequence
from pathlib import Path
import sys
import json
from helper.logger import log
from . import register
from ._shared import Cmdlet, CmdletArg, get_pipe_object_path, normalize_result_input, filter_results_by_temp
from ._shared import Cmdlet, CmdletArg, get_pipe_object_path, normalize_result_input, filter_results_by_temp, should_show_help
import models
import pipeline as pipeline_context
@@ -36,13 +37,9 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
import json
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
# Normalize input to list
results = normalize_result_input(result)
@@ -97,8 +94,8 @@ CMDLET = Cmdlet(
name="cleanup",
summary="Remove temporary artifacts from pipeline (marked with is_temp=True).",
usage="cleanup",
args=[],
details=[
arg=[],
detail=[
"- Accepts pipeline results that may contain temporary files (screenshots, intermediate artifacts)",
"- Deletes files marked with is_temp=True from disk",
"- Also cleans up associated sidecar files (.tags, .metadata)",

View File

@@ -1,398 +1,249 @@
"""Delete-file cmdlet: Delete files from local storage and/or Hydrus."""
from __future__ import annotations
from typing import Any, Dict, Sequence
import json
import sys
from helper.logger import debug, log
import sqlite3
from pathlib import Path
import models
import pipeline as ctx
from helper.logger import debug, log
from helper.store import Folder
from ._shared import Cmdlet, CmdletArg, normalize_hash, looks_like_hash, get_origin, get_field, should_show_help
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash, looks_like_hash
from config import get_local_storage_path
from helper.local_library import LocalLibraryDB
import pipeline as ctx
def _refresh_last_search(config: Dict[str, Any]) -> None:
"""Re-run the last search-file to refresh the table after deletes."""
try:
source_cmd = ctx.get_last_result_table_source_command() if hasattr(ctx, "get_last_result_table_source_command") else None
if source_cmd not in {"search-file", "search_file", "search"}:
return
class Delete_File(Cmdlet):
"""Class-based delete-file cmdlet with self-registration."""
args = ctx.get_last_result_table_source_args() if hasattr(ctx, "get_last_result_table_source_args") else []
try:
from cmdlets import search_file as search_file_cmd # type: ignore
except Exception:
return
def __init__(self) -> None:
super().__init__(
name="delete-file",
summary="Delete a file locally and/or from Hydrus, including database entries.",
usage="delete-file [-hash <sha256>] [-conserve <local|hydrus>] [-lib-root <path>] [reason]",
alias=["del-file"],
arg=[
CmdletArg("hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
CmdletArg("conserve", description="Choose which copy to keep: 'local' or 'hydrus'."),
CmdletArg("lib-root", description="Path to local library root for database cleanup."),
CmdletArg("reason", description="Optional reason for deletion (free text)."),
],
detail=[
"Default removes both the local file and Hydrus file.",
"Use -conserve local to keep the local file, or -conserve hydrus to keep it in Hydrus.",
"Database entries are automatically cleaned up for local files.",
"Any remaining arguments are treated as the Hydrus reason text.",
],
exec=self.run,
)
self.register()
# Re-run the prior search to refresh items/table without disturbing history
search_file_cmd._run(None, args, config)
def _process_single_item(self, item: Any, override_hash: str | None, conserve: str | None,
lib_root: str | None, reason: str, config: Dict[str, Any]) -> bool:
"""Process deletion for a single item."""
# Handle item as either dict or object
if isinstance(item, dict):
hash_hex_raw = item.get("hash_hex") or item.get("hash")
target = item.get("target") or item.get("file_path") or item.get("path")
else:
hash_hex_raw = get_field(item, "hash_hex") or get_field(item, "hash")
target = get_field(item, "target") or get_field(item, "file_path") or get_field(item, "path")
origin = get_origin(item)
# Also check the store field explicitly from PipeObject
store = None
if isinstance(item, dict):
store = item.get("store")
else:
store = get_field(item, "store")
# For Hydrus files, the target IS the hash
if origin and origin.lower() == "hydrus" and not hash_hex_raw:
hash_hex_raw = target
# Set an overlay so action-command pipeline output displays the refreshed table
try:
new_table = ctx.get_last_result_table()
new_items = ctx.get_last_result_items()
subject = ctx.get_last_result_subject() if hasattr(ctx, "get_last_result_subject") else None
if hasattr(ctx, "set_last_result_table_overlay") and new_table and new_items is not None:
ctx.set_last_result_table_overlay(new_table, new_items, subject)
except Exception:
pass
except Exception as exc:
debug(f"[delete_file] search refresh failed: {exc}", file=sys.stderr)
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(hash_hex_raw)
def _cleanup_relationships(db_path: Path, file_hash: str) -> int:
"""Remove references to file_hash from other files' relationships."""
try:
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
local_deleted = False
local_target = isinstance(target, str) and target.strip() and not str(target).lower().startswith(("http://", "https://"))
# Find all metadata entries that contain this hash in relationships
cursor.execute("SELECT file_id, relationships FROM metadata WHERE relationships LIKE ?", (f'%{file_hash}%',))
rows = cursor.fetchall()
rel_update_count = 0
for row_fid, rel_json in rows:
try:
rels = json.loads(rel_json)
changed = False
if isinstance(rels, dict):
for r_type, hashes in rels.items():
if isinstance(hashes, list) and file_hash in hashes:
hashes.remove(file_hash)
changed = True
if changed:
cursor.execute("UPDATE metadata SET relationships = ? WHERE file_id = ?", (json.dumps(rels), row_fid))
rel_update_count += 1
except Exception:
pass
conn.commit()
conn.close()
if rel_update_count > 0:
debug(f"Removed relationship references from {rel_update_count} other files", file=sys.stderr)
return rel_update_count
except Exception as e:
debug(f"Error cleaning up relationships: {e}", file=sys.stderr)
return 0
def _delete_database_entry(db_path: Path, file_path: str) -> bool:
"""Delete file and related entries from local library database.
Args:
db_path: Path to the library.db file
file_path: Exact file path string as stored in database
Returns:
True if successful, False otherwise
"""
try:
if not db_path.exists():
debug(f"Database not found at {db_path}", file=sys.stderr)
return False
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
debug(f"Searching database for file_path: {file_path}", file=sys.stderr)
# Find the file_id using the exact file_path
cursor.execute('SELECT id FROM files WHERE file_path = ?', (file_path,))
result = cursor.fetchone()
if not result:
debug(f"File path not found in database: {file_path}", file=sys.stderr)
conn.close()
return False
file_id = result[0]
# Get file hash before deletion to clean up relationships
cursor.execute('SELECT file_hash FROM files WHERE id = ?', (file_id,))
hash_result = cursor.fetchone()
file_hash = hash_result[0] if hash_result else None
debug(f"Found file_id={file_id}, deleting all related records", file=sys.stderr)
# Delete related records
cursor.execute('DELETE FROM metadata WHERE file_id = ?', (file_id,))
meta_count = cursor.rowcount
cursor.execute('DELETE FROM tags WHERE file_id = ?', (file_id,))
tags_count = cursor.rowcount
cursor.execute('DELETE FROM notes WHERE file_id = ?', (file_id,))
notes_count = cursor.rowcount
cursor.execute('DELETE FROM files WHERE id = ?', (file_id,))
files_count = cursor.rowcount
conn.commit()
conn.close()
# Clean up relationships in other files
if file_hash:
_cleanup_relationships(db_path, file_hash)
debug(f"Deleted: metadata={meta_count}, tags={tags_count}, notes={notes_count}, files={files_count}", file=sys.stderr)
return True
except Exception as exc:
log(f"Database cleanup failed: {exc}", file=sys.stderr)
import traceback
traceback.print_exc(file=sys.stderr)
return False
def _process_single_item(item: Any, override_hash: str | None, conserve: str | None,
lib_root: str | None, reason: str, config: Dict[str, Any]) -> bool:
"""Process deletion for a single item."""
# Handle item as either dict or object
if isinstance(item, dict):
hash_hex_raw = item.get("hash_hex") or item.get("hash")
target = item.get("target")
origin = item.get("origin")
else:
hash_hex_raw = getattr(item, "hash_hex", None) or getattr(item, "hash", None)
target = getattr(item, "target", None)
origin = getattr(item, "origin", None)
# For Hydrus files, the target IS the hash
if origin and origin.lower() == "hydrus" and not hash_hex_raw:
hash_hex_raw = target
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(hash_hex_raw)
local_deleted = False
local_target = isinstance(target, str) and target.strip() and not str(target).lower().startswith(("http://", "https://"))
# Try to resolve local path if target looks like a hash and we have a library root
if local_target and looks_like_hash(str(target)) and lib_root:
try:
db_path = Path(lib_root) / ".downlow_library.db"
if db_path.exists():
# We can't use LocalLibraryDB context manager easily here without importing it,
# but we can use a quick sqlite connection or just use the class if imported.
# We imported LocalLibraryDB, so let's use it.
with LocalLibraryDB(Path(lib_root)) as db:
resolved = db.search_by_hash(str(target))
if resolved:
target = str(resolved)
# Also ensure we have the hash set for Hydrus deletion if needed
if not hash_hex:
hash_hex = normalize_hash(str(target))
except Exception as e:
debug(f"Failed to resolve hash to local path: {e}", file=sys.stderr)
if conserve != "local" and local_target:
path = Path(str(target))
file_path_str = str(target) # Keep the original string for DB matching
try:
if path.exists() and path.is_file():
path.unlink()
local_deleted = True
if ctx._PIPE_ACTIVE:
ctx.emit(f"Removed local file: {path}")
log(f"Deleted: {path.name}", file=sys.stderr)
except Exception as exc:
log(f"Local delete failed: {exc}", file=sys.stderr)
# Remove common sidecars regardless of file removal success
for sidecar in (path.with_suffix(".tags"), path.with_suffix(".tags.txt"),
path.with_suffix(".metadata"), path.with_suffix(".notes")):
try:
if sidecar.exists() and sidecar.is_file():
sidecar.unlink()
except Exception:
pass
# Clean up database entry if library root provided - do this regardless of file deletion success
if lib_root:
lib_root_path = Path(lib_root)
db_path = lib_root_path / ".downlow_library.db"
if conserve != "local" and local_target:
path = Path(str(target))
# If file_path_str is a hash (because file was already deleted or target was hash),
# we need to find the path by hash in the DB first
if looks_like_hash(file_path_str):
# If lib_root is provided and this is from a folder store, use the Folder class
if lib_root:
try:
with LocalLibraryDB(lib_root_path) as db:
resolved = db.search_by_hash(file_path_str)
if resolved:
file_path_str = str(resolved)
folder = Folder(Path(lib_root), name=origin or "local")
if folder.delete_file(str(path)):
local_deleted = True
ctx.emit(f"Removed file: {path.name}")
log(f"Deleted: {path.name}", file=sys.stderr)
except Exception as exc:
debug(f"Folder.delete_file failed: {exc}", file=sys.stderr)
# Fallback to manual deletion
try:
if path.exists() and path.is_file():
path.unlink()
local_deleted = True
ctx.emit(f"Removed local file: {path}")
log(f"Deleted: {path.name}", file=sys.stderr)
except Exception as exc:
log(f"Local delete failed: {exc}", file=sys.stderr)
else:
# No lib_root, just delete the file
try:
if path.exists() and path.is_file():
path.unlink()
local_deleted = True
ctx.emit(f"Removed local file: {path}")
log(f"Deleted: {path.name}", file=sys.stderr)
except Exception as exc:
log(f"Local delete failed: {exc}", file=sys.stderr)
# Remove common sidecars regardless of file removal success
for sidecar in (path.with_suffix(".tags"), path.with_suffix(".tags.txt"),
path.with_suffix(".metadata"), path.with_suffix(".notes")):
try:
if sidecar.exists() and sidecar.is_file():
sidecar.unlink()
except Exception:
pass
db_success = _delete_database_entry(db_path, file_path_str)
if not db_success:
# If deletion failed (e.g. not found), but we have a hash, try to clean up relationships anyway
effective_hash = None
if looks_like_hash(file_path_str):
effective_hash = file_path_str
elif hash_hex:
effective_hash = hash_hex
if effective_hash:
debug(f"Entry not found, but attempting to clean up relationships for hash: {effective_hash}", file=sys.stderr)
if _cleanup_relationships(db_path, effective_hash) > 0:
db_success = True
if db_success:
if ctx._PIPE_ACTIVE:
ctx.emit(f"Removed database entry: {path.name}")
debug(f"Database entry cleaned up", file=sys.stderr)
local_deleted = True
else:
debug(f"Database entry not found or cleanup failed for {file_path_str}", file=sys.stderr)
else:
debug(f"No lib_root provided, skipping database cleanup", file=sys.stderr)
hydrus_deleted = False
# Only attempt Hydrus deletion if origin is explicitly Hydrus or if we failed to delete locally
# and we suspect it might be in Hydrus.
# If origin is local, we should default to NOT deleting from Hydrus unless requested?
# Or maybe we should check if it exists in Hydrus first?
# The user complaint is "its still trying to delete hydrus, this is a local file".
should_try_hydrus = True
if origin and origin.lower() == "local":
should_try_hydrus = False
# If conserve is set to hydrus, definitely don't delete
if conserve == "hydrus":
hydrus_deleted = False
# Only attempt Hydrus deletion if store is explicitly Hydrus-related
# Check both origin and store fields to determine if this is a Hydrus file
should_try_hydrus = False
if should_try_hydrus and hash_hex:
try:
client = hydrus_wrapper.get_client(config)
except Exception as exc:
if not local_deleted:
log(f"Hydrus client unavailable: {exc}", file=sys.stderr)
return False
else:
if client is None:
# Check if store indicates this is a Hydrus backend
if store and ("hydrus" in store.lower() or store.lower() == "home" or store.lower() == "work"):
should_try_hydrus = True
# Fallback to origin check if store not available
elif origin and origin.lower() == "hydrus":
should_try_hydrus = True
# If conserve is set to hydrus, definitely don't delete
if conserve == "hydrus":
should_try_hydrus = False
if should_try_hydrus and hash_hex:
try:
client = hydrus_wrapper.get_client(config)
except Exception as exc:
if not local_deleted:
# If we deleted locally, we don't care if Hydrus is unavailable
pass
else:
log("Hydrus client unavailable", file=sys.stderr)
log(f"Hydrus client unavailable: {exc}", file=sys.stderr)
return False
else:
payload: Dict[str, Any] = {"hashes": [hash_hex]}
if reason:
payload["reason"] = reason
try:
client._post("/add_files/delete_files", data=payload) # type: ignore[attr-defined]
hydrus_deleted = True
preview = hash_hex[:12] + ('' if len(hash_hex) > 12 else '')
debug(f"Deleted from Hydrus: {preview}", file=sys.stderr)
except Exception as exc:
# If it's not in Hydrus (e.g. 404 or similar), that's fine
# log(f"Hydrus delete failed: {exc}", file=sys.stderr)
if client is None:
if not local_deleted:
log("Hydrus client unavailable", file=sys.stderr)
return False
else:
payload: Dict[str, Any] = {"hashes": [hash_hex]}
if reason:
payload["reason"] = reason
try:
client._post("/add_files/delete_files", data=payload) # type: ignore[attr-defined]
hydrus_deleted = True
preview = hash_hex[:12] + ('' if len(hash_hex) > 12 else '')
debug(f"Deleted from Hydrus: {preview}", file=sys.stderr)
except Exception as exc:
# If it's not in Hydrus (e.g. 404 or similar), that's fine
if not local_deleted:
return False
if hydrus_deleted and hash_hex:
preview = hash_hex[:12] + ('' if len(hash_hex) > 12 else '')
if ctx._PIPE_ACTIVE:
if hydrus_deleted and hash_hex:
preview = hash_hex[:12] + ('' if len(hash_hex) > 12 else '')
if reason:
ctx.emit(f"Deleted {preview} (reason: {reason}).")
else:
ctx.emit(f"Deleted {preview}.")
if hydrus_deleted or local_deleted:
return True
if hydrus_deleted or local_deleted:
return True
log("Selected result has neither Hydrus hash nor local file target")
return False
log("Selected result has neither Hydrus hash nor local file target")
return False
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Execute delete-file command."""
if should_show_help(args):
log(f"Cmdlet: {self.name}\nSummary: {self.summary}\nUsage: {self.usage}")
return 0
except Exception:
pass
override_hash: str | None = None
conserve: str | None = None
lib_root: str | None = None
reason_tokens: list[str] = []
i = 0
while i < len(args):
token = args[i]
low = str(token).lower()
if low in {"-hash", "--hash", "hash"} and i + 1 < len(args):
override_hash = str(args[i + 1]).strip()
i += 2
continue
if low in {"-conserve", "--conserve"} and i + 1 < len(args):
value = str(args[i + 1]).strip().lower()
if value in {"local", "hydrus"}:
conserve = value
# Parse arguments
override_hash: str | None = None
conserve: str | None = None
lib_root: str | None = None
reason_tokens: list[str] = []
i = 0
while i < len(args):
token = args[i]
low = str(token).lower()
if low in {"-hash", "--hash", "hash"} and i + 1 < len(args):
override_hash = str(args[i + 1]).strip()
i += 2
continue
if low in {"-lib-root", "--lib-root", "lib-root"} and i + 1 < len(args):
lib_root = str(args[i + 1]).strip()
i += 2
continue
reason_tokens.append(token)
i += 1
if low in {"-conserve", "--conserve"} and i + 1 < len(args):
value = str(args[i + 1]).strip().lower()
if value in {"local", "hydrus"}:
conserve = value
i += 2
continue
if low in {"-lib-root", "--lib-root", "lib-root"} and i + 1 < len(args):
lib_root = str(args[i + 1]).strip()
i += 2
continue
reason_tokens.append(token)
i += 1
if not lib_root:
# Try to get from config
p = get_local_storage_path(config)
if p:
lib_root = str(p)
# If no lib_root provided, try to get the first folder store from config
if not lib_root:
try:
storage_config = config.get("storage", {})
folder_config = storage_config.get("folder", {})
if folder_config:
# Get first folder store path
for store_name, store_config in folder_config.items():
if isinstance(store_config, dict):
path = store_config.get("path")
if path:
lib_root = path
break
except Exception:
pass
reason = " ".join(token for token in reason_tokens if str(token).strip()).strip()
reason = " ".join(token for token in reason_tokens if str(token).strip()).strip()
items = []
if isinstance(result, list):
items = result
elif result:
items = [result]
if not items:
log("No items to delete", file=sys.stderr)
return 1
items = []
if isinstance(result, list):
items = result
elif result:
items = [result]
if not items:
log("No items to delete", file=sys.stderr)
return 1
success_count = 0
for item in items:
if _process_single_item(item, override_hash, conserve, lib_root, reason, config):
success_count += 1
success_count = 0
for item in items:
if self._process_single_item(item, override_hash, conserve, lib_root, reason, config):
success_count += 1
if success_count > 0:
_refresh_last_search(config)
if success_count > 0:
# Clear cached tables/items so deleted entries are not redisplayed
try:
ctx.set_last_result_table_overlay(None, None, None)
ctx.set_last_result_table(None, [])
ctx.set_last_result_items_only([])
ctx.set_current_stage_table(None)
except Exception:
pass
return 0 if success_count > 0 else 1
return 0 if success_count > 0 else 1
# Instantiate and register the cmdlet
Delete_File()
CMDLET = Cmdlet(
name="delete-file",
summary="Delete a file locally and/or from Hydrus, including database entries.",
usage="delete-file [-hash <sha256>] [-conserve <local|hydrus>] [-lib-root <path>] [reason]",
aliases=["del-file"],
args=[
CmdletArg("hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
CmdletArg("conserve", description="Choose which copy to keep: 'local' or 'hydrus'."),
CmdletArg("lib-root", description="Path to local library root for database cleanup."),
CmdletArg("reason", description="Optional reason for deletion (free text)."),
],
details=[
"Default removes both the local file and Hydrus file.",
"Use -conserve local to keep the local file, or -conserve hydrus to keep it in Hydrus.",
"Database entries are automatically cleaned up for local files.",
"Any remaining arguments are treated as the Hydrus reason text.",
],
)

View File

@@ -5,18 +5,18 @@ import json
import pipeline as ctx
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash
from ._shared import Cmdlet, CmdletArg, normalize_hash, get_hash_for_operation, fetch_hydrus_metadata, should_show_help, get_field
from helper.logger import log
CMDLET = Cmdlet(
name="delete-note",
summary="Delete a named note from a Hydrus file.",
usage="i | del-note [-hash <sha256>] <name>",
aliases=["del-note"],
args=[
alias=["del-note"],
arg=[
],
details=[
detail=[
"- Removes the note with the given name from the Hydrus file.",
],
)
@@ -24,12 +24,9 @@ CMDLET = Cmdlet(
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
if not args:
log("Requires the note name/key to delete")
return 1
@@ -57,7 +54,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
if isinstance(result, list) and len(result) > 0:
result = result[0]
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(getattr(result, "hash_hex", None))
hash_hex = get_hash_for_operation(override_hash, result)
if not hash_hex:
log("Selected result does not include a Hydrus hash")
return 1
@@ -93,7 +90,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
if isinstance(subject, dict):
subj_hashes = [norm(v) for v in [subject.get("hydrus_hash"), subject.get("hash"), subject.get("hash_hex"), subject.get("file_hash")] if v]
else:
subj_hashes = [norm(getattr(subject, f, None)) for f in ("hydrus_hash", "hash", "hash_hex", "file_hash") if getattr(subject, f, None)]
subj_hashes = [norm(get_field(subject, f)) for f in ("hydrus_hash", "hash", "hash_hex", "file_hash") if get_field(subject, f)]
if target_hash and target_hash in subj_hashes:
get_note_cmd.get_notes(subject, ["-hash", hash_hex], config)
return 0

View File

@@ -10,8 +10,8 @@ import sys
from helper.logger import log
import pipeline as ctx
from ._shared import Cmdlet, CmdletArg, parse_cmdlet_args, normalize_result_input
from helper.local_library import LocalLibrarySearchOptimizer
from ._shared import Cmdlet, CmdletArg, parse_cmdlet_args, normalize_result_input, get_field
from helper.folder_store import LocalLibrarySearchOptimizer
from config import get_local_storage_path
@@ -35,12 +35,14 @@ def _refresh_relationship_view_if_current(target_hash: Optional[str], target_pat
subj_hashes: list[str] = []
subj_paths: list[str] = []
if isinstance(subject, dict):
subj_hashes = [norm(v) for v in [subject.get("hydrus_hash"), subject.get("hash"), subject.get("hash_hex"), subject.get("file_hash")] if v]
subj_paths = [norm(v) for v in [subject.get("file_path"), subject.get("path"), subject.get("target")] if v]
else:
subj_hashes = [norm(getattr(subject, f, None)) for f in ("hydrus_hash", "hash", "hash_hex", "file_hash") if getattr(subject, f, None)]
subj_paths = [norm(getattr(subject, f, None)) for f in ("file_path", "path", "target") if getattr(subject, f, None)]
for field in ("hydrus_hash", "hash", "hash_hex", "file_hash"):
val = get_field(subject, field)
if val:
subj_hashes.append(norm(val))
for field in ("file_path", "path", "target"):
val = get_field(subject, field)
if val:
subj_paths.append(norm(val))
is_match = False
if target_hashes and any(h in subj_hashes for h in target_hashes):
@@ -93,21 +95,12 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
for single_result in results:
try:
# Get file path from result
file_path_from_result = None
if isinstance(single_result, dict):
file_path_from_result = (
single_result.get("file_path") or
single_result.get("path") or
single_result.get("target")
)
else:
file_path_from_result = (
getattr(single_result, "file_path", None) or
getattr(single_result, "path", None) or
getattr(single_result, "target", None) or
str(single_result)
)
file_path_from_result = (
get_field(single_result, "file_path")
or get_field(single_result, "path")
or get_field(single_result, "target")
or (str(single_result) if not isinstance(single_result, dict) else None)
)
if not file_path_from_result:
log("Could not extract file path from result", file=sys.stderr)
@@ -199,12 +192,12 @@ CMDLET = Cmdlet(
name="delete-relationship",
summary="Remove relationships from files.",
usage="@1 | delete-relationship --all OR delete-relationship -path <file> --all OR @1-3 | delete-relationship -type alt",
args=[
arg=[
CmdletArg("path", type="string", description="Specify the local file path (if not piping a result)."),
CmdletArg("all", type="flag", description="Delete all relationships for the file(s)."),
CmdletArg("type", type="string", description="Delete specific relationship type ('alt', 'king', 'related'). Default: delete all types."),
],
details=[
detail=[
"- Delete all relationships: pipe files | delete-relationship --all",
"- Delete specific type: pipe files | delete-relationship -type alt",
"- Delete all from file: delete-relationship -path <file> --all",

View File

@@ -9,7 +9,7 @@ from . import register
import models
import pipeline as ctx
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash, parse_tag_arguments
from ._shared import Cmdlet, CmdletArg, SharedArgs, normalize_hash, parse_tag_arguments, fetch_hydrus_metadata, should_show_help, get_field
from helper.logger import debug, log
@@ -37,8 +37,8 @@ def _refresh_tag_view_if_current(hash_hex: str | None, file_path: str | None, co
subj_hashes = [norm(v) for v in [subject.get("hydrus_hash"), subject.get("hash"), subject.get("hash_hex"), subject.get("file_hash")] if v]
subj_paths = [norm(v) for v in [subject.get("file_path"), subject.get("path"), subject.get("target")] if v]
else:
subj_hashes = [norm(getattr(subject, f, None)) for f in ("hydrus_hash", "hash", "hash_hex", "file_hash") if getattr(subject, f, None)]
subj_paths = [norm(getattr(subject, f, None)) for f in ("file_path", "path", "target") if getattr(subject, f, None)]
subj_hashes = [norm(get_field(subject, f)) for f in ("hydrus_hash", "hash", "hash_hex", "file_hash") if get_field(subject, f)]
subj_paths = [norm(get_field(subject, f)) for f in ("file_path", "path", "target") if get_field(subject, f)]
is_match = False
if target_hash and target_hash in subj_hashes:
@@ -60,12 +60,12 @@ CMDLET = Cmdlet(
name="delete-tags",
summary="Remove tags from a Hydrus file.",
usage="del-tags [-hash <sha256>] <tag>[,<tag>...]",
aliases=["del-tag", "del-tags", "delete-tag"],
args=[
CmdletArg("-hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
alias=["del-tag", "del-tags", "delete-tag"],
arg=[
SharedArgs.HASH,
CmdletArg("<tag>[,<tag>...]", required=True, description="One or more tags to remove. Comma- or space-separated."),
],
details=[
detail=[
"- Requires a Hydrus file (hash present) or explicit -hash override.",
"- Multiple tags can be comma-separated or space-separated.",
],
@@ -74,12 +74,9 @@ CMDLET = Cmdlet(
@register(["del-tag", "del-tags", "delete-tag", "delete-tags"]) # Still needed for backward compatibility
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
# Check if we have a piped TagItem with no args (i.e., from @1 | delete-tag)
has_piped_tag = (result and hasattr(result, '__class__') and
@@ -139,15 +136,15 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
if idx - 1 < len(ctx._LAST_RESULT_ITEMS):
item = ctx._LAST_RESULT_ITEMS[idx - 1]
if hasattr(item, '__class__') and item.__class__.__name__ == 'TagItem':
tag_name = getattr(item, 'tag_name', None)
tag_name = get_field(item, 'tag_name')
if tag_name:
log(f"[delete_tag] Extracted tag from @{idx}: {tag_name}")
tags_from_at_syntax.append(tag_name)
# Also get hash from first item for consistency
if not hash_from_at_syntax:
hash_from_at_syntax = getattr(item, 'hash_hex', None)
hash_from_at_syntax = get_field(item, 'hash_hex')
if not file_path_from_at_syntax:
file_path_from_at_syntax = getattr(item, 'file_path', None)
file_path_from_at_syntax = get_field(item, 'file_path')
if not tags_from_at_syntax:
log(f"No tags found at indices: {indices}")
@@ -219,13 +216,13 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
for item in items_to_process:
tags_to_delete = []
item_hash = normalize_hash(override_hash) if override_hash else normalize_hash(getattr(item, "hash_hex", None))
item_path = getattr(item, "path", None) or getattr(item, "file_path", None) or getattr(item, "target", None)
# If result is a dict (e.g. from search-file), try getting path from keys
if not item_path and isinstance(item, dict):
item_path = item.get("path") or item.get("file_path") or item.get("target")
item_source = getattr(item, "source", None)
item_hash = normalize_hash(override_hash) if override_hash else normalize_hash(get_field(item, "hash_hex"))
item_path = (
get_field(item, "path")
or get_field(item, "file_path")
or get_field(item, "target")
)
item_source = get_field(item, "source")
if hasattr(item, '__class__') and item.__class__.__name__ == 'TagItem':
# It's a TagItem
@@ -238,7 +235,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Let's assume if args are present, we use args. If not, we use the tag name.
tags_to_delete = tags_arg
else:
tag_name = getattr(item, 'tag_name', None)
tag_name = get_field(item, 'tag_name')
if tag_name:
tags_to_delete = [tag_name]
else:
@@ -270,34 +267,31 @@ def _process_deletion(tags: list[str], hash_hex: str | None, file_path: str | No
# Prefer local DB when we have a path and not explicitly hydrus
if file_path and (source == "local" or (source != "hydrus" and not hash_hex)):
try:
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
from config import get_local_storage_path
path_obj = Path(file_path)
local_root = get_local_storage_path(config) or path_obj.parent
with LocalLibraryDB(local_root) as db:
existing = db.get_tags(path_obj) or []
with FolderDB(local_root) as db:
file_hash = db.get_file_hash(path_obj)
existing = db.get_tags(file_hash) if file_hash else []
except Exception:
existing = []
elif hash_hex:
try:
client = hydrus_wrapper.get_client(config)
payload = client.fetch_file_metadata(
hashes=[hash_hex],
include_service_keys_to_tags=True,
include_file_urls=False,
)
items = payload.get("metadata") if isinstance(payload, dict) else None
meta = items[0] if isinstance(items, list) and items else None
if isinstance(meta, dict):
tags_payload = meta.get("tags")
if isinstance(tags_payload, dict):
seen: set[str] = set()
for svc_data in tags_payload.values():
if not isinstance(svc_data, dict):
continue
display = svc_data.get("display_tags")
if isinstance(display, list):
for t in display:
meta, _ = fetch_hydrus_metadata(
config, hash_hex,
include_service_keys_to_tags=True,
include_file_url=False,
)
if isinstance(meta, dict):
tags_payload = meta.get("tags")
if isinstance(tags_payload, dict):
seen: set[str] = set()
for svc_data in tags_payload.values():
if not isinstance(svc_data, dict):
continue
display = svc_data.get("display_tags")
if isinstance(display, list):
for t in display:
if isinstance(t, (str, bytes)):
val = str(t).strip()
if val and val not in seen:
@@ -313,8 +307,6 @@ def _process_deletion(tags: list[str], hash_hex: str | None, file_path: str | No
if val and val not in seen:
seen.add(val)
existing.append(val)
except Exception:
existing = []
return existing
# Safety: only block if this deletion would remove the final title tag
@@ -335,7 +327,7 @@ def _process_deletion(tags: list[str], hash_hex: str | None, file_path: str | No
# Handle local file tag deletion
if file_path and (source == "local" or (not hash_hex and source != "hydrus")):
try:
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
from pathlib import Path
path_obj = Path(file_path)
@@ -351,7 +343,7 @@ def _process_deletion(tags: list[str], hash_hex: str | None, file_path: str | No
# Fallback: assume file is in a library root or use its parent
local_root = path_obj.parent
with LocalLibraryDB(local_root) as db:
with FolderDB(local_root) as db:
db.remove_tags(path_obj, tags)
debug(f"Removed {len(tags)} tag(s) from {path_obj.name} (local)")
_refresh_tag_view_if_current(hash_hex, file_path, config)

View File

@@ -1,194 +1,82 @@
from __future__ import annotations
from typing import Any, Dict, Sequence
import json
import sys
from pathlib import Path
from . import register
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash
from helper.logger import debug, log
from config import get_local_storage_path
from helper.local_library import LocalLibraryDB
import pipeline as ctx
CMDLET = Cmdlet(
name="delete-url",
summary="Remove a URL association from a file (Hydrus or Local).",
usage="delete-url [-hash <sha256>] <url>",
args=[
CmdletArg("-hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
CmdletArg("url", required=True, description="The URL to remove from the file."),
],
details=[
"- Removes the URL from the file's known URL list.",
],
)
from ._shared import Cmdlet, CmdletArg, SharedArgs, parse_cmdlet_args, get_field, normalize_hash
from helper.logger import log
from helper.store import FileStorage
def _parse_hash_and_rest(args: Sequence[str]) -> tuple[str | None, list[str]]:
override_hash: str | None = None
rest: list[str] = []
i = 0
while i < len(args):
a = args[i]
low = str(a).lower()
if low in {"-hash", "--hash", "hash"} and i + 1 < len(args):
override_hash = str(args[i + 1]).strip()
i += 2
continue
rest.append(a)
i += 1
return override_hash, rest
@register(["del-url", "delete-url", "delete_url"]) # aliases
def delete(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
class Delete_Url(Cmdlet):
"""Delete URL associations from files via hash+store."""
override_hash, rest = _parse_hash_and_rest(args)
NAME = "delete-url"
SUMMARY = "Remove a URL association from a file"
USAGE = "@1 | delete-url <url>"
ARGS = [
SharedArgs.HASH,
SharedArgs.STORE,
CmdletArg("url", required=True, description="URL to remove"),
]
DETAIL = [
"- Removes URL association from file identified by hash+store",
"- Multiple url can be comma-separated",
]
url_arg = None
if rest:
url_arg = str(rest[0] or '').strip()
# Normalize result to a list
items = result if isinstance(result, list) else [result]
if not items:
log("No input provided.")
return 1
success_count = 0
for item in items:
target_url = url_arg
target_file = item
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Delete URL from file via hash+store backend."""
parsed = parse_cmdlet_args(args, self)
# Check for rich URL object from get-url
if isinstance(item, dict) and "url" in item and "source_file" in item:
if not target_url:
target_url = item["url"]
target_file = item["source_file"]
# Extract hash and store from result or args
file_hash = parsed.get("hash") or get_field(result, "hash")
store_name = parsed.get("store") or get_field(result, "store")
url_arg = parsed.get("url")
if not target_url:
continue
if _delete_single(target_file, target_url, override_hash, config):
success_count += 1
if success_count == 0:
if not file_hash:
log("Error: No file hash provided")
return 1
if not store_name:
log("Error: No store name provided")
return 1
if not url_arg:
log("Requires a URL argument or valid selection.")
else:
log("Failed to delete URL(s).")
return 1
log("Error: No URL provided")
return 1
return 0
def _delete_single(result: Any, url: str, override_hash: str | None, config: Dict[str, Any]) -> bool:
# Helper to get field from both dict and object
def get_field(obj: Any, field: str, default: Any = None) -> Any:
if isinstance(obj, dict):
return obj.get(field, default)
else:
return getattr(obj, field, default)
success = False
# 1. Try Local Library
file_path = get_field(result, "file_path") or get_field(result, "path")
if file_path and not override_hash:
# Normalize hash
file_hash = normalize_hash(file_hash)
if not file_hash:
log("Error: Invalid hash format")
return 1
# Parse url (comma-separated)
url = [u.strip() for u in str(url_arg).split(',') if u.strip()]
if not url:
log("Error: No valid url provided")
return 1
# Get backend and delete url
try:
path_obj = Path(file_path)
if path_obj.exists():
storage_path = get_local_storage_path(config)
if storage_path:
with LocalLibraryDB(storage_path) as db:
metadata = db.get_metadata(path_obj) or {}
known_urls = metadata.get("known_urls") or []
# Handle comma-separated URLs if passed as arg
# But first check if the exact url string exists (e.g. if it contains commas itself)
urls_to_process = []
if url in known_urls:
urls_to_process = [url]
else:
urls_to_process = [u.strip() for u in url.split(',') if u.strip()]
local_changed = False
for u in urls_to_process:
if u in known_urls:
known_urls.remove(u)
local_changed = True
ctx.emit(f"Deleted URL from local file {path_obj.name}: {u}")
if local_changed:
metadata["known_urls"] = known_urls
db.save_metadata(path_obj, metadata)
success = True
except Exception as e:
log(f"Error updating local library: {e}", file=sys.stderr)
# 2. Try Hydrus
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(get_field(result, "hash_hex", None))
if hash_hex:
try:
client = hydrus_wrapper.get_client(config)
if client:
urls_to_delete = [u.strip() for u in url.split(',') if u.strip()]
for u in urls_to_delete:
client.delete_url(hash_hex, u)
preview = hash_hex[:12] + ('' if len(hash_hex) > 12 else '')
ctx.emit(f"Deleted URL from Hydrus file {preview}: {u}")
success = True
storage = FileStorage(config)
backend = storage[store_name]
for url in url:
backend.delete_url(file_hash, url)
ctx.emit(f"Deleted URL: {url}")
return 0
except KeyError:
log(f"Error: Storage backend '{store_name}' not configured")
return 1
except Exception as exc:
log(f"Hydrus del-url failed: {exc}", file=sys.stderr)
log(f"Error deleting URL: {exc}", file=sys.stderr)
return 1
if success:
try:
from cmdlets import get_url as get_url_cmd # type: ignore
except Exception:
get_url_cmd = None
if get_url_cmd:
try:
subject = ctx.get_last_result_subject()
if subject is not None:
def norm(val: Any) -> str:
return str(val).lower()
target_hash = norm(hash_hex) if hash_hex else None
target_path = norm(file_path) if file_path else None
subj_hashes = []
subj_paths = []
if isinstance(subject, dict):
subj_hashes = [norm(v) for v in [subject.get("hydrus_hash"), subject.get("hash"), subject.get("hash_hex"), subject.get("file_hash")] if v]
subj_paths = [norm(v) for v in [subject.get("file_path"), subject.get("path"), subject.get("target")] if v]
else:
subj_hashes = [norm(getattr(subject, f, None)) for f in ("hydrus_hash", "hash", "hash_hex", "file_hash") if getattr(subject, f, None)]
subj_paths = [norm(getattr(subject, f, None)) for f in ("file_path", "path", "target") if getattr(subject, f, None)]
is_match = False
if target_hash and target_hash in subj_hashes:
is_match = True
if target_path and target_path in subj_paths:
is_match = True
if is_match:
refresh_args: list[str] = []
if hash_hex:
refresh_args.extend(["-hash", hash_hex])
get_url_cmd._run(subject, refresh_args, config)
except Exception:
debug("URL refresh skipped (error)")
return success
# Register cmdlet
register(["delete-url", "del-url", "delete_url"])(Delete_Url)

File diff suppressed because it is too large Load Diff

199
cmdlets/download_file.py Normal file
View File

@@ -0,0 +1,199 @@
"""Download files directly via HTTP (non-yt-dlp url).
Focused cmdlet for direct file downloads from:
- PDFs, images, documents
- url not supported by yt-dlp
- LibGen sources
- Direct file links
No streaming site logic - pure HTTP download with retries.
"""
from __future__ import annotations
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional, Sequence
from helper.download import DownloadError, _download_direct_file
from helper.logger import log, debug
from models import DownloadOptions
import pipeline as pipeline_context
from ._shared import Cmdlet, CmdletArg, SharedArgs, parse_cmdlet_args, register_url_with_local_library, coerce_to_pipe_object
class Download_File(Cmdlet):
"""Class-based download-file cmdlet - direct HTTP downloads."""
def __init__(self) -> None:
"""Initialize download-file cmdlet."""
super().__init__(
name="download-file",
summary="Download files directly via HTTP (PDFs, images, documents)",
usage="download-file <url> [options] or search-file | download-file [options]",
alias=["dl-file", "download-http"],
arg=[
CmdletArg(name="url", type="string", required=False, description="URL to download (direct file links)", variadic=True),
CmdletArg(name="-url", type="string", description="URL to download (alias for positional argument)", variadic=True),
CmdletArg(name="output", type="string", alias="o", description="Output filename (auto-detected if not specified)"),
SharedArgs.URL
],
detail=["Download files directly via HTTP without yt-dlp processing.", "For streaming sites, use download-media."],
exec=self.run,
)
self.register()
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Main execution method."""
stage_ctx = pipeline_context.get_stage_context()
in_pipeline = stage_ctx is not None and getattr(stage_ctx, "total_stages", 1) > 1
if in_pipeline and isinstance(config, dict):
config["_quiet_background_output"] = True
return self._run_impl(result, args, config)
def _run_impl(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Main download implementation for direct HTTP files."""
try:
debug("Starting download-file")
# Parse arguments
parsed = parse_cmdlet_args(args, self)
# Extract options
raw_url = parsed.get("url", [])
if isinstance(raw_url, str):
raw_url = [raw_url]
if not raw_url:
log("No url to download", file=sys.stderr)
return 1
# Get output directory
final_output_dir = self._resolve_output_dir(parsed, config)
if not final_output_dir:
return 1
debug(f"Output directory: {final_output_dir}")
# Download each URL
downloaded_count = 0
quiet_mode = bool(config.get("_quiet_background_output")) if isinstance(config, dict) else False
custom_output = parsed.get("output")
for url in raw_url:
try:
debug(f"Processing: {url}")
# Direct HTTP download
result_obj = _download_direct_file(url, final_output_dir, quiet=quiet_mode)
debug(f"Download completed, building pipe object...")
pipe_obj_dict = self._build_pipe_object(result_obj, url, final_output_dir)
debug(f"Emitting result to pipeline...")
pipeline_context.emit(pipe_obj_dict)
# Automatically register url with local library
if pipe_obj_dict.get("url"):
pipe_obj = coerce_to_pipe_object(pipe_obj_dict)
register_url_with_local_library(pipe_obj, config)
downloaded_count += 1
debug("✓ Downloaded and emitted")
except DownloadError as e:
log(f"Download failed for {url}: {e}", file=sys.stderr)
except Exception as e:
log(f"Error processing {url}: {e}", file=sys.stderr)
if downloaded_count > 0:
debug(f"✓ Successfully processed {downloaded_count} file(s)")
return 0
log("No downloads completed", file=sys.stderr)
return 1
except Exception as e:
log(f"Error in download-file: {e}", file=sys.stderr)
return 1
def _resolve_output_dir(self, parsed: Dict[str, Any], config: Dict[str, Any]) -> Optional[Path]:
"""Resolve the output directory from storage location or config."""
storage_location = parsed.get("storage")
# Priority 1: --storage flag
if storage_location:
try:
return SharedArgs.resolve_storage(storage_location)
except Exception as e:
log(f"Invalid storage location: {e}", file=sys.stderr)
return None
# Priority 2: Config outfile
if config and config.get("outfile"):
try:
return Path(config["outfile"]).expanduser()
except Exception:
pass
# Priority 3: Default (home/Downloads)
final_output_dir = Path.home() / "Downloads"
debug(f"Using default directory: {final_output_dir}")
# Ensure directory exists
try:
final_output_dir.mkdir(parents=True, exist_ok=True)
except Exception as e:
log(f"Cannot create output directory {final_output_dir}: {e}", file=sys.stderr)
return None
return final_output_dir
def _build_pipe_object(self, download_result: Any, url: str, output_dir: Path) -> Dict[str, Any]:
"""Create a PipeObject-compatible dict from a download result."""
# Try to get file path from result
file_path = None
if hasattr(download_result, 'path'):
file_path = download_result.path
elif isinstance(download_result, dict) and 'path' in download_result:
file_path = download_result['path']
if not file_path:
# Fallback: assume result is the path itself
file_path = str(download_result)
media_path = Path(file_path)
hash_value = self._compute_file_hash(media_path)
title = media_path.stem
# Build tags with title for searchability
tags = [f"title:{title}"]
# Prefer canonical fields while keeping legacy keys for compatibility
return {
"path": str(media_path),
"hash": hash_value,
"file_hash": hash_value,
"title": title,
"file_title": title,
"action": "cmdlet:download-file",
"download_mode": "file",
"url": url or (download_result.get('url') if isinstance(download_result, dict) else None),
"url": [url] if url else [],
"store": "local",
"storage_source": "downloads",
"media_kind": "file",
"tags": tags,
}
def _compute_file_hash(self, filepath: Path) -> str:
"""Compute SHA256 hash of a file."""
import hashlib
sha256_hash = hashlib.sha256()
with open(filepath, "rb") as f:
for byte_block in iter(lambda: f.read(4096), b""):
sha256_hash.update(byte_block)
return sha256_hash.hexdigest()
# Module-level singleton registration
CMDLET = Download_File()

1445
cmdlets/download_media.py Normal file

File diff suppressed because it is too large Load Diff

127
cmdlets/download_torrent.py Normal file
View File

@@ -0,0 +1,127 @@
"""Download torrent/magnet links via AllDebrid in a dedicated cmdlet.
Features:
- Accepts magnet links and .torrent files/url
- Uses AllDebrid API for background downloads
- Progress tracking and worker management
- Self-registering class-based cmdlet
"""
from __future__ import annotations
import sys
import uuid
import threading
from pathlib import Path
from typing import Any, Dict, Optional, Sequence
from helper.logger import log
from ._shared import Cmdlet, CmdletArg, parse_cmdlet_args
class Download_Torrent(Cmdlet):
"""Class-based download-torrent cmdlet with self-registration."""
def __init__(self) -> None:
super().__init__(
name="download-torrent",
summary="Download torrent/magnet links via AllDebrid",
usage="download-torrent <magnet|.torrent> [options]",
alias=["torrent", "magnet"],
arg=[
CmdletArg(name="magnet", type="string", required=False, description="Magnet link or .torrent file/URL", variadic=True),
CmdletArg(name="output", type="string", description="Output directory for downloaded files"),
CmdletArg(name="wait", type="float", description="Wait time (seconds) for magnet processing timeout"),
CmdletArg(name="background", type="flag", alias="bg", description="Start download in background"),
],
detail=["Download torrents/magnets via AllDebrid API."],
exec=self.run,
)
self.register()
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
parsed = parse_cmdlet_args(args, self)
magnet_args = parsed.get("magnet", [])
output_dir = Path(parsed.get("output") or Path.home() / "Downloads")
wait_timeout = int(float(parsed.get("wait", 600)))
background_mode = parsed.get("background", False)
api_key = config.get("alldebrid_api_key")
if not api_key:
log("AllDebrid API key not configured", file=sys.stderr)
return 1
for magnet_url in magnet_args:
if background_mode:
self._start_background_worker(magnet_url, output_dir, config, api_key, wait_timeout)
log(f"⧗ Torrent download queued in background: {magnet_url}")
else:
self._download_torrent_worker(str(uuid.uuid4()), magnet_url, output_dir, config, api_key, wait_timeout)
return 0
@staticmethod
def _download_torrent_worker(
worker_id: str,
magnet_url: str,
output_dir: Path,
config: Dict[str, Any],
api_key: str,
wait_timeout: int = 600,
worker_manager: Optional[Any] = None,
) -> None:
try:
from helper.alldebrid import AllDebridClient
client = AllDebridClient(api_key)
log(f"[Worker {worker_id}] Submitting magnet to AllDebrid...")
magnet_info = client.magnet_add(magnet_url)
magnet_id = int(magnet_info.get('id', 0))
if magnet_id <= 0:
log(f"[Worker {worker_id}] Magnet add failed", file=sys.stderr)
return
log(f"[Worker {worker_id}] ✓ Magnet added (ID: {magnet_id})")
# Poll for ready status (simplified)
import time
elapsed = 0
while elapsed < wait_timeout:
status = client.magnet_status(magnet_id)
if status.get('ready'):
break
time.sleep(5)
elapsed += 5
if elapsed >= wait_timeout:
log(f"[Worker {worker_id}] Timeout waiting for magnet", file=sys.stderr)
return
files_result = client.magnet_links([magnet_id])
magnet_files = files_result.get(str(magnet_id), {})
files_array = magnet_files.get('files', [])
if not files_array:
log(f"[Worker {worker_id}] No files found", file=sys.stderr)
return
for file_info in files_array:
file_url = file_info.get('link')
file_name = file_info.get('name')
if file_url:
Download_Torrent._download_file(file_url, output_dir / file_name)
log(f"[Worker {worker_id}] ✓ Downloaded {file_name}")
except Exception as e:
log(f"[Worker {worker_id}] Torrent download failed: {e}", file=sys.stderr)
@staticmethod
def _download_file(url: str, dest: Path) -> None:
try:
import requests
resp = requests.get(url, stream=True)
with open(dest, 'wb') as f:
for chunk in resp.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
except Exception as e:
log(f"File download failed: {e}", file=sys.stderr)
def _start_background_worker(self, magnet_url, output_dir, config, api_key, wait_timeout):
worker_id = f"torrent_{uuid.uuid4().hex[:6]}"
thread = threading.Thread(
target=self._download_torrent_worker,
args=(worker_id, magnet_url, output_dir, config, api_key, wait_timeout),
daemon=False,
name=f"TorrentWorker_{worker_id}",
)
thread.start()
CMDLET = Download_Torrent()

File diff suppressed because it is too large Load Diff

1708
cmdlets/get_file.py.backup Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -6,337 +6,224 @@ import sys
from helper.logger import log
from pathlib import Path
import mimetypes
import os
from helper import hydrus as hydrus_wrapper
from helper.local_library import LocalLibraryDB
from ._shared import Cmdlet, CmdletArg, normalize_hash
from config import get_local_storage_path
from ._shared import Cmdlet, CmdletArg, SharedArgs, parse_cmdlet_args, get_field
import pipeline as ctx
from result_table import ResultTable
def _extract_imported_ts(meta: Dict[str, Any]) -> Optional[int]:
"""Extract an imported timestamp from Hydrus metadata if available."""
if not isinstance(meta, dict):
class Get_Metadata(Cmdlet):
"""Class-based get-metadata cmdlet with self-registration."""
def __init__(self) -> None:
"""Initialize get-metadata cmdlet."""
super().__init__(
name="get-metadata",
summary="Print metadata for files by hash and storage backend.",
usage="get-metadata [-hash <sha256>] [-store <backend>]",
alias=["meta"],
arg=[
SharedArgs.HASH,
SharedArgs.STORE,
],
detail=[
"- Retrieves metadata from storage backend using file hash as identifier.",
"- Shows hash, MIME type, size, duration/pages, known url, and import timestamp.",
"- Hash and store are taken from piped result or can be overridden with -hash/-store flags.",
"- All metadata is retrieved from the storage backend's database (single source of truth).",
],
exec=self.run,
)
self.register()
@staticmethod
def _extract_imported_ts(meta: Dict[str, Any]) -> Optional[int]:
"""Extract an imported timestamp from metadata if available."""
if not isinstance(meta, dict):
return None
# Prefer explicit time_imported if present
explicit = meta.get("time_imported")
if isinstance(explicit, (int, float)):
return int(explicit)
# Try parsing string timestamps
if isinstance(explicit, str):
try:
import datetime as _dt
return int(_dt.datetime.fromisoformat(explicit).timestamp())
except Exception:
pass
return None
# Prefer explicit time_imported if present
explicit = meta.get("time_imported")
if isinstance(explicit, (int, float)):
return int(explicit)
file_services = meta.get("file_services")
if isinstance(file_services, dict):
current = file_services.get("current")
if isinstance(current, dict):
numeric = [int(v) for v in current.values() if isinstance(v, (int, float))]
if numeric:
return min(numeric)
return None
def _format_imported(ts: Optional[int]) -> str:
if not ts:
return ""
try:
import datetime as _dt
return _dt.datetime.utcfromtimestamp(ts).strftime("%Y-%m-%d %H:%M:%S")
except Exception:
return ""
def _build_table_row(title: str, origin: str, path: str, mime: str, size_bytes: Optional[int], dur_seconds: Optional[int], imported_ts: Optional[int], urls: list[str], hash_value: Optional[str], pages: Optional[int] = None) -> Dict[str, Any]:
size_mb = None
if isinstance(size_bytes, int):
@staticmethod
def _format_imported(ts: Optional[int]) -> str:
"""Format timestamp as readable string."""
if not ts:
return ""
try:
size_mb = int(size_bytes / (1024 * 1024))
import datetime as _dt
return _dt.datetime.utcfromtimestamp(ts).strftime("%Y-%m-%d %H:%M:%S")
except Exception:
size_mb = None
return ""
dur_int = int(dur_seconds) if isinstance(dur_seconds, (int, float)) else None
pages_int = int(pages) if isinstance(pages, (int, float)) else None
imported_label = _format_imported(imported_ts)
@staticmethod
def _build_table_row(title: str, origin: str, path: str, mime: str, size_bytes: Optional[int],
dur_seconds: Optional[int], imported_ts: Optional[int], url: list[str],
hash_value: Optional[str], pages: Optional[int] = None) -> Dict[str, Any]:
"""Build a table row dict with metadata fields."""
size_mb = None
if isinstance(size_bytes, int):
try:
size_mb = int(size_bytes / (1024 * 1024))
except Exception:
size_mb = None
duration_label = "Duration(s)"
duration_value = str(dur_int) if dur_int is not None else ""
if mime and mime.lower().startswith("application/pdf"):
duration_label = "Pages"
duration_value = str(pages_int) if pages_int is not None else ""
dur_int = int(dur_seconds) if isinstance(dur_seconds, (int, float)) else None
pages_int = int(pages) if isinstance(pages, (int, float)) else None
imported_label = Get_Metadata._format_imported(imported_ts)
columns = [
("Title", title or ""),
("Hash", hash_value or ""),
("MIME", mime or ""),
("Size(MB)", str(size_mb) if size_mb is not None else ""),
(duration_label, duration_value),
("Imported", imported_label),
("Store", origin or ""),
]
duration_label = "Duration(s)"
duration_value = str(dur_int) if dur_int is not None else ""
if mime and mime.lower().startswith("application/pdf"):
duration_label = "Pages"
duration_value = str(pages_int) if pages_int is not None else ""
return {
"title": title or path,
"path": path,
"origin": origin,
"mime": mime,
"size_bytes": size_bytes,
"duration_seconds": dur_int,
"pages": pages_int,
"imported_ts": imported_ts,
"imported": imported_label,
"hash": hash_value,
"known_urls": urls,
"columns": columns,
}
columns = [
("Title", title or ""),
("Hash", hash_value or ""),
("MIME", mime or ""),
("Size(MB)", str(size_mb) if size_mb is not None else ""),
(duration_label, duration_value),
("Imported", imported_label),
("Store", origin or ""),
]
return {
"title": title or path,
"path": path,
"origin": origin,
"mime": mime,
"size_bytes": size_bytes,
"duration_seconds": dur_int,
"pages": pages_int,
"imported_ts": imported_ts,
"imported": imported_label,
"hash": hash_value,
"url": url,
"columns": columns,
}
def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in _args):
log(json.dumps(CMDLET.to_dict(), ensure_ascii=False, indent=2))
return 0
except Exception:
pass
# Helper to get field from both dict and object
def get_field(obj: Any, field: str, default: Any = None) -> Any:
if isinstance(obj, dict):
return obj.get(field, default)
@staticmethod
def _add_table_body_row(table: ResultTable, row: Dict[str, Any]) -> None:
"""Add a single row to the ResultTable using the prepared columns."""
columns = row.get("columns") if isinstance(row, dict) else None
lookup: Dict[str, Any] = {}
if isinstance(columns, list):
for col in columns:
if isinstance(col, tuple) and len(col) == 2:
label, value = col
lookup[str(label)] = value
row_obj = table.add_row()
row_obj.add_column("Hash", lookup.get("Hash", ""))
row_obj.add_column("MIME", lookup.get("MIME", ""))
row_obj.add_column("Size(MB)", lookup.get("Size(MB)", ""))
if "Duration(s)" in lookup:
row_obj.add_column("Duration(s)", lookup.get("Duration(s)", ""))
elif "Pages" in lookup:
row_obj.add_column("Pages", lookup.get("Pages", ""))
else:
return getattr(obj, field, default)
# Parse -hash override
override_hash: str | None = None
args_list = list(_args)
i = 0
while i < len(args_list):
a = args_list[i]
low = str(a).lower()
if low in {"-hash", "--hash", "hash"} and i + 1 < len(args_list):
override_hash = str(args_list[i + 1]).strip()
break
i += 1
# Try to determine if this is a local file or Hydrus file
local_path = get_field(result, "target", None) or get_field(result, "path", None)
is_local = False
if local_path and isinstance(local_path, str) and not local_path.startswith(("http://", "https://")):
is_local = True
# LOCAL FILE PATH
if is_local and local_path:
row_obj.add_column("Duration(s)", "")
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Main execution entry point."""
# Parse arguments
parsed = parse_cmdlet_args(args, self)
# Get hash and store from parsed args or result
file_hash = parsed.get("hash") or get_field(result, "hash") or get_field(result, "file_hash") or get_field(result, "hash_hex")
storage_source = parsed.get("store") or get_field(result, "store") or get_field(result, "storage") or get_field(result, "origin")
if not file_hash:
log("No hash available - use -hash to specify", file=sys.stderr)
return 1
if not storage_source:
log("No storage backend specified - use -store to specify", file=sys.stderr)
return 1
# Use storage backend to get metadata
try:
file_path = Path(str(local_path))
if file_path.exists() and file_path.is_file():
# Get the hash from result or compute it
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(get_field(result, "hash_hex", None))
# If no hash, compute SHA256 of the file
if not hash_hex:
try:
import hashlib
with open(file_path, 'rb') as f:
hash_hex = hashlib.sha256(f.read()).hexdigest()
except Exception:
hash_hex = None
# Get MIME type
mime_type, _ = mimetypes.guess_type(str(file_path))
if not mime_type:
mime_type = "unknown"
# Pull metadata from local DB if available (for imported timestamp, duration, etc.)
db_metadata = None
library_root = get_local_storage_path(config)
if library_root:
try:
with LocalLibraryDB(library_root) as db:
db_metadata = db.get_metadata(file_path) or None
except Exception:
db_metadata = None
# Get file size (prefer DB size if present)
file_size = None
if isinstance(db_metadata, dict) and isinstance(db_metadata.get("size"), int):
file_size = db_metadata.get("size")
else:
try:
file_size = file_path.stat().st_size
except Exception:
file_size = None
# Duration/pages
duration_seconds = None
pages = None
if isinstance(db_metadata, dict):
if isinstance(db_metadata.get("duration"), (int, float)):
duration_seconds = float(db_metadata.get("duration"))
if isinstance(db_metadata.get("pages"), (int, float)):
pages = int(db_metadata.get("pages"))
if duration_seconds is None and mime_type and mime_type.startswith("video"):
try:
import subprocess
result_proc = subprocess.run(
["ffprobe", "-v", "error", "-select_streams", "v:0", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", str(file_path)],
capture_output=True,
text=True,
timeout=5
)
if result_proc.returncode == 0 and result_proc.stdout.strip():
duration_seconds = float(result_proc.stdout.strip())
except Exception:
pass
# Known URLs from sidecar or result
urls = []
sidecar_path = Path(str(file_path) + '.tags')
if sidecar_path.exists():
try:
with open(sidecar_path, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
if line.startswith('known_url:'):
url_value = line.replace('known_url:', '', 1).strip()
if url_value:
urls.append(url_value)
except Exception:
pass
if not urls:
urls_from_result = get_field(result, "known_urls", None) or get_field(result, "urls", None)
if isinstance(urls_from_result, list):
urls.extend([str(u).strip() for u in urls_from_result if u])
imported_ts = None
if isinstance(db_metadata, dict):
ts = db_metadata.get("time_imported") or db_metadata.get("time_added")
if isinstance(ts, (int, float)):
imported_ts = int(ts)
elif isinstance(ts, str):
try:
import datetime as _dt
imported_ts = int(_dt.datetime.fromisoformat(ts).timestamp())
except Exception:
imported_ts = None
row = _build_table_row(
title=file_path.name,
origin="local",
path=str(file_path),
mime=mime_type or "",
size_bytes=int(file_size) if isinstance(file_size, int) else None,
dur_seconds=duration_seconds,
imported_ts=imported_ts,
urls=urls,
hash_value=hash_hex,
pages=pages,
)
table_title = file_path.name
table = ResultTable(table_title)
table.set_source_command("get-metadata", list(_args))
table.add_result(row)
ctx.set_last_result_table_overlay(table, [row], row)
ctx.emit(row)
return 0
except Exception:
# Fall through to Hydrus if local file handling fails
pass
# HYDRUS PATH
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(get_field(result, "hash_hex", None))
if not hash_hex:
log("Selected result does not include a Hydrus hash or local path", file=sys.stderr)
return 1
try:
client = hydrus_wrapper.get_client(config)
except Exception as exc:
log(f"Hydrus client unavailable: {exc}", file=sys.stderr)
return 1
if client is None:
log("Hydrus client unavailable", file=sys.stderr)
return 1
try:
payload = client.fetch_file_metadata(
hashes=[hash_hex],
include_service_keys_to_tags=False,
include_file_urls=True,
include_duration=True,
include_size=True,
include_mime=True,
)
except Exception as exc:
log(f"Hydrus metadata fetch failed: {exc}", file=sys.stderr)
return 1
items = payload.get("metadata") if isinstance(payload, dict) else None
if not isinstance(items, list) or not items:
log("No metadata found.")
return 0
meta = items[0] if isinstance(items[0], dict) else None
if not isinstance(meta, dict):
log("No metadata found.")
return 0
mime = meta.get("mime")
size = meta.get("size") or meta.get("file_size")
duration_value = meta.get("duration")
inner = meta.get("metadata") if isinstance(meta.get("metadata"), dict) else None
if duration_value is None and isinstance(inner, dict):
duration_value = inner.get("duration")
imported_ts = _extract_imported_ts(meta)
try:
from .search_file import _hydrus_duration_seconds as _dur_secs
except Exception:
_dur_secs = lambda x: x
dur_seconds = _dur_secs(duration_value)
urls = meta.get("known_urls") or meta.get("urls")
urls = [str(u).strip() for u in urls] if isinstance(urls, list) else []
row = _build_table_row(
title=hash_hex,
origin="hydrus",
path=f"hydrus://file/{hash_hex}",
mime=mime or "",
size_bytes=int(size) if isinstance(size, int) else None,
dur_seconds=int(dur_seconds) if isinstance(dur_seconds, (int, float)) else None,
imported_ts=imported_ts,
urls=urls,
hash_value=hash_hex,
pages=None,
)
table = ResultTable(hash_hex or "Metadata")
table.set_source_command("get-metadata", list(_args))
table.add_result(row)
ctx.set_last_result_table_overlay(table, [row], row)
ctx.emit(row)
return 0
from helper.store import FileStorage
storage = FileStorage(config)
backend = storage[storage_source]
# Get metadata from backend
metadata = backend.get_metadata(file_hash)
if not metadata:
log(f"No metadata found for hash {file_hash[:8]}... in {storage_source}", file=sys.stderr)
return 1
# Extract title from tags if available
title = get_field(result, "title") or file_hash[:16]
if not get_field(result, "title"):
# Try to get title from tags
try:
tags, _ = backend.get_tag(file_hash)
for tag in tags:
if tag.lower().startswith("title:"):
title = tag.split(":", 1)[1]
break
except Exception:
pass
# Extract metadata fields
mime_type = metadata.get("mime") or metadata.get("ext", "")
file_size = metadata.get("size")
duration_seconds = metadata.get("duration")
pages = metadata.get("pages")
url = metadata.get("url") or []
imported_ts = self._extract_imported_ts(metadata)
# Normalize url
if isinstance(url, str):
try:
url = json.loads(url)
except (json.JSONDecodeError, TypeError):
url = []
if not isinstance(url, list):
url = []
# Build display row
row = self._build_table_row(
title=title,
origin=storage_source,
path=metadata.get("file_path", ""),
mime=mime_type,
size_bytes=file_size,
dur_seconds=duration_seconds,
imported_ts=imported_ts,
url=url,
hash_value=file_hash,
pages=pages,
)
table_title = title
table = ResultTable(table_title).init_command("get-metadata", list(args))
self._add_table_body_row(table, row)
ctx.set_last_result_table_overlay(table, [row], row)
ctx.emit(row)
return 0
except KeyError:
log(f"Storage backend '{storage_source}' not found", file=sys.stderr)
return 1
except Exception as exc:
log(f"Failed to get metadata: {exc}", file=sys.stderr)
return 1
CMDLET = Cmdlet(
name="get-metadata",
summary="Print metadata for local or Hydrus files (hash, mime, duration, size, URLs).",
usage="get-metadata [-hash <sha256>]",
aliases=["meta"],
args=[
CmdletArg("hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
],
details=[
"- For local files: Shows path, hash (computed if needed), MIME type, size, duration, and known URLs from sidecar.",
"- For Hydrus files: Shows path (hydrus://), hash, MIME, duration, size, and known URLs.",
"- Automatically detects local vs Hydrus files.",
"- Local file hashes are computed via SHA256 if not already available.",
],
)
CMDLET = Get_Metadata()

View File

@@ -7,17 +7,17 @@ from . import register
import models
import pipeline as ctx
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash
from ._shared import Cmdlet, CmdletArg, SharedArgs, normalize_hash, get_hash_for_operation, fetch_hydrus_metadata, get_field, should_show_help
from helper.logger import log
CMDLET = Cmdlet(
name="get-note",
summary="List notes on a Hydrus file.",
usage="get-note [-hash <sha256>]",
args=[
CmdletArg("-hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
arg=[
SharedArgs.HASH,
],
details=[
detail=[
"- Prints notes by service and note name.",
],
)
@@ -25,45 +25,24 @@ CMDLET = Cmdlet(
@register(["get-note", "get-notes", "get_note"]) # aliases
def get_notes(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Helper to get field from both dict and object
def get_field(obj: Any, field: str, default: Any = None) -> Any:
if isinstance(obj, dict):
return obj.get(field, default)
else:
return getattr(obj, field, default)
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
from ._shared import parse_cmdlet_args
from ._shared import parse_cmdlet_args, get_hash_for_operation, fetch_hydrus_metadata
parsed = parse_cmdlet_args(args, CMDLET)
override_hash = parsed.get("hash")
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(get_field(result, "hash_hex", None))
hash_hex = get_hash_for_operation(override_hash, result)
if not hash_hex:
log("Selected result does not include a Hydrus hash")
return 1
try:
client = hydrus_wrapper.get_client(config)
except Exception as exc:
log(f"Hydrus client unavailable: {exc}")
return 1
if client is None:
log("Hydrus client unavailable")
return 1
try:
payload = client.fetch_file_metadata(hashes=[hash_hex], include_service_keys_to_tags=False, include_notes=True)
except Exception as exc:
log(f"Hydrus metadata fetch failed: {exc}")
return 1
items = payload.get("metadata") if isinstance(payload, dict) else None
meta = items[0] if (isinstance(items, list) and items and isinstance(items[0], dict)) else None
meta, error_code = fetch_hydrus_metadata(config, hash_hex, include_service_keys_to_tags=False, include_notes=True)
if error_code != 0:
return error_code
notes = {}
if isinstance(meta, dict):
# Hydrus returns service_keys_to_tags; for notes we expect 'service_names_to_notes' in modern API

View File

@@ -7,12 +7,11 @@ from pathlib import Path
from helper.logger import log
from . import register
import models
import pipeline as ctx
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash, fmt_bytes
from helper.local_library import LocalLibraryDB
from ._shared import Cmdlet, CmdletArg, SharedArgs, normalize_hash, fmt_bytes, get_hash_for_operation, fetch_hydrus_metadata, should_show_help
from helper.folder_store import FolderDB
from config import get_local_storage_path
from result_table import ResultTable
@@ -20,23 +19,22 @@ CMDLET = Cmdlet(
name="get-relationship",
summary="Print relationships for the selected file (Hydrus or Local).",
usage="get-relationship [-hash <sha256>]",
args=[
CmdletArg("-hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
alias=[
"get-rel",
],
details=[
arg=[
SharedArgs.HASH,
],
detail=[
"- Lists relationship data as returned by Hydrus or Local DB.",
],
)
@register(["get-rel", "get-relationship", "get-relationships", "get-file-relationships"]) # aliases
def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in _args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(_args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
# Parse -hash override
override_hash: str | None = None
@@ -91,8 +89,9 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
storage_path = get_local_storage_path(config)
print(f"[DEBUG] Storage path: {storage_path}", file=sys.stderr)
if storage_path:
with LocalLibraryDB(storage_path) as db:
metadata = db.get_metadata(path_obj)
with FolderDB(storage_path) as db:
file_hash = db.get_file_hash(path_obj)
metadata = db.get_metadata(file_hash) if file_hash else None
print(f"[DEBUG] Metadata found: {metadata is not None}", file=sys.stderr)
if metadata and metadata.get("relationships"):
local_db_checked = True
@@ -106,14 +105,14 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
# h is now a file hash (not a path)
print(f"[DEBUG] Processing relationship hash: h={h}", file=sys.stderr)
# Resolve hash to file path
resolved_path = db.search_by_hash(h)
resolved_path = db.search_hash(h)
title = h[:16] + "..."
path = None
if resolved_path and resolved_path.exists():
path = str(resolved_path)
# Try to get title from tags
try:
tags = db.get_tags(resolved_path)
tags = db.get_tags(h)
found_title = False
for t in tags:
if t.lower().startswith('title:'):
@@ -154,11 +153,13 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
if not existing_parent:
parent_title = parent_path_obj.stem
try:
parent_tags = db.get_tags(parent_path_obj)
for t in parent_tags:
if t.lower().startswith('title:'):
parent_title = t[6:].strip()
break
parent_hash = db.get_file_hash(parent_path_obj)
if parent_hash:
parent_tags = db.get_tags(parent_hash)
for t in parent_tags:
if t.lower().startswith('title:'):
parent_title = t[6:].strip()
break
except Exception:
pass
@@ -176,7 +177,8 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
existing_parent['type'] = "king"
# 1. Check forward relationships from parent (siblings)
parent_metadata = db.get_metadata(parent_path_obj)
parent_hash = db.get_file_hash(parent_path_obj)
parent_metadata = db.get_metadata(parent_hash) if parent_hash else None
print(f"[DEBUG] 📖 Parent metadata: {parent_metadata is not None}", file=sys.stderr)
if parent_metadata:
print(f"[DEBUG] Parent metadata keys: {parent_metadata.keys()}", file=sys.stderr)
@@ -189,7 +191,7 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
if child_hashes:
for child_h in child_hashes:
# child_h is now a HASH, not a path - resolve it
child_path_obj = db.search_by_hash(child_h)
child_path_obj = db.search_hash(child_h)
print(f"[DEBUG] Resolved hash {child_h[:16]}... to: {child_path_obj}", file=sys.stderr)
if not child_path_obj:
@@ -205,11 +207,13 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
# Now child_path_obj is a Path, so we can get tags
child_title = child_path_obj.stem
try:
child_tags = db.get_tags(child_path_obj)
for t in child_tags:
if t.lower().startswith('title:'):
child_title = t[6:].strip()
break
child_hash = db.get_file_hash(child_path_obj)
if child_hash:
child_tags = db.get_tags(child_hash)
for t in child_tags:
if t.lower().startswith('title:'):
child_title = t[6:].strip()
break
except Exception:
pass
@@ -241,11 +245,13 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
child_path_obj = Path(child_path)
child_title = child_path_obj.stem
try:
child_tags = db.get_tags(child_path_obj)
for t in child_tags:
if t.lower().startswith('title:'):
child_title = t[6:].strip()
break
child_hash = db.get_file_hash(child_path_obj)
if child_hash:
child_tags = db.get_tags(child_hash)
for t in child_tags:
if t.lower().startswith('title:'):
child_title = t[6:].strip()
break
except Exception:
pass
@@ -304,11 +310,7 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
# But if the file is also in Hydrus, we might want those too.
# Let's try Hydrus if we have a hash.
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(getattr(result, "hash_hex", None))
if not hash_hex:
# Try to get hash from dict
if isinstance(result, dict):
hash_hex = normalize_hash(result.get("hash") or result.get("file_hash"))
hash_hex = get_hash_for_operation(override_hash, result)
if hash_hex and not local_db_checked:
try:
@@ -362,7 +364,7 @@ def _run(result: Any, _args: Sequence[str], config: Dict[str, Any]) -> int:
return 0
# Display results
table = ResultTable(f"Relationships: {source_title}")
table = ResultTable(f"Relationships: {source_title}").init_command("get-relationship", [])
# Sort by type then title
# Custom sort order: King first, then Derivative, then others

View File

@@ -20,8 +20,8 @@ from typing import Any, Dict, List, Optional, Sequence, Tuple
import pipeline as ctx
from helper import hydrus
from helper.local_library import read_sidecar, write_sidecar, find_sidecar, LocalLibraryDB
from ._shared import normalize_hash, looks_like_hash, Cmdlet, CmdletArg, SharedArgs, parse_cmdlet_args
from helper.folder_store import read_sidecar, write_sidecar, find_sidecar, FolderDB
from ._shared import normalize_hash, looks_like_hash, Cmdlet, CmdletArg, SharedArgs, parse_cmdlet_args, get_field
from config import get_local_storage_path
@@ -71,33 +71,6 @@ class TagItem:
}
def _extract_my_tags_from_hydrus_meta(meta: Dict[str, Any], service_key: Optional[str], service_name: str) -> List[str]:
"""Extract current tags from Hydrus metadata dict.
Prefers display_tags (includes siblings/parents, excludes deleted).
Falls back to storage_tags status '0' (current).
"""
tags_payload = meta.get("tags")
if not isinstance(tags_payload, dict):
return []
svc_data = None
if service_key:
svc_data = tags_payload.get(service_key)
if not isinstance(svc_data, dict):
return []
# Prefer display_tags (Hydrus computes siblings/parents)
display = svc_data.get("display_tags")
if isinstance(display, list) and display:
return [str(t) for t in display if isinstance(t, (str, bytes)) and str(t).strip()]
# Fallback to storage_tags status '0' (current)
storage = svc_data.get("storage_tags")
if isinstance(storage, dict):
current_list = storage.get("0") or storage.get(0)
if isinstance(current_list, list):
return [str(t) for t in current_list if isinstance(t, (str, bytes)) and str(t).strip()]
return []
def _emit_tags_as_table(
tags_list: List[str],
hash_hex: Optional[str],
@@ -316,12 +289,12 @@ def _read_sidecar_fallback(p: Path) -> tuple[Optional[str], List[str], List[str]
Format:
- Lines with "hash:" prefix: file hash
- Lines with "known_url:" or "url:" prefix: URLs
- Lines with "url:" or "url:" prefix: url
- Lines with "relationship:" prefix: ignored (internal relationships)
- Lines with "key:", "namespace:value" format: treated as namespace tags
- Plain lines without colons: freeform tags
Excluded namespaces (treated as metadata, not tags): hash, known_url, url, relationship
Excluded namespaces (treated as metadata, not tags): hash, url, url, relationship
"""
try:
raw = p.read_text(encoding="utf-8", errors="ignore")
@@ -332,7 +305,7 @@ def _read_sidecar_fallback(p: Path) -> tuple[Optional[str], List[str], List[str]
h: Optional[str] = None
# Namespaces to exclude from tags
excluded_namespaces = {"hash", "known_url", "url", "relationship"}
excluded_namespaces = {"hash", "url", "url", "relationship"}
for line in raw.splitlines():
s = line.strip()
@@ -344,7 +317,7 @@ def _read_sidecar_fallback(p: Path) -> tuple[Optional[str], List[str], List[str]
if low.startswith("hash:"):
h = s.split(":", 1)[1].strip() if ":" in s else h
# Check if this is a URL line
elif low.startswith("known_url:") or low.startswith("url:"):
elif low.startswith("url:") or low.startswith("url:"):
val = s.split(":", 1)[1].strip() if ":" in s else ""
if val:
u.append(val)
@@ -361,12 +334,12 @@ def _read_sidecar_fallback(p: Path) -> tuple[Optional[str], List[str], List[str]
return h, t, u
def _write_sidecar(p: Path, media: Path, tag_list: List[str], known_urls: List[str], hash_in_sidecar: Optional[str]) -> Path:
def _write_sidecar(p: Path, media: Path, tag_list: List[str], url: List[str], hash_in_sidecar: Optional[str]) -> Path:
"""Write tags to sidecar file and handle title-based renaming.
Returns the new media path if renamed, otherwise returns the original media path.
"""
success = write_sidecar(media, tag_list, known_urls, hash_in_sidecar)
success = write_sidecar(media, tag_list, url, hash_in_sidecar)
if success:
_apply_result_updates_from_tags(None, tag_list)
# Check if we should rename the file based on title tag
@@ -381,8 +354,8 @@ def _write_sidecar(p: Path, media: Path, tag_list: List[str], known_urls: List[s
if hash_in_sidecar:
lines.append(f"hash:{hash_in_sidecar}")
lines.extend(ordered)
for u in known_urls:
lines.append(f"known_url:{u}")
for u in url:
lines.append(f"url:{u}")
try:
p.write_text("\n".join(lines) + "\n", encoding="utf-8")
# Check if we should rename the file based on title tag
@@ -414,16 +387,16 @@ def _emit_tag_payload(source: str, tags_list: List[str], *, hash_value: Optional
label = None
if store_label:
label = store_label
elif ctx._PIPE_ACTIVE:
elif ctx.get_stage_context() is not None:
label = "tags"
if label:
ctx.store_value(label, payload)
if ctx._PIPE_ACTIVE and label.lower() != "tags":
if ctx.get_stage_context() is not None and label.lower() != "tags":
ctx.store_value("tags", payload)
# Emit individual TagItem objects so they can be selected by bare index
# When in pipeline, emit individual TagItem objects
if ctx._PIPE_ACTIVE:
if ctx.get_stage_context() is not None:
for idx, tag_name in enumerate(tags_list, start=1):
tag_item = TagItem(
tag_name=tag_name,
@@ -1113,7 +1086,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Try local sidecar if no tags present on result
if not identifier_tags:
file_path = get_field(result, "target", None) or get_field(result, "path", None) or get_field(result, "file_path", None) or get_field(result, "filename", None)
file_path = get_field(result, "target", None) or get_field(result, "path", None) or get_field(result, "filename", None)
if isinstance(file_path, str) and file_path and not file_path.lower().startswith(("http://", "https://")):
try:
media_path = Path(str(file_path))
@@ -1226,103 +1199,35 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
emit_mode = emit_requested or bool(store_key)
store_label = (store_key.strip() if store_key and store_key.strip() else None)
# Check Hydrus availability
hydrus_available, _ = hydrus.is_available(config)
# Get hash and store from result
file_hash = hash_hex
storage_source = get_field(result, "store") or get_field(result, "storage") or get_field(result, "origin")
# Try to find path in result object
local_path = get_field(result, "target", None) or get_field(result, "path", None) or get_field(result, "file_path", None)
if not file_hash:
log("No hash available in result", file=sys.stderr)
return 1
# Determine if local file
is_local_file = False
media: Optional[Path] = None
if local_path and isinstance(local_path, str) and not local_path.startswith(("http://", "https://")):
is_local_file = True
try:
media = Path(str(local_path))
except Exception:
media = None
if not storage_source:
log("No storage backend specified in result", file=sys.stderr)
return 1
# Try Hydrus first (always prioritize if available and has hash)
use_hydrus = False
hydrus_meta = None # Cache the metadata from first fetch
client = None
if hash_hex and hydrus_available:
try:
client = hydrus.get_client(config)
payload = client.fetch_file_metadata(hashes=[str(hash_hex)], include_service_keys_to_tags=True, include_file_urls=False)
items = payload.get("metadata") if isinstance(payload, dict) else None
if isinstance(items, list) and items:
meta = items[0] if isinstance(items[0], dict) else None
# Only accept file if it has a valid file_id (not None)
if isinstance(meta, dict) and meta.get("file_id") is not None:
use_hydrus = True
hydrus_meta = meta # Cache for tag extraction
except Exception:
pass
# Get tags - try Hydrus first, fallback to sidecar
current = []
service_name = ""
service_key = None
source = "unknown"
if use_hydrus and hash_hex and hydrus_meta:
try:
# Use cached metadata from above, don't fetch again
service_name = hydrus.get_tag_service_name(config)
if client is None:
client = hydrus.get_client(config)
service_key = hydrus.get_tag_service_key(client, service_name)
current = _extract_my_tags_from_hydrus_meta(hydrus_meta, service_key, service_name)
source = "hydrus"
except Exception as exc:
log(f"Warning: Failed to extract tags from Hydrus: {exc}", file=sys.stderr)
# Fallback to local sidecar or local DB if no tags
if not current and is_local_file and media and media.exists():
try:
# First try local library DB
library_root = get_local_storage_path(config)
if library_root:
try:
with LocalLibraryDB(library_root) as db:
db_tags = db.get_tags(media)
if db_tags:
current = db_tags
source = "local_db"
except Exception as exc:
log(f"[get_tag] DB lookup failed, trying sidecar: {exc}", file=sys.stderr)
# Fall back to sidecar if DB didn't have tags
if not current:
sidecar_path = find_sidecar(media)
if sidecar_path and sidecar_path.exists():
try:
_, current, _ = read_sidecar(sidecar_path)
except Exception:
_, current, _ = _read_sidecar_fallback(sidecar_path)
if current:
source = "sidecar"
except Exception as exc:
log(f"Warning: Failed to load tags from local storage: {exc}", file=sys.stderr)
# Fallback to tags in the result object if Hydrus/local lookup returned nothing
if not current:
# Check if result has 'tags' attribute (PipeObject)
if hasattr(result, 'tags') and getattr(result, 'tags', None):
current = getattr(result, 'tags')
source = "pipeline_result"
# Check if result is a dict with 'tags' key
elif isinstance(result, dict) and 'tags' in result:
tags_val = result['tags']
if isinstance(tags_val, list):
current = tags_val
source = "pipeline_result"
source = "pipeline_result"
# Error if no tags found
if not current:
log("No tags found", file=sys.stderr)
# Get tags using storage backend
try:
from helper.store import FileStorage
storage = FileStorage(config)
backend = storage[storage_source]
current, source = backend.get_tag(file_hash, config=config)
if not current:
log("No tags found", file=sys.stderr)
return 1
service_name = ""
except KeyError:
log(f"Storage backend '{storage_source}' not found", file=sys.stderr)
return 1
except Exception as exc:
log(f"Failed to get tags: {exc}", file=sys.stderr)
return 1
# Always output to ResultTable (pipeline mode only)
@@ -1383,33 +1288,106 @@ except Exception:
_SCRAPE_CHOICES = ["itunes", "openlibrary", "googlebooks", "google", "musicbrainz"]
CMDLET = Cmdlet(
name="get-tag",
summary="Get tags from Hydrus or local sidecar metadata",
usage="get-tag [-hash <sha256>] [--store <key>] [--emit] [-scrape <url|provider>]",
aliases=["tags"],
args=[
SharedArgs.HASH,
CmdletArg(
name="-store",
type="string",
description="Store result to this key for pipeline",
alias="store"
),
CmdletArg(
name="-emit",
type="flag",
description="Emit result without interactive prompt (quiet mode)",
alias="emit-only"
),
CmdletArg(
name="-scrape",
type="string",
description="Scrape metadata from URL or provider name (returns tags as JSON or table)",
required=False,
choices=_SCRAPE_CHOICES,
)
]
)
class Get_Tag(Cmdlet):
"""Class-based get-tag cmdlet with self-registration."""
def __init__(self) -> None:
"""Initialize get-tag cmdlet."""
super().__init__(
name="get-tag",
summary="Get tags from Hydrus or local sidecar metadata",
usage="get-tag [-hash <sha256>] [--store <key>] [--emit] [-scrape <url|provider>]",
alias=["tags"],
arg=[
SharedArgs.HASH,
CmdletArg(
name="-store",
type="string",
description="Store result to this key for pipeline",
alias="store"
),
CmdletArg(
name="-emit",
type="flag",
description="Emit result without interactive prompt (quiet mode)",
alias="emit-only"
),
CmdletArg(
name="-scrape",
type="string",
description="Scrape metadata from URL or provider name (returns tags as JSON or table)",
required=False,
choices=_SCRAPE_CHOICES,
)
],
detail=[
"- Retrieves tags for a file from:",
" Hydrus: Using file hash if available",
" Local: From sidecar files or local library database",
"- Options:",
" -hash: Override hash to look up in Hydrus",
" -store: Store result to key for downstream pipeline",
" -emit: Quiet mode (no interactive selection)",
" -scrape: Scrape metadata from URL or metadata provider",
],
exec=self.run,
)
self.register()
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Execute get-tag cmdlet."""
# Parse arguments
parsed = parse_cmdlet_args(args, self)
# Get hash and store from parsed args or result
hash_override = parsed.get("hash")
file_hash = hash_override or get_field(result, "hash") or get_field(result, "file_hash") or get_field(result, "hash_hex")
storage_source = parsed.get("store") or get_field(result, "store") or get_field(result, "storage") or get_field(result, "origin")
if not file_hash:
log("No hash available in result", file=sys.stderr)
return 1
if not storage_source:
log("No storage backend specified in result", file=sys.stderr)
return 1
# Get tags using storage backend
try:
from helper.store import FileStorage
storage_obj = FileStorage(config)
backend = storage_obj[storage_source]
current, source = backend.get_tag(file_hash, config=config)
if not current:
log("No tags found", file=sys.stderr)
return 1
# Build table and emit
item_title = get_field(result, "title") or file_hash[:16]
_emit_tags_as_table(
tags_list=current,
hash_hex=file_hash,
source=source,
service_name="",
config=config,
item_title=item_title,
file_path=None,
subject=result,
)
return 0
except KeyError:
log(f"Storage backend '{storage_source}' not found", file=sys.stderr)
return 1
except Exception as exc:
log(f"Failed to get tags: {exc}", file=sys.stderr)
import traceback
traceback.print_exc(file=sys.stderr)
return 1
# Create and register the cmdlet
CMDLET = Get_Tag()

1415
cmdlets/get_tag.py.orig Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1,139 +1,80 @@
from __future__ import annotations
from typing import Any, Dict, Sequence
import json
import sys
from pathlib import Path
from . import register
import models
import pipeline as ctx
from helper import hydrus as hydrus_wrapper
from ._shared import Cmdlet, CmdletArg, normalize_hash
from ._shared import Cmdlet, CmdletArg, SharedArgs, parse_cmdlet_args, get_field, normalize_hash
from helper.logger import log
from config import get_local_storage_path
from helper.local_library import LocalLibraryDB
CMDLET = Cmdlet(
name="get-url",
summary="List URLs associated with a file (Hydrus or Local).",
usage="get-url [-hash <sha256>]",
args=[
CmdletArg("-hash", description="Override the Hydrus file hash (SHA256) to target instead of the selected result."),
],
details=[
"- Prints the known URLs for the selected file.",
],
)
from helper.store import FileStorage
def _parse_hash_and_rest(args: Sequence[str]) -> tuple[str | None, list[str]]:
override_hash: str | None = None
rest: list[str] = []
i = 0
while i < len(args):
a = args[i]
low = str(a).lower()
if low in {"-hash", "--hash", "hash"} and i + 1 < len(args):
override_hash = str(args[i + 1]).strip()
i += 2
continue
rest.append(a)
i += 1
return override_hash, rest
@register(["get-url", "get-urls", "get_url"]) # aliases
def get_urls(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Helper to get field from both dict and object
def get_field(obj: Any, field: str, default: Any = None) -> Any:
if isinstance(obj, dict):
return obj.get(field, default)
else:
return getattr(obj, field, default)
class Get_Url(Cmdlet):
"""Get url associated with files via hash+store."""
# Help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
NAME = "get-url"
SUMMARY = "List url associated with a file"
USAGE = "@1 | get-url"
ARGS = [
SharedArgs.HASH,
SharedArgs.STORE,
]
DETAIL = [
"- Lists all url associated with file identified by hash+store",
]
override_hash, _ = _parse_hash_and_rest(args)
# Handle @N selection which creates a list - extract the first item
if isinstance(result, list) and len(result) > 0:
result = result[0]
found_urls = []
# 1. Try Local Library
file_path = get_field(result, "file_path") or get_field(result, "path")
if file_path and not override_hash:
try:
path_obj = Path(file_path)
if path_obj.exists():
storage_path = get_local_storage_path(config)
if storage_path:
with LocalLibraryDB(storage_path) as db:
metadata = db.get_metadata(path_obj)
if metadata and metadata.get("known_urls"):
found_urls.extend(metadata["known_urls"])
except Exception as e:
log(f"Error checking local library: {e}", file=sys.stderr)
# 2. Try Hydrus
hash_hex = normalize_hash(override_hash) if override_hash else normalize_hash(get_field(result, "hash_hex", None))
# If we haven't found URLs yet, or if we want to merge them (maybe?), let's check Hydrus if we have a hash
# But usually if it's local, we might not want to check Hydrus unless requested.
# However, the user said "they can just work together".
if hash_hex:
try:
client = hydrus_wrapper.get_client(config)
if client:
payload = client.fetch_file_metadata(hashes=[hash_hex], include_file_urls=True)
items = payload.get("metadata") if isinstance(payload, dict) else None
meta = items[0] if (isinstance(items, list) and items and isinstance(items[0], dict)) else None
hydrus_urls = (meta.get("known_urls") if isinstance(meta, dict) else None) or []
for u in hydrus_urls:
if u not in found_urls:
found_urls.append(u)
except Exception as exc:
# Only log error if we didn't find local URLs either, or if it's a specific error
if not found_urls:
log(f"Hydrus lookup failed: {exc}", file=sys.stderr)
if found_urls:
for u in found_urls:
text = str(u).strip()
if text:
# Emit a rich object that looks like a string but carries context
# We use a dict with 'title' which ResultTable uses for display
# and 'url' which is the actual data
# We also include the source file info so downstream cmdlets can use it
# Create a result object that mimics the structure expected by delete-url
# delete-url expects a file object usually, but here we are emitting URLs.
# If we emit a dict with 'url' and 'source_file', delete-url can use it.
rich_result = {
"title": text, # Display as just the URL
"url": text,
"source_file": result, # Pass the original file context
"file_path": get_field(result, "file_path") or get_field(result, "path"),
"hash_hex": hash_hex
}
ctx.emit(rich_result)
return 0
if not hash_hex and not file_path:
log("Selected result does not include a file path or Hydrus hash", file=sys.stderr)
return 1
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Get url for file via hash+store backend."""
parsed = parse_cmdlet_args(args, self)
ctx.emit("No URLs found.")
return 0
# Extract hash and store from result or args
file_hash = parsed.get("hash") or get_field(result, "hash")
store_name = parsed.get("store") or get_field(result, "store")
if not file_hash:
log("Error: No file hash provided")
return 1
if not store_name:
log("Error: No store name provided")
return 1
# Normalize hash
file_hash = normalize_hash(file_hash)
if not file_hash:
log("Error: Invalid hash format")
return 1
# Get backend and retrieve url
try:
storage = FileStorage(config)
backend = storage[store_name]
url = backend.get_url(file_hash)
if url:
for url in url:
# Emit rich object for pipeline compatibility
ctx.emit({
"url": url,
"hash": file_hash,
"store": store_name,
})
return 0
else:
ctx.emit("No url found")
return 0
except KeyError:
log(f"Error: Storage backend '{store_name}' not configured")
return 1
except Exception as exc:
log(f"Error retrieving url: {exc}", file=sys.stderr)
return 1
# Register cmdlet
register(["get-url", "get_url"])(Get_Url)

View File

@@ -6,7 +6,7 @@ CMDLET = Cmdlet(
name=".config",
summary="Manage configuration settings",
usage=".config [key] [value]",
args=[
arg=[
CmdletArg(
name="key",
description="Configuration key to update (dot-separated)",

View File

@@ -42,16 +42,14 @@ from ._shared import (
normalize_result_input,
get_pipe_object_path,
get_pipe_object_hash,
should_show_help,
get_field,
)
import models
import pipeline as ctx
def _get_item_value(item: Any, key: str, default: Any = None) -> Any:
"""Helper to read either dict keys or attributes."""
if isinstance(item, dict):
return item.get(key, default)
return getattr(item, key, default)
@@ -60,12 +58,9 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Merge multiple files into one."""
# Parse help
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
# Parse arguments
parsed = parse_cmdlet_args(args, CMDLET)
@@ -102,7 +97,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
source_files: List[Path] = []
source_tags_files: List[Path] = []
source_hashes: List[str] = []
source_urls: List[str] = []
source_url: List[str] = []
source_tags: List[str] = [] # NEW: collect tags from source files
source_relationships: List[str] = [] # NEW: collect relationships from source files
@@ -146,7 +141,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
if tags_file.exists():
source_tags_files.append(tags_file)
# Try to read hash, tags, urls, and relationships from .tags sidecar file
# Try to read hash, tags, url, and relationships from .tags sidecar file
try:
tags_content = tags_file.read_text(encoding='utf-8')
for line in tags_content.split('\n'):
@@ -157,18 +152,18 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
hash_value = line[5:].strip()
if hash_value:
source_hashes.append(hash_value)
elif line.startswith('known_url:') or line.startswith('url:'):
# Extract URLs from tags file
elif line.startswith('url:') or line.startswith('url:'):
# Extract url from tags file
url_value = line.split(':', 1)[1].strip() if ':' in line else ''
if url_value and url_value not in source_urls:
source_urls.append(url_value)
if url_value and url_value not in source_url:
source_url.append(url_value)
elif line.startswith('relationship:'):
# Extract relationships from tags file
rel_value = line.split(':', 1)[1].strip() if ':' in line else ''
if rel_value and rel_value not in source_relationships:
source_relationships.append(rel_value)
else:
# Collect actual tags (not metadata like hash: or known_url:)
# Collect actual tags (not metadata like hash: or url:)
source_tags.append(line)
except Exception:
pass
@@ -178,14 +173,14 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
if hash_value and hash_value not in source_hashes:
source_hashes.append(str(hash_value))
# Extract known URLs if available
known_urls = _get_item_value(item, 'known_urls', [])
if isinstance(known_urls, str):
source_urls.append(known_urls)
elif isinstance(known_urls, list):
source_urls.extend(known_urls)
# Extract known url if available
url = get_field(item, 'url', [])
if isinstance(url, str):
source_url.append(url)
elif isinstance(url, list):
source_url.extend(url)
else:
title = _get_item_value(item, 'title', 'unknown') or _get_item_value(item, 'id', 'unknown')
title = get_field(item, 'title', 'unknown') or get_field(item, 'id', 'unknown')
log(f"Warning: Could not locate file for item: {title}", file=sys.stderr)
if len(source_files) < 2:
@@ -279,8 +274,8 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
if HAS_METADATA_API and write_tags_to_file:
# Use unified API for file writing
source_hashes_list = source_hashes if source_hashes else None
source_urls_list = source_urls if source_urls else None
write_tags_to_file(tags_path, merged_tags, source_hashes_list, source_urls_list)
source_url_list = source_url if source_url else None
write_tags_to_file(tags_path, merged_tags, source_hashes_list, source_url_list)
else:
# Fallback: manual file writing
tags_lines = []
@@ -292,10 +287,10 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Add regular tags
tags_lines.extend(merged_tags)
# Add known URLs
if source_urls:
for url in source_urls:
tags_lines.append(f"known_url:{url}")
# Add known url
if source_url:
for url in source_url:
tags_lines.append(f"url:{url}")
# Add relationships (if available)
if source_relationships:
@@ -309,7 +304,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Also create .metadata file using centralized function
try:
write_metadata(output_path, source_hashes[0] if source_hashes else None, source_urls, source_relationships)
write_metadata(output_path, source_hashes[0] if source_hashes else None, source_url, source_relationships)
log(f"Created metadata: {output_path.name}.metadata", file=sys.stderr)
except Exception as e:
log(f"Warning: Could not create metadata file: {e}", file=sys.stderr)
@@ -325,12 +320,12 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
except ImportError:
# Fallback: create a simple object with the required attributes
class SimpleItem:
def __init__(self, target, title, media_kind, tags=None, known_urls=None):
def __init__(self, target, title, media_kind, tags=None, url=None):
self.target = target
self.title = title
self.media_kind = media_kind
self.tags = tags or []
self.known_urls = known_urls or []
self.url = url or []
self.origin = "local" # Ensure origin is set for add-file
PipelineItem = SimpleItem
@@ -339,7 +334,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
title=output_path.stem,
media_kind=file_kind,
tags=merged_tags, # Include merged tags
known_urls=source_urls # Include known URLs
url=source_url # Include known url
)
# Clear previous results to ensure only the merged file is passed down
ctx.clear_last_result()
@@ -904,12 +899,12 @@ CMDLET = Cmdlet(
name="merge-file",
summary="Merge multiple files into a single output file. Supports audio, video, PDF, and text merging with optional cleanup.",
usage="merge-file [-delete] [-output <path>] [-format <auto|mp3|aac|opus|mp4|mkv|pdf|txt>]",
args=[
arg=[
CmdletArg("-delete", type="flag", description="Delete source files after successful merge."),
CmdletArg("-output", description="Override output file path."),
CmdletArg("-format", description="Output format (auto/mp3/aac/opus/mp4/mkv/pdf/txt). Default: auto-detect from first file."),
],
details=[
detail=[
"- Pipe multiple files: search-file query | [1,2,3] | merge-file",
"- Audio files merge with minimal quality loss using specified codec.",
"- Video files merge into MP4 or MKV containers.",

View File

@@ -1,4 +1,4 @@
"""Screen-shot cmdlet for capturing screenshots of URLs in a pipeline.
"""Screen-shot cmdlet for capturing screenshots of url in a pipeline.
This cmdlet processes files through the pipeline and creates screenshots using
Playwright, marking them as temporary artifacts for cleanup.
@@ -23,7 +23,7 @@ from helper.http_client import HTTPClient
from helper.utils import ensure_directory, unique_path, unique_preserve_order
from . import register
from ._shared import Cmdlet, CmdletArg, SharedArgs, create_pipe_object_result, normalize_result_input
from ._shared import Cmdlet, CmdletArg, SharedArgs, create_pipe_object_result, normalize_result_input, should_show_help, get_field
import models
import pipeline as pipeline_context
@@ -113,8 +113,8 @@ class ScreenshotError(RuntimeError):
class ScreenshotOptions:
"""Options controlling screenshot capture and post-processing."""
url: str
output_dir: Path
url: Sequence[str] = ()
output_path: Optional[Path] = None
full_page: bool = True
headless: bool = True
@@ -124,7 +124,7 @@ class ScreenshotOptions:
tags: Sequence[str] = ()
archive: bool = False
archive_timeout: float = ARCHIVE_TIMEOUT
known_urls: Sequence[str] = ()
url: Sequence[str] = ()
output_format: Optional[str] = None
prefer_platform_target: bool = False
target_selectors: Optional[Sequence[str]] = None
@@ -136,10 +136,9 @@ class ScreenshotResult:
"""Details about the captured screenshot."""
path: Path
url: str
tags_applied: List[str]
archive_urls: List[str]
known_urls: List[str]
archive_url: List[str]
url: List[str]
warnings: List[str] = field(default_factory=list)
@@ -471,24 +470,24 @@ def _capture_screenshot(options: ScreenshotOptions) -> ScreenshotResult:
warnings: List[str] = []
_capture(options, destination, warnings)
known_urls = unique_preserve_order([options.url, *options.known_urls])
archive_urls: List[str] = []
# Build URL list from provided options.url (sequence) and deduplicate
url = unique_preserve_order(list(options.url))
archive_url: List[str] = []
if options.archive:
debug(f"[_capture_screenshot] Archiving enabled for {options.url}")
archives, archive_warnings = _archive_url(options.url, options.archive_timeout)
archive_urls.extend(archives)
archive_url.extend(archives)
warnings.extend(archive_warnings)
if archives:
known_urls = unique_preserve_order([*known_urls, *archives])
url = unique_preserve_order([*url, *archives])
applied_tags = unique_preserve_order(list(tag for tag in options.tags if tag.strip()))
return ScreenshotResult(
path=destination,
url=options.url,
tags_applied=applied_tags,
archive_urls=archive_urls,
known_urls=known_urls,
archive_url=archive_url,
url=url,
warnings=warnings,
)
@@ -498,10 +497,10 @@ def _capture_screenshot(options: ScreenshotOptions) -> ScreenshotResult:
# ============================================================================
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Take screenshots of URLs in the pipeline.
"""Take screenshots of url in the pipeline.
Accepts:
- Single result object (dict or PipeObject) with 'file_path' field
- Single result object (dict or PipeObject) with 'path' field
- List of result objects to screenshot each
- Direct URL as string
@@ -518,12 +517,9 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
debug(f"[_run] screen-shot invoked with args: {args}")
# Help check
try:
if any(str(a).lower() in {"-?", "/?", "--help", "-h", "help", "--cmdlet"} for a in args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
except Exception:
pass
if should_show_help(args):
log(json.dumps(CMDLET, ensure_ascii=False, indent=2))
return 0
# ========================================================================
# ARGUMENT PARSING
@@ -539,36 +535,36 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Positional URL argument (if provided)
url_arg = parsed.get("url")
positional_urls = [str(url_arg)] if url_arg else []
positional_url = [str(url_arg)] if url_arg else []
# ========================================================================
# INPUT PROCESSING - Extract URLs from pipeline or command arguments
# INPUT PROCESSING - Extract url from pipeline or command arguments
# ========================================================================
piped_results = normalize_result_input(result)
urls_to_process = []
url_to_process = []
# Extract URLs from piped results
# Extract url from piped results
if piped_results:
for item in piped_results:
url = None
if isinstance(item, dict):
url = item.get('file_path') or item.get('path') or item.get('url') or item.get('target')
else:
url = getattr(item, 'file_path', None) or getattr(item, 'path', None) or getattr(item, 'url', None) or getattr(item, 'target', None)
url = (
get_field(item, 'path')
or get_field(item, 'url')
or get_field(item, 'target')
)
if url:
urls_to_process.append(str(url))
url_to_process.append(str(url))
# Use positional arguments if no pipeline input
if not urls_to_process and positional_urls:
urls_to_process = positional_urls
if not url_to_process and positional_url:
url_to_process = positional_url
if not urls_to_process:
log(f"No URLs to process for screen-shot cmdlet", file=sys.stderr)
if not url_to_process:
log(f"No url to process for screen-shot cmdlet", file=sys.stderr)
return 1
debug(f"[_run] URLs to process: {urls_to_process}")
debug(f"[_run] url to process: {url_to_process}")
# ========================================================================
# OUTPUT DIRECTORY RESOLUTION - Priority chain
@@ -619,10 +615,10 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
all_emitted = []
exit_code = 0
# ========================================================================
# PROCESS URLs AND CAPTURE SCREENSHOTS
# PROCESS url AND CAPTURE SCREENSHOTS
# ========================================================================
for url in urls_to_process:
for url in url_to_process:
# Validate URL format
if not url.lower().startswith(("http://", "https://", "file://")):
log(f"[screen_shot] Skipping non-URL input: {url}", file=sys.stderr)
@@ -631,7 +627,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
try:
# Create screenshot with provided options
options = ScreenshotOptions(
url=url,
url=[url],
output_dir=screenshot_dir,
output_format=format_name,
archive=archive_enabled,
@@ -645,8 +641,8 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Log results and warnings
log(f"Screenshot captured to {screenshot_result.path}", flush=True)
if screenshot_result.archive_urls:
log(f"Archives: {', '.join(screenshot_result.archive_urls)}", flush=True)
if screenshot_result.archive_url:
log(f"Archives: {', '.join(screenshot_result.archive_url)}", flush=True)
for warning in screenshot_result.warnings:
log(f"Warning: {warning}", flush=True)
@@ -670,8 +666,8 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
parent_hash=hashlib.sha256(url.encode()).hexdigest(),
extra={
'source_url': url,
'archive_urls': screenshot_result.archive_urls,
'known_urls': screenshot_result.known_urls,
'archive_url': screenshot_result.archive_url,
'url': screenshot_result.url,
'target': str(screenshot_result.path), # Explicit target for add-file
}
)
@@ -701,16 +697,16 @@ CMDLET = Cmdlet(
name="screen-shot",
summary="Capture a screenshot of a URL or file and mark as temporary artifact",
usage="screen-shot <url> [options] or download-data <url> | screen-shot [options]",
aliases=["screenshot", "ss"],
args=[
alias=["screenshot", "ss"],
arg=[
CmdletArg(name="url", type="string", required=False, description="URL to screenshot (or from pipeline)"),
CmdletArg(name="format", type="string", description="Output format: png, jpeg, or pdf"),
CmdletArg(name="selector", type="string", description="CSS selector for element capture"),
SharedArgs.ARCHIVE, # Use shared archive argument
SharedArgs.STORAGE, # Use shared storage argument
SharedArgs.STORE, # Use shared storage argument
],
details=[
"Take screenshots of URLs with optional archiving and element targeting.",
detail=[
"Take screenshots of url with optional archiving and element targeting.",
"Screenshots are marked as temporary artifacts for cleanup by the cleanup cmdlet.",
"",
"Arguments:",

View File

@@ -1,531 +0,0 @@
"""Search-file cmdlet: Search for files by query, tags, size, type, duration, etc."""
from __future__ import annotations
from typing import Any, Dict, Sequence, List, Optional, Tuple, Callable
from fnmatch import fnmatchcase
from pathlib import Path
from dataclasses import dataclass, field
from collections import OrderedDict
import re
import json
import os
import sys
from helper.logger import log, debug
import shutil
import subprocess
from helper.file_storage import FileStorage
from helper.search_provider import get_provider, list_providers, SearchResult
from metadata import import_pending_sidecars
from . import register
from ._shared import Cmdlet, CmdletArg
import models
import pipeline as ctx
# Optional dependencies
try:
import mutagen # type: ignore
except ImportError: # pragma: no cover
mutagen = None # type: ignore
try:
from config import get_hydrus_url, resolve_output_dir
except Exception: # pragma: no cover
get_hydrus_url = None # type: ignore
resolve_output_dir = None # type: ignore
try:
from helper.hydrus import HydrusClient, HydrusRequestError
except ImportError: # pragma: no cover
HydrusClient = None # type: ignore
HydrusRequestError = RuntimeError # type: ignore
try:
from helper.utils import sha256_file
except ImportError: # pragma: no cover
sha256_file = None # type: ignore
try:
from helper.utils_constant import mime_maps
except ImportError: # pragma: no cover
mime_maps = {} # type: ignore
# ============================================================================
# Data Classes (from helper/search.py)
# ============================================================================
@dataclass(slots=True)
class SearchRecord:
path: str
size_bytes: int | None = None
duration_seconds: str | None = None
tags: str | None = None
hash_hex: str | None = None
def as_dict(self) -> dict[str, str]:
payload: dict[str, str] = {"path": self.path}
if self.size_bytes is not None:
payload["size"] = str(self.size_bytes)
if self.duration_seconds:
payload["duration"] = self.duration_seconds
if self.tags:
payload["tags"] = self.tags
if self.hash_hex:
payload["hash"] = self.hash_hex
return payload
@dataclass
class ResultItem:
origin: str
title: str
detail: str
annotations: List[str]
target: str
media_kind: str = "other"
hash_hex: Optional[str] = None
columns: List[tuple[str, str]] = field(default_factory=list)
tag_summary: Optional[str] = None
duration_seconds: Optional[float] = None
size_bytes: Optional[int] = None
full_metadata: Optional[Dict[str, Any]] = None
tags: Optional[set[str]] = field(default_factory=set)
relationships: Optional[List[str]] = field(default_factory=list)
known_urls: Optional[List[str]] = field(default_factory=list)
def to_dict(self) -> Dict[str, Any]:
payload: Dict[str, Any] = {
"title": self.title,
}
# Always include these core fields for downstream cmdlets (get-file, download-data, etc)
payload["origin"] = self.origin
payload["target"] = self.target
payload["media_kind"] = self.media_kind
# Always include full_metadata if present (needed by download-data, etc)
# This is NOT for display, but for downstream processing
if self.full_metadata:
payload["full_metadata"] = self.full_metadata
# Include columns if defined (result renderer will use these for display)
if self.columns:
payload["columns"] = list(self.columns)
else:
# If no columns, include the detail for backwards compatibility
payload["detail"] = self.detail
payload["annotations"] = list(self.annotations)
# Include optional fields
if self.hash_hex:
payload["hash"] = self.hash_hex
if self.tag_summary:
payload["tags"] = self.tag_summary
if self.tags:
payload["tags_set"] = list(self.tags)
if self.relationships:
payload["relationships"] = self.relationships
if self.known_urls:
payload["known_urls"] = self.known_urls
return payload
STORAGE_ORIGINS = {"local", "hydrus", "debrid"}
def _normalize_extension(ext_value: Any) -> str:
"""Sanitize extension strings to alphanumerics and cap at 5 chars."""
ext = str(ext_value or "").strip().lstrip(".")
# Stop at common separators to avoid dragging status text into the extension
for sep in (" ", "|", "(", "[", "{", ",", ";"):
if sep in ext:
ext = ext.split(sep, 1)[0]
break
# If there are multiple dots, take the last token as the extension
if "." in ext:
ext = ext.split(".")[-1]
# Keep only alphanumeric characters and enforce max length
ext = "".join(ch for ch in ext if ch.isalnum())
return ext[:5]
def _ensure_storage_columns(payload: Dict[str, Any]) -> Dict[str, Any]:
"""Attach Title/Store columns for storage-origin results to keep CLI display compact."""
origin_value = str(payload.get("origin") or payload.get("source") or "").lower()
if origin_value not in STORAGE_ORIGINS:
return payload
title = payload.get("title") or payload.get("name") or payload.get("target") or payload.get("path") or "Result"
store_label = payload.get("origin") or payload.get("source") or origin_value
# Handle extension
extension = _normalize_extension(payload.get("ext", ""))
if not extension and title:
path_obj = Path(str(title))
if path_obj.suffix:
extension = _normalize_extension(path_obj.suffix.lstrip('.'))
title = path_obj.stem
# Handle size as integer MB (header will include units)
size_val = payload.get("size") or payload.get("size_bytes")
size_str = ""
if size_val is not None:
try:
size_bytes = int(size_val)
size_mb = int(size_bytes / (1024 * 1024))
size_str = str(size_mb)
except (ValueError, TypeError):
size_str = str(size_val)
normalized = dict(payload)
normalized["columns"] = [
("Title", str(title)),
("Ext", str(extension)),
("Store", str(store_label)),
("Size(Mb)", str(size_str)),
]
return normalized
CMDLET = Cmdlet(
name="search-file",
summary="Unified search cmdlet for storage (Hydrus, Local) and providers (Debrid, LibGen, OpenLibrary, Soulseek).",
usage="search-file [query] [-tag TAG] [-size >100MB|<50MB] [-type audio|video|image] [-duration >10:00] [-storage BACKEND] [-provider PROVIDER]",
args=[
CmdletArg("query", description="Search query string"),
CmdletArg("tag", description="Filter by tag (can be used multiple times)"),
CmdletArg("size", description="Filter by size: >100MB, <50MB, =10MB"),
CmdletArg("type", description="Filter by type: audio, video, image, document"),
CmdletArg("duration", description="Filter by duration: >10:00, <1:30:00"),
CmdletArg("limit", type="integer", description="Limit results (default: 45)"),
CmdletArg("storage", description="Search storage backend: hydrus, local (default: all searchable storages)"),
CmdletArg("provider", description="Search provider: libgen, openlibrary, soulseek, debrid, local (overrides -storage)"),
],
details=[
"Search across storage (Hydrus, Local) and providers (Debrid, LibGen, OpenLibrary, Soulseek)",
"Use -provider to search a specific source, or -storage to search file backends",
"Filter results by: tag, size, type, duration",
"Results can be piped to other commands",
"Examples:",
"search-file foo # Search all file backends",
"search-file -provider libgen 'python programming' # Search LibGen books",
"search-file -provider debrid 'movie' # Search AllDebrid magnets",
"search-file 'music' -provider soulseek # Search Soulseek P2P",
"search-file -provider openlibrary 'tolkien' # Search OpenLibrary",
"search-file song -storage hydrus -type audio # Search only Hydrus audio",
"search-file movie -tag action -provider debrid # Debrid with filters",
],
)
@register(["search-file", "search"])
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Search across multiple providers: Hydrus, Local, Debrid, LibGen, etc."""
args_list = [str(arg) for arg in (args or [])]
# Parse arguments
query = ""
tag_filters: List[str] = []
size_filter: Optional[Tuple[str, int]] = None
duration_filter: Optional[Tuple[str, float]] = None
type_filter: Optional[str] = None
storage_backend: Optional[str] = None
provider_name: Optional[str] = None
limit = 45
searched_backends: List[str] = []
# Simple argument parsing
i = 0
while i < len(args_list):
arg = args_list[i]
low = arg.lower()
if low in {"-provider", "--provider"} and i + 1 < len(args_list):
provider_name = args_list[i + 1].lower()
i += 2
elif low in {"-storage", "--storage"} and i + 1 < len(args_list):
storage_backend = args_list[i + 1].lower()
i += 2
elif low in {"-tag", "--tag"} and i + 1 < len(args_list):
tag_filters.append(args_list[i + 1])
i += 2
elif low in {"-limit", "--limit"} and i + 1 < len(args_list):
try:
limit = int(args_list[i + 1])
except ValueError:
limit = 100
i += 2
elif low in {"-type", "--type"} and i + 1 < len(args_list):
type_filter = args_list[i + 1].lower()
i += 2
elif not arg.startswith("-"):
if query:
query += " " + arg
else:
query = arg
i += 1
else:
i += 1
# Extract store: filter tokens (works with commas or whitespace) and clean query for backends
store_filter: Optional[str] = None
if query:
match = re.search(r"\bstore:([^\s,]+)", query, flags=re.IGNORECASE)
if match:
store_filter = match.group(1).strip().lower() or None
# Remove any store: tokens so downstream backends see only the actual query
query = re.sub(r"\s*[,]?\s*store:[^\s,]+", " ", query, flags=re.IGNORECASE)
query = re.sub(r"\s{2,}", " ", query)
query = query.strip().strip(',')
# Debrid is provider-only now
if storage_backend and storage_backend.lower() == "debrid":
log("Use -provider debrid instead of -storage debrid (debrid is provider-only)", file=sys.stderr)
return 1
# If store: was provided without explicit -storage/-provider, prefer that backend
if store_filter and not provider_name and not storage_backend:
if store_filter in {"hydrus", "local", "debrid"}:
storage_backend = store_filter
# Handle piped input (e.g. from @N selection) if query is empty
if not query and result:
# If result is a list, take the first item
actual_result = result[0] if isinstance(result, list) and result else result
# Helper to get field
def get_field(obj: Any, field: str) -> Any:
return getattr(obj, field, None) or (obj.get(field) if isinstance(obj, dict) else None)
origin = get_field(actual_result, 'origin')
target = get_field(actual_result, 'target')
# Special handling for Bandcamp artist/album drill-down
if origin == 'bandcamp' and target:
query = target
if not provider_name:
provider_name = 'bandcamp'
# Generic URL handling
elif target and str(target).startswith(('http://', 'https://')):
query = target
# Try to infer provider from URL if not set
if not provider_name:
if 'bandcamp.com' in target:
provider_name = 'bandcamp'
elif 'youtube.com' in target or 'youtu.be' in target:
provider_name = 'youtube'
if not query:
log("Provide a search query", file=sys.stderr)
return 1
# Initialize worker for this search command
from helper.local_library import LocalLibraryDB
from config import get_local_storage_path
import uuid
worker_id = str(uuid.uuid4())
library_root = get_local_storage_path(config or {})
if not library_root:
log("No library root configured", file=sys.stderr)
return 1
db = None
try:
db = LocalLibraryDB(library_root)
db.insert_worker(
worker_id,
"search",
title=f"Search: {query}",
description=f"Query: {query}",
pipe=ctx.get_current_command_text()
)
results_list = []
import result_table
import importlib
importlib.reload(result_table)
from result_table import ResultTable
# Create ResultTable for display
table_title = f"Search: {query}"
if provider_name:
table_title += f" [{provider_name}]"
elif storage_backend:
table_title += f" [{storage_backend}]"
table = ResultTable(table_title)
table.set_source_command("search-file", args_list)
# Try to search using provider (libgen, soulseek, debrid, openlibrary)
if provider_name:
debug(f"[search_file] Attempting provider search with: {provider_name}")
provider = get_provider(provider_name, config)
if not provider:
log(f"Provider '{provider_name}' not available", file=sys.stderr)
db.update_worker_status(worker_id, 'error')
return 1
debug(f"[search_file] Provider loaded, calling search with query: {query}")
search_result = provider.search(query, limit=limit)
debug(f"[search_file] Provider search returned {len(search_result)} results")
for item in search_result:
# Add to table
table.add_result(item)
# Emit to pipeline
item_dict = item.to_dict()
results_list.append(item_dict)
ctx.emit(item_dict)
# Set the result table in context for TUI/CLI display
ctx.set_last_result_table(table, results_list)
debug(f"[search_file] Emitted {len(results_list)} results")
# Write results to worker stdout
db.append_worker_stdout(worker_id, json.dumps(results_list, indent=2))
db.update_worker_status(worker_id, 'completed')
return 0
# Otherwise search using storage backends (Hydrus, Local)
from helper.file_storage import FileStorage
storage = FileStorage(config=config or {})
backend_to_search = storage_backend or None
if backend_to_search:
# Check if requested backend is available
if backend_to_search == "hydrus":
from helper.hydrus import is_hydrus_available
if not is_hydrus_available(config or {}):
log(f"Backend 'hydrus' is not available (Hydrus service not running)", file=sys.stderr)
db.update_worker_status(worker_id, 'error')
return 1
searched_backends.append(backend_to_search)
if not storage.supports_search(backend_to_search):
log(f"Backend '{backend_to_search}' does not support searching", file=sys.stderr)
db.update_worker_status(worker_id, 'error')
return 1
results = storage[backend_to_search].search(query, limit=limit)
else:
# Search all searchable backends, but skip hydrus if unavailable
from helper.hydrus import is_hydrus_available
hydrus_available = is_hydrus_available(config or {})
all_results = []
for backend_name in storage.list_searchable_backends():
# Skip hydrus if not available
if backend_name == "hydrus" and not hydrus_available:
continue
searched_backends.append(backend_name)
try:
backend_results = storage[backend_name].search(query, limit=limit - len(all_results))
if backend_results:
all_results.extend(backend_results)
if len(all_results) >= limit:
break
except Exception as exc:
log(f"Backend {backend_name} search failed: {exc}", file=sys.stderr)
results = all_results[:limit]
# Also query Debrid provider by default (provider-only, but keep legacy coverage when no explicit provider given)
if not provider_name and not storage_backend:
try:
debrid_provider = get_provider("debrid", config)
if debrid_provider and debrid_provider.validate():
remaining = max(0, limit - len(results)) if isinstance(results, list) else limit
if remaining > 0:
debrid_results = debrid_provider.search(query, limit=remaining)
if debrid_results:
if "debrid" not in searched_backends:
searched_backends.append("debrid")
if results is None:
results = []
results.extend(debrid_results)
except Exception as exc:
log(f"Debrid provider search failed: {exc}", file=sys.stderr)
def _format_storage_label(name: str) -> str:
clean = str(name or "").strip()
if not clean:
return "Unknown"
return clean.replace("_", " ").title()
storage_counts: OrderedDict[str, int] = OrderedDict((name, 0) for name in searched_backends)
for item in results or []:
origin = getattr(item, 'origin', None)
if origin is None and isinstance(item, dict):
origin = item.get('origin') or item.get('source')
if not origin:
continue
key = str(origin).lower()
if key not in storage_counts:
storage_counts[key] = 0
storage_counts[key] += 1
if storage_counts or query:
display_counts = OrderedDict((_format_storage_label(name), count) for name, count in storage_counts.items())
summary_line = table.set_storage_summary(display_counts, query, inline=True)
if summary_line:
table.title = summary_line
# Emit results and collect for workers table
if results:
for item in results:
def _as_dict(obj: Any) -> Dict[str, Any]:
if isinstance(obj, dict):
return dict(obj)
if hasattr(obj, "to_dict") and callable(getattr(obj, "to_dict")):
return obj.to_dict() # type: ignore[arg-type]
return {"title": str(obj)}
item_dict = _as_dict(item)
if store_filter:
origin_val = str(item_dict.get("origin") or item_dict.get("source") or "").lower()
if store_filter != origin_val:
continue
normalized = _ensure_storage_columns(item_dict)
# Add to table using normalized columns to avoid extra fields (e.g., Tags/Name)
table.add_result(normalized)
results_list.append(normalized)
ctx.emit(normalized)
# Set the result table in context for TUI/CLI display
ctx.set_last_result_table(table, results_list)
# Write results to worker stdout
db.append_worker_stdout(worker_id, json.dumps(results_list, indent=2))
else:
log("No results found", file=sys.stderr)
db.append_worker_stdout(worker_id, json.dumps([], indent=2))
db.update_worker_status(worker_id, 'completed')
return 0
except Exception as exc:
log(f"Search failed: {exc}", file=sys.stderr)
import traceback
traceback.print_exc(file=sys.stderr)
if db:
try:
db.update_worker_status(worker_id, 'error')
except Exception:
pass
return 1
finally:
# Always close the database connection
if db:
try:
db.close()
except Exception:
pass

117
cmdlets/search_provider.py Normal file
View File

@@ -0,0 +1,117 @@
"""search-provider cmdlet: Search external providers (bandcamp, libgen, soulseek, youtube)."""
from __future__ import annotations
from typing import Any, Dict, List, Sequence
import sys
from helper.logger import log, debug
from helper.provider import get_search_provider, list_search_providers
from ._shared import Cmdlet, CmdletArg, should_show_help
import pipeline as ctx
class Search_Provider(Cmdlet):
"""Search external content providers."""
def __init__(self):
super().__init__(
name="search-provider",
summary="Search external providers (bandcamp, libgen, soulseek, youtube)",
usage="search-provider <provider> <query> [-limit N]",
arg=[
CmdletArg("provider", type="string", required=True, description="Provider name: bandcamp, libgen, soulseek, youtube"),
CmdletArg("query", type="string", required=True, description="Search query (supports provider-specific syntax)"),
CmdletArg("limit", type="int", description="Maximum results to return (default: 50)"),
],
detail=[
"Search external content providers:",
"- bandcamp: Search for music albums/tracks",
" Example: search-provider bandcamp \"artist:altrusian grace\"",
"- libgen: Search Library Genesis for books",
" Example: search-provider libgen \"python programming\"",
"- soulseek: Search P2P network for music",
" Example: search-provider soulseek \"pink floyd\"",
"- youtube: Search YouTube for videos",
" Example: search-provider youtube \"tutorial\"",
"",
"Query syntax:",
"- bandcamp: Use 'artist:Name' to search by artist",
"- libgen: Supports isbn:, author:, title: prefixes",
"- soulseek: Plain text search",
"- youtube: Plain text search",
"",
"Results can be piped to other cmdlets:",
" search-provider bandcamp \"artist:grace\" | @1 | download-data",
],
exec=self.run
)
self.register()
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Execute search-provider cmdlet."""
if should_show_help(args):
ctx.emit(self.__dict__)
return 0
# Parse arguments
if len(args) < 2:
log("Error: search-provider requires <provider> and <query> arguments", file=sys.stderr)
log(f"Usage: {self.usage}", file=sys.stderr)
log("Available providers:", file=sys.stderr)
providers = list_search_providers(config)
for name, available in sorted(providers.items()):
status = "" if available else ""
log(f" {status} {name}", file=sys.stderr)
return 1
provider_name = args[0]
query = args[1]
# Parse optional limit
limit = 50
if len(args) >= 4 and args[2] in ("-limit", "--limit"):
try:
limit = int(args[3])
except ValueError:
log(f"Warning: Invalid limit value '{args[3]}', using default 50", file=sys.stderr)
debug(f"[search-provider] provider={provider_name}, query={query}, limit={limit}")
# Get provider
provider = get_search_provider(provider_name, config)
if not provider:
log(f"Error: Provider '{provider_name}' is not available", file=sys.stderr)
log("Available providers:", file=sys.stderr)
providers = list_search_providers(config)
for name, available in sorted(providers.items()):
if available:
log(f" - {name}", file=sys.stderr)
return 1
# Execute search
try:
debug(f"[search-provider] Calling {provider_name}.search()")
results = provider.search(query, limit=limit)
debug(f"[search-provider] Got {len(results)} results")
if not results:
log(f"No results found for query: {query}", file=sys.stderr)
return 0
# Emit results for pipeline
for search_result in results:
ctx.emit(search_result.to_dict())
log(f"Found {len(results)} result(s) from {provider_name}", file=sys.stderr)
return 0
except Exception as e:
log(f"Error searching {provider_name}: {e}", file=sys.stderr)
import traceback
debug(traceback.format_exc())
return 1
# Register cmdlet instance
Search_Provider_Instance = Search_Provider()

341
cmdlets/search_store.py Normal file
View File

@@ -0,0 +1,341 @@
"""Search-store cmdlet: Search for files in storage backends (Folder, Hydrus)."""
from __future__ import annotations
from typing import Any, Dict, Sequence, List, Optional, Tuple
from pathlib import Path
from dataclasses import dataclass, field
from collections import OrderedDict
import re
import json
import sys
from helper.logger import log, debug
from ._shared import Cmdlet, CmdletArg, get_origin, get_field, should_show_help
import pipeline as ctx
# Optional dependencies
try:
import mutagen # type: ignore
except ImportError: # pragma: no cover
mutagen = None # type: ignore
try:
from config import get_hydrus_url, resolve_output_dir
except Exception: # pragma: no cover
get_hydrus_url = None # type: ignore
resolve_output_dir = None # type: ignore
try:
from helper.hydrus import HydrusClient, HydrusRequestError
except ImportError: # pragma: no cover
HydrusClient = None # type: ignore
HydrusRequestError = RuntimeError # type: ignore
try:
from helper.utils import sha256_file
except ImportError: # pragma: no cover
sha256_file = None # type: ignore
try:
from helper.utils_constant import mime_maps
except ImportError: # pragma: no cover
mime_maps = {} # type: ignore
@dataclass(slots=True)
class SearchRecord:
path: str
size_bytes: int | None = None
duration_seconds: str | None = None
tags: str | None = None
hash_hex: str | None = None
def as_dict(self) -> dict[str, str]:
payload: dict[str, str] = {"path": self.path}
if self.size_bytes is not None:
payload["size"] = str(self.size_bytes)
if self.duration_seconds:
payload["duration"] = self.duration_seconds
if self.tags:
payload["tags"] = self.tags
if self.hash_hex:
payload["hash"] = self.hash_hex
return payload
STORAGE_ORIGINS = {"local", "hydrus", "folder"}
class Search_Store(Cmdlet):
"""Class-based search-store cmdlet for searching storage backends."""
def __init__(self) -> None:
super().__init__(
name="search-store",
summary="Search storage backends (Folder, Hydrus) for files.",
usage="search-store [query] [-tag TAG] [-size >100MB|<50MB] [-type audio|video|image] [-duration >10:00] [-store BACKEND]",
arg=[
CmdletArg("query", description="Search query string"),
CmdletArg("tag", description="Filter by tag (can be used multiple times)"),
CmdletArg("size", description="Filter by size: >100MB, <50MB, =10MB"),
CmdletArg("type", description="Filter by type: audio, video, image, document"),
CmdletArg("duration", description="Filter by duration: >10:00, <1:30:00"),
CmdletArg("limit", type="integer", description="Limit results (default: 100)"),
CmdletArg("store", description="Search specific storage backend (e.g., 'home', 'test', or 'default')"),
],
detail=[
"Search across storage backends: Folder stores and Hydrus instances",
"Use -store to search a specific backend by name",
"Filter results by: tag, size, type, duration",
"Results include hash for downstream commands (get-file, add-tag, etc.)",
"Examples:",
"search-store foo # Search all storage backends",
"search-store -store home '*' # Search 'home' Hydrus instance",
"search-store -store test 'video' # Search 'test' folder store",
"search-store song -type audio # Search for audio files",
"search-store movie -tag action # Search with tag filter",
],
exec=self.run,
)
self.register()
# --- Helper methods -------------------------------------------------
@staticmethod
def _normalize_extension(ext_value: Any) -> str:
"""Sanitize extension strings to alphanumerics and cap at 5 chars."""
ext = str(ext_value or "").strip().lstrip(".")
for sep in (" ", "|", "(", "[", "{", ",", ";"):
if sep in ext:
ext = ext.split(sep, 1)[0]
break
if "." in ext:
ext = ext.split(".")[-1]
ext = "".join(ch for ch in ext if ch.isalnum())
return ext[:5]
def _ensure_storage_columns(self, payload: Dict[str, Any]) -> Dict[str, Any]:
"""Ensure storage results have the necessary fields for result_table display."""
store_value = str(get_origin(payload, "") or "").lower()
if store_value not in STORAGE_ORIGINS:
return payload
# Ensure we have title field
if "title" not in payload:
payload["title"] = payload.get("name") or payload.get("target") or payload.get("path") or "Result"
# Ensure we have ext field
if "ext" not in payload:
title = str(payload.get("title", ""))
path_obj = Path(title)
if path_obj.suffix:
payload["ext"] = self._normalize_extension(path_obj.suffix.lstrip('.'))
else:
payload["ext"] = payload.get("ext", "")
# Ensure size_bytes is present for display (already set by search_file())
# result_table will handle formatting it
# Don't create manual columns - let result_table handle display
# This allows the table to respect max_columns and apply consistent formatting
return payload
# --- Execution ------------------------------------------------------
def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Search storage backends for files."""
if should_show_help(args):
log(f"Cmdlet: {self.name}\nSummary: {self.summary}\nUsage: {self.usage}")
return 0
args_list = [str(arg) for arg in (args or [])]
# Parse arguments
query = ""
tag_filters: List[str] = []
size_filter: Optional[Tuple[str, int]] = None
duration_filter: Optional[Tuple[str, float]] = None
type_filter: Optional[str] = None
storage_backend: Optional[str] = None
limit = 100
searched_backends: List[str] = []
i = 0
while i < len(args_list):
arg = args_list[i]
low = arg.lower()
if low in {"-store", "--store", "-storage", "--storage"} and i + 1 < len(args_list):
storage_backend = args_list[i + 1]
i += 2
elif low in {"-tag", "--tag"} and i + 1 < len(args_list):
tag_filters.append(args_list[i + 1])
i += 2
elif low in {"-limit", "--limit"} and i + 1 < len(args_list):
try:
limit = int(args_list[i + 1])
except ValueError:
limit = 100
i += 2
elif low in {"-type", "--type"} and i + 1 < len(args_list):
type_filter = args_list[i + 1].lower()
i += 2
elif not arg.startswith("-"):
query = f"{query} {arg}".strip() if query else arg
i += 1
else:
i += 1
store_filter: Optional[str] = None
if query:
match = re.search(r"\bstore:([^\s,]+)", query, flags=re.IGNORECASE)
if match:
store_filter = match.group(1).strip() or None
query = re.sub(r"\s*[,]?\s*store:[^\s,]+", " ", query, flags=re.IGNORECASE)
query = re.sub(r"\s{2,}", " ", query)
query = query.strip().strip(',')
if store_filter and not storage_backend:
storage_backend = store_filter
if not query:
log("Provide a search query", file=sys.stderr)
return 1
from helper.folder_store import FolderDB
from config import get_local_storage_path
import uuid
worker_id = str(uuid.uuid4())
library_root = get_local_storage_path(config or {})
if not library_root:
log("No library root configured", file=sys.stderr)
return 1
# Use context manager to ensure database is always closed
with FolderDB(library_root) as db:
try:
db.insert_worker(
worker_id,
"search-store",
title=f"Search: {query}",
description=f"Query: {query}",
pipe=ctx.get_current_command_text()
)
results_list = []
import result_table
import importlib
importlib.reload(result_table)
from result_table import ResultTable
table_title = f"Search: {query}"
if storage_backend:
table_title += f" [{storage_backend}]"
table = ResultTable(table_title)
from helper.store import FileStorage
storage = FileStorage(config=config or {})
backend_to_search = storage_backend or None
if backend_to_search:
searched_backends.append(backend_to_search)
target_backend = storage[backend_to_search]
if not callable(getattr(target_backend, 'search_file', None)):
log(f"Backend '{backend_to_search}' does not support searching", file=sys.stderr)
db.update_worker_status(worker_id, 'error')
return 1
results = target_backend.search_file(query, limit=limit)
else:
from helper.hydrus import is_hydrus_available
hydrus_available = is_hydrus_available(config or {})
all_results = []
for backend_name in storage.list_searchable_backends():
if backend_name.startswith("hydrus") and not hydrus_available:
continue
searched_backends.append(backend_name)
try:
backend_results = storage[backend_name].search_file(query, limit=limit - len(all_results))
if backend_results:
all_results.extend(backend_results)
if len(all_results) >= limit:
break
except Exception as exc:
log(f"Backend {backend_name} search failed: {exc}", file=sys.stderr)
results = all_results[:limit]
def _format_storage_label(name: str) -> str:
clean = str(name or "").strip()
if not clean:
return "Unknown"
return clean.replace("_", " ").title()
storage_counts: OrderedDict[str, int] = OrderedDict((name, 0) for name in searched_backends)
for item in results or []:
origin = get_origin(item)
if not origin:
continue
key = str(origin).lower()
if key not in storage_counts:
storage_counts[key] = 0
storage_counts[key] += 1
if storage_counts or query:
display_counts = OrderedDict((_format_storage_label(name), count) for name, count in storage_counts.items())
summary_line = table.set_storage_summary(display_counts, query, inline=True)
if summary_line:
table.title = summary_line
if results:
for item in results:
def _as_dict(obj: Any) -> Dict[str, Any]:
if isinstance(obj, dict):
return dict(obj)
if hasattr(obj, "to_dict") and callable(getattr(obj, "to_dict")):
return obj.to_dict() # type: ignore[arg-type]
return {"title": str(obj)}
item_dict = _as_dict(item)
if store_filter:
origin_val = str(get_origin(item_dict) or "").lower()
if store_filter != origin_val:
continue
normalized = self._ensure_storage_columns(item_dict)
# Make hash/store available for downstream cmdlets without rerunning search
hash_val = normalized.get("hash")
store_val = normalized.get("store") or get_origin(item_dict)
if hash_val and not normalized.get("hash"):
normalized["hash"] = hash_val
if store_val and not normalized.get("store"):
normalized["store"] = store_val
table.add_result(normalized)
results_list.append(normalized)
ctx.emit(normalized)
# Debug: Verify table rows match items list
debug(f"[search-store] Added {len(table.rows)} rows to table, {len(results_list)} items to results_list")
if len(table.rows) != len(results_list):
debug(f"[search-store] WARNING: Table/items mismatch! rows={len(table.rows)} items={len(results_list)}", file=sys.stderr)
ctx.set_last_result_table(table, results_list)
db.append_worker_stdout(worker_id, json.dumps(results_list, indent=2))
else:
log("No results found", file=sys.stderr)
db.append_worker_stdout(worker_id, json.dumps([], indent=2))
db.update_worker_status(worker_id, 'completed')
return 0
except Exception as exc:
log(f"Search failed: {exc}", file=sys.stderr)
import traceback
traceback.print_exc(file=sys.stderr)
try:
db.update_worker_status(worker_id, 'error')
except Exception:
pass
return 1
CMDLET = Search_Store()

View File

@@ -26,12 +26,12 @@ CMDLET = Cmdlet(
name="trim-file",
summary="Trim a media file using ffmpeg.",
usage="trim-file [-path <path>] -range <start-end> [-delete]",
args=[
arg=[
CmdletArg("-path", description="Path to the file (optional if piped)."),
CmdletArg("-range", required=True, description="Time range to trim (e.g. '3:45-3:55' or '00:03:45-00:03:55')."),
CmdletArg("-delete", type="flag", description="Delete the original file after trimming."),
],
details=[
detail=[
"Creates a new file with 'clip_' prefix in the filename/title.",
"Inherits tags from the source file.",
"Adds a relationship to the source file (if hash is available).",
@@ -133,7 +133,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# If path arg provided, add it to inputs
if path_arg:
inputs.append({"file_path": path_arg})
inputs.append({"path": path_arg})
if not inputs:
log("No input files provided.", file=sys.stderr)
@@ -145,9 +145,9 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Resolve file path
file_path = None
if isinstance(item, dict):
file_path = item.get("file_path") or item.get("path") or item.get("target")
elif hasattr(item, "file_path"):
file_path = item.file_path
file_path = item.get("path") or item.get("target")
elif hasattr(item, "path"):
file_path = item.path
elif isinstance(item, str):
file_path = item
@@ -175,9 +175,9 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# 1. Get source hash for relationship
source_hash = None
if isinstance(item, dict):
source_hash = item.get("hash") or item.get("file_hash")
elif hasattr(item, "file_hash"):
source_hash = item.file_hash
source_hash = item.get("hash")
elif hasattr(item, "hash"):
source_hash = item.hash
if not source_hash:
try:
@@ -219,18 +219,18 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Update original file in local DB if possible
try:
from config import get_local_storage_path
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
storage_path = get_local_storage_path(config)
if storage_path:
with LocalLibraryDB(storage_path) as db:
with FolderDB(storage_path) as db:
# Get original file metadata
# We need to find the original file by hash or path
# Try path first
orig_meta = db.get_metadata(path_obj)
if not orig_meta and source_hash:
# Try by hash
orig_path_resolved = db.search_by_hash(source_hash)
orig_path_resolved = db.search_hash(source_hash)
if orig_path_resolved:
orig_meta = db.get_metadata(orig_path_resolved)
@@ -256,7 +256,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
orig_meta["hash"] = source_hash
# We need the path to save
save_path = Path(orig_meta.get("file_path") or path_obj)
save_path = Path(orig_meta.get("path") or path_obj)
db.save_metadata(save_path, orig_meta)
log(f"Updated relationship for original file: {save_path.name}", file=sys.stderr)
except Exception as e:
@@ -264,7 +264,6 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# 5. Construct result
result_dict = {
"file_path": str(output_path),
"path": str(output_path),
"title": new_title,
"tags": new_tags,

View File

@@ -135,10 +135,10 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
CMDLET = Cmdlet(
name=".adjective",
aliases=["adj"],
alias=["adj"],
summary="Manage adjective categories and tags",
usage=".adjective [category] [-add tag] [-delete tag]",
args=[
arg=[
CmdletArg(name="category", type="string", description="Category name", required=False),
CmdletArg(name="tag", type="string", description="Tag name", required=False),
CmdletArg(name="add", type="flag", description="Add tag"),

183
cmdnats/help.py Normal file
View File

@@ -0,0 +1,183 @@
from __future__ import annotations
from typing import Any, Dict, Sequence, List, Optional
import shlex
import sys
from cmdlets._shared import Cmdlet, CmdletArg, parse_cmdlet_args
from helper.logger import log
from result_table import ResultTable
import pipeline as ctx
def _normalize_choice_list(arg_names: Optional[List[str]]) -> List[str]:
return sorted(set(arg_names or []))
def _examples_for_cmd(name: str) -> List[str]:
"""Return example invocations for a given command (best-effort)."""
lookup = {
".adjective": [
'.adjective -add "example"',
'.adjective -delete "example"',
],
}
key = name.replace("_", "-").lower()
return lookup.get(key, [])
def _find_cmd_metadata(name: str, metadata: Dict[str, Dict[str, Any]]) -> Optional[Dict[str, Any]]:
target = name.replace("_", "-").lower()
for cmd_name, meta in metadata.items():
if target == cmd_name:
return meta
aliases = meta.get("aliases", []) or []
if target in aliases:
return meta
return None
def _render_list(metadata: Dict[str, Dict[str, Any]], filter_text: Optional[str], args: Sequence[str]) -> None:
table = ResultTable("Help")
table.set_source_command(".help", list(args))
items: List[Dict[str, Any]] = []
needle = (filter_text or "").lower().strip()
for name in sorted(metadata.keys()):
meta = metadata[name]
summary = meta.get("summary", "") or ""
if needle and needle not in name.lower() and needle not in summary.lower():
continue
row = table.add_row()
row.add_column("Cmd", name)
aliases = ", ".join(meta.get("aliases", []) or [])
row.add_column("Aliases", aliases)
arg_names = [a.get("name") for a in meta.get("args", []) if a.get("name")]
row.add_column("Args", ", ".join(f"-{a}" for a in arg_names))
table.set_row_selection_args(len(table.rows) - 1, ["-cmd", name])
items.append(meta)
ctx.set_last_result_table(table, items)
ctx.set_current_stage_table(table)
print(table)
def _render_detail(meta: Dict[str, Any], args: Sequence[str]) -> None:
title = f"Help: {meta.get('name', '') or 'cmd'}"
table = ResultTable(title)
table.set_source_command(".help", list(args))
header_lines: List[str] = []
summary = meta.get("summary", "")
usage = meta.get("usage", "")
aliases = meta.get("aliases", []) or []
examples = _examples_for_cmd(meta.get("name", ""))
first_example_tokens: List[str] = []
first_example_cmd: Optional[str] = None
if examples:
try:
split_tokens = shlex.split(examples[0])
if split_tokens:
first_example_cmd = split_tokens[0]
first_example_tokens = split_tokens[1:]
except Exception:
pass
if summary:
header_lines.append(summary)
if usage:
header_lines.append(f"Usage: {usage}")
if aliases:
header_lines.append("Aliases: " + ", ".join(aliases))
if examples:
header_lines.append("Examples: " + " | ".join(examples))
if header_lines:
table.set_header_lines(header_lines)
args_meta = meta.get("args", []) or []
example_text = " | ".join(examples)
# If we have an example, use it as the source command so @N runs that example
if first_example_cmd:
table.set_source_command(first_example_cmd, [])
if not args_meta:
row = table.add_row()
row.add_column("Arg", "(none)")
row.add_column("Type", "")
row.add_column("Req", "")
row.add_column("Description", "")
row.add_column("Example", example_text)
if first_example_tokens:
table.set_row_selection_args(len(table.rows) - 1, first_example_tokens)
else:
for arg in args_meta:
row = table.add_row()
name = arg.get("name") or ""
row.add_column("Arg", f"-{name}" if name else "")
row.add_column("Type", arg.get("type", ""))
row.add_column("Req", "yes" if arg.get("required") else "")
desc = arg.get("description", "") or ""
choices = arg.get("choices", []) or []
if choices:
choice_text = f"choices: {', '.join(choices)}"
desc = f"{desc} ({choice_text})" if desc else choice_text
row.add_column("Description", desc)
row.add_column("Example", example_text)
if first_example_tokens:
table.set_row_selection_args(len(table.rows) - 1, first_example_tokens)
ctx.set_last_result_table_overlay(table, [meta])
ctx.set_current_stage_table(table)
print(table)
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
try:
from helper import cmdlet_catalog as _catalog
CMDLET.arg[0].choices = _normalize_choice_list(_catalog.list_cmdlet_names())
metadata = _catalog.list_cmdlet_metadata()
except Exception:
CMDLET.arg[0].choices = []
metadata = {}
parsed = parse_cmdlet_args(args, CMDLET)
filter_text = parsed.get("filter")
cmd_arg = parsed.get("cmd")
if cmd_arg:
target_meta = _find_cmd_metadata(str(cmd_arg), metadata)
if not target_meta:
log(f"Unknown command: {cmd_arg}", file=sys.stderr)
return 1
_render_detail(target_meta, args)
return 0
_render_list(metadata, filter_text, args)
return 0
CMDLET = Cmdlet(
name=".help",
alias=["help", "?"],
summary="Show cmdlets or detailed help",
usage=".help [cmd] [-filter text]",
arg=[
CmdletArg(
name="cmd",
type="string",
description="Cmdlet name to show detailed help",
required=False,
choices=[],
),
CmdletArg(
name="-filter",
type="string",
description="Filter cmdlets by substring",
required=False,
),
],
)

View File

@@ -3,95 +3,22 @@ import sys
from cmdlets._shared import Cmdlet, CmdletArg, parse_cmdlet_args
from helper.logger import log, debug
from result_table import ResultTable
from helper.file_storage import MatrixStorageBackend
# REFACTOR: Commenting out Matrix import until provider refactor is complete
# from helper.store import MatrixStorageBackend
from config import save_config, load_config
import pipeline as ctx
def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
parsed = parse_cmdlet_args(args, CMDLET)
# Initialize backend
backend = MatrixStorageBackend()
# Get current default room
matrix_conf = config.get('storage', {}).get('matrix', {})
current_room_id = matrix_conf.get('room_id')
# Fetch rooms
debug("Fetching joined rooms from Matrix...")
rooms = backend.list_rooms(config)
if not rooms:
debug("No joined rooms found or Matrix not configured.")
return 1
# Handle selection if provided
selection = parsed.get("selection")
if selection:
new_room_id = None
selected_room_name = None
# Try as index (1-based)
try:
idx = int(selection) - 1
if 0 <= idx < len(rooms):
selected_room = rooms[idx]
new_room_id = selected_room['id']
selected_room_name = selected_room['name']
except ValueError:
# Try as Room ID
for room in rooms:
if room['id'] == selection:
new_room_id = selection
selected_room_name = room['name']
break
if new_room_id:
# Update config
# Load fresh config from disk to avoid saving runtime objects (like WorkerManager)
disk_config = load_config()
if 'storage' not in disk_config: disk_config['storage'] = {}
if 'matrix' not in disk_config['storage']: disk_config['storage']['matrix'] = {}
disk_config['storage']['matrix']['room_id'] = new_room_id
save_config(disk_config)
debug(f"Default Matrix room set to: {selected_room_name} ({new_room_id})")
current_room_id = new_room_id
else:
debug(f"Invalid selection: {selection}")
return 1
# Display table
table = ResultTable("Matrix Rooms")
for i, room in enumerate(rooms):
is_default = (room['id'] == current_room_id)
row = table.add_row()
row.add_column("Default", "*" if is_default else "")
row.add_column("Name", room['name'])
row.add_column("ID", room['id'])
# Set selection args so user can type @N to select
# This will run .matrix N
table.set_row_selection_args(i, [str(i + 1)])
table.set_source_command(".matrix")
# Register results
ctx.set_last_result_table_overlay(table, rooms)
ctx.set_current_stage_table(table)
print(table)
return 0
# REFACTOR: Matrix cmdlet temporarily disabled during storage provider refactor
log("⚠️ Matrix cmdlet is temporarily disabled during refactor", file=sys.stderr)
return 1
CMDLET = Cmdlet(
name=".matrix",
aliases=["matrix", "rooms"],
alias=["matrix", "rooms"],
summary="List and select default Matrix room",
usage=".matrix [selection]",
args=[
arg=[
CmdletArg(
name="selection",
type="string",

View File

@@ -14,7 +14,7 @@ from helper.mpv_ipc import get_ipc_pipe_path, MPVIPCClient
import pipeline as ctx
from helper.download import is_url_supported_by_ytdlp
from helper.local_library import LocalLibrarySearchOptimizer
from helper.folder_store import LocalLibrarySearchOptimizer
from config import get_local_storage_path, get_hydrus_access_key, get_hydrus_url
from hydrus_health_check import get_cookies_file_path
@@ -35,6 +35,20 @@ def _send_ipc_command(command: Dict[str, Any], silent: bool = False) -> Optional
debug(f"IPC Error: {e}", file=sys.stderr)
return None
def _is_mpv_running() -> bool:
"""Check if MPV is currently running and accessible via IPC."""
try:
ipc_pipe = get_ipc_pipe_path()
client = MPVIPCClient(socket_path=ipc_pipe)
if client.connect():
client.disconnect()
return True
return False
except Exception:
return False
def _get_playlist(silent: bool = False) -> Optional[List[Dict[str, Any]]]:
"""Get the current playlist from MPV. Returns None if MPV is not running."""
cmd = {"command": ["get_property", "playlist"], "request_id": 100}
@@ -87,8 +101,75 @@ def _extract_target_from_memory_uri(text: str) -> Optional[str]:
return None
def _normalize_playlist_target(text: Optional[str]) -> Optional[str]:
"""Normalize playlist entry targets for dedupe comparisons."""
def _find_hydrus_instance_for_hash(hash_str: str, file_storage: Any) -> Optional[str]:
"""Find which Hydrus instance serves a specific file hash.
Args:
hash_str: SHA256 hash (64 hex chars)
file_storage: FileStorage instance with Hydrus backends
Returns:
Instance name (e.g., 'home') or None if not found
"""
# Query each Hydrus backend to see if it has this file
for backend_name in file_storage.list_backends():
backend = file_storage[backend_name]
# Check if this is a Hydrus backend by checking class name
backend_class = type(backend).__name__
if backend_class != "HydrusNetwork":
continue
try:
# Query metadata to see if this instance has the file
metadata = backend.get_metadata(hash_str)
if metadata:
return backend_name
except Exception:
# This instance doesn't have the file or had an error
continue
return None
def _find_hydrus_instance_by_url(url: str, file_storage: Any) -> Optional[str]:
"""Find which Hydrus instance matches a given URL.
Args:
url: Full URL (e.g., http://localhost:45869/get_files/file?hash=...)
file_storage: FileStorage instance with Hydrus backends
Returns:
Instance name (e.g., 'home') or None if not found
"""
from urllib.parse import urlparse
parsed_target = urlparse(url)
target_netloc = parsed_target.netloc.lower()
# Check each Hydrus backend's URL
for backend_name in file_storage.list_backends():
backend = file_storage[backend_name]
backend_class = type(backend).__name__
if backend_class != "HydrusNetwork":
continue
# Get the backend's base URL from its client
try:
backend_url = backend._client.base_url
parsed_backend = urlparse(backend_url)
backend_netloc = parsed_backend.netloc.lower()
# Match by netloc (host:port)
if target_netloc == backend_netloc:
return backend_name
except Exception:
continue
return None
def _normalize_playlist_path(text: Optional[str]) -> Optional[str]:
"""Normalize playlist entry paths for dedupe comparisons."""
if not text:
return None
real = _extract_target_from_memory_uri(text) or text
@@ -118,8 +199,16 @@ def _normalize_playlist_target(text: Optional[str]) -> Optional[str]:
return real.lower()
def _infer_store_from_playlist_item(item: Dict[str, Any]) -> str:
"""Infer a friendly store label from an MPV playlist entry."""
def _infer_store_from_playlist_item(item: Dict[str, Any], file_storage: Optional[Any] = None) -> str:
"""Infer a friendly store label from an MPV playlist entry.
Args:
item: MPV playlist item dict
file_storage: Optional FileStorage instance for querying specific backend instances
Returns:
Store label (e.g., 'home', 'work', 'local', 'youtube', etc.)
"""
name = item.get("filename") if isinstance(item, dict) else None
target = str(name or "")
@@ -130,19 +219,33 @@ def _infer_store_from_playlist_item(item: Dict[str, Any]) -> str:
# Hydrus hashes: bare 64-hex entries
if re.fullmatch(r"[0-9a-f]{64}", target.lower()):
# If we have file_storage, query each Hydrus instance to find which one has this hash
if file_storage:
hash_str = target.lower()
hydrus_instance = _find_hydrus_instance_for_hash(hash_str, file_storage)
if hydrus_instance:
return hydrus_instance
return "hydrus"
lower = target.lower()
if lower.startswith("magnet:"):
return "magnet"
if lower.startswith("hydrus://"):
# Extract hash from hydrus:// URL if possible
if file_storage:
hash_match = re.search(r"[0-9a-f]{64}", target.lower())
if hash_match:
hash_str = hash_match.group(0)
hydrus_instance = _find_hydrus_instance_for_hash(hash_str, file_storage)
if hydrus_instance:
return hydrus_instance
return "hydrus"
# Windows / UNC paths
if re.match(r"^[a-z]:[\\/]", target, flags=re.IGNORECASE) or target.startswith("\\\\"):
return "local"
# file:// URLs
# file:// url
if lower.startswith("file://"):
return "local"
@@ -162,9 +265,33 @@ def _infer_store_from_playlist_item(item: Dict[str, Any]) -> str:
return "soundcloud"
if "bandcamp" in host_stripped:
return "bandcamp"
if "get_files" in path or host_stripped in {"127.0.0.1", "localhost"}:
if "get_files" in path or "file?hash=" in path or host_stripped in {"127.0.0.1", "localhost"}:
# Hydrus API URL - try to extract hash and find instance
if file_storage:
# Try to extract hash from URL parameters
hash_match = re.search(r"hash=([0-9a-f]{64})", target.lower())
if hash_match:
hash_str = hash_match.group(1)
hydrus_instance = _find_hydrus_instance_for_hash(hash_str, file_storage)
if hydrus_instance:
return hydrus_instance
# If no hash in URL, try matching the base URL to configured instances
hydrus_instance = _find_hydrus_instance_by_url(target, file_storage)
if hydrus_instance:
return hydrus_instance
return "hydrus"
if re.match(r"^\d+\.\d+\.\d+\.\d+$", host_stripped) and "get_files" in path:
# IP-based Hydrus URL
if file_storage:
hash_match = re.search(r"hash=([0-9a-f]{64})", target.lower())
if hash_match:
hash_str = hash_match.group(1)
hydrus_instance = _find_hydrus_instance_for_hash(hash_str, file_storage)
if hydrus_instance:
return hydrus_instance
hydrus_instance = _find_hydrus_instance_by_url(target, file_storage)
if hydrus_instance:
return hydrus_instance
return "hydrus"
parts = host_stripped.split('.')
@@ -231,15 +358,15 @@ def _build_ytdl_options(config: Optional[Dict[str, Any]], hydrus_header: Optiona
return ",".join(opts) if opts else None
def _is_hydrus_target(target: str, hydrus_url: Optional[str]) -> bool:
if not target:
def _is_hydrus_path(path: str, hydrus_url: Optional[str]) -> bool:
if not path:
return False
lower = target.lower()
lower = path.lower()
if "hydrus://" in lower:
return True
parsed = urlparse(target)
parsed = urlparse(path)
host = (parsed.netloc or "").lower()
path = parsed.path or ""
path_part = parsed.path or ""
if hydrus_url:
try:
hydrus_host = urlparse(hydrus_url).netloc.lower()
@@ -247,9 +374,9 @@ def _is_hydrus_target(target: str, hydrus_url: Optional[str]) -> bool:
return True
except Exception:
pass
if "get_files" in path or "file?hash=" in path:
if "get_files" in path_part or "file?hash=" in path_part:
return True
if re.match(r"^\d+\.\d+\.\d+\.\d+$", host) and "get_files" in path:
if re.match(r"^\d+\.\d+\.\d+\.\d+$", host) and "get_files" in path_part:
return True
return False
@@ -313,6 +440,113 @@ def _monitor_mpv_logs(duration: float = 3.0) -> None:
client.disconnect()
except Exception:
pass
def _get_playable_path(item: Any, file_storage: Optional[Any], config: Optional[Dict[str, Any]]) -> Optional[tuple[str, Optional[str]]]:
"""Extract a playable path/URL from an item, handling different store types.
Args:
item: Item to extract path from (dict, PipeObject, or string)
file_storage: FileStorage instance for querying backends
config: Config dict for Hydrus URL
Returns:
Tuple of (path, title) or None if no valid path found
"""
path = None
title = None
store = None
file_hash = None
# Extract fields from item - prefer a disk path ('path'), but accept 'url' as fallback for providers
if isinstance(item, dict):
# Support both canonical 'path' and legacy 'file_path' keys, and provider 'url' keys
path = item.get("path") or item.get("file_path")
# Fallbacks for provider-style entries where URL is stored in 'url' or 'source_url' or 'target'
if not path:
path = item.get("url") or item.get("source_url") or item.get("target")
if not path:
known = item.get("url") or item.get("url") or []
if known and isinstance(known, list):
path = known[0]
title = item.get("title") or item.get("file_title")
store = item.get("store") or item.get("storage") or item.get("storage_source") or item.get("origin")
file_hash = item.get("hash") or item.get("file_hash") or item.get("hash_hex")
elif hasattr(item, "path") or hasattr(item, "url") or hasattr(item, "source_url") or hasattr(item, "store") or hasattr(item, "hash"):
# Handle PipeObject / dataclass objects - prefer path, but fall back to url/source_url attributes
path = getattr(item, "path", None) or getattr(item, "file_path", None)
if not path:
path = getattr(item, "url", None) or getattr(item, "source_url", None) or getattr(item, "target", None)
if not path:
known = getattr(item, "url", None) or (getattr(item, "extra", None) or {}).get("url")
if known and isinstance(known, list):
path = known[0]
title = getattr(item, "title", None) or getattr(item, "file_title", None)
store = getattr(item, "store", None) or getattr(item, "origin", None)
file_hash = getattr(item, "hash", None)
elif isinstance(item, str):
path = item
# Debug: show incoming values
try:
debug(f"_get_playable_path: store={store}, path={path}, hash={file_hash}")
except Exception:
pass
if not path:
return None
# If we have a store and hash, use store's .pipe() method if available
# Skip this for URL-based providers (YouTube, SoundCloud, etc.) which have hash="unknown"
# Also skip if path is already a URL (http/https)
if store and file_hash and file_hash != "unknown" and file_storage:
# Check if this is actually a URL - if so, just return it
if path.startswith(("http://", "https://")):
return (path, title)
try:
backend = file_storage[store]
# Check if backend has a .pipe() method
if hasattr(backend, 'pipe') and callable(backend.pipe):
pipe_path = backend.pipe(file_hash, config)
if pipe_path:
path = pipe_path
debug(f"Got pipe path from {store} backend: {path}")
except KeyError:
# Store not found in file_storage - it could be a search provider (youtube, bandcamp, etc.)
from helper.provider import get_search_provider
try:
provider = get_search_provider(store, config or {})
if provider and hasattr(provider, 'pipe') and callable(provider.pipe):
try:
debug(f"Calling provider.pipe for '{store}' with path: {path}")
provider_path = provider.pipe(path, config or {})
debug(f"provider.pipe returned: {provider_path}")
if provider_path:
path = provider_path
debug(f"Got pipe path from provider '{store}': {path}")
except Exception as e:
debug(f"Error in provider.pipe for '{store}': {e}", file=sys.stderr)
except Exception as e:
debug(f"Error calling provider.pipe for '{store}': {e}", file=sys.stderr)
except Exception as e:
debug(f"Error calling .pipe() on store '{store}': {e}", file=sys.stderr)
# As a fallback, if a provider exists for this store (e.g., youtube) and
# this store is not part of FileStorage backends, call provider.pipe()
if store and (not file_storage or store not in (file_storage.list_backends() if file_storage else [])):
try:
from helper.provider import get_search_provider
provider = get_search_provider(store, config or {})
if provider and hasattr(provider, 'pipe') and callable(provider.pipe):
provider_path = provider.pipe(path, config or {})
if provider_path:
path = provider_path
debug(f"Got pipe path from provider '{store}' (fallback): {path}")
except Exception as e:
debug(f"Error calling provider.pipe (fallback) for '{store}': {e}", file=sys.stderr)
return (path, title)
def _queue_items(items: List[Any], clear_first: bool = False, config: Optional[Dict[str, Any]] = None) -> bool:
"""Queue items to MPV, starting it if necessary.
@@ -323,6 +557,12 @@ def _queue_items(items: List[Any], clear_first: bool = False, config: Optional[D
Returns:
True if MPV was started, False if items were queued via IPC.
"""
# Debug: print incoming items
try:
debug(f"_queue_items: count={len(items)} types={[type(i).__name__ for i in items]}")
except Exception:
pass
# Just verify cookies are configured, don't try to set via IPC
_ensure_ytdl_cookies()
@@ -333,6 +573,14 @@ def _queue_items(items: List[Any], clear_first: bool = False, config: Optional[D
hydrus_url = get_hydrus_url(config) if config is not None else None
except Exception:
hydrus_url = None
# Initialize FileStorage for path resolution
file_storage = None
try:
from helper.store import FileStorage
file_storage = FileStorage(config or {})
except Exception as e:
debug(f"Warning: Could not initialize FileStorage: {e}", file=sys.stderr)
# Dedupe existing playlist before adding more (unless we're replacing it)
existing_targets: set[str] = set()
@@ -342,7 +590,7 @@ def _queue_items(items: List[Any], clear_first: bool = False, config: Optional[D
for idx, pl_item in enumerate(playlist):
fname = pl_item.get("filename") if isinstance(pl_item, dict) else str(pl_item)
alt = pl_item.get("playlist-path") if isinstance(pl_item, dict) else None
norm = _normalize_playlist_target(fname) or _normalize_playlist_target(alt)
norm = _normalize_playlist_path(fname) or _normalize_playlist_path(alt)
if not norm:
continue
if norm in existing_targets:
@@ -360,25 +608,25 @@ def _queue_items(items: List[Any], clear_first: bool = False, config: Optional[D
new_targets: set[str] = set()
for i, item in enumerate(items):
# Extract URL/Path
target = None
title = None
# Debug: show the item being processed
try:
debug(f"_queue_items: processing idx={i} type={type(item)} repr={repr(item)[:200]}")
except Exception:
pass
# Extract URL/Path using store-aware logic
result = _get_playable_path(item, file_storage, config)
if not result:
debug(f"_queue_items: item idx={i} produced no playable path")
continue
if isinstance(item, dict):
target = item.get("target") or item.get("url") or item.get("path") or item.get("filename")
title = item.get("title") or item.get("name")
elif hasattr(item, "target"):
target = item.target
title = getattr(item, "title", None)
elif isinstance(item, str):
target = item
target, title = result
if target:
# If we just have a hydrus hash, build a direct file URL for MPV
if re.fullmatch(r"[0-9a-f]{64}", str(target).strip().lower()) and hydrus_url:
target = f"{hydrus_url.rstrip('/')}/get_files/file?hash={str(target).strip()}"
norm_key = _normalize_playlist_target(target) or str(target).strip().lower()
norm_key = _normalize_playlist_path(target) or str(target).strip().lower()
if norm_key in existing_targets or norm_key in new_targets:
debug(f"Skipping duplicate playlist entry: {title or target}")
continue
@@ -386,11 +634,16 @@ def _queue_items(items: List[Any], clear_first: bool = False, config: Optional[D
# Check if it's a yt-dlp supported URL
is_ytdlp = False
if target.startswith("http") and is_url_supported_by_ytdlp(target):
is_ytdlp = True
# Treat any http(s) target as yt-dlp candidate. If the Python yt-dlp
# module is available we also check more deeply, but default to True
# so MPV can use its ytdl hooks for remote streaming sites.
try:
is_ytdlp = target.startswith("http") or is_url_supported_by_ytdlp(target)
except Exception:
is_ytdlp = target.startswith("http")
# Use memory:// M3U hack to pass title to MPV
# Skip for yt-dlp URLs to ensure proper handling
# Skip for yt-dlp url to ensure proper handling
if title and not is_ytdlp:
# Sanitize title for M3U (remove newlines)
safe_title = title.replace('\n', ' ').replace('\r', '')
@@ -403,8 +656,8 @@ def _queue_items(items: List[Any], clear_first: bool = False, config: Optional[D
if clear_first and i == 0:
mode = "replace"
# If this is a Hydrus target, set header property and yt-dlp headers before loading
if hydrus_header and _is_hydrus_target(target_to_send, hydrus_url):
# If this is a Hydrus path, set header property and yt-dlp headers before loading
if hydrus_header and _is_hydrus_path(target_to_send, hydrus_url):
header_cmd = {"command": ["set_property", "http-header-fields", hydrus_header], "request_id": 199}
_send_ipc_command(header_cmd, silent=True)
if ytdl_opts:
@@ -412,11 +665,18 @@ def _queue_items(items: List[Any], clear_first: bool = False, config: Optional[D
_send_ipc_command(ytdl_cmd, silent=True)
cmd = {"command": ["loadfile", target_to_send, mode], "request_id": 200}
resp = _send_ipc_command(cmd)
try:
debug(f"Sending MPV loadfile: {target_to_send} mode={mode}")
resp = _send_ipc_command(cmd)
debug(f"MPV loadfile response: {resp}")
except Exception as e:
debug(f"Exception sending loadfile to MPV: {e}", file=sys.stderr)
resp = None
if resp is None:
# MPV not running (or died)
# Start MPV with remaining items
debug(f"MPV not running/died while queuing, starting MPV with remaining items: {items[i:]}")
_start_mpv(items[i:], config=config)
return True
elif resp.get("error") == "success":
@@ -435,6 +695,14 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
parsed = parse_cmdlet_args(args, CMDLET)
# Initialize FileStorage for detecting Hydrus instance names
file_storage = None
try:
from helper.store import FileStorage
file_storage = FileStorage(config)
except Exception as e:
debug(f"Warning: Could not initialize FileStorage: {e}", file=sys.stderr)
# Initialize mpv_started flag
mpv_started = False
@@ -485,7 +753,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
# Emit the current item to pipeline
result_obj = {
'file_path': filename,
'path': filename,
'title': title,
'cmdlet_name': '.pipe',
'source': 'pipe',
@@ -683,10 +951,20 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
items_to_add = result
elif isinstance(result, dict):
items_to_add = [result]
if _queue_items(items_to_add, config=config):
else:
# Handle PipeObject or any other object type
items_to_add = [result]
# Debug: inspect incoming result and attributes
try:
debug(f"pipe._run: received result type={type(result)} repr={repr(result)[:200]}")
debug(f"pipe._run: attrs path={getattr(result, 'path', None)} url={getattr(result, 'url', None)} store={getattr(result, 'store', None)} hash={getattr(result, 'hash', None)}")
except Exception:
pass
if items_to_add and _queue_items(items_to_add, config=config):
mpv_started = True
if items_to_add:
# If we added items, we might want to play the first one if nothing is playing?
# For now, just list the playlist
@@ -760,7 +1038,7 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
return 1
else:
# Play item
if hydrus_header and _is_hydrus_target(filename, hydrus_url):
if hydrus_header and _is_hydrus_path(filename, hydrus_url):
header_cmd = {"command": ["set_property", "http-header-fields", hydrus_header], "request_id": 198}
_send_ipc_command(header_cmd, silent=True)
cmd = {"command": ["playlist-play-index", idx], "request_id": 102}
@@ -799,28 +1077,84 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
except NameError:
table_title = "MPV Playlist"
table = ResultTable(table_title)
table = ResultTable(table_title, preserve_order=True)
# Convert MPV items to PipeObjects with proper hash and store
pipe_objects = []
for i, item in enumerate(items):
is_current = item.get("current", False)
title = _extract_title_from_item(item)
store = _infer_store_from_playlist_item(item)
# Truncate if too long
if len(title) > 80:
title = title[:77] + "..."
filename = item.get("filename", "")
# Extract the real path/URL from memory:// wrapper if present
real_path = _extract_target_from_memory_uri(filename) or filename
# Try to extract hash from the path/URL
file_hash = None
store_name = None
# Check if it's a Hydrus URL
if "get_files/file" in real_path or "hash=" in real_path:
# Extract hash from Hydrus URL
hash_match = re.search(r"hash=([0-9a-f]{64})", real_path.lower())
if hash_match:
file_hash = hash_match.group(1)
# Try to find which Hydrus instance has this file
if file_storage:
store_name = _find_hydrus_instance_for_hash(file_hash, file_storage)
if not store_name:
store_name = "hydrus"
# Check if it's a hash-based local file
elif real_path:
# Try to extract hash from filename (e.g., C:\path\1e8c46...a1b2.mp4)
path_obj = Path(real_path)
stem = path_obj.stem # filename without extension
if len(stem) == 64 and all(c in '0123456789abcdef' for c in stem.lower()):
file_hash = stem.lower()
# Find which folder store has this file
if file_storage:
for backend_name in file_storage.list_backends():
backend = file_storage[backend_name]
if type(backend).__name__ == "Folder":
# Check if this backend has the file
try:
result_path = backend.get_file(file_hash)
if result_path and result_path.exists():
store_name = backend_name
break
except Exception:
pass
# Fallback to inferred store if we couldn't find it
if not store_name:
store_name = _infer_store_from_playlist_item(item, file_storage=file_storage)
# Build PipeObject with proper metadata
from models import PipeObject
pipe_obj = PipeObject(
hash=file_hash or "unknown",
store=store_name or "unknown",
title=title,
path=real_path
)
pipe_objects.append(pipe_obj)
# Truncate title for display
display_title = title
if len(display_title) > 80:
display_title = display_title[:77] + "..."
row = table.add_row()
row.add_column("Current", "*" if is_current else "")
row.add_column("Store", store)
row.add_column("Title", title)
row.add_column("Store", store_name or "unknown")
row.add_column("Title", display_title)
table.set_row_selection_args(i, [str(i + 1)])
table.set_source_command(".pipe")
# Register results with pipeline context so @N selection works
ctx.set_last_result_table_overlay(table, items)
# Register PipeObjects (not raw MPV items) with pipeline context
ctx.set_last_result_table_overlay(table, pipe_objects)
ctx.set_current_stage_table(table)
print(table)
@@ -889,16 +1223,30 @@ def _start_mpv(items: List[Any], config: Optional[Dict[str, Any]] = None) -> Non
if items:
_queue_items(items, config=config)
# Auto-play the first item
import time
time.sleep(0.3) # Give MPV a moment to process the queued items
# Play the first item (index 0) and unpause
play_cmd = {"command": ["playlist-play-index", 0], "request_id": 102}
play_resp = _send_ipc_command(play_cmd, silent=True)
if play_resp and play_resp.get("error") == "success":
# Ensure playback starts (unpause)
unpause_cmd = {"command": ["set_property", "pause", False], "request_id": 103}
_send_ipc_command(unpause_cmd, silent=True)
debug("Auto-playing first item")
except Exception as e:
debug(f"Error starting MPV: {e}", file=sys.stderr)
CMDLET = Cmdlet(
name=".pipe",
aliases=["pipe", "playlist", "queue", "ls-pipe"],
alias=["pipe", "playlist", "queue", "ls-pipe"],
summary="Manage and play items in the MPV playlist via IPC",
usage=".pipe [index|url] [-current] [-clear] [-list] [-url URL]",
args=[
arg=[
CmdletArg(
name="index",
type="string", # Changed to string to allow URL detection

View File

@@ -21,14 +21,14 @@ CMDLET = Cmdlet(
name=".worker",
summary="Display workers table in result table format.",
usage=".worker [status] [-limit N] [@N]",
args=[
arg=[
CmdletArg("status", description="Filter by status: running, completed, error (default: all)"),
CmdletArg("limit", type="integer", description="Limit results (default: 100)"),
CmdletArg("@N", description="Select worker by index (1-based) and display full logs"),
CmdletArg("-id", description="Show full logs for a specific worker"),
CmdletArg("-clear", type="flag", description="Remove completed workers from the database"),
],
details=[
detail=[
"- Shows all background worker tasks and their output",
"- Can filter by status: running, completed, error",
"- Search result stdout is captured from each worker",
@@ -74,9 +74,9 @@ def _run(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
return 1
try:
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
with LocalLibraryDB(library_root) as db:
with FolderDB(library_root) as db:
if options.clear:
count = db.clear_finished_workers()
log(f"Cleared {count} finished workers.")

View File

@@ -25,18 +25,28 @@ def _make_cache_key(config_dir: Optional[Path], filename: str, actual_path: Opti
def get_hydrus_instance(config: Dict[str, Any], instance_name: str = "home") -> Optional[Dict[str, Any]]:
"""Get a specific Hydrus instance config by name.
Supports both formats:
- New: config["storage"]["hydrus"][instance_name] = {"key": "...", "url": "..."}
- Old: config["HydrusNetwork"][instance_name] = {"key": "...", "url": "..."}
Supports multiple formats:
- Current: config["store"]["hydrusnetwork"][instance_name]
- Legacy: config["storage"]["hydrus"][instance_name]
- Old: config["HydrusNetwork"][instance_name]
Args:
config: Configuration dict
instance_name: Name of the Hydrus instance (default: "home")
Returns:
Dict with "key" and "url" keys, or None if not found
Dict with access key and URL, or None if not found
"""
# Try new format first
# Try current format first: config["store"]["hydrusnetwork"]["home"]
store = config.get("store", {})
if isinstance(store, dict):
hydrusnetwork = store.get("hydrusnetwork", {})
if isinstance(hydrusnetwork, dict):
instance = hydrusnetwork.get(instance_name)
if isinstance(instance, dict):
return instance
# Try legacy format: config["storage"]["hydrus"]
storage = config.get("storage", {})
if isinstance(storage, dict):
hydrus_config = storage.get("hydrus", {})
@@ -45,7 +55,7 @@ def get_hydrus_instance(config: Dict[str, Any], instance_name: str = "home") ->
if isinstance(instance, dict):
return instance
# Fall back to old format
# Fall back to old format: config["HydrusNetwork"]
hydrus_network = config.get("HydrusNetwork")
if not isinstance(hydrus_network, dict):
return None
@@ -60,9 +70,10 @@ def get_hydrus_instance(config: Dict[str, Any], instance_name: str = "home") ->
def get_hydrus_access_key(config: Dict[str, Any], instance_name: str = "home") -> Optional[str]:
"""Get Hydrus access key for an instance.
Supports both old flat format and new nested format:
Supports multiple formats:
- Current: config["store"]["hydrusnetwork"][name]["Hydrus-Client-API-Access-Key"]
- Legacy: config["storage"]["hydrus"][name]["key"]
- Old: config["HydrusNetwork_Access_Key"]
- New: config["HydrusNetwork"][instance_name]["key"]
Args:
config: Configuration dict
@@ -72,7 +83,18 @@ def get_hydrus_access_key(config: Dict[str, Any], instance_name: str = "home") -
Access key string, or None if not found
"""
instance = get_hydrus_instance(config, instance_name)
key = instance.get("key") if instance else config.get("HydrusNetwork_Access_Key")
if instance:
# Try current format key name
key = instance.get("Hydrus-Client-API-Access-Key")
if key:
return str(key).strip()
# Try legacy key name
key = instance.get("key")
if key:
return str(key).strip()
# Fall back to old flat format
key = config.get("HydrusNetwork_Access_Key")
return str(key).strip() if key else None
@@ -140,8 +162,9 @@ def resolve_output_dir(config: Dict[str, Any]) -> Path:
def get_local_storage_path(config: Dict[str, Any]) -> Optional[Path]:
"""Get local storage path from config.
Supports both formats:
- New: config["storage"]["local"]["path"]
Supports multiple formats:
- New: config["store"]["folder"]["default"]["path"]
- Old: config["storage"]["local"]["path"]
- Old: config["Local"]["path"]
Args:
@@ -150,7 +173,18 @@ def get_local_storage_path(config: Dict[str, Any]) -> Optional[Path]:
Returns:
Path object if found, None otherwise
"""
# Try new format first
# Try new format first: store.folder.default.path
store = config.get("store", {})
if isinstance(store, dict):
folder_config = store.get("folder", {})
if isinstance(folder_config, dict):
default_config = folder_config.get("default", {})
if isinstance(default_config, dict):
path_str = default_config.get("path")
if path_str:
return Path(str(path_str)).expanduser()
# Fall back to storage.local.path format
storage = config.get("storage", {})
if isinstance(storage, dict):
local_config = storage.get("local", {})
@@ -159,7 +193,7 @@ def get_local_storage_path(config: Dict[str, Any]) -> Optional[Path]:
if path_str:
return Path(str(path_str)).expanduser()
# Fall back to old format
# Fall back to old Local format
local_config = config.get("Local", {})
if isinstance(local_config, dict):
path_str = local_config.get("path")

View File

@@ -50,7 +50,6 @@ UrlPolicy = _utils.UrlPolicy
DownloadOptions = _download.DownloadOptions
DownloadError = _download.DownloadError
DownloadMediaResult = _download.DownloadMediaResult
download_media = _download.download_media
is_url_supported_by_ytdlp = _download.is_url_supported_by_ytdlp
probe_url = _download.probe_url
# Hydrus utilities

View File

@@ -35,7 +35,7 @@ class AllDebridClient:
"""Client for AllDebrid API."""
# Try both v4 and v3 APIs
BASE_URLS = [
BASE_url = [
"https://api.alldebrid.com/v4",
"https://api.alldebrid.com/v3",
]
@@ -49,7 +49,7 @@ class AllDebridClient:
self.api_key = api_key.strip()
if not self.api_key:
raise AllDebridError("AllDebrid API key is empty")
self.base_url = self.BASE_URLS[0] # Start with v4
self.base_url = self.BASE_url[0] # Start with v4
def _request(self, endpoint: str, params: Optional[Dict[str, str]] = None) -> Dict[str, Any]:
"""Make a request to AllDebrid API.
@@ -738,7 +738,7 @@ def parse_magnet_or_hash(uri: str) -> Optional[str]:
def unlock_link_cmdlet(result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:
"""Unlock a restricted link using AllDebrid.
Converts free hosters and restricted links to direct download URLs.
Converts free hosters and restricted links to direct download url.
Usage:
unlock-link <link>

View File

@@ -378,7 +378,7 @@ def download(
session: Authenticated requests.Session
n_threads: Number of download threads
directory: Directory to save images to
links: List of image URLs
links: List of image url
scale: Image resolution (0=highest, 10=lowest)
book_id: Archive.org book ID (for re-borrowing)

View File

@@ -0,0 +1,195 @@
"""Lightweight console notifier for background WorkerManager tasks.
Registers a refresh callback on WorkerManager and prints concise updates when
workers start, progress, or finish. Intended for CLI background workflows.
Filters to show only workers related to the current pipeline session to avoid
cluttering the terminal with workers from previous sessions.
"""
from __future__ import annotations
from typing import Any, Callable, Dict, Optional, Set
from helper.logger import log, debug
class BackgroundNotifier:
"""Simple notifier that prints worker status changes for a session."""
def __init__(
self,
manager: Any,
output: Callable[[str], None] = log,
session_worker_ids: Optional[Set[str]] = None,
only_terminal_updates: bool = False,
overlay_mode: bool = False,
) -> None:
self.manager = manager
self.output = output
self.session_worker_ids = session_worker_ids if session_worker_ids is not None else set()
self.only_terminal_updates = only_terminal_updates
self.overlay_mode = overlay_mode
self._filter_enabled = session_worker_ids is not None
self._last_state: Dict[str, str] = {}
try:
self.manager.add_refresh_callback(self._on_refresh)
self.manager.start_auto_refresh()
except Exception as exc: # pragma: no cover - best effort
debug(f"[notifier] Could not attach refresh callback: {exc}")
def _render_line(self, worker: Dict[str, Any]) -> Optional[str]:
# Use worker_id (the actual worker ID we set) for filtering and display
worker_id = str(worker.get("worker_id") or "").strip()
if not worker_id:
# Fallback to database id if worker_id is not set
worker_id = str(worker.get("id") or "").strip()
if not worker_id:
return None
status = str(worker.get("status") or "running")
progress_val = worker.get("progress") or worker.get("progress_percent")
progress = ""
if isinstance(progress_val, (int, float)):
progress = f" {progress_val:.1f}%"
elif progress_val:
progress = f" {progress_val}"
step = str(worker.get("current_step") or worker.get("description") or "").strip()
parts = [f"[worker:{worker_id}] {status}{progress}"]
if step:
parts.append(step)
return " - ".join(parts)
def _on_refresh(self, workers: list[Dict[str, Any]]) -> None:
overlay_active_workers = 0
for worker in workers:
# Use worker_id (the actual worker ID we set) for filtering
worker_id = str(worker.get("worker_id") or "").strip()
if not worker_id:
# Fallback to database id if worker_id is not set
worker_id = str(worker.get("id") or "").strip()
if not worker_id:
continue
# If filtering is enabled, skip workers not in this session
if self._filter_enabled and worker_id not in self.session_worker_ids:
continue
status = str(worker.get("status") or "running")
# Overlay mode: only emit on completion; suppress start/progress spam
if self.overlay_mode:
if status in ("completed", "finished", "error"):
progress_val = worker.get("progress") or worker.get("progress_percent") or ""
step = str(worker.get("current_step") or worker.get("description") or "").strip()
signature = f"{status}|{progress_val}|{step}"
if self._last_state.get(worker_id) == signature:
continue
self._last_state[worker_id] = signature
line = self._render_line(worker)
if line:
try:
self.output(line)
except Exception:
pass
self._last_state.pop(worker_id, None)
self.session_worker_ids.discard(worker_id)
continue
# For terminal-only mode, emit once when the worker finishes and skip intermediate updates
if self.only_terminal_updates:
if status in ("completed", "finished", "error"):
if self._last_state.get(worker_id) == status:
continue
self._last_state[worker_id] = status
line = self._render_line(worker)
if line:
try:
self.output(line)
except Exception:
pass
# Stop tracking this worker after terminal notification
self.session_worker_ids.discard(worker_id)
continue
# Skip finished workers after showing them once (standard verbose mode)
if status in ("completed", "finished", "error"):
if worker_id in self._last_state:
# Already shown, remove from tracking
self._last_state.pop(worker_id, None)
self.session_worker_ids.discard(worker_id)
continue
progress_val = worker.get("progress") or worker.get("progress_percent") or ""
step = str(worker.get("current_step") or worker.get("description") or "").strip()
signature = f"{status}|{progress_val}|{step}"
if self._last_state.get(worker_id) == signature:
continue
self._last_state[worker_id] = signature
line = self._render_line(worker)
if line:
try:
self.output(line)
except Exception:
pass
if self.overlay_mode:
try:
# If nothing active for this session, clear the overlay text
if overlay_active_workers == 0:
self.output("")
except Exception:
pass
def ensure_background_notifier(
manager: Any,
output: Callable[[str], None] = log,
session_worker_ids: Optional[Set[str]] = None,
only_terminal_updates: bool = False,
overlay_mode: bool = False,
) -> Optional[BackgroundNotifier]:
"""Attach a BackgroundNotifier to a WorkerManager if not already present.
Args:
manager: WorkerManager instance
output: Function to call for printing updates
session_worker_ids: Set of worker IDs belonging to this pipeline session.
If None, show all workers. If a set (even empty), only show workers in that set.
"""
if manager is None:
return None
existing = getattr(manager, "_background_notifier", None)
if isinstance(existing, BackgroundNotifier):
# Update session IDs if provided
if session_worker_ids is not None:
existing._filter_enabled = True
existing.session_worker_ids.update(session_worker_ids)
# Respect the most restrictive setting for terminal-only updates
if only_terminal_updates:
existing.only_terminal_updates = True
# Enable overlay mode if requested later
if overlay_mode:
existing.overlay_mode = True
return existing
notifier = BackgroundNotifier(
manager,
output,
session_worker_ids=session_worker_ids,
only_terminal_updates=only_terminal_updates,
overlay_mode=overlay_mode,
)
try:
manager._background_notifier = notifier # type: ignore[attr-defined]
except Exception:
pass
return notifier

223
helper/cmdlet_catalog.py Normal file
View File

@@ -0,0 +1,223 @@
from __future__ import annotations
from importlib import import_module
from typing import Any, Dict, List, Optional
try:
from cmdlets import REGISTRY
except Exception:
REGISTRY = {} # type: ignore
try:
from cmdnats import register_native_commands as _register_native_commands
except Exception:
_register_native_commands = None
def ensure_registry_loaded() -> None:
"""Ensure native commands are registered into REGISTRY (idempotent)."""
if _register_native_commands and REGISTRY is not None:
try:
_register_native_commands(REGISTRY)
except Exception:
pass
def _normalize_mod_name(mod_name: str) -> str:
"""Normalize a command/module name for import resolution."""
normalized = (mod_name or "").strip()
if normalized.startswith('.'):
normalized = normalized.lstrip('.')
normalized = normalized.replace('-', '_')
return normalized
def import_cmd_module(mod_name: str):
"""Import a cmdlet/native module from cmdnats or cmdlets packages."""
normalized = _normalize_mod_name(mod_name)
if not normalized:
return None
for package in ("cmdnats", "cmdlets", None):
try:
qualified = f"{package}.{normalized}" if package else normalized
return import_module(qualified)
except ModuleNotFoundError:
continue
except Exception:
continue
return None
def _normalize_arg(arg: Any) -> Dict[str, Any]:
"""Convert a CmdletArg/dict into a plain metadata dict."""
if isinstance(arg, dict):
name = arg.get("name", "")
return {
"name": str(name).lstrip("-"),
"type": arg.get("type", "string"),
"required": bool(arg.get("required", False)),
"description": arg.get("description", ""),
"choices": arg.get("choices", []) or [],
"alias": arg.get("alias", ""),
"variadic": arg.get("variadic", False),
}
name = getattr(arg, "name", "") or ""
return {
"name": str(name).lstrip("-"),
"type": getattr(arg, "type", "string"),
"required": bool(getattr(arg, "required", False)),
"description": getattr(arg, "description", ""),
"choices": getattr(arg, "choices", []) or [],
"alias": getattr(arg, "alias", ""),
"variadic": getattr(arg, "variadic", False),
}
def get_cmdlet_metadata(cmd_name: str) -> Optional[Dict[str, Any]]:
"""Return normalized metadata for a cmdlet, if available (aliases supported)."""
ensure_registry_loaded()
normalized = cmd_name.replace("-", "_")
mod = import_cmd_module(normalized)
data = getattr(mod, "CMDLET", None) if mod else None
# Fallback: resolve via registered function's module (covers aliases)
if data is None:
try:
reg_fn = (REGISTRY or {}).get(cmd_name.replace('_', '-').lower())
if reg_fn:
owner_mod = getattr(reg_fn, "__module__", "")
if owner_mod:
owner = import_module(owner_mod)
data = getattr(owner, "CMDLET", None)
except Exception:
data = None
if not data:
return None
if hasattr(data, "to_dict"):
base = data.to_dict()
elif isinstance(data, dict):
base = data
else:
base = {}
name = getattr(data, "name", base.get("name", cmd_name)) or cmd_name
aliases = getattr(data, "aliases", base.get("aliases", [])) or []
usage = getattr(data, "usage", base.get("usage", ""))
summary = getattr(data, "summary", base.get("summary", ""))
details = getattr(data, "details", base.get("details", [])) or []
args_list = getattr(data, "args", base.get("args", [])) or []
args = [_normalize_arg(arg) for arg in args_list]
return {
"name": str(name).replace("_", "-").lower(),
"aliases": [str(a).replace("_", "-").lower() for a in aliases if a],
"usage": usage,
"summary": summary,
"details": details,
"args": args,
"raw": data,
}
def list_cmdlet_metadata() -> Dict[str, Dict[str, Any]]:
"""Collect metadata for all registered cmdlets keyed by canonical name."""
ensure_registry_loaded()
entries: Dict[str, Dict[str, Any]] = {}
for reg_name in (REGISTRY or {}).keys():
meta = get_cmdlet_metadata(reg_name)
canonical = str(reg_name).replace("_", "-").lower()
if meta:
canonical = meta.get("name", canonical)
aliases = meta.get("aliases", [])
base = entries.get(
canonical,
{
"name": canonical,
"aliases": [],
"usage": "",
"summary": "",
"details": [],
"args": [],
"raw": meta.get("raw"),
},
)
merged_aliases = set(base.get("aliases", [])) | set(aliases)
if canonical != reg_name:
merged_aliases.add(reg_name)
base["aliases"] = sorted(a for a in merged_aliases if a and a != canonical)
if not base.get("usage") and meta.get("usage"):
base["usage"] = meta["usage"]
if not base.get("summary") and meta.get("summary"):
base["summary"] = meta["summary"]
if not base.get("details") and meta.get("details"):
base["details"] = meta["details"]
if not base.get("args") and meta.get("args"):
base["args"] = meta["args"]
if not base.get("raw"):
base["raw"] = meta.get("raw")
entries[canonical] = base
else:
entries.setdefault(
canonical,
{"name": canonical, "aliases": [], "usage": "", "summary": "", "details": [], "args": [], "raw": None},
)
return entries
def list_cmdlet_names(include_aliases: bool = True) -> List[str]:
"""Return sorted cmdlet names (optionally including aliases)."""
ensure_registry_loaded()
entries = list_cmdlet_metadata()
names = set()
for meta in entries.values():
names.add(meta.get("name", ""))
if include_aliases:
for alias in meta.get("aliases", []):
names.add(alias)
return sorted(n for n in names if n)
def get_cmdlet_arg_flags(cmd_name: str) -> List[str]:
"""Return flag variants for cmdlet arguments (e.g., -name/--name)."""
meta = get_cmdlet_metadata(cmd_name)
if not meta:
return []
raw = meta.get("raw")
if raw and hasattr(raw, "build_flag_registry"):
try:
registry = raw.build_flag_registry()
flags: List[str] = []
for flag_set in registry.values():
flags.extend(flag_set)
return sorted(set(flags))
except Exception:
pass
flags: List[str] = []
for arg in meta.get("args", []):
name = arg.get("name")
if not name:
continue
flags.append(f"-{name}")
flags.append(f"--{name}")
alias = arg.get("alias")
if alias:
flags.append(f"-{alias}")
return flags
def get_cmdlet_arg_choices(cmd_name: str, arg_name: str) -> List[str]:
"""Return declared choices for a cmdlet argument."""
meta = get_cmdlet_metadata(cmd_name)
if not meta:
return []
target = arg_name.lstrip("-")
for arg in meta.get("args", []):
if arg.get("name") == target:
return list(arg.get("choices", []) or [])
return []

View File

@@ -28,7 +28,6 @@ from helper.logger import log, debug
from .utils import ensure_directory, sha256_file
from .http_client import HTTPClient
from models import DownloadError, DownloadOptions, DownloadMediaResult, DebugLogger, ProgressBar
from hydrus_health_check import get_cookies_file_path
try:
import yt_dlp # type: ignore
@@ -145,7 +144,7 @@ def list_formats(url: str, no_playlist: bool = False, playlist_items: Optional[s
return None
def _download_with_sections_via_cli(url: str, ytdl_options: Dict[str, Any], sections: List[str]) -> tuple[Optional[str], Dict[str, Any]]:
def _download_with_sections_via_cli(url: str, ytdl_options: Dict[str, Any], sections: List[str], quiet: bool = False) -> tuple[Optional[str], Dict[str, Any]]:
"""Download each section separately so merge-file can combine them.
yt-dlp with multiple --download-sections args merges them into one file.
@@ -204,11 +203,14 @@ def _download_with_sections_via_cli(url: str, ytdl_options: Dict[str, Any], sect
info_dict = json.loads(meta_result.stdout.strip())
first_section_info = info_dict
title_from_first = info_dict.get('title')
debug(f"Extracted title from metadata: {title_from_first}")
if not quiet:
debug(f"Extracted title from metadata: {title_from_first}")
except json.JSONDecodeError:
debug("Could not parse JSON metadata")
if not quiet:
debug("Could not parse JSON metadata")
except Exception as e:
debug(f"Error extracting metadata: {e}")
if not quiet:
debug(f"Error extracting metadata: {e}")
# Build yt-dlp command for downloading this section
cmd = ["yt-dlp"]
@@ -240,8 +242,9 @@ def _download_with_sections_via_cli(url: str, ytdl_options: Dict[str, Any], sect
# Add the URL
cmd.append(url)
debug(f"Running yt-dlp for section {section_idx}/{len(sections_list)}: {section}")
debug(f"Command: {' '.join(cmd)}")
if not quiet:
debug(f"Running yt-dlp for section {section_idx}/{len(sections_list)}: {section}")
debug(f"Command: {' '.join(cmd)}")
# Run the subprocess - don't capture output so progress is shown
try:
@@ -273,13 +276,15 @@ def _build_ytdlp_options(opts: DownloadOptions) -> Dict[str, Any]:
"fragment_retries": 10,
"http_chunk_size": 10_485_760,
"restrictfilenames": True,
"progress_hooks": [_progress_callback],
"progress_hooks": [] if opts.quiet else [_progress_callback],
}
if opts.cookies_path and opts.cookies_path.is_file():
base_options["cookiefile"] = str(opts.cookies_path)
else:
# Check global cookies file
# Check global cookies file lazily to avoid import cycles
from hydrus_health_check import get_cookies_file_path # local import
global_cookies = get_cookies_file_path()
if global_cookies:
base_options["cookiefile"] = global_cookies
@@ -287,7 +292,7 @@ def _build_ytdlp_options(opts: DownloadOptions) -> Dict[str, Any]:
# Fallback to browser cookies
base_options["cookiesfrombrowser"] = ("chrome",)
# Add no-playlist option if specified (for single video from playlist URLs)
# Add no-playlist option if specified (for single video from playlist url)
if opts.no_playlist:
base_options["noplaylist"] = True
@@ -336,7 +341,8 @@ def _build_ytdlp_options(opts: DownloadOptions) -> Dict[str, Any]:
if opts.playlist_items:
base_options["playlist_items"] = opts.playlist_items
debug(f"yt-dlp: mode={opts.mode}, format={base_options.get('format')}")
if not opts.quiet:
debug(f"yt-dlp: mode={opts.mode}, format={base_options.get('format')}")
return base_options
@@ -411,8 +417,8 @@ def _extract_sha256(info: Dict[str, Any]) -> Optional[str]:
def _get_libgen_download_url(libgen_url: str) -> Optional[str]:
"""Extract the actual download link from LibGen redirect URL.
LibGen URLs like https://libgen.gl/file.php?id=123456 redirect to
actual mirror URLs. This follows the redirect chain to get the real file.
LibGen url like https://libgen.gl/file.php?id=123456 redirect to
actual mirror url. This follows the redirect chain to get the real file.
Args:
libgen_url: LibGen file.php URL
@@ -491,6 +497,7 @@ def _download_direct_file(
url: str,
output_dir: Path,
debug_logger: Optional[DebugLogger] = None,
quiet: bool = False,
) -> DownloadMediaResult:
"""Download a direct file (PDF, image, document, etc.) without yt-dlp."""
ensure_directory(output_dir)
@@ -535,9 +542,11 @@ def _download_direct_file(
extracted_name = match.group(1) or match.group(2)
if extracted_name:
filename = unquote(extracted_name)
debug(f"Filename from Content-Disposition: {filename}")
if not quiet:
debug(f"Filename from Content-Disposition: {filename}")
except Exception as e:
log(f"Could not get filename from headers: {e}", file=sys.stderr)
if not quiet:
log(f"Could not get filename from headers: {e}", file=sys.stderr)
# Fallback if we still don't have a good filename
if not filename or "." not in filename:
@@ -546,7 +555,8 @@ def _download_direct_file(
file_path = output_dir / filename
progress_bar = ProgressBar()
debug(f"Direct download: {filename}")
if not quiet:
debug(f"Direct download: {filename}")
try:
start_time = time.time()
@@ -577,7 +587,8 @@ def _download_direct_file(
speed_str=speed_str,
eta_str=eta_str,
)
debug(progress_line)
if not quiet:
debug(progress_line)
last_progress_time[0] = now
with HTTPClient(timeout=30.0) as client:
@@ -585,7 +596,8 @@ def _download_direct_file(
elapsed = time.time() - start_time
avg_speed_str = progress_bar.format_bytes(downloaded_bytes[0] / elapsed if elapsed > 0 else 0) + "/s"
debug(f"✓ Downloaded in {elapsed:.1f}s at {avg_speed_str}")
if not quiet:
debug(f"✓ Downloaded in {elapsed:.1f}s at {avg_speed_str}")
# For direct file downloads, create minimal info dict without filename as title
# This prevents creating duplicate title: tags when filename gets auto-generated
@@ -658,375 +670,98 @@ def _download_direct_file(
raise DownloadError(f"Error downloading file: {exc}") from exc
def probe_url(url: str, no_playlist: bool = False) -> Optional[Dict[str, Any]]:
def probe_url(url: str, no_playlist: bool = False, timeout_seconds: int = 15) -> Optional[Dict[str, Any]]:
"""Probe URL to extract metadata WITHOUT downloading.
Args:
url: URL to probe
no_playlist: If True, ignore playlists and probe only the single video
timeout_seconds: Max seconds to wait for probe (default 15s)
Returns:
Dict with keys: extractor, title, entries (if playlist), duration, etc.
Returns None if not supported by yt-dlp.
Returns None if not supported by yt-dlp or on timeout.
"""
if not is_url_supported_by_ytdlp(url):
return None
_ensure_yt_dlp_ready()
# Wrap probe in timeout to prevent hanging on large playlists
import threading
from typing import cast
assert yt_dlp is not None
try:
# Extract info without downloading
# Use extract_flat='in_playlist' to get full metadata for playlist items
ydl_opts = {
"quiet": True, # Suppress all output
"no_warnings": True,
"socket_timeout": 10,
"retries": 3,
"skip_download": True, # Don't actually download
"extract_flat": "in_playlist", # Get playlist with metadata for each entry
"noprogress": True, # No progress bars
}
# Add cookies if available
global_cookies = get_cookies_file_path()
if global_cookies:
ydl_opts["cookiefile"] = global_cookies
# Add no_playlist option if specified
if no_playlist:
ydl_opts["noplaylist"] = True
with yt_dlp.YoutubeDL(ydl_opts) as ydl: # type: ignore[arg-type]
info = ydl.extract_info(url, download=False)
if not isinstance(info, dict):
return None
# Extract relevant fields
return {
"extractor": info.get("extractor", ""),
"title": info.get("title", ""),
"entries": info.get("entries", []), # Will be populated if playlist
"duration": info.get("duration"),
"uploader": info.get("uploader"),
"description": info.get("description"),
"url": url,
}
except Exception as exc:
log(f"Probe failed for {url}: {exc}")
return None
def download_media(
opts: DownloadOptions,
*,
debug_logger: Optional[DebugLogger] = None,
) -> DownloadMediaResult:
"""Download media from URL using yt-dlp or direct HTTP download.
result_container: List[Optional[Any]] = [None, None] # [result, error]
Args:
opts: DownloadOptions with url, mode, output_dir, etc.
debug_logger: Optional debug logger for troubleshooting
Returns:
DownloadMediaResult with path, info, tags, hash
Raises:
DownloadError: If download fails
"""
# Handle LibGen URLs specially
# file.php redirects to mirrors, get.php is direct from modern API
if 'libgen' in opts.url.lower():
if '/get.php' in opts.url.lower():
# Modern API get.php links are direct downloads from mirrors (not file redirects)
log(f"Detected LibGen get.php URL, downloading directly...")
if debug_logger is not None:
debug_logger.write_record("libgen-direct", {"url": opts.url})
return _download_direct_file(opts.url, opts.output_dir, debug_logger)
elif '/file.php' in opts.url.lower():
# Old-style file.php redirects to mirrors, we need to resolve
log(f"Detected LibGen file.php URL, resolving to actual mirror...")
actual_url = _get_libgen_download_url(opts.url)
if actual_url and actual_url != opts.url:
log(f"Resolved LibGen URL to mirror: {actual_url}")
opts.url = actual_url
# After resolution, this will typically be an onion link or direct file
# Skip yt-dlp for this (it won't support onion/mirrors), go direct
if debug_logger is not None:
debug_logger.write_record("libgen-resolved", {"original": opts.url, "resolved": actual_url})
return _download_direct_file(opts.url, opts.output_dir, debug_logger)
else:
log(f"Could not resolve LibGen URL, trying direct download anyway", file=sys.stderr)
if debug_logger is not None:
debug_logger.write_record("libgen-resolve-failed", {"url": opts.url})
return _download_direct_file(opts.url, opts.output_dir, debug_logger)
# Handle GoFile shares with a dedicated resolver before yt-dlp/direct fallbacks
try:
netloc = urlparse(opts.url).netloc.lower()
except Exception:
netloc = ""
if "gofile.io" in netloc:
msg = "GoFile links are currently unsupported"
debug(msg)
if debug_logger is not None:
debug_logger.write_record("gofile-unsupported", {"url": opts.url})
raise DownloadError(msg)
# Determine if yt-dlp should be used
ytdlp_supported = is_url_supported_by_ytdlp(opts.url)
if ytdlp_supported:
probe_result = probe_url(opts.url, no_playlist=opts.no_playlist)
if probe_result is None:
log(f"URL supported by yt-dlp but no media detected, falling back to direct download: {opts.url}")
if debug_logger is not None:
debug_logger.write_record("ytdlp-skip-no-media", {"url": opts.url})
return _download_direct_file(opts.url, opts.output_dir, debug_logger)
else:
log(f"URL not supported by yt-dlp, trying direct download: {opts.url}")
if debug_logger is not None:
debug_logger.write_record("direct-file-attempt", {"url": opts.url})
return _download_direct_file(opts.url, opts.output_dir, debug_logger)
_ensure_yt_dlp_ready()
ytdl_options = _build_ytdlp_options(opts)
debug(f"Starting yt-dlp download: {opts.url}")
if debug_logger is not None:
debug_logger.write_record("ytdlp-start", {"url": opts.url})
assert yt_dlp is not None
try:
# Debug: show what options we're using
if ytdl_options.get("download_sections"):
debug(f"[yt-dlp] download_sections: {ytdl_options['download_sections']}")
debug(f"[yt-dlp] force_keyframes_at_cuts: {ytdl_options.get('force_keyframes_at_cuts', False)}")
# Use subprocess when download_sections are present (Python API doesn't support them properly)
session_id = None
first_section_info = {}
if ytdl_options.get("download_sections"):
session_id, first_section_info = _download_with_sections_via_cli(opts.url, ytdl_options, ytdl_options.get("download_sections", []))
info = None
else:
with yt_dlp.YoutubeDL(ytdl_options) as ydl: # type: ignore[arg-type]
info = ydl.extract_info(opts.url, download=True)
except Exception as exc:
log(f"yt-dlp failed: {exc}", file=sys.stderr)
if debug_logger is not None:
debug_logger.write_record(
"exception",
{
"phase": "yt-dlp",
"error": str(exc),
"traceback": traceback.format_exc(),
},
)
raise DownloadError("yt-dlp download failed") from exc
# If we used subprocess, we need to find the file manually
if info is None:
# Find files created/modified during this download (after we started)
# Look for files matching the expected output template pattern
def _do_probe() -> None:
try:
import glob
import time
import re
_ensure_yt_dlp_ready()
# Get the expected filename pattern from outtmpl
# For sections: "C:\path\{session_id}.section_1_of_3.ext", etc.
# For non-sections: "C:\path\title.ext"
# Wait a moment to ensure files are fully written
time.sleep(0.5)
# List all files in output_dir, sorted by modification time
files = sorted(opts.output_dir.iterdir(), key=lambda p: p.stat().st_mtime, reverse=True)
if not files:
raise FileNotFoundError(f"No files found in {opts.output_dir}")
# If we downloaded sections, look for files with the session_id pattern
if opts.clip_sections and session_id:
# Pattern: "{session_id}_1.ext", "{session_id}_2.ext", etc.
section_pattern = re.compile(rf'^{re.escape(session_id)}_(\d+)\.')
matching_files = [f for f in files if section_pattern.search(f.name)]
if matching_files:
# Sort by section number to ensure correct order
def extract_section_num(path: Path) -> int:
match = section_pattern.search(path.name)
return int(match.group(1)) if match else 999
matching_files.sort(key=extract_section_num)
debug(f"Found {len(matching_files)} section file(s) matching pattern")
# Now rename section files to use hash-based names
# This ensures unique filenames for each section content
renamed_files = []
for idx, section_file in enumerate(matching_files, 1):
try:
# Calculate hash for the file
file_hash = sha256_file(section_file)
ext = section_file.suffix
new_name = f"{file_hash}{ext}"
new_path = opts.output_dir / new_name
if new_path.exists() and new_path != section_file:
# If file with same hash exists, use it and delete the temp one
debug(f"File with hash {file_hash} already exists, using existing file.")
try:
section_file.unlink()
except OSError:
pass
renamed_files.append(new_path)
else:
section_file.rename(new_path)
debug(f"Renamed section file: {section_file.name}{new_name}")
renamed_files.append(new_path)
except Exception as e:
debug(f"Failed to process section file {section_file.name}: {e}")
renamed_files.append(section_file)
media_path = renamed_files[0]
media_paths = renamed_files
debug(f"✓ Downloaded {len(media_paths)} section file(s) (session: {session_id})")
else:
# Fallback to most recent file if pattern not found
media_path = files[0]
media_paths = None
debug(f"✓ Downloaded section file (pattern not found): {media_path.name}")
else:
# No sections, just take the most recent file
media_path = files[0]
media_paths = None
debug(f"✓ Downloaded: {media_path.name}")
if debug_logger is not None:
debug_logger.write_record("ytdlp-file-found", {"path": str(media_path)})
except Exception as exc:
log(f"Error finding downloaded file: {exc}", file=sys.stderr)
if debug_logger is not None:
debug_logger.write_record(
"exception",
{"phase": "find-file", "error": str(exc)},
)
raise DownloadError(str(exc)) from exc
# Create result with minimal data extracted from filename
file_hash = sha256_file(media_path)
# For section downloads, create tags with the title and build proper info dict
tags = []
title = ''
if first_section_info:
title = first_section_info.get('title', '')
if title:
tags.append(f'title:{title}')
debug(f"Added title tag for section download: {title}")
# Build info dict - always use extracted title if available, not hash
if first_section_info:
info_dict = first_section_info
else:
info_dict = {
"id": media_path.stem,
"title": title or media_path.stem,
"ext": media_path.suffix.lstrip(".")
assert yt_dlp is not None
# Extract info without downloading
# Use extract_flat='in_playlist' to get full metadata for playlist items
ydl_opts = {
"quiet": True, # Suppress all output
"no_warnings": True,
"socket_timeout": 10,
"retries": 2, # Reduce retries for faster timeout
"skip_download": True, # Don't actually download
"extract_flat": "in_playlist", # Get playlist with metadata for each entry
"noprogress": True, # No progress bars
}
return DownloadMediaResult(
path=media_path,
info=info_dict,
tags=tags,
source_url=opts.url,
hash_value=file_hash,
paths=media_paths, # Include all section files if present
)
# Add cookies if available (lazy import to avoid circular dependency)
from hydrus_health_check import get_cookies_file_path # local import
if not isinstance(info, dict):
log(f"Unexpected yt-dlp response: {type(info)}", file=sys.stderr)
raise DownloadError("Unexpected yt-dlp response type")
info_dict: Dict[str, Any] = info
if debug_logger is not None:
debug_logger.write_record(
"ytdlp-info",
{
"keys": sorted(info_dict.keys()),
"is_playlist": bool(info_dict.get("entries")),
},
)
try:
entry, media_path = _resolve_entry_and_path(info_dict, opts.output_dir)
except FileNotFoundError as exc:
log(f"Error: {exc}", file=sys.stderr)
if debug_logger is not None:
debug_logger.write_record(
"exception",
{"phase": "resolve-path", "error": str(exc)},
)
raise DownloadError(str(exc)) from exc
if debug_logger is not None:
debug_logger.write_record(
"resolved-media",
{"path": str(media_path), "entry_keys": sorted(entry.keys())},
)
# Extract hash from metadata or compute
hash_value = _extract_sha256(entry) or _extract_sha256(info_dict)
if not hash_value:
try:
hash_value = sha256_file(media_path)
except OSError as exc:
if debug_logger is not None:
debug_logger.write_record(
"hash-error",
{"path": str(media_path), "error": str(exc)},
)
# Extract tags using metadata.py
tags = []
if extract_ytdlp_tags:
try:
tags = extract_ytdlp_tags(entry)
except Exception as e:
log(f"Error extracting tags: {e}", file=sys.stderr)
source_url = (
entry.get("webpage_url")
or entry.get("original_url")
or entry.get("url")
)
debug(f"✓ Downloaded: {media_path.name} ({len(tags)} tags)")
if debug_logger is not None:
debug_logger.write_record(
"downloaded",
{
"path": str(media_path),
"tag_count": len(tags),
"source_url": source_url,
"sha256": hash_value,
},
)
return DownloadMediaResult(
path=media_path,
info=entry,
tags=tags,
source_url=source_url,
hash_value=hash_value,
)
global_cookies = get_cookies_file_path()
if global_cookies:
ydl_opts["cookiefile"] = global_cookies
# Add no_playlist option if specified
if no_playlist:
ydl_opts["noplaylist"] = True
with yt_dlp.YoutubeDL(ydl_opts) as ydl: # type: ignore[arg-type]
info = ydl.extract_info(url, download=False)
if not isinstance(info, dict):
result_container[0] = None
return
# Extract relevant fields
result_container[0] = {
"extractor": info.get("extractor", ""),
"title": info.get("title", ""),
"entries": info.get("entries", []), # Will be populated if playlist
"duration": info.get("duration"),
"uploader": info.get("uploader"),
"description": info.get("description"),
"url": url,
}
except Exception as exc:
log(f"Probe error for {url}: {exc}")
result_container[1] = exc
thread = threading.Thread(target=_do_probe, daemon=False)
thread.start()
thread.join(timeout=timeout_seconds)
if thread.is_alive():
# Probe timed out - return None to fall back to direct download
debug(f"Probe timeout for {url} (>={timeout_seconds}s), proceeding with download")
return None
if result_container[1] is not None:
# Probe error - return None to proceed anyway
return None
return cast(Optional[Dict[str, Any]], result_container[0])
__all__ = [
"download_media",
"is_url_supported_by_ytdlp",
"list_formats",
"probe_url",
"DownloadError",
"DownloadOptions",
"DownloadMediaResult",
]

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -73,7 +73,7 @@ class HydrusRequestSpec:
class HydrusClient:
"""Thin wrapper around the Hydrus Client API."""
base_url: str
url: str
access_key: str = ""
timeout: float = 60.0
@@ -84,10 +84,10 @@ class HydrusClient:
_session_key: str = field(init=False, default="", repr=False) # Cached session key
def __post_init__(self) -> None:
if not self.base_url:
if not self.url:
raise ValueError("Hydrus base URL is required")
self.base_url = self.base_url.rstrip("/")
parsed = urlsplit(self.base_url)
self.url = self.url.rstrip("/")
parsed = urlsplit(self.url)
if parsed.scheme not in {"http", "https"}:
raise ValueError("Hydrus base URL must use http or https")
self.scheme = parsed.scheme
@@ -374,24 +374,24 @@ class HydrusClient:
hashes = self._ensure_hashes(file_hashes)
if len(hashes) == 1:
body = {"hash": hashes[0], "url_to_add": url}
return self._post("/add_urls/associate_url", data=body)
return self._post("/add_url/associate_url", data=body)
results: dict[str, Any] = {}
for file_hash in hashes:
body = {"hash": file_hash, "url_to_add": url}
results[file_hash] = self._post("/add_urls/associate_url", data=body)
results[file_hash] = self._post("/add_url/associate_url", data=body)
return {"batched": results}
def delete_url(self, file_hashes: Union[str, Iterable[str]], url: str) -> dict[str, Any]:
hashes = self._ensure_hashes(file_hashes)
if len(hashes) == 1:
body = {"hash": hashes[0], "url_to_delete": url}
return self._post("/add_urls/associate_url", data=body)
return self._post("/add_url/associate_url", data=body)
results: dict[str, Any] = {}
for file_hash in hashes:
body = {"hash": file_hash, "url_to_delete": url}
results[file_hash] = self._post("/add_urls/associate_url", data=body)
results[file_hash] = self._post("/add_url/associate_url", data=body)
return {"batched": results}
def set_notes(self, file_hashes: Union[str, Iterable[str]], notes: dict[str, str], service_name: str) -> dict[str, Any]:
@@ -517,7 +517,7 @@ class HydrusClient:
file_ids: Sequence[int] | None = None,
hashes: Sequence[str] | None = None,
include_service_keys_to_tags: bool = True,
include_file_urls: bool = False,
include_file_url: bool = False,
include_duration: bool = True,
include_size: bool = True,
include_mime: bool = False,
@@ -535,7 +535,7 @@ class HydrusClient:
include_service_keys_to_tags,
lambda v: "true" if v else None,
),
("include_file_urls", include_file_urls, lambda v: "true" if v else None),
("include_file_url", include_file_url, lambda v: "true" if v else None),
("include_duration", include_duration, lambda v: "true" if v else None),
("include_size", include_size, lambda v: "true" if v else None),
("include_mime", include_mime, lambda v: "true" if v else None),
@@ -559,13 +559,13 @@ class HydrusClient:
def file_url(self, file_hash: str) -> str:
hash_param = quote(file_hash)
# Don't append access_key parameter for file downloads - use header instead
url = f"{self.base_url}/get_files/file?hash={hash_param}"
url = f"{self.url}/get_files/file?hash={hash_param}"
return url
def thumbnail_url(self, file_hash: str) -> str:
hash_param = quote(file_hash)
# Don't append access_key parameter for file downloads - use header instead
url = f"{self.base_url}/get_files/thumbnail?hash={hash_param}"
url = f"{self.url}/get_files/thumbnail?hash={hash_param}"
return url
@@ -612,7 +612,7 @@ def hydrus_request(args, parser) -> int:
parsed = urlsplit(options.url)
if parsed.scheme not in ('http', 'https'):
parser.error('Only http and https URLs are supported')
parser.error('Only http and https url are supported')
if not parsed.hostname:
parser.error('Invalid Hydrus URL')
@@ -1064,7 +1064,7 @@ def hydrus_export(args, _parser) -> int:
file_hash = getattr(args, 'file_hash', None) or _extract_hash(args.file_url)
if hydrus_url and file_hash:
try:
client = HydrusClient(base_url=hydrus_url, access_key=args.access_key, timeout=args.timeout)
client = HydrusClient(url=hydrus_url, access_key=args.access_key, timeout=args.timeout)
meta_response = client.fetch_file_metadata(hashes=[file_hash], include_mime=True)
entries = meta_response.get('metadata') if isinstance(meta_response, dict) else None
if isinstance(entries, list) and entries:
@@ -1301,8 +1301,7 @@ def is_available(config: dict[str, Any], use_cache: bool = True) -> tuple[bool,
Performs a lightweight probe to verify:
- Hydrus URL is configured
- Hydrus client library is available
- Can connect to Hydrus and retrieve services
- Can connect to Hydrus URL/port
Results are cached per session unless use_cache=False.
@@ -1330,50 +1329,43 @@ def is_available(config: dict[str, Any], use_cache: bool = True) -> tuple[bool,
return False, reason
access_key = get_hydrus_access_key(config, "home") or ""
if not access_key:
reason = "Hydrus access key not configured"
_HYDRUS_AVAILABLE = False
_HYDRUS_UNAVAILABLE_REASON = reason
return False, reason
timeout_raw = config.get("HydrusNetwork_Request_Timeout")
try:
timeout = float(timeout_raw) if timeout_raw is not None else 10.0
timeout = float(timeout_raw) if timeout_raw is not None else 5.0
except (TypeError, ValueError):
timeout = 10.0
timeout = 5.0
try:
# Use HTTPClient directly to avoid session key logic and reduce retries
# This prevents log spam when Hydrus is offline (avoiding 3 retries x 2 requests)
from helper.http_client import HTTPClient
# Simple TCP connection test to URL/port
import socket
from urllib.parse import urlparse
probe_url = f"{url.rstrip('/')}/get_services"
parsed = urlparse(url)
hostname = parsed.hostname or 'localhost'
port = parsed.port or (443 if parsed.scheme == 'https' else 80)
headers = {}
if access_key:
headers["Hydrus-Client-API-Access-Key"] = access_key
# Suppress HTTPClient logging during probe to avoid "Request failed" logs on startup
http_logger = logging.getLogger("helper.http_client")
original_level = http_logger.level
http_logger.setLevel(logging.CRITICAL)
# Try to connect to the host/port
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(timeout)
try:
# Use retries=1 (single attempt, no retry) to fail fast
with HTTPClient(timeout=timeout, retries=1, headers=headers, verify_ssl=False) as http:
try:
response = http.get(probe_url)
if response.status_code == 200:
_HYDRUS_AVAILABLE = True
_HYDRUS_UNAVAILABLE_REASON = None
return True, None
else:
# Even if we get a 4xx/5xx, the service is "reachable" but maybe auth failed
# But for "availability" we usually mean "usable".
# If auth fails (403), we can't use it, so return False.
reason = f"HTTP {response.status_code}: {response.reason_phrase}"
_HYDRUS_AVAILABLE = False
_HYDRUS_UNAVAILABLE_REASON = reason
return False, reason
except Exception as e:
# This catches connection errors from HTTPClient
raise e
result = sock.connect_ex((hostname, port))
if result == 0:
_HYDRUS_AVAILABLE = True
_HYDRUS_UNAVAILABLE_REASON = None
return True, None
else:
reason = f"Cannot connect to {hostname}:{port}"
_HYDRUS_AVAILABLE = False
_HYDRUS_UNAVAILABLE_REASON = reason
return False, reason
finally:
http_logger.setLevel(original_level)
sock.close()
except Exception as exc:
reason = str(exc)

View File

@@ -2,15 +2,29 @@
import sys
import inspect
import threading
from pathlib import Path
_DEBUG_ENABLED = False
_thread_local = threading.local()
def set_thread_stream(stream):
"""Set a custom output stream for the current thread."""
_thread_local.stream = stream
def get_thread_stream():
"""Get the custom output stream for the current thread, if any."""
return getattr(_thread_local, 'stream', None)
def set_debug(enabled: bool) -> None:
"""Enable or disable debug logging."""
global _DEBUG_ENABLED
_DEBUG_ENABLED = enabled
def is_debug_enabled() -> bool:
"""Check if debug logging is enabled."""
return _DEBUG_ENABLED
def debug(*args, **kwargs) -> None:
"""Print debug message if debug logging is enabled.
@@ -18,9 +32,22 @@ def debug(*args, **kwargs) -> None:
"""
if not _DEBUG_ENABLED:
return
# Check if stderr has been redirected to /dev/null (quiet mode)
# If so, skip output to avoid queuing in background worker's capture
try:
stderr_name = getattr(sys.stderr, 'name', '')
if 'nul' in str(stderr_name).lower() or '/dev/null' in str(stderr_name):
return
except Exception:
pass
# Check for thread-local stream first
stream = get_thread_stream()
if stream:
kwargs['file'] = stream
# Set default to stderr for debug messages
if 'file' not in kwargs:
elif 'file' not in kwargs:
kwargs['file'] = sys.stderr
# Prepend DEBUG label
@@ -59,8 +86,12 @@ def log(*args, **kwargs) -> None:
# Get function name
func_name = caller_frame.f_code.co_name
# Check for thread-local stream first
stream = get_thread_stream()
if stream:
kwargs['file'] = stream
# Set default to stdout if not specified
if 'file' not in kwargs:
elif 'file' not in kwargs:
kwargs['file'] = sys.stdout
if add_prefix:

View File

@@ -96,7 +96,7 @@ class MPVfile:
relationship_metadata: Dict[str, Any] = field(default_factory=dict)
tags: List[str] = field(default_factory=list)
original_tags: Dict[str, str] = field(default_factory=dict)
known_urls: List[str] = field(default_factory=list)
url: List[str] = field(default_factory=list)
title: Optional[str] = None
source_url: Optional[str] = None
clip_time: Optional[str] = None
@@ -128,7 +128,7 @@ class MPVfile:
"relationship_metadata": self.relationship_metadata,
"tags": self.tags,
"original_tags": self.original_tags,
"known_urls": self.known_urls,
"url": self.url,
"title": self.title,
"source_url": self.source_url,
"clip_time": self.clip_time,
@@ -293,10 +293,10 @@ class MPVFileBuilder:
if s.tags:
s.original_tags = {tag: tag for tag in s.tags}
# known URLs + last_url
s.known_urls = _normalise_string_list(p.get("known_urls"))
if self.last_url and self.last_url not in s.known_urls:
s.known_urls.append(self.last_url)
# known url + last_url
s.url = _normalise_string_list(p.get("url"))
if self.last_url and self.last_url not in s.url:
s.url.append(self.last_url)
# source URL (explicit or fallback to last_url)
explicit_source = p.get("source_url")
@@ -500,8 +500,8 @@ class MPVFileBuilder:
self._apply_hydrus_result(result)
self.state.type = "hydrus"
matched_url = result.get("matched_url") or result.get("url")
if matched_url and matched_url not in self.state.known_urls:
self.state.known_urls.append(matched_url)
if matched_url and matched_url not in self.state.url:
self.state.url.append(matched_url)
# Enrich relationships once we know the hash
if self.include_relationships and self.state.hash and self.hydrus_settings.base_url:
self._enrich_relationships_from_api(self.state.hash)
@@ -527,7 +527,7 @@ class MPVFileBuilder:
metadata_payload["type"] = "other"
self.state.metadata = metadata_payload
# Do NOT overwrite MPVfile.type with metadata.type
self._merge_known_urls(metadata_payload.get("known_urls") or metadata_payload.get("known_urls_set"))
self._merge_url(metadata_payload.get("url") or metadata_payload.get("url_set"))
source_url = metadata_payload.get("original_url") or metadata_payload.get("source_url")
if source_url and not self.state.source_url:
self.state.source_url = self._normalise_url(source_url)
@@ -722,7 +722,7 @@ class MPVFileBuilder:
include_service_keys_to_tags=True,
include_duration=True,
include_size=True,
include_file_urls=False,
include_file_url=False,
include_mime=False,
)
except HydrusRequestError as hre: # pragma: no cover
@@ -801,11 +801,11 @@ class MPVFileBuilder:
if tag not in self.state.original_tags:
self.state.original_tags[tag] = tag
def _merge_known_urls(self, urls: Optional[Iterable[Any]]) -> None:
if not urls:
def _merge_url(self, url: Optional[Iterable[Any]]) -> None:
if not url:
return
combined = list(self.state.known_urls or []) + _normalise_string_list(urls)
self.state.known_urls = unique_preserve_order(combined)
combined = list(self.state.url or []) + _normalise_string_list(url)
self.state.url = unique_preserve_order(combined)
def _load_sidecar_tags(self, local_path: str) -> None:
try:
@@ -821,7 +821,7 @@ class MPVFileBuilder:
if hash_value and not self.state.hash and _looks_like_hash(hash_value):
self.state.hash = hash_value.lower()
self._merge_tags(tags)
self._merge_known_urls(known)
self._merge_url(known)
break
def _read_sidecar(self, sidecar_path: Path) -> tuple[Optional[str], List[str], List[str]]:
@@ -831,7 +831,7 @@ class MPVFileBuilder:
return None, [], []
hash_value: Optional[str] = None
tags: List[str] = []
known_urls: List[str] = []
url: List[str] = []
for line in raw.splitlines():
trimmed = line.strip()
if not trimmed:
@@ -841,13 +841,13 @@ class MPVFileBuilder:
candidate = trimmed.split(":", 1)[1].strip() if ":" in trimmed else ""
if candidate:
hash_value = candidate
elif lowered.startswith("known_url:") or lowered.startswith("url:"):
elif lowered.startswith("url:") or lowered.startswith("url:"):
candidate = trimmed.split(":", 1)[1].strip() if ":" in trimmed else ""
if candidate:
known_urls.append(candidate)
url.append(candidate)
else:
tags.append(trimmed)
return hash_value, tags, known_urls
return hash_value, tags, url
def _compute_local_hash(self, local_path: str) -> None:
try:
@@ -864,8 +864,8 @@ class MPVFileBuilder:
def _finalise(self) -> None:
if self.state.tags:
self.state.tags = unique_preserve_order(self.state.tags)
if self.state.known_urls:
self.state.known_urls = unique_preserve_order(self.state.known_urls)
if self.state.url:
self.state.url = unique_preserve_order(self.state.url)
# Ensure metadata.type is always present for Lua, but do NOT overwrite MPVfile.type
if not self.state.title:
if self.state.metadata.get("title"):

View File

@@ -85,7 +85,7 @@ def _normalize_target(text: Optional[str]) -> Optional[str]:
except Exception:
pass
# Normalize paths/urls for comparison
# Normalize paths/url for comparison
return lower.replace('\\', '\\')

818
helper/provider.py Normal file
View File

@@ -0,0 +1,818 @@
"""Provider interfaces for search and file upload functionality.
This module defines two distinct provider types:
1. SearchProvider: For searching content (books, music, videos, games)
2. FileProvider: For uploading files to hosting services
No legacy code or backwards compatibility - clean, single source of truth.
"""
from __future__ import annotations
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional, Tuple
from dataclasses import dataclass, field
from pathlib import Path
import sys
import os
import json
import re
import time
import asyncio
import subprocess
import shutil
import mimetypes
import traceback
import requests
from helper.logger import log, debug
# Optional dependencies
try:
from playwright.sync_api import sync_playwright
PLAYWRIGHT_AVAILABLE = True
except ImportError:
PLAYWRIGHT_AVAILABLE = False
# ============================================================================
# SEARCH PROVIDERS
# ============================================================================
@dataclass
class SearchResult:
"""Unified search result format across all search providers."""
origin: str # Provider name: "libgen", "soulseek", "debrid", "bandcamp", etc.
title: str # Display title/filename
path: str # Download target (URL, path, magnet, identifier)
detail: str = "" # Additional description
annotations: List[str] = field(default_factory=list) # Tags: ["120MB", "flac", "ready"]
media_kind: str = "other" # Type: "book", "audio", "video", "game", "magnet"
size_bytes: Optional[int] = None
tags: set[str] = field(default_factory=set) # Searchable tags
columns: List[Tuple[str, str]] = field(default_factory=list) # Display columns
full_metadata: Dict[str, Any] = field(default_factory=dict) # Extra metadata
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary for pipeline processing."""
return {
"origin": self.origin,
"title": self.title,
"path": self.path,
"detail": self.detail,
"annotations": self.annotations,
"media_kind": self.media_kind,
"size_bytes": self.size_bytes,
"tags": list(self.tags),
"columns": list(self.columns),
"full_metadata": self.full_metadata,
}
class SearchProvider(ABC):
"""Base class for search providers."""
def __init__(self, config: Dict[str, Any] = None):
self.config = config or {}
self.name = self.__class__.__name__.lower()
@abstractmethod
def search(
self,
query: str,
limit: int = 50,
filters: Optional[Dict[str, Any]] = None,
**kwargs
) -> List[SearchResult]:
"""Search for items matching the query.
Args:
query: Search query string
limit: Maximum results to return
filters: Optional filtering criteria
**kwargs: Provider-specific arguments
Returns:
List of SearchResult objects
"""
pass
def validate(self) -> bool:
"""Check if provider is available and properly configured."""
return True
class Libgen(SearchProvider):
"""Search provider for Library Genesis books."""
def search(
self,
query: str,
limit: int = 50,
filters: Optional[Dict[str, Any]] = None,
**kwargs
) -> List[SearchResult]:
filters = filters or {}
try:
from helper.unified_book_downloader import UnifiedBookDownloader
from helper.query_parser import parse_query, get_field, get_free_text
parsed = parse_query(query)
isbn = get_field(parsed, 'isbn')
author = get_field(parsed, 'author')
title = get_field(parsed, 'title')
free_text = get_free_text(parsed)
search_query = isbn or title or author or free_text or query
downloader = UnifiedBookDownloader(config=self.config)
books = downloader.search_libgen(search_query, limit=limit)
results = []
for idx, book in enumerate(books, 1):
title = book.get("title", "Unknown")
author = book.get("author", "Unknown")
year = book.get("year", "Unknown")
pages = book.get("pages") or book.get("pages_str") or ""
extension = book.get("extension", "") or book.get("ext", "")
filesize = book.get("filesize_str", "Unknown")
isbn = book.get("isbn", "")
mirror_url = book.get("mirror_url", "")
columns = [
("Title", title),
("Author", author),
("Pages", str(pages)),
("Ext", str(extension)),
]
detail = f"By: {author}"
if year and year != "Unknown":
detail += f" ({year})"
annotations = [f"{filesize}"]
if isbn:
annotations.append(f"ISBN: {isbn}")
results.append(SearchResult(
origin="libgen",
title=title,
path=mirror_url or f"libgen:{book.get('id', '')}",
detail=detail,
annotations=annotations,
media_kind="book",
columns=columns,
full_metadata={
"number": idx,
"author": author,
"year": year,
"isbn": isbn,
"filesize": filesize,
"pages": pages,
"extension": extension,
"book_id": book.get("book_id", ""),
"md5": book.get("md5", ""),
},
))
return results
except Exception as e:
log(f"[libgen] Search error: {e}", file=sys.stderr)
return []
def validate(self) -> bool:
try:
from helper.unified_book_downloader import UnifiedBookDownloader
return True
except Exception:
return False
class Soulseek(SearchProvider):
"""Search provider for Soulseek P2P network."""
MUSIC_EXTENSIONS = {
'.flac', '.mp3', '.m4a', '.aac', '.ogg', '.opus',
'.wav', '.alac', '.wma', '.ape', '.aiff', '.dsf',
'.dff', '.wv', '.tta', '.tak', '.ac3', '.dts'
}
USERNAME = "asjhkjljhkjfdsd334"
PASSWORD = "khhhg"
DOWNLOAD_DIR = "./downloads"
MAX_WAIT_TRANSFER = 1200
async def perform_search(
self,
query: str,
timeout: float = 9.0,
limit: int = 50
) -> List[Dict[str, Any]]:
"""Perform async Soulseek search."""
import os
from aioslsk.client import SoulSeekClient
from aioslsk.settings import Settings, CredentialsSettings
os.makedirs(self.DOWNLOAD_DIR, exist_ok=True)
settings = Settings(credentials=CredentialsSettings(username=self.USERNAME, password=self.PASSWORD))
client = SoulSeekClient(settings)
try:
await client.start()
await client.login()
except Exception as e:
log(f"[soulseek] Login failed: {type(e).__name__}: {e}", file=sys.stderr)
return []
try:
search_request = await client.searches.search(query)
await self._collect_results(client, search_request, timeout=timeout)
return self._flatten_results(search_request)[:limit]
except Exception as e:
log(f"[soulseek] Search error: {type(e).__name__}: {e}", file=sys.stderr)
return []
finally:
try:
await client.stop()
except Exception:
pass
def _flatten_results(self, search_request) -> List[dict]:
flat = []
for result in search_request.results:
username = getattr(result, "username", "?")
for file_data in getattr(result, "shared_items", []):
flat.append({
"file": file_data,
"username": username,
"filename": getattr(file_data, "filename", "?"),
"size": getattr(file_data, "filesize", 0),
})
for file_data in getattr(result, "locked_results", []):
flat.append({
"file": file_data,
"username": username,
"filename": getattr(file_data, "filename", "?"),
"size": getattr(file_data, "filesize", 0),
})
return flat
async def _collect_results(self, client, search_request, timeout: float = 75.0) -> None:
end = time.time() + timeout
last_count = 0
while time.time() < end:
current_count = len(search_request.results)
if current_count > last_count:
debug(f"[soulseek] Got {current_count} result(s)...")
last_count = current_count
await asyncio.sleep(0.5)
def search(
self,
query: str,
limit: int = 50,
filters: Optional[Dict[str, Any]] = None,
**kwargs
) -> List[SearchResult]:
filters = filters or {}
try:
flat_results = asyncio.run(self.perform_search(query, timeout=9.0, limit=limit))
if not flat_results:
return []
# Filter to music files only
music_results = []
for item in flat_results:
filename = item['filename']
ext = '.' + filename.rsplit('.', 1)[-1].lower() if '.' in filename else ''
if ext in self.MUSIC_EXTENSIONS:
music_results.append(item)
if not music_results:
return []
# Extract metadata
enriched_results = []
for item in music_results:
filename = item['filename']
ext = '.' + filename.rsplit('.', 1)[-1].lower() if '.' in filename else ''
# Get display filename
display_name = filename.split('\\')[-1] if '\\' in filename else filename.split('/')[-1] if '/' in filename else filename
# Extract path hierarchy
path_parts = filename.replace('\\', '/').split('/')
artist = path_parts[-3] if len(path_parts) >= 3 else ''
album = path_parts[-2] if len(path_parts) >= 3 else path_parts[-2] if len(path_parts) == 2 else ''
# Extract track number and title
base_name = display_name.rsplit('.', 1)[0] if '.' in display_name else display_name
track_num = ''
title = base_name
filename_artist = ''
match = re.match(r'^(\d{1,3})\s*[\.\-]?\s+(.+)$', base_name)
if match:
track_num = match.group(1)
rest = match.group(2)
if ' - ' in rest:
filename_artist, title = rest.split(' - ', 1)
else:
title = rest
if filename_artist:
artist = filename_artist
enriched_results.append({
**item,
'artist': artist,
'album': album,
'title': title,
'track_num': track_num,
'ext': ext
})
# Apply filters
if filters:
artist_filter = filters.get('artist', '').lower() if filters.get('artist') else ''
album_filter = filters.get('album', '').lower() if filters.get('album') else ''
track_filter = filters.get('track', '').lower() if filters.get('track') else ''
if artist_filter or album_filter or track_filter:
filtered = []
for item in enriched_results:
if artist_filter and artist_filter not in item['artist'].lower():
continue
if album_filter and album_filter not in item['album'].lower():
continue
if track_filter and track_filter not in item['title'].lower():
continue
filtered.append(item)
enriched_results = filtered
# Sort: .flac first, then by size
enriched_results.sort(key=lambda item: (item['ext'].lower() != '.flac', -item['size']))
# Convert to SearchResult
results = []
for idx, item in enumerate(enriched_results, 1):
artist_display = item['artist'] if item['artist'] else "(no artist)"
album_display = item['album'] if item['album'] else "(no album)"
size_mb = int(item['size'] / 1024 / 1024)
columns = [
("Track", item['track_num'] or "?"),
("Title", item['title'][:40]),
("Artist", artist_display[:32]),
("Album", album_display[:32]),
("Size", f"{size_mb} MB"),
]
results.append(SearchResult(
origin="soulseek",
title=item['title'],
path=item['filename'],
detail=f"{artist_display} - {album_display}",
annotations=[f"{size_mb} MB", item['ext'].lstrip('.').upper()],
media_kind="audio",
size_bytes=item['size'],
columns=columns,
full_metadata={
"username": item['username'],
"filename": item['filename'],
"artist": item['artist'],
"album": item['album'],
"track_num": item['track_num'],
"ext": item['ext'],
},
))
return results
except Exception as e:
log(f"[soulseek] Search error: {e}", file=sys.stderr)
return []
def validate(self) -> bool:
try:
from aioslsk.client import SoulSeekClient
return True
except ImportError:
return False
class Bandcamp(SearchProvider):
"""Search provider for Bandcamp."""
def search(
self,
query: str,
limit: int = 50,
filters: Optional[Dict[str, Any]] = None,
**kwargs
) -> List[SearchResult]:
if not PLAYWRIGHT_AVAILABLE:
log("[bandcamp] Playwright not available. Install with: pip install playwright", file=sys.stderr)
return []
results = []
try:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Parse query for artist: prefix
if query.strip().lower().startswith("artist:"):
artist_name = query[7:].strip().strip('"')
search_url = f"https://bandcamp.com/search?q={artist_name}&item_type=b"
else:
search_url = f"https://bandcamp.com/search?q={query}&item_type=a"
results = self._scrape_url(page, search_url, limit)
browser.close()
except Exception as e:
log(f"[bandcamp] Search error: {e}", file=sys.stderr)
return []
return results
def _scrape_url(self, page, url: str, limit: int) -> List[SearchResult]:
debug(f"[bandcamp] Scraping: {url}")
page.goto(url)
page.wait_for_load_state("domcontentloaded")
results = []
# Check for search results
search_results = page.query_selector_all(".searchresult")
if search_results:
for item in search_results[:limit]:
try:
heading = item.query_selector(".heading")
if not heading:
continue
link = heading.query_selector("a")
if not link:
continue
title = link.inner_text().strip()
target_url = link.get_attribute("href")
subhead = item.query_selector(".subhead")
artist = subhead.inner_text().strip() if subhead else "Unknown"
itemtype = item.query_selector(".itemtype")
media_type = itemtype.inner_text().strip() if itemtype else "album"
results.append(SearchResult(
origin="bandcamp",
title=title,
path=target_url,
detail=f"By: {artist}",
annotations=[media_type],
media_kind="audio",
columns=[
("Name", title),
("Artist", artist),
("Type", media_type),
],
full_metadata={
"artist": artist,
"type": media_type,
},
))
except Exception as e:
debug(f"[bandcamp] Error parsing result: {e}")
continue
return results
def validate(self) -> bool:
return PLAYWRIGHT_AVAILABLE
class YouTube(SearchProvider):
"""Search provider for YouTube using yt-dlp."""
def search(
self,
query: str,
limit: int = 10,
filters: Optional[Dict[str, Any]] = None,
**kwargs
) -> List[SearchResult]:
ytdlp_path = shutil.which("yt-dlp")
if not ytdlp_path:
log("[youtube] yt-dlp not found in PATH", file=sys.stderr)
return []
search_query = f"ytsearch{limit}:{query}"
cmd = [
ytdlp_path,
"--dump-json",
"--flat-playlist",
"--no-warnings",
search_query
]
try:
process = subprocess.run(
cmd,
capture_output=True,
text=True,
encoding="utf-8",
errors="replace"
)
if process.returncode != 0:
log(f"[youtube] yt-dlp failed: {process.stderr}", file=sys.stderr)
return []
results = []
for line in process.stdout.splitlines():
if not line.strip():
continue
try:
video_data = json.loads(line)
title = video_data.get("title", "Unknown")
video_id = video_data.get("id", "")
url = video_data.get("url") or f"https://youtube.com/watch?v={video_id}"
uploader = video_data.get("uploader", "Unknown")
duration = video_data.get("duration", 0)
view_count = video_data.get("view_count", 0)
duration_str = f"{int(duration//60)}:{int(duration%60):02d}" if duration else ""
views_str = f"{view_count:,}" if view_count else ""
results.append(SearchResult(
origin="youtube",
title=title,
path=url,
detail=f"By: {uploader}",
annotations=[duration_str, f"{views_str} views"],
media_kind="video",
columns=[
("Title", title),
("Uploader", uploader),
("Duration", duration_str),
("Views", views_str),
],
full_metadata={
"video_id": video_id,
"uploader": uploader,
"duration": duration,
"view_count": view_count,
},
))
except json.JSONDecodeError:
continue
return results
except Exception as e:
log(f"[youtube] Error: {e}", file=sys.stderr)
return []
def validate(self) -> bool:
return shutil.which("yt-dlp") is not None
def pipe(self, path: str, config: Optional[Dict[str, Any]] = None) -> Optional[str]:
"""Return the playable URL for MPV (just the path for YouTube)."""
return path
# Search provider registry
_SEARCH_PROVIDERS = {
"libgen": Libgen,
"soulseek": Soulseek,
"bandcamp": Bandcamp,
"youtube": YouTube,
}
def get_search_provider(name: str, config: Optional[Dict[str, Any]] = None) -> Optional[SearchProvider]:
"""Get a search provider by name."""
provider_class = _SEARCH_PROVIDERS.get(name.lower())
if provider_class is None:
log(f"[provider] Unknown search provider: {name}", file=sys.stderr)
return None
try:
provider = provider_class(config)
if not provider.validate():
log(f"[provider] Provider '{name}' is not available", file=sys.stderr)
return None
return provider
except Exception as e:
log(f"[provider] Error initializing '{name}': {e}", file=sys.stderr)
return None
def list_search_providers(config: Optional[Dict[str, Any]] = None) -> Dict[str, bool]:
"""List all search providers and their availability."""
availability = {}
for name, provider_class in _SEARCH_PROVIDERS.items():
try:
provider = provider_class(config)
availability[name] = provider.validate()
except Exception:
availability[name] = False
return availability
# ============================================================================
# FILE PROVIDERS
# ============================================================================
class FileProvider(ABC):
"""Base class for file upload providers."""
def __init__(self, config: Optional[Dict[str, Any]] = None):
self.config = config or {}
self.name = self.__class__.__name__.lower()
@abstractmethod
def upload(self, file_path: str, **kwargs: Any) -> str:
"""Upload a file and return the URL."""
pass
def validate(self) -> bool:
"""Check if provider is available/configured."""
return True
class ZeroXZero(FileProvider):
"""File provider for 0x0.st."""
def upload(self, file_path: str, **kwargs: Any) -> str:
from helper.http_client import HTTPClient
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path}")
try:
headers = {"User-Agent": "Medeia-Macina/1.0"}
with HTTPClient(headers=headers) as client:
with open(file_path, 'rb') as f:
response = client.post(
"https://0x0.st",
files={"file": f}
)
if response.status_code == 200:
return response.text.strip()
else:
raise Exception(f"Upload failed: {response.status_code} - {response.text}")
except Exception as e:
log(f"[0x0] Upload error: {e}", file=sys.stderr)
raise
def validate(self) -> bool:
return True
class Matrix(FileProvider):
"""File provider for Matrix (Element) chat rooms."""
def validate(self) -> bool:
if not self.config:
return False
matrix_conf = self.config.get('storage', {}).get('matrix', {})
return bool(
matrix_conf.get('homeserver') and
matrix_conf.get('room_id') and
(matrix_conf.get('access_token') or matrix_conf.get('password'))
)
def upload(self, file_path: str, **kwargs: Any) -> str:
from pathlib import Path
path = Path(file_path)
if not path.exists():
raise FileNotFoundError(f"File not found: {file_path}")
matrix_conf = self.config.get('storage', {}).get('matrix', {})
homeserver = matrix_conf.get('homeserver')
access_token = matrix_conf.get('access_token')
room_id = matrix_conf.get('room_id')
if not homeserver.startswith('http'):
homeserver = f"https://{homeserver}"
# Upload media
upload_url = f"{homeserver}/_matrix/media/v3/upload"
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/octet-stream"
}
mime_type, _ = mimetypes.guess_type(path)
if mime_type:
headers["Content-Type"] = mime_type
filename = path.name
with open(path, 'rb') as f:
resp = requests.post(upload_url, headers=headers, data=f, params={"filename": filename})
if resp.status_code != 200:
raise Exception(f"Matrix upload failed: {resp.text}")
content_uri = resp.json().get('content_uri')
if not content_uri:
raise Exception("No content_uri returned")
# Send message
send_url = f"{homeserver}/_matrix/client/v3/rooms/{room_id}/send/m.room.message"
# Determine message type
msgtype = "m.file"
ext = path.suffix.lower()
AUDIO_EXTS = {'.mp3', '.flac', '.wav', '.m4a', '.aac', '.ogg', '.opus', '.wma', '.mka', '.alac'}
VIDEO_EXTS = {'.mp4', '.mkv', '.webm', '.mov', '.avi', '.flv', '.mpg', '.mpeg', '.ts', '.m4v', '.wmv'}
IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp', '.tiff'}
if ext in AUDIO_EXTS:
msgtype = "m.audio"
elif ext in VIDEO_EXTS:
msgtype = "m.video"
elif ext in IMAGE_EXTS:
msgtype = "m.image"
info = {
"mimetype": mime_type,
"size": path.stat().st_size
}
payload = {
"msgtype": msgtype,
"body": filename,
"url": content_uri,
"info": info
}
resp = requests.post(send_url, headers=headers, json=payload)
if resp.status_code != 200:
raise Exception(f"Matrix send message failed: {resp.text}")
event_id = resp.json().get('event_id')
return f"https://matrix.to/#/{room_id}/{event_id}"
# File provider registry
_FILE_PROVIDERS = {
"0x0": ZeroXZero,
"matrix": Matrix,
}
def get_file_provider(name: str, config: Optional[Dict[str, Any]] = None) -> Optional[FileProvider]:
"""Get a file provider by name."""
provider_class = _FILE_PROVIDERS.get(name.lower())
if provider_class is None:
log(f"[provider] Unknown file provider: {name}", file=sys.stderr)
return None
try:
provider = provider_class(config)
if not provider.validate():
log(f"[provider] File provider '{name}' is not available", file=sys.stderr)
return None
return provider
except Exception as e:
log(f"[provider] Error initializing file provider '{name}': {e}", file=sys.stderr)
return None
def list_file_providers(config: Optional[Dict[str, Any]] = None) -> Dict[str, bool]:
"""List all file providers and their availability."""
availability = {}
for name, provider_class in _FILE_PROVIDERS.items():
try:
provider = provider_class(config)
availability[name] = provider.validate()
except Exception:
availability[name] = False
return availability

View File

@@ -159,8 +159,8 @@ def create_app():
status["storage_path"] = str(STORAGE_PATH)
status["storage_exists"] = STORAGE_PATH.exists()
try:
from helper.local_library import LocalLibraryDB
with LocalLibraryDB(STORAGE_PATH) as db:
from helper.folder_store import FolderDB
with FolderDB(STORAGE_PATH) as db:
status["database_accessible"] = True
except Exception as e:
status["database_accessible"] = False
@@ -177,7 +177,7 @@ def create_app():
@require_storage()
def search_files():
"""Search for files by name or tag."""
from helper.local_library import LocalLibrarySearchOptimizer
from helper.folder_store import LocalLibrarySearchOptimizer
query = request.args.get('q', '')
limit = request.args.get('limit', 100, type=int)
@@ -205,11 +205,11 @@ def create_app():
@require_storage()
def get_file_metadata(file_hash: str):
"""Get metadata for a specific file by hash."""
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
try:
with LocalLibraryDB(STORAGE_PATH) as db:
file_path = db.search_by_hash(file_hash)
with FolderDB(STORAGE_PATH) as db:
file_path = db.search_hash(file_hash)
if not file_path or not file_path.exists():
return jsonify({"error": "File not found"}), 404
@@ -233,13 +233,13 @@ def create_app():
@require_storage()
def index_file():
"""Index a new file in the storage."""
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
from helper.utils import sha256_file
data = request.get_json() or {}
file_path_str = data.get('path')
tags = data.get('tags', [])
urls = data.get('urls', [])
url = data.get('url', [])
if not file_path_str:
return jsonify({"error": "File path required"}), 400
@@ -250,14 +250,14 @@ def create_app():
if not file_path.exists():
return jsonify({"error": "File does not exist"}), 404
with LocalLibraryDB(STORAGE_PATH) as db:
with FolderDB(STORAGE_PATH) as db:
db.get_or_create_file_entry(file_path)
if tags:
db.add_tags(file_path, tags)
if urls:
db.add_known_urls(file_path, urls)
if url:
db.add_url(file_path, url)
file_hash = sha256_file(file_path)
@@ -265,7 +265,7 @@ def create_app():
"hash": file_hash,
"path": str(file_path),
"tags_added": len(tags),
"urls_added": len(urls)
"url_added": len(url)
}), 201
except Exception as e:
logger.error(f"Index error: {e}", exc_info=True)
@@ -280,11 +280,11 @@ def create_app():
@require_storage()
def get_tags(file_hash: str):
"""Get tags for a file."""
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
try:
with LocalLibraryDB(STORAGE_PATH) as db:
file_path = db.search_by_hash(file_hash)
with FolderDB(STORAGE_PATH) as db:
file_path = db.search_hash(file_hash)
if not file_path:
return jsonify({"error": "File not found"}), 404
@@ -299,7 +299,7 @@ def create_app():
@require_storage()
def add_tags(file_hash: str):
"""Add tags to a file."""
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
data = request.get_json() or {}
tags = data.get('tags', [])
@@ -309,8 +309,8 @@ def create_app():
return jsonify({"error": "Tags required"}), 400
try:
with LocalLibraryDB(STORAGE_PATH) as db:
file_path = db.search_by_hash(file_hash)
with FolderDB(STORAGE_PATH) as db:
file_path = db.search_hash(file_hash)
if not file_path:
return jsonify({"error": "File not found"}), 404
@@ -328,13 +328,13 @@ def create_app():
@require_storage()
def remove_tags(file_hash: str):
"""Remove tags from a file."""
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
tags_str = request.args.get('tags', '')
try:
with LocalLibraryDB(STORAGE_PATH) as db:
file_path = db.search_by_hash(file_hash)
with FolderDB(STORAGE_PATH) as db:
file_path = db.search_hash(file_hash)
if not file_path:
return jsonify({"error": "File not found"}), 404
@@ -358,11 +358,11 @@ def create_app():
@require_storage()
def get_relationships(file_hash: str):
"""Get relationships for a file."""
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
try:
with LocalLibraryDB(STORAGE_PATH) as db:
file_path = db.search_by_hash(file_hash)
with FolderDB(STORAGE_PATH) as db:
file_path = db.search_hash(file_hash)
if not file_path:
return jsonify({"error": "File not found"}), 404
@@ -378,7 +378,7 @@ def create_app():
@require_storage()
def set_relationship():
"""Set a relationship between two files."""
from helper.local_library import LocalLibraryDB
from helper.folder_store import FolderDB
data = request.get_json() or {}
from_hash = data.get('from_hash')
@@ -389,9 +389,9 @@ def create_app():
return jsonify({"error": "from_hash and to_hash required"}), 400
try:
with LocalLibraryDB(STORAGE_PATH) as db:
from_path = db.search_by_hash(from_hash)
to_path = db.search_by_hash(to_hash)
with FolderDB(STORAGE_PATH) as db:
from_path = db.search_hash(from_hash)
to_path = db.search_hash(to_hash)
if not from_path or not to_path:
return jsonify({"error": "File not found"}), 404
@@ -406,49 +406,49 @@ def create_app():
# URL OPERATIONS
# ========================================================================
@app.route('/urls/<file_hash>', methods=['GET'])
@app.route('/url/<file_hash>', methods=['GET'])
@require_auth()
@require_storage()
def get_urls(file_hash: str):
"""Get known URLs for a file."""
from helper.local_library import LocalLibraryDB
def get_url(file_hash: str):
"""Get known url for a file."""
from helper.folder_store import FolderDB
try:
with LocalLibraryDB(STORAGE_PATH) as db:
file_path = db.search_by_hash(file_hash)
with FolderDB(STORAGE_PATH) as db:
file_path = db.search_hash(file_hash)
if not file_path:
return jsonify({"error": "File not found"}), 404
metadata = db.get_metadata(file_path)
urls = metadata.get('known_urls', []) if metadata else []
return jsonify({"hash": file_hash, "urls": urls}), 200
url = metadata.get('url', []) if metadata else []
return jsonify({"hash": file_hash, "url": url}), 200
except Exception as e:
logger.error(f"Get URLs error: {e}", exc_info=True)
logger.error(f"Get url error: {e}", exc_info=True)
return jsonify({"error": f"Failed: {str(e)}"}), 500
@app.route('/urls/<file_hash>', methods=['POST'])
@app.route('/url/<file_hash>', methods=['POST'])
@require_auth()
@require_storage()
def add_urls(file_hash: str):
"""Add URLs to a file."""
from helper.local_library import LocalLibraryDB
def add_url(file_hash: str):
"""Add url to a file."""
from helper.folder_store import FolderDB
data = request.get_json() or {}
urls = data.get('urls', [])
url = data.get('url', [])
if not urls:
return jsonify({"error": "URLs required"}), 400
if not url:
return jsonify({"error": "url required"}), 400
try:
with LocalLibraryDB(STORAGE_PATH) as db:
file_path = db.search_by_hash(file_hash)
with FolderDB(STORAGE_PATH) as db:
file_path = db.search_hash(file_hash)
if not file_path:
return jsonify({"error": "File not found"}), 404
db.add_known_urls(file_path, urls)
return jsonify({"hash": file_hash, "urls_added": len(urls)}), 200
db.add_url(file_path, url)
return jsonify({"hash": file_hash, "url_added": len(url)}), 200
except Exception as e:
logger.error(f"Add URLs error: {e}", exc_info=True)
logger.error(f"Add url error: {e}", exc_info=True)
return jsonify({"error": f"Failed: {str(e)}"}), 500
return app
@@ -509,8 +509,8 @@ def main():
print(f"\n{'='*70}\n")
try:
from helper.local_library import LocalLibraryDB
with LocalLibraryDB(STORAGE_PATH) as db:
from helper.folder_store import FolderDB
with FolderDB(STORAGE_PATH) as db:
logger.info("Database initialized successfully")
except Exception as e:
logger.error(f"Failed to initialize database: {e}")

File diff suppressed because it is too large Load Diff

2268
helper/store.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -555,7 +555,7 @@ class UnifiedBookDownloader:
This follows the exact process from archive_client.py:
1. Login with credentials
2. Call loan() to create 14-day borrow
3. Get book info (extract page URLs)
3. Get book info (extract page url)
4. Download all pages as images
5. Merge images into searchable PDF
@@ -576,10 +576,10 @@ class UnifiedBookDownloader:
# If we get here, borrowing succeeded
logger.info(f"[UnifiedBookDownloader] Successfully borrowed book: {book_id}")
# Now get the book info (page URLs and metadata)
# Now get the book info (page url and metadata)
logger.info(f"[UnifiedBookDownloader] Extracting book page information...")
# Try both URL formats: with /borrow and without
book_urls = [
book_url = [
f"https://archive.org/borrow/{book_id}", # Try borrow page first (for borrowed books)
f"https://archive.org/details/{book_id}" # Fallback to details page
]
@@ -589,7 +589,7 @@ class UnifiedBookDownloader:
metadata = None
last_error = None
for book_url in book_urls:
for book_url in book_url:
try:
logger.debug(f"[UnifiedBookDownloader] Trying to get book info from: {book_url}")
response = session.get(book_url, timeout=10)
@@ -611,7 +611,7 @@ class UnifiedBookDownloader:
continue
if links is None:
logger.error(f"[UnifiedBookDownloader] Failed to get book info from all URLs: {last_error}")
logger.error(f"[UnifiedBookDownloader] Failed to get book info from all url: {last_error}")
# Borrow extraction failed - return False
return False, "Could not extract borrowed book pages"

View File

@@ -308,7 +308,7 @@ def format_metadata_value(key: str, value) -> str:
# ============================================================================
# Link Utilities - Consolidated from link_utils.py
# ============================================================================
"""Link utilities - Extract and process URLs from various sources."""
"""Link utilities - Extract and process url from various sources."""
def extract_link_from_args(args: Iterable[str]) -> Any | None:

View File

@@ -77,3 +77,26 @@ mime_maps = {
"csv": { "ext": ".csv", "mimes": ["text/csv"] }
}
}
def get_type_from_ext(ext: str) -> str:
"""Determine the type (e.g., 'image', 'video', 'audio') from file extension.
Args:
ext: File extension (with or without leading dot, e.g., 'jpg' or '.jpg')
Returns:
Type string (e.g., 'image', 'video', 'audio') or 'other' if unknown
"""
if not ext:
return 'other'
# Normalize: remove leading dot and convert to lowercase
ext_clean = ext.lstrip('.').lower()
# Search through mime_maps to find matching type
for type_name, extensions_dict in mime_maps.items():
if ext_clean in extensions_dict:
return type_name
return 'other'

View File

@@ -11,7 +11,7 @@ from datetime import datetime
from threading import Thread, Lock
import time
from .local_library import LocalLibraryDB
from .folder_store import FolderDB
from helper.logger import log
logger = logging.getLogger(__name__)
@@ -140,7 +140,7 @@ class Worker:
class WorkerLoggingHandler(logging.StreamHandler):
"""Custom logging handler that captures logs for a worker."""
def __init__(self, worker_id: str, db: LocalLibraryDB,
def __init__(self, worker_id: str, db: FolderDB,
manager: Optional['WorkerManager'] = None,
buffer_size: int = 50):
"""Initialize the handler.
@@ -235,7 +235,7 @@ class WorkerManager:
auto_refresh_interval: Seconds between auto-refresh checks (0 = disabled)
"""
self.library_root = Path(library_root)
self.db = LocalLibraryDB(library_root)
self.db = FolderDB(library_root)
self.auto_refresh_interval = auto_refresh_interval
self.refresh_callbacks: List[Callable] = []
self.refresh_thread: Optional[Thread] = None
@@ -244,6 +244,22 @@ class WorkerManager:
self.worker_handlers: Dict[str, WorkerLoggingHandler] = {} # Track active handlers
self._worker_last_step: Dict[str, str] = {}
def close(self) -> None:
"""Close the database connection."""
if self.db:
try:
self.db.close()
except Exception:
pass
def __enter__(self):
"""Context manager entry."""
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""Context manager exit - close database."""
self.close()
def add_refresh_callback(self, callback: Callable[[List[Dict[str, Any]]], None]) -> None:
"""Register a callback to be called on worker updates.

View File

@@ -12,26 +12,14 @@ from typing import Tuple, Optional, Dict, Any
from pathlib import Path
logger = logging.getLogger(__name__)
# Global state for Hydrus availability
_HYDRUS_AVAILABLE: Optional[bool] = None
_HYDRUS_UNAVAILABLE_REASON: Optional[str] = None
_HYDRUS_CHECK_COMPLETE = False
# Global state for Debrid availability
_DEBRID_AVAILABLE: Optional[bool] = None
_DEBRID_UNAVAILABLE_REASON: Optional[str] = None
_DEBRID_CHECK_COMPLETE = False
# Global state for MPV availability
_MPV_AVAILABLE: Optional[bool] = None
_MPV_UNAVAILABLE_REASON: Optional[str] = None
_MPV_CHECK_COMPLETE = False
# Global state for Matrix availability
_MATRIX_AVAILABLE: Optional[bool] = None
_MATRIX_UNAVAILABLE_REASON: Optional[str] = None
_MATRIX_CHECK_COMPLETE = False
# Global state for all service availability checks - consolidated from 12 separate globals
_SERVICE_STATE = {
"hydrus": {"available": None, "reason": None, "complete": False},
"hydrusnetwork_stores": {}, # Track individual Hydrus instances
"debrid": {"available": None, "reason": None, "complete": False},
"mpv": {"available": None, "reason": None, "complete": False},
"matrix": {"available": None, "reason": None, "complete": False},
}
# Global state for Cookies availability
_COOKIES_FILE_PATH: Optional[str] = None
@@ -68,130 +56,73 @@ def check_hydrus_availability(config: Dict[str, Any]) -> Tuple[bool, Optional[st
return False, error_msg
def initialize_hydrus_health_check(config: Dict[str, Any]) -> None:
"""Initialize Hydrus health check at startup.
This should be called once at application startup to determine if Hydrus
features should be enabled or disabled.
Args:
config: Application configuration dictionary
"""
global _HYDRUS_AVAILABLE, _HYDRUS_UNAVAILABLE_REASON, _HYDRUS_CHECK_COMPLETE
def initialize_hydrus_health_check(config: Dict[str, Any], emit_debug: bool = True) -> Tuple[bool, Optional[str]]:
"""Initialize Hydrus health check at startup."""
global _SERVICE_STATE
logger.info("[Startup] Starting Hydrus health check...")
is_available, reason = check_hydrus_availability(config)
_SERVICE_STATE["hydrus"]["available"] = is_available
_SERVICE_STATE["hydrus"]["reason"] = reason
_SERVICE_STATE["hydrus"]["complete"] = True
# Track individual Hydrus instances
try:
is_available, reason = check_hydrus_availability(config)
_HYDRUS_AVAILABLE = is_available
_HYDRUS_UNAVAILABLE_REASON = reason
_HYDRUS_CHECK_COMPLETE = True
if is_available:
debug("Hydrus: ENABLED - All Hydrus features available", file=sys.stderr)
else:
debug(f"Hydrus: DISABLED - {reason or 'Connection failed'}", file=sys.stderr)
store_config = config.get("store", {})
hydrusnetwork = store_config.get("hydrusnetwork", {})
for instance_name, instance_config in hydrusnetwork.items():
if isinstance(instance_config, dict):
url = instance_config.get("url")
access_key = instance_config.get("Hydrus-Client-API-Access-Key")
if url and access_key:
_SERVICE_STATE["hydrusnetwork_stores"][instance_name] = {
"ok": is_available,
"url": url,
"detail": reason if not is_available else "Connected"
}
else:
_SERVICE_STATE["hydrusnetwork_stores"][instance_name] = {
"ok": False,
"url": url or "Not configured",
"detail": "Missing credentials"
}
except Exception as e:
logger.error(f"[Startup] Failed to initialize Hydrus health check: {e}", exc_info=True)
_HYDRUS_AVAILABLE = False
_HYDRUS_UNAVAILABLE_REASON = str(e)
_HYDRUS_CHECK_COMPLETE = True
debug(f"Hydrus: DISABLED - Error during health check: {e}", file=sys.stderr)
logger.debug(f"Could not enumerate Hydrus instances: {e}")
if emit_debug:
status = 'ENABLED' if is_available else f'DISABLED - {reason or "Connection failed"}'
debug(f"Hydrus: {status}", file=sys.stderr)
return is_available, reason
def check_debrid_availability(config: Dict[str, Any]) -> Tuple[bool, Optional[str]]:
"""Check if Debrid API is available.
Args:
config: Application configuration dictionary
Returns:
Tuple of (is_available: bool, reason: Optional[str])
- (True, None) if Debrid API is available
- (False, reason) if Debrid API is unavailable with reason
"""
"""Check if Debrid API is available."""
try:
from helper.http_client import HTTPClient
logger.info("[Debrid Health Check] Pinging Debrid API at https://api.alldebrid.com/v4/ping...")
try:
# Use the public ping endpoint to check API availability
# This endpoint doesn't require authentication
with HTTPClient(timeout=10.0, verify_ssl=True) as client:
response = client.get('https://api.alldebrid.com/v4/ping')
logger.debug(f"[Debrid Health Check] Response status: {response.status_code}")
# Read response text first (handles gzip decompression)
try:
response_text = response.text
logger.debug(f"[Debrid Health Check] Response text: {response_text}")
except Exception as e:
logger.error(f"[Debrid Health Check] ❌ Failed to read response text: {e}")
return False, f"Failed to read response: {e}"
# Parse JSON
try:
result = response.json()
logger.debug(f"[Debrid Health Check] Response JSON: {result}")
except Exception as e:
logger.error(f"[Debrid Health Check] ❌ Failed to parse JSON: {e}")
logger.error(f"[Debrid Health Check] Response was: {response_text}")
return False, f"Failed to parse response: {e}"
# Validate response format
if result.get('status') == 'success' and result.get('data', {}).get('ping') == 'pong':
logger.info("[Debrid Health Check] ✅ Debrid API is AVAILABLE")
return True, None
else:
logger.warning(f"[Debrid Health Check] ❌ Debrid API returned unexpected response: {result}")
return False, "Invalid API response"
except Exception as e:
error_msg = str(e)
logger.warning(f"[Debrid Health Check] ❌ Debrid API error: {error_msg}")
import traceback
logger.debug(f"[Debrid Health Check] Traceback: {traceback.format_exc()}")
return False, error_msg
logger.info("[Debrid Health Check] Pinging Debrid API...")
with HTTPClient(timeout=10.0, verify_ssl=True) as client:
response = client.get('https://api.alldebrid.com/v4/ping')
result = response.json()
if result.get('status') == 'success' and result.get('data', {}).get('ping') == 'pong':
logger.info("[Debrid Health Check] Debrid API is AVAILABLE")
return True, None
return False, "Invalid API response"
except Exception as e:
error_msg = str(e)
logger.error(f"[Debrid Health Check] ❌ Error checking Debrid availability: {error_msg}")
return False, error_msg
logger.warning(f"[Debrid Health Check] Debrid API error: {e}")
return False, str(e)
def initialize_debrid_health_check(config: Dict[str, Any]) -> None:
"""Initialize Debrid health check at startup.
This should be called once at application startup to determine if Debrid
features should be enabled or disabled.
Args:
config: Application configuration dictionary
"""
global _DEBRID_AVAILABLE, _DEBRID_UNAVAILABLE_REASON, _DEBRID_CHECK_COMPLETE
def initialize_debrid_health_check(config: Dict[str, Any], emit_debug: bool = True) -> Tuple[bool, Optional[str]]:
"""Initialize Debrid health check at startup."""
global _SERVICE_STATE
logger.info("[Startup] Starting Debrid health check...")
try:
is_available, reason = check_debrid_availability(config)
_DEBRID_AVAILABLE = is_available
_DEBRID_UNAVAILABLE_REASON = reason
_DEBRID_CHECK_COMPLETE = True
if is_available:
debug("✅ Debrid: ENABLED - All Debrid features available", file=sys.stderr)
logger.info("[Startup] Debrid health check PASSED")
else:
debug(f"⚠️ Debrid: DISABLED - {reason or 'Connection failed'}", file=sys.stderr)
logger.warning(f"[Startup] Debrid health check FAILED: {reason}")
except Exception as e:
logger.error(f"[Startup] Failed to initialize Debrid health check: {e}", exc_info=True)
_DEBRID_AVAILABLE = False
_DEBRID_UNAVAILABLE_REASON = str(e)
_DEBRID_CHECK_COMPLETE = True
debug(f"⚠️ Debrid: DISABLED - Error during health check: {e}", file=sys.stderr)
is_available, reason = check_debrid_availability(config)
_SERVICE_STATE["debrid"]["available"] = is_available
_SERVICE_STATE["debrid"]["reason"] = reason
_SERVICE_STATE["debrid"]["complete"] = True
if emit_debug:
status = 'ENABLED' if is_available else f'DISABLED - {reason or "Connection failed"}'
debug(f"Debrid: {status}", file=sys.stderr)
return is_available, reason
def check_mpv_availability() -> Tuple[bool, Optional[str]]:
@@ -200,10 +131,10 @@ def check_mpv_availability() -> Tuple[bool, Optional[str]]:
Returns:
Tuple of (is_available: bool, reason: Optional[str])
"""
global _MPV_AVAILABLE, _MPV_UNAVAILABLE_REASON, _MPV_CHECK_COMPLETE
global _SERVICE_STATE
if _MPV_CHECK_COMPLETE and _MPV_AVAILABLE is not None:
return _MPV_AVAILABLE, _MPV_UNAVAILABLE_REASON
if _SERVICE_STATE["mpv"]["complete"] and _SERVICE_STATE["mpv"]["available"] is not None:
return _SERVICE_STATE["mpv"]["available"], _SERVICE_STATE["mpv"]["reason"]
import shutil
import subprocess
@@ -212,11 +143,8 @@ def check_mpv_availability() -> Tuple[bool, Optional[str]]:
mpv_path = shutil.which("mpv")
if not mpv_path:
_MPV_AVAILABLE = False
_MPV_UNAVAILABLE_REASON = "Executable 'mpv' not found in PATH"
_MPV_CHECK_COMPLETE = True
logger.warning(f"[MPV Health Check] ❌ MPV is UNAVAILABLE: {_MPV_UNAVAILABLE_REASON}")
return False, _MPV_UNAVAILABLE_REASON
logger.warning(f"[MPV Health Check] ❌ MPV is UNAVAILABLE: Executable 'mpv' not found in PATH")
return False, "Executable 'mpv' not found in PATH"
# Try to get version to confirm it works
try:
@@ -228,55 +156,35 @@ def check_mpv_availability() -> Tuple[bool, Optional[str]]:
)
if result.returncode == 0:
version_line = result.stdout.split('\n')[0]
_MPV_AVAILABLE = True
_MPV_UNAVAILABLE_REASON = None
_MPV_CHECK_COMPLETE = True
logger.info(f"[MPV Health Check] ✅ MPV is AVAILABLE ({version_line})")
logger.info(f"[MPV Health Check] MPV is AVAILABLE ({version_line})")
return True, None
else:
_MPV_AVAILABLE = False
_MPV_UNAVAILABLE_REASON = f"MPV returned non-zero exit code: {result.returncode}"
_MPV_CHECK_COMPLETE = True
logger.warning(f"[MPV Health Check] ❌ MPV is UNAVAILABLE: {_MPV_UNAVAILABLE_REASON}")
return False, _MPV_UNAVAILABLE_REASON
reason = f"MPV returned non-zero exit code: {result.returncode}"
logger.warning(f"[MPV Health Check] ❌ MPV is UNAVAILABLE: {reason}")
return False, reason
except Exception as e:
_MPV_AVAILABLE = False
_MPV_UNAVAILABLE_REASON = f"Error running MPV: {e}"
_MPV_CHECK_COMPLETE = True
logger.warning(f"[MPV Health Check] ❌ MPV is UNAVAILABLE: {_MPV_UNAVAILABLE_REASON}")
return False, _MPV_UNAVAILABLE_REASON
reason = f"Error running MPV: {e}"
logger.warning(f"[MPV Health Check] ❌ MPV is UNAVAILABLE: {reason}")
return False, reason
def initialize_mpv_health_check() -> None:
"""Initialize MPV health check at startup.
This should be called once at application startup to determine if MPV
features should be enabled or disabled.
"""
global _MPV_AVAILABLE, _MPV_UNAVAILABLE_REASON, _MPV_CHECK_COMPLETE
def initialize_mpv_health_check(emit_debug: bool = True) -> Tuple[bool, Optional[str]]:
"""Initialize MPV health check at startup and return (is_available, reason)."""
global _SERVICE_STATE
logger.info("[Startup] Starting MPV health check...")
is_available, reason = check_mpv_availability()
_SERVICE_STATE["mpv"]["available"] = is_available
_SERVICE_STATE["mpv"]["reason"] = reason
_SERVICE_STATE["mpv"]["complete"] = True
try:
is_available, reason = check_mpv_availability()
_MPV_AVAILABLE = is_available
_MPV_UNAVAILABLE_REASON = reason
_MPV_CHECK_COMPLETE = True
if emit_debug:
if is_available:
debug("MPV: ENABLED - All MPV features available", file=sys.stderr)
logger.info("[Startup] MPV health check PASSED")
else:
debug(f"⚠️ MPV: DISABLED - {reason or 'Connection failed'}", file=sys.stderr)
debug("→ Hydrus features still available", file=sys.stderr)
logger.warning(f"[Startup] MPV health check FAILED: {reason}")
except Exception as e:
logger.error(f"[Startup] Failed to initialize MPV health check: {e}", exc_info=True)
_MPV_AVAILABLE = False
_MPV_UNAVAILABLE_REASON = str(e)
_MPV_CHECK_COMPLETE = True
debug(f"⚠️ MPV: DISABLED - Error during health check: {e}", file=sys.stderr)
debug("MPV: ENABLED - All MPV features available", file=sys.stderr)
elif reason != "Not configured":
debug(f"MPV: DISABLED - {reason or 'Connection failed'}", file=sys.stderr)
return is_available, reason
def check_matrix_availability(config: Dict[str, Any]) -> Tuple[bool, Optional[str]]:
@@ -324,264 +232,262 @@ def check_matrix_availability(config: Dict[str, Any]) -> Tuple[bool, Optional[st
return False, str(e)
def initialize_matrix_health_check(config: Dict[str, Any]) -> None:
"""Initialize Matrix health check at startup."""
global _MATRIX_AVAILABLE, _MATRIX_UNAVAILABLE_REASON, _MATRIX_CHECK_COMPLETE
def initialize_matrix_health_check(config: Dict[str, Any], emit_debug: bool = True) -> Tuple[bool, Optional[str]]:
"""Initialize Matrix health check at startup and return (is_available, reason)."""
global _SERVICE_STATE
logger.info("[Startup] Starting Matrix health check...")
is_available, reason = check_matrix_availability(config)
_SERVICE_STATE["matrix"]["available"] = is_available
_SERVICE_STATE["matrix"]["reason"] = reason
_SERVICE_STATE["matrix"]["complete"] = True
try:
is_available, reason = check_matrix_availability(config)
_MATRIX_AVAILABLE = is_available
_MATRIX_UNAVAILABLE_REASON = reason
_MATRIX_CHECK_COMPLETE = True
if emit_debug:
if is_available:
debug("Matrix: ENABLED - Homeserver reachable", file=sys.stderr)
else:
if reason != "Not configured":
debug(f"Matrix: DISABLED - {reason}", file=sys.stderr)
except Exception as e:
logger.error(f"[Startup] Failed to initialize Matrix health check: {e}", exc_info=True)
_MATRIX_AVAILABLE = False
_MATRIX_UNAVAILABLE_REASON = str(e)
_MATRIX_CHECK_COMPLETE = True
def is_hydrus_available() -> bool:
"""Check if Hydrus is available (from cached health check).
elif reason != "Not configured":
debug(f"Matrix: DISABLED - {reason}", file=sys.stderr)
Returns:
True if Hydrus API is available, False otherwise
"""
return _HYDRUS_AVAILABLE is True
return is_available, reason
# Unified getter functions for service availability - all use _SERVICE_STATE
def is_hydrus_available() -> bool:
"""Check if Hydrus is available (from cached health check)."""
return _SERVICE_STATE["hydrus"]["available"] is True
def get_hydrus_unavailable_reason() -> Optional[str]:
"""Get the reason why Hydrus is unavailable.
Returns:
String explaining why Hydrus is unavailable, or None if available
"""
return _HYDRUS_UNAVAILABLE_REASON if not is_hydrus_available() else None
"""Get the reason why Hydrus is unavailable."""
return _SERVICE_STATE["hydrus"]["reason"] if not is_hydrus_available() else None
def is_hydrus_check_complete() -> bool:
"""Check if the Hydrus health check has been completed.
Returns:
True if health check has run, False if still pending
"""
return _HYDRUS_CHECK_COMPLETE
"""Check if the Hydrus health check has been completed."""
return _SERVICE_STATE["hydrus"]["complete"]
def disable_hydrus_features() -> None:
"""Manually disable all Hydrus features (for testing/fallback).
This can be called if Hydrus connectivity is lost after startup.
"""
global _HYDRUS_AVAILABLE, _HYDRUS_UNAVAILABLE_REASON
_HYDRUS_AVAILABLE = False
_HYDRUS_UNAVAILABLE_REASON = "Manually disabled or lost connection"
"""Manually disable all Hydrus features (for testing/fallback)."""
global _SERVICE_STATE
_SERVICE_STATE["hydrus"]["available"] = False
_SERVICE_STATE["hydrus"]["reason"] = "Manually disabled or lost connection"
logger.warning("[Hydrus] Features manually disabled")
def enable_hydrus_features() -> None:
"""Manually enable Hydrus features (for testing/fallback).
This can be called if Hydrus connectivity is restored after startup.
"""
global _HYDRUS_AVAILABLE, _HYDRUS_UNAVAILABLE_REASON
_HYDRUS_AVAILABLE = True
_HYDRUS_UNAVAILABLE_REASON = None
"""Manually enable Hydrus features (for testing/fallback)."""
global _SERVICE_STATE
_SERVICE_STATE["hydrus"]["available"] = True
_SERVICE_STATE["hydrus"]["reason"] = None
logger.info("[Hydrus] Features manually enabled")
def is_debrid_available() -> bool:
"""Check if Debrid is available (from cached health check).
Returns:
True if Debrid API is available, False otherwise
"""
return _DEBRID_AVAILABLE is True
"""Check if Debrid is available (from cached health check)."""
return _SERVICE_STATE["debrid"]["available"] is True
def get_debrid_unavailable_reason() -> Optional[str]:
"""Get the reason why Debrid is unavailable.
Returns:
String explaining why Debrid is unavailable, or None if available
"""
return _DEBRID_UNAVAILABLE_REASON if not is_debrid_available() else None
"""Get the reason why Debrid is unavailable."""
return _SERVICE_STATE["debrid"]["reason"] if not is_debrid_available() else None
def is_debrid_check_complete() -> bool:
"""Check if the Debrid health check has been completed.
Returns:
True if health check has run, False if still pending
"""
return _DEBRID_CHECK_COMPLETE
"""Check if the Debrid health check has been completed."""
return _SERVICE_STATE["debrid"]["complete"]
def disable_debrid_features() -> None:
"""Manually disable all Debrid features (for testing/fallback).
This can be called if Debrid connectivity is lost after startup.
"""
global _DEBRID_AVAILABLE, _DEBRID_UNAVAILABLE_REASON
_DEBRID_AVAILABLE = False
_DEBRID_UNAVAILABLE_REASON = "Manually disabled or lost connection"
"""Manually disable all Debrid features (for testing/fallback)."""
global _SERVICE_STATE
_SERVICE_STATE["debrid"]["available"] = False
_SERVICE_STATE["debrid"]["reason"] = "Manually disabled or lost connection"
logger.warning("[Debrid] Features manually disabled")
def enable_debrid_features() -> None:
"""Manually enable Debrid features (for testing/fallback).
This can be called if Debrid connectivity is restored after startup.
"""
global _DEBRID_AVAILABLE, _DEBRID_UNAVAILABLE_REASON
_DEBRID_AVAILABLE = True
_DEBRID_UNAVAILABLE_REASON = None
"""Manually enable Debrid features (for testing/fallback)."""
global _SERVICE_STATE
_SERVICE_STATE["debrid"]["available"] = True
_SERVICE_STATE["debrid"]["reason"] = None
logger.info("[Debrid] Features manually enabled")
def is_mpv_available() -> bool:
"""Check if MPV is available (from cached health check).
Returns:
True if MPV is available, False otherwise
"""
return _MPV_AVAILABLE is True
"""Check if MPV is available (from cached health check)."""
return _SERVICE_STATE["mpv"]["available"] is True
def get_mpv_unavailable_reason() -> Optional[str]:
"""Get the reason why MPV is unavailable.
Returns:
String explaining why MPV is unavailable, or None if available
"""
return _MPV_UNAVAILABLE_REASON if not is_mpv_available() else None
"""Get the reason why MPV is unavailable."""
return _SERVICE_STATE["mpv"]["reason"] if not is_mpv_available() else None
def is_mpv_check_complete() -> bool:
"""Check if the MPV health check has been completed.
Returns:
True if health check has run, False if still pending
"""
return _MPV_CHECK_COMPLETE
"""Check if the MPV health check has been completed."""
return _SERVICE_STATE["mpv"]["complete"]
def disable_mpv_features() -> None:
"""Manually disable all MPV features (for testing/fallback).
This can be called if MPV connectivity is lost after startup.
"""
global _MPV_AVAILABLE, _MPV_UNAVAILABLE_REASON
_MPV_AVAILABLE = False
_MPV_UNAVAILABLE_REASON = "Manually disabled or lost connection"
"""Manually disable all MPV features (for testing/fallback)."""
global _SERVICE_STATE
_SERVICE_STATE["mpv"]["available"] = False
_SERVICE_STATE["mpv"]["reason"] = "Manually disabled or lost connection"
logger.warning("[MPV] Features manually disabled")
def enable_mpv_features() -> None:
"""Manually enable MPV features (for testing/fallback).
This can be called if MPV connectivity is restored after startup.
"""
global _MPV_AVAILABLE, _MPV_UNAVAILABLE_REASON
_MPV_AVAILABLE = True
_MPV_UNAVAILABLE_REASON = None
"""Manually enable MPV features (for testing/fallback)."""
global _SERVICE_STATE
_SERVICE_STATE["mpv"]["available"] = True
_SERVICE_STATE["mpv"]["reason"] = None
logger.info("[MPV] Features manually enabled")
def is_matrix_available() -> bool:
"""Check if Matrix is available (from cached health check).
Returns:
True if Matrix is available, False otherwise
"""
return _MATRIX_AVAILABLE is True
"""Check if Matrix is available (from cached health check)."""
return _SERVICE_STATE["matrix"]["available"] is True
def get_matrix_unavailable_reason() -> Optional[str]:
"""Get the reason why Matrix is unavailable.
Returns:
String explaining why Matrix is unavailable, or None if available
"""
return _MATRIX_UNAVAILABLE_REASON if not is_matrix_available() else None
"""Get the reason why Matrix is unavailable."""
return _SERVICE_STATE["matrix"]["reason"] if not is_matrix_available() else None
def is_matrix_check_complete() -> bool:
"""Check if the Matrix health check has been completed.
Returns:
True if health check has run, False if still pending
"""
return _MATRIX_CHECK_COMPLETE
"""Check if the Matrix health check has been completed."""
return _SERVICE_STATE["matrix"]["complete"]
def disable_matrix_features() -> None:
"""Manually disable all Matrix features (for testing/fallback).
This can be called if Matrix connectivity is lost after startup.
"""
global _MATRIX_AVAILABLE, _MATRIX_UNAVAILABLE_REASON
_MATRIX_AVAILABLE = False
_MATRIX_UNAVAILABLE_REASON = "Manually disabled or lost connection"
"""Manually disable all Matrix features (for testing/fallback)."""
global _SERVICE_STATE
_SERVICE_STATE["matrix"]["available"] = False
_SERVICE_STATE["matrix"]["reason"] = "Manually disabled or lost connection"
logger.warning("[Matrix] Features manually disabled")
def enable_matrix_features() -> None:
"""Manually enable Matrix features (for testing/fallback).
This can be called if Matrix connectivity is restored after startup.
"""
global _MATRIX_AVAILABLE, _MATRIX_UNAVAILABLE_REASON
_MATRIX_AVAILABLE = True
_MATRIX_UNAVAILABLE_REASON = None
"""Manually enable Matrix features (for testing/fallback)."""
global _SERVICE_STATE
_SERVICE_STATE["matrix"]["available"] = True
_SERVICE_STATE["matrix"]["reason"] = None
logger.info("[Matrix] Features manually enabled")
def initialize_local_library_scan(config: Dict[str, Any]) -> None:
"""Initialize and scan local library at startup.
def initialize_local_library_scan(config: Dict[str, Any], emit_debug: bool = True) -> Tuple[bool, str]:
"""Initialize and scan all folder stores at startup.
Returns a tuple of (success, detail_message).
This ensures that any new files in the local library folder are indexed
Note: Individual store results are stored in _SERVICE_STATE["folder_stores"]
for the CLI to display as separate table rows.
This ensures that any new files in configured folder stores are indexed
and their sidecar files are imported and cleaned up.
"""
from config import get_local_storage_path
from helper.local_library import LocalLibraryInitializer
from helper.folder_store import LocalLibraryInitializer
from helper.store import Folder
logger.info("[Startup] Starting Local Library scan...")
logger.info("[Startup] Starting folder store scans...")
try:
storage_path = get_local_storage_path(config)
if not storage_path:
debug("⚠️ Local Library: SKIPPED - No storage path configured", file=sys.stderr)
return
# Get all configured folder stores from config
folder_sources = config.get("store", {}).get("folder", {})
if not isinstance(folder_sources, dict) or not folder_sources:
if emit_debug:
debug("⚠️ Folder stores: SKIPPED - No folder stores configured", file=sys.stderr)
return False, "No folder stores configured"
results = []
total_new_files = 0
total_sidecars = 0
failed_stores = []
store_results = {}
for store_name, store_config in folder_sources.items():
if not isinstance(store_config, dict):
continue
debug(f"Scanning local library at: {storage_path}", file=sys.stderr)
initializer = LocalLibraryInitializer(storage_path)
stats = initializer.scan_and_index()
store_path = store_config.get("path")
if not store_path:
continue
try:
from pathlib import Path
storage_path = Path(str(store_path)).expanduser()
if emit_debug:
debug(f"Scanning folder store '{store_name}' at: {storage_path}", file=sys.stderr)
# Migrate the folder store to hash-based naming (only runs once per location)
Folder.migrate_location(str(storage_path))
initializer = LocalLibraryInitializer(storage_path)
stats = initializer.scan_and_index()
# Accumulate stats
new_files = stats.get('files_new', 0)
sidecars = stats.get('sidecars_imported', 0)
total_new_files += new_files
total_sidecars += sidecars
# Record result for this store
if new_files > 0 or sidecars > 0:
result_detail = f"New: {new_files}, Sidecars: {sidecars}"
if emit_debug:
debug(f" {store_name}: {result_detail}", file=sys.stderr)
else:
result_detail = "Up to date"
if emit_debug:
debug(f" {store_name}: {result_detail}", file=sys.stderr)
results.append(f"{store_name}: {result_detail}")
store_results[store_name] = {
"path": str(storage_path),
"detail": result_detail,
"ok": True
}
except Exception as e:
logger.error(f"[Startup] Failed to scan folder store '{store_name}': {e}", exc_info=True)
if emit_debug:
debug(f" {store_name}: ERROR - {e}", file=sys.stderr)
failed_stores.append(store_name)
store_results[store_name] = {
"path": str(store_config.get("path", "?")),
"detail": f"ERROR - {e}",
"ok": False
}
# Log summary
new_files = stats.get('files_new', 0)
sidecars = stats.get('sidecars_imported', 0)
# Store individual results for CLI to display
_SERVICE_STATE["folder_stores"] = store_results
if new_files > 0 or sidecars > 0:
debug(f"✅ Local Library: Scanned - New files: {new_files}, Sidecars imported: {sidecars}", file=sys.stderr)
# Build detail message
if failed_stores:
detail = f"Scanned {len(results)} stores ({len(failed_stores)} failed); Total new: {total_new_files}, Sidecars: {total_sidecars}"
if emit_debug:
debug(f"Folder stores scan complete: {detail}", file=sys.stderr)
return len(failed_stores) < len(results), detail
else:
debug("✅ Local Library: Up to date", file=sys.stderr)
detail = f"Scanned {len(results)} stores; Total new: {total_new_files}, Sidecars: {total_sidecars}"
if emit_debug:
debug(f"Folder stores scan complete: {detail}", file=sys.stderr)
return True, detail
except Exception as e:
logger.error(f"[Startup] Failed to scan local library: {e}", exc_info=True)
debug(f"⚠️ Local Library: ERROR - Scan failed: {e}", file=sys.stderr)
logger.error(f"[Startup] Failed to scan folder stores: {e}", exc_info=True)
if emit_debug:
debug(f"⚠️ Folder stores: ERROR - Scan failed: {e}", file=sys.stderr)
return False, f"Scan failed: {e}"
def initialize_cookies_check() -> None:
"""Check for cookies.txt in the application root directory."""
def initialize_cookies_check(emit_debug: bool = True) -> Tuple[bool, str]:
"""Check for cookies.txt in the application root directory.
Returns a tuple of (found, detail_message).
"""
global _COOKIES_FILE_PATH
# Assume CLI.py is in the root
@@ -590,10 +496,12 @@ def initialize_cookies_check() -> None:
if cookies_path.exists():
_COOKIES_FILE_PATH = str(cookies_path)
debug(f"✅ Cookies: ENABLED - Found cookies.txt", file=sys.stderr)
if emit_debug:
debug(f"Cookies: ENABLED - Found cookies.txt", file=sys.stderr)
return True, str(cookies_path)
else:
_COOKIES_FILE_PATH = None
# debug(" Cookies: Using browser cookies (fallback)", file=sys.stderr)
return False, "Not found"
def get_cookies_file_path() -> Optional[str]:

File diff suppressed because it is too large Load Diff

225
models.py
View File

@@ -16,134 +16,183 @@ from typing import Any, Callable, Dict, List, Optional, Protocol, TextIO, Tuple
class PipeObject:
"""Unified pipeline object for tracking files, metadata, tags, and relationships through the pipeline.
This is the single source of truth for all result data in the pipeline. It can represent:
- Tag extraction results (IMDb, MusicBrainz, OpenLibrary lookups)
- Remote metadata fetches
- File operations with metadata/tags and relationship tracking
- Search results
- Files with version relationships (king/alt/related)
This is the single source of truth for all result data in the pipeline. Uses the hash+store
canonical pattern for file identification.
Attributes:
source: Source of the object (e.g., 'imdb', 'musicbrainz', 'libgen', 'debrid', 'file', etc.)
identifier: Unique identifier from the source (e.g., IMDb ID, MBID, magnet hash, file hash)
hash: SHA-256 hash of the file (canonical identifier)
store: Storage backend name (e.g., 'default', 'hydrus', 'test', 'home')
tags: List of extracted or assigned tags
title: Human-readable title if applicable
source_url: URL where the object came from
duration: Duration in seconds if applicable
metadata: Full metadata dictionary from source
remote_metadata: Additional remote metadata
warnings: Any warnings or issues encountered
mpv_metadata: MPV-specific metadata if applicable
file_path: Path to the file if this object represents a file
file_hash: SHA-256 hash of the file for integrity and relationship tracking
king_hash: Hash of the primary/master version of this file (for alternates)
alt_hashes: List of hashes for alternate versions of this file
related_hashes: List of hashes for related files (e.g., screenshots, editions)
path: Path to the file if this object represents a file
relationships: Relationship data (king/alt/related hashes)
is_temp: If True, this is a temporary/intermediate artifact that may be cleaned up
action: The cmdlet that created this object (format: 'cmdlet:cmdlet_name', e.g., 'cmdlet:get-file')
parent_id: Hash of the parent file in the pipeline chain (for tracking provenance/lineage)
action: The cmdlet that created this object (format: 'cmdlet:cmdlet_name')
parent_hash: Hash of the parent file in the pipeline chain (for tracking provenance/lineage)
extra: Additional fields not covered above
"""
source: str
identifier: str
hash: str
store: str
tags: List[str] = field(default_factory=list)
title: Optional[str] = None
url: Optional[str] = None
source_url: Optional[str] = None
duration: Optional[float] = None
metadata: Dict[str, Any] = field(default_factory=dict)
remote_metadata: Optional[Dict[str, Any]] = None
warnings: List[str] = field(default_factory=list)
mpv_metadata: Optional[Dict[str, Any]] = None
file_path: Optional[str] = None
file_hash: Optional[str] = None
king_hash: Optional[str] = None
alt_hashes: List[str] = field(default_factory=list)
related_hashes: List[str] = field(default_factory=list)
path: Optional[str] = None
relationships: Dict[str, Any] = field(default_factory=dict)
is_temp: bool = False
action: Optional[str] = None
parent_id: Optional[str] = None
parent_hash: Optional[str] = None
extra: Dict[str, Any] = field(default_factory=dict)
def register_as_king(self, file_hash: str) -> None:
"""Register this object as the king (primary) version of a file."""
self.king_hash = file_hash
def add_alternate(self, alt_hash: str) -> None:
"""Add an alternate version hash for this file."""
if alt_hash not in self.alt_hashes:
self.alt_hashes.append(alt_hash)
def add_related(self, related_hash: str) -> None:
"""Add a related file hash (e.g., screenshot, edition)."""
if related_hash not in self.related_hashes:
self.related_hashes.append(related_hash)
def add_relationship(self, rel_type: str, rel_hash: str) -> None:
"""Add a relationship hash.
Args:
rel_type: Relationship type ('king', 'alt', 'related')
rel_hash: Hash to add to the relationship
"""
if rel_type not in self.relationships:
self.relationships[rel_type] = []
if isinstance(self.relationships[rel_type], list):
if rel_hash not in self.relationships[rel_type]:
self.relationships[rel_type].append(rel_hash)
else:
# Single value (e.g., king), convert to that value
self.relationships[rel_type] = rel_hash
def get_relationships(self) -> Dict[str, Any]:
"""Get all relationships for this object."""
rels = {}
if self.king_hash:
rels["king"] = self.king_hash
if self.alt_hashes:
rels["alt"] = self.alt_hashes
if self.related_hashes:
rels["related"] = self.related_hashes
return rels
return self.relationships.copy() if self.relationships else {}
def debug_table(self) -> None:
"""Print a formatted debug table showing PipeObject state.
Only prints when debug logging is enabled. Useful for tracking
object state throughout the pipeline.
"""
try:
from helper.logger import is_debug_enabled, debug
if not is_debug_enabled():
return
except Exception:
return
# Prepare display values
hash_display = self.hash or "N/A"
store_display = self.store or "N/A"
title_display = self.title or "N/A"
tags_display = ", ".join(self.tags[:3]) if self.tags else "[]"
if len(self.tags) > 3:
tags_display += f" (+{len(self.tags) - 3} more)"
file_path_display = self.path or "N/A"
if file_path_display != "N/A" and len(file_path_display) > 50:
file_path_display = "..." + file_path_display[-47:]
url_display = self.url or "N/A"
if url_display != "N/A" and len(url_display) > 48:
url_display = url_display[:45] + "..."
relationships_display = "N/A"
if self.relationships:
rel_parts = []
for key, val in self.relationships.items():
if isinstance(val, list):
rel_parts.append(f"{key}({len(val)})")
else:
rel_parts.append(key)
relationships_display = ", ".join(rel_parts)
warnings_display = f"{len(self.warnings)} warning(s)" if self.warnings else "none"
# Print table
debug("┌─────────────────────────────────────────────────────────────┐")
debug("│ PipeObject Debug Info │")
debug("├─────────────────────────────────────────────────────────────┤")
debug(f"│ Hash : {hash_display:<48}")
debug(f"│ Store : {store_display:<48}")
debug(f"│ Title : {title_display:<48}")
debug(f"│ Tags : {tags_display:<48}")
debug(f"│ URL : {url_display:<48}")
debug(f"│ File Path : {file_path_display:<48}")
debug(f"│ Relationships: {relationships_display:<47}")
debug(f"│ Warnings : {warnings_display:<48}")
# Show extra keys as individual rows
if self.extra:
debug("├─────────────────────────────────────────────────────────────┤")
debug("│ Extra Fields: │")
for key, val in self.extra.items():
# Format value for display
if isinstance(val, (list, set)):
val_display = f"{type(val).__name__}({len(val)})"
elif isinstance(val, dict):
val_display = f"dict({len(val)})"
elif isinstance(val, (int, float)):
val_display = str(val)
else:
val_str = str(val)
val_display = val_str if len(val_str) <= 40 else val_str[:37] + "..."
# Truncate key if needed
key_display = key if len(key) <= 15 else key[:12] + "..."
debug(f"{key_display:<15}: {val_display:<42}")
if self.action:
debug("├─────────────────────────────────────────────────────────────┤")
action_display = self.action[:48]
debug(f"│ Action : {action_display:<48}")
if self.parent_hash:
if not self.action:
debug("├─────────────────────────────────────────────────────────────┤")
parent_display = self.parent_hash[:12] + "..." if len(self.parent_hash) > 12 else self.parent_hash
debug(f"│ Parent Hash : {parent_display:<48}")
debug("└─────────────────────────────────────────────────────────────┘")
def to_dict(self) -> Dict[str, Any]:
"""Serialize to dictionary, excluding None and empty values."""
data: Dict[str, Any] = {
"source": self.source,
"tags": self.tags,
"hash": self.hash,
"store": self.store,
}
if self.identifier:
data["id"] = self.identifier
if self.tags:
data["tags"] = self.tags
if self.title:
data["title"] = self.title
if self.url:
data["url"] = self.url
if self.source_url:
data["source_url"] = self.source_url
if self.duration is not None:
data["duration"] = self.duration
if self.metadata:
data["metadata"] = self.metadata
if self.remote_metadata is not None:
data["remote_metadata"] = self.remote_metadata
if self.mpv_metadata is not None:
data["mpv_metadata"] = self.mpv_metadata
if self.warnings:
data["warnings"] = self.warnings
if self.file_path:
data["file_path"] = self.file_path
if self.file_hash:
data["file_hash"] = self.file_hash
# Include pipeline chain tracking fields
if self.path:
data["path"] = self.path
if self.relationships:
data["relationships"] = self.relationships
if self.is_temp:
data["is_temp"] = self.is_temp
if self.action:
data["action"] = self.action
if self.parent_id:
data["parent_id"] = self.parent_id
# Include relationship data if present
rels = self.get_relationships()
if rels:
data["relationships"] = rels
if self.parent_hash:
data["parent_hash"] = self.parent_hash
# Add extra fields
data.update({k: v for k, v in self.extra.items() if v is not None})
return data
@property
def hash(self) -> str:
"""Compute SHA-256 hash from source and identifier."""
base = f"{self.source}:{self.identifier}"
return hashlib.sha256(base.encode('utf-8')).hexdigest()
# Backwards compatibility aliases
def as_dict(self) -> Dict[str, Any]:
"""Alias for to_dict() for backwards compatibility."""
return self.to_dict()
def to_serializable(self) -> Dict[str, Any]:
"""Alias for to_dict() for backwards compatibility."""
return self.to_dict()
class FileRelationshipTracker:
"""Track relationships between files for sidecar creation.
@@ -235,6 +284,7 @@ class DownloadOptions:
clip_sections: Optional[str] = None
playlist_items: Optional[str] = None # yt-dlp --playlist-items format (e.g., "1-3,5,8")
no_playlist: bool = False # If True, pass --no-playlist to yt-dlp
quiet: bool = False # If True, suppress all console output (progress, debug logs)
class SendFunc(Protocol):
@@ -546,18 +596,25 @@ class ProgressBar:
class PipelineStageContext:
"""Context information for the current pipeline stage."""
def __init__(self, stage_index: int, total_stages: int):
def __init__(self, stage_index: int, total_stages: int, worker_id: Optional[str] = None):
self.stage_index = stage_index
self.total_stages = total_stages
self.is_last_stage = (stage_index == total_stages - 1)
self.worker_id = worker_id
self.emits: List[Any] = []
def emit(self, obj: Any) -> None:
"""Emit an object to the next pipeline stage."""
self.emits.append(obj)
def get_current_command_text(self) -> str:
"""Get the current command text (for backward compatibility)."""
# This is maintained for backward compatibility with old code
# In a real implementation, this would come from the stage context
return ""
def __repr__(self) -> str:
return f"PipelineStageContext(stage={self.stage_index}/{self.total_stages}, is_last={self.is_last_stage})"
return f"PipelineStageContext(stage={self.stage_index}/{self.total_stages}, is_last={self.is_last_stage}, worker_id={self.worker_id})"
# ============================================================================

View File

@@ -25,21 +25,18 @@ from models import PipelineStageContext
from helper.logger import log
def _is_selectable_table(table: Any) -> bool:
"""Return True when a table can be used for @ selection."""
return bool(table) and not getattr(table, "no_choice", False)
# ============================================================================
# PIPELINE GLOBALS (maintained for backward compatibility)
# PIPELINE STATE
# ============================================================================
# Current pipeline context (thread-local in real world, global here for simplicity)
# Current pipeline context
_CURRENT_CONTEXT: Optional[PipelineStageContext] = None
# Active execution state
_PIPE_EMITS: List[Any] = []
_PIPE_ACTIVE: bool = False
_PIPE_IS_LAST: bool = False
# Ephemeral handoff for direct pipelines (e.g., URL --screen-shot | ...)
_LAST_PIPELINE_CAPTURE: Optional[Any] = None
# Remember last search query to support refreshing results after pipeline actions
_LAST_SEARCH_QUERY: Optional[str] = None
@@ -52,25 +49,23 @@ _PIPELINE_LAST_ITEMS: List[Any] = []
# Store the last result table for @ selection syntax (e.g., @2, @2-5, @{1,3,5})
_LAST_RESULT_TABLE: Optional[Any] = None
_LAST_RESULT_ITEMS: List[Any] = []
# Subject for the current result table (e.g., the file whose tags/URLs are displayed)
# Subject for the current result table (e.g., the file whose tags/url are displayed)
_LAST_RESULT_SUBJECT: Optional[Any] = None
# History of result tables for @.. navigation (LIFO stack, max 20 tables)
_RESULT_TABLE_HISTORY: List[tuple[Optional[Any], List[Any], Optional[Any]]] = []
_MAX_RESULT_TABLE_HISTORY = 20
# Forward history for @,, navigation (LIFO stack for popped tables)
_RESULT_TABLE_FORWARD: List[tuple[Optional[Any], List[Any], Optional[Any]]] = []
# Current stage table for @N expansion (separate from history)
# Used to track the ResultTable with source_command + row_selection_args from current pipeline stage
# This is set by cmdlets that display tabular results (e.g., download-data showing formats)
# and used by CLI to expand @N into full commands like "download-data URL -item 2"
_CURRENT_STAGE_TABLE: Optional[Any] = None
# Items displayed by non-selectable commands (get-tag, delete-tag, etc.)
# These are available for @N selection but NOT saved to history
_DISPLAY_ITEMS: List[Any] = []
# Table for display-only commands (overlay)
# Used when a command wants to show a specific table formatting but not affect history
_DISPLAY_TABLE: Optional[Any] = None
# Subject for overlay/display-only tables (takes precedence over _LAST_RESULT_SUBJECT)
_DISPLAY_SUBJECT: Optional[Any] = None
@@ -98,7 +93,7 @@ _UI_LIBRARY_REFRESH_CALLBACK: Optional[Any] = None
# ============================================================================
def set_stage_context(context: Optional[PipelineStageContext]) -> None:
"""Internal: Set the current pipeline stage context."""
"""Set the current pipeline stage context."""
global _CURRENT_CONTEXT
_CURRENT_CONTEXT = context
@@ -126,26 +121,21 @@ def emit(obj: Any) -> None:
return 0
```
"""
# Try new context-based approach first
if _CURRENT_CONTEXT is not None:
import logging
logger = logging.getLogger(__name__)
logger.debug(f"[EMIT] Context-based: appending to _CURRENT_CONTEXT.emits. obj={obj}")
_CURRENT_CONTEXT.emit(obj)
return
def emit_list(objects: List[Any]) -> None:
"""Emit a list of objects to the next pipeline stage.
# Fallback to legacy global approach (for backward compatibility)
try:
import logging
logger = logging.getLogger(__name__)
logger.debug(f"[EMIT] Legacy: appending to _PIPE_EMITS. obj type={type(obj).__name__}, _PIPE_EMITS len before={len(_PIPE_EMITS)}")
_PIPE_EMITS.append(obj)
logger.debug(f"[EMIT] Legacy: _PIPE_EMITS len after={len(_PIPE_EMITS)}")
except Exception as e:
import logging
logger = logging.getLogger(__name__)
logger.error(f"[EMIT] Error appending to _PIPE_EMITS: {e}", exc_info=True)
pass
This allows cmdlets to emit multiple results that are tracked as a list,
enabling downstream cmdlets to process all of them or filter by metadata.
Args:
objects: List of objects to emit
"""
if _CURRENT_CONTEXT is not None:
_CURRENT_CONTEXT.emit(objects)
def print_if_visible(*args: Any, file=None, **kwargs: Any) -> None:
@@ -171,7 +161,7 @@ def print_if_visible(*args: Any, file=None, **kwargs: Any) -> None:
"""
try:
# Print if: not in a pipeline OR this is the last stage
should_print = (not _PIPE_ACTIVE) or _PIPE_IS_LAST
should_print = (_CURRENT_CONTEXT is None) or (_CURRENT_CONTEXT and _CURRENT_CONTEXT.is_last_stage)
# Always print to stderr regardless
if file is not None:
@@ -304,17 +294,17 @@ def clear_pending_pipeline_tail() -> None:
_PENDING_PIPELINE_SOURCE = None
def reset() -> None:
"""Reset all pipeline state. Called between pipeline executions."""
global _PIPE_EMITS, _PIPE_ACTIVE, _PIPE_IS_LAST, _PIPELINE_VALUES
global _LAST_PIPELINE_CAPTURE, _PIPELINE_REFRESHED, _PIPELINE_LAST_ITEMS
global _PIPELINE_COMMAND_TEXT, _LAST_RESULT_SUBJECT, _DISPLAY_SUBJECT
global _PENDING_PIPELINE_TAIL, _PENDING_PIPELINE_SOURCE
global _PIPELINE_VALUES, _LAST_SEARCH_QUERY, _PIPELINE_REFRESHED
global _PIPELINE_LAST_ITEMS, _PIPELINE_COMMAND_TEXT, _LAST_RESULT_SUBJECT
global _DISPLAY_SUBJECT, _PENDING_PIPELINE_TAIL, _PENDING_PIPELINE_SOURCE
global _CURRENT_CONTEXT
_PIPE_EMITS = []
_PIPE_ACTIVE = False
_PIPE_IS_LAST = False
_LAST_PIPELINE_CAPTURE = None
_CURRENT_CONTEXT = None
_LAST_SEARCH_QUERY = None
_PIPELINE_REFRESHED = False
_PIPELINE_LAST_ITEMS = []
_PIPELINE_VALUES = {}
@@ -327,13 +317,15 @@ def reset() -> None:
def get_emitted_items() -> List[Any]:
"""Get a copy of all items emitted by the current pipeline stage."""
return list(_PIPE_EMITS)
if _CURRENT_CONTEXT is not None:
return list(_CURRENT_CONTEXT.emits)
return []
def clear_emits() -> None:
"""Clear the emitted items list (called between stages)."""
global _PIPE_EMITS
_PIPE_EMITS = []
if _CURRENT_CONTEXT is not None:
_CURRENT_CONTEXT.emits.clear()
def set_last_selection(indices: Sequence[int]) -> None:
@@ -375,20 +367,8 @@ def clear_current_command_text() -> None:
_PIPELINE_COMMAND_TEXT = ""
def set_active(active: bool) -> None:
"""Internal: Set whether we're in a pipeline context."""
global _PIPE_ACTIVE
_PIPE_ACTIVE = active
def set_last_stage(is_last: bool) -> None:
"""Internal: Set whether this is the last stage of the pipeline."""
global _PIPE_IS_LAST
_PIPE_IS_LAST = is_last
def set_search_query(query: Optional[str]) -> None:
"""Internal: Set the last search query for refresh purposes."""
"""Set the last search query for refresh purposes."""
global _LAST_SEARCH_QUERY
_LAST_SEARCH_QUERY = query
@@ -399,7 +379,7 @@ def get_search_query() -> Optional[str]:
def set_pipeline_refreshed(refreshed: bool) -> None:
"""Internal: Track whether the pipeline already refreshed results."""
"""Track whether the pipeline already refreshed results."""
global _PIPELINE_REFRESHED
_PIPELINE_REFRESHED = refreshed
@@ -410,7 +390,7 @@ def was_pipeline_refreshed() -> bool:
def set_last_items(items: list) -> None:
"""Internal: Cache the last pipeline outputs."""
"""Cache the last pipeline outputs."""
global _PIPELINE_LAST_ITEMS
_PIPELINE_LAST_ITEMS = list(items) if items else []
@@ -420,17 +400,6 @@ def get_last_items() -> List[Any]:
return list(_PIPELINE_LAST_ITEMS)
def set_last_capture(obj: Any) -> None:
"""Internal: Store ephemeral handoff for direct pipelines."""
global _LAST_PIPELINE_CAPTURE
_LAST_PIPELINE_CAPTURE = obj
def get_last_capture() -> Optional[Any]:
"""Get ephemeral pipeline handoff (e.g., URL --screen-shot | ...)."""
return _LAST_PIPELINE_CAPTURE
def set_ui_library_refresh_callback(callback: Any) -> None:
"""Set a callback to be called when library content is updated.
@@ -501,6 +470,22 @@ def set_last_result_table(result_table: Optional[Any], items: Optional[List[Any]
_LAST_RESULT_TABLE = result_table
_LAST_RESULT_ITEMS = items or []
_LAST_RESULT_SUBJECT = subject
# Sort table by Title/Name column alphabetically if available
if result_table is not None and hasattr(result_table, 'sort_by_title') and not getattr(result_table, 'preserve_order', False):
try:
result_table.sort_by_title()
# Re-order items list to match the sorted table
if _LAST_RESULT_ITEMS and hasattr(result_table, 'rows'):
sorted_items = []
for row in result_table.rows:
src_idx = getattr(row, 'source_index', None)
if isinstance(src_idx, int) and 0 <= src_idx < len(_LAST_RESULT_ITEMS):
sorted_items.append(_LAST_RESULT_ITEMS[src_idx])
if len(sorted_items) == len(result_table.rows):
_LAST_RESULT_ITEMS = sorted_items
except Exception:
pass
def set_last_result_table_overlay(result_table: Optional[Any], items: Optional[List[Any]] = None, subject: Optional[Any] = None) -> None:
@@ -518,6 +503,22 @@ def set_last_result_table_overlay(result_table: Optional[Any], items: Optional[L
_DISPLAY_TABLE = result_table
_DISPLAY_ITEMS = items or []
_DISPLAY_SUBJECT = subject
# Sort table by Title/Name column alphabetically if available
if result_table is not None and hasattr(result_table, 'sort_by_title') and not getattr(result_table, 'preserve_order', False):
try:
result_table.sort_by_title()
# Re-order items list to match the sorted table
if _DISPLAY_ITEMS and hasattr(result_table, 'rows'):
sorted_items = []
for row in result_table.rows:
src_idx = getattr(row, 'source_index', None)
if isinstance(src_idx, int) and 0 <= src_idx < len(_DISPLAY_ITEMS):
sorted_items.append(_DISPLAY_ITEMS[src_idx])
if len(sorted_items) == len(result_table.rows):
_DISPLAY_ITEMS = sorted_items
except Exception:
pass
def set_last_result_table_preserve_history(result_table: Optional[Any], items: Optional[List[Any]] = None, subject: Optional[Any] = None) -> None:
@@ -567,7 +568,7 @@ def restore_previous_result_table() -> bool:
True if a previous table was restored, False if history is empty
"""
global _LAST_RESULT_TABLE, _LAST_RESULT_ITEMS, _LAST_RESULT_SUBJECT
global _RESULT_TABLE_HISTORY, _DISPLAY_ITEMS, _DISPLAY_TABLE, _DISPLAY_SUBJECT
global _RESULT_TABLE_HISTORY, _RESULT_TABLE_FORWARD, _DISPLAY_ITEMS, _DISPLAY_TABLE, _DISPLAY_SUBJECT
# If we have an active overlay (display items/table), clear it to "go back" to the underlying table
if _DISPLAY_ITEMS or _DISPLAY_TABLE or _DISPLAY_SUBJECT is not None:
@@ -579,6 +580,9 @@ def restore_previous_result_table() -> bool:
if not _RESULT_TABLE_HISTORY:
return False
# Save current state to forward stack before popping
_RESULT_TABLE_FORWARD.append((_LAST_RESULT_TABLE, _LAST_RESULT_ITEMS, _LAST_RESULT_SUBJECT))
# Pop from history and restore
prev = _RESULT_TABLE_HISTORY.pop()
if isinstance(prev, tuple) and len(prev) >= 3:
@@ -595,6 +599,44 @@ def restore_previous_result_table() -> bool:
return True
def restore_next_result_table() -> bool:
"""Restore the next result table from forward history (for @,, navigation).
Returns:
True if a next table was restored, False if forward history is empty
"""
global _LAST_RESULT_TABLE, _LAST_RESULT_ITEMS, _LAST_RESULT_SUBJECT
global _RESULT_TABLE_HISTORY, _RESULT_TABLE_FORWARD, _DISPLAY_ITEMS, _DISPLAY_TABLE, _DISPLAY_SUBJECT
# If we have an active overlay (display items/table), clear it to "go forward" to the underlying table
if _DISPLAY_ITEMS or _DISPLAY_TABLE or _DISPLAY_SUBJECT is not None:
_DISPLAY_ITEMS = []
_DISPLAY_TABLE = None
_DISPLAY_SUBJECT = None
return True
if not _RESULT_TABLE_FORWARD:
return False
# Save current state to history stack before popping forward
_RESULT_TABLE_HISTORY.append((_LAST_RESULT_TABLE, _LAST_RESULT_ITEMS, _LAST_RESULT_SUBJECT))
# Pop from forward stack and restore
next_state = _RESULT_TABLE_FORWARD.pop()
if isinstance(next_state, tuple) and len(next_state) >= 3:
_LAST_RESULT_TABLE, _LAST_RESULT_ITEMS, _LAST_RESULT_SUBJECT = next_state[0], next_state[1], next_state[2]
elif isinstance(next_state, tuple) and len(next_state) == 2:
_LAST_RESULT_TABLE, _LAST_RESULT_ITEMS = next_state
_LAST_RESULT_SUBJECT = None
else:
_LAST_RESULT_TABLE, _LAST_RESULT_ITEMS, _LAST_RESULT_SUBJECT = None, [], None
# Clear display items so get_last_result_items() falls back to restored items
_DISPLAY_ITEMS = []
_DISPLAY_TABLE = None
_DISPLAY_SUBJECT = None
return True
def get_display_table() -> Optional[Any]:
"""Get the current display overlay table.
@@ -637,9 +679,15 @@ def get_last_result_items() -> List[Any]:
# Prioritize items from display commands (get-tag, delete-tag, etc.)
# These are available for immediate @N selection
if _DISPLAY_ITEMS:
if _DISPLAY_TABLE is not None and not _is_selectable_table(_DISPLAY_TABLE):
return []
return _DISPLAY_ITEMS
# Fall back to items from last search/selectable command
return _LAST_RESULT_ITEMS
if _LAST_RESULT_TABLE is None:
return _LAST_RESULT_ITEMS
if _is_selectable_table(_LAST_RESULT_TABLE):
return _LAST_RESULT_ITEMS
return []
def get_last_result_table_source_command() -> Optional[str]:
@@ -648,7 +696,7 @@ def get_last_result_table_source_command() -> Optional[str]:
Returns:
Command name (e.g., 'download-data') or None if not set
"""
if _LAST_RESULT_TABLE and hasattr(_LAST_RESULT_TABLE, 'source_command'):
if _is_selectable_table(_LAST_RESULT_TABLE) and hasattr(_LAST_RESULT_TABLE, 'source_command'):
return _LAST_RESULT_TABLE.source_command
return None
@@ -659,7 +707,7 @@ def get_last_result_table_source_args() -> List[str]:
Returns:
List of arguments (e.g., ['https://example.com']) or empty list
"""
if _LAST_RESULT_TABLE and hasattr(_LAST_RESULT_TABLE, 'source_args'):
if _is_selectable_table(_LAST_RESULT_TABLE) and hasattr(_LAST_RESULT_TABLE, 'source_args'):
return _LAST_RESULT_TABLE.source_args or []
return []
@@ -673,7 +721,7 @@ def get_last_result_table_row_selection_args(row_index: int) -> Optional[List[st
Returns:
Selection arguments (e.g., ['-item', '3']) or None
"""
if _LAST_RESULT_TABLE and hasattr(_LAST_RESULT_TABLE, 'rows'):
if _is_selectable_table(_LAST_RESULT_TABLE) and hasattr(_LAST_RESULT_TABLE, 'rows'):
if 0 <= row_index < len(_LAST_RESULT_TABLE.rows):
row = _LAST_RESULT_TABLE.rows[row_index]
if hasattr(row, 'selection_args'):
@@ -696,13 +744,18 @@ def set_current_stage_table(result_table: Optional[Any]) -> None:
_CURRENT_STAGE_TABLE = result_table
def get_current_stage_table() -> Optional[Any]:
"""Get the current pipeline stage table (if any)."""
return _CURRENT_STAGE_TABLE
def get_current_stage_table_source_command() -> Optional[str]:
"""Get the source command from the current pipeline stage table.
Returns:
Command name (e.g., 'download-data') or None
"""
if _CURRENT_STAGE_TABLE and hasattr(_CURRENT_STAGE_TABLE, 'source_command'):
if _is_selectable_table(_CURRENT_STAGE_TABLE) and hasattr(_CURRENT_STAGE_TABLE, 'source_command'):
return _CURRENT_STAGE_TABLE.source_command
return None
@@ -713,7 +766,7 @@ def get_current_stage_table_source_args() -> List[str]:
Returns:
List of arguments or empty list
"""
if _CURRENT_STAGE_TABLE and hasattr(_CURRENT_STAGE_TABLE, 'source_args'):
if _is_selectable_table(_CURRENT_STAGE_TABLE) and hasattr(_CURRENT_STAGE_TABLE, 'source_args'):
return _CURRENT_STAGE_TABLE.source_args or []
return []
@@ -727,7 +780,7 @@ def get_current_stage_table_row_selection_args(row_index: int) -> Optional[List[
Returns:
Selection arguments or None
"""
if _CURRENT_STAGE_TABLE and hasattr(_CURRENT_STAGE_TABLE, 'rows'):
if _is_selectable_table(_CURRENT_STAGE_TABLE) and hasattr(_CURRENT_STAGE_TABLE, 'rows'):
if 0 <= row_index < len(_CURRENT_STAGE_TABLE.rows):
row = _CURRENT_STAGE_TABLE.rows[row_index]
if hasattr(row, 'selection_args'):
@@ -735,23 +788,21 @@ def get_current_stage_table_row_selection_args(row_index: int) -> Optional[List[
return None
def get_current_stage_table_row_source_index(row_index: int) -> Optional[int]:
"""Get the original source index for a row in the current stage table.
Useful when the table has been sorted for display but selections should map
back to the original item order (e.g., playlist or provider order).
"""
if _is_selectable_table(_CURRENT_STAGE_TABLE) and hasattr(_CURRENT_STAGE_TABLE, 'rows'):
if 0 <= row_index < len(_CURRENT_STAGE_TABLE.rows):
row = _CURRENT_STAGE_TABLE.rows[row_index]
return getattr(row, 'source_index', None)
return None
def clear_last_result() -> None:
"""Clear the stored last result table and items."""
global _LAST_RESULT_TABLE, _LAST_RESULT_ITEMS
_LAST_RESULT_TABLE = None
_LAST_RESULT_ITEMS = []
def emit_list(objects: List[Any]) -> None:
"""Emit a list of PipeObjects to the next pipeline stage.
This allows cmdlets to emit multiple results that are tracked as a list,
enabling downstream cmdlets to process all of them or filter by metadata.
Args:
objects: List of PipeObject instances or dicts to emit
"""
if _CURRENT_CONTEXT is not None:
_CURRENT_CONTEXT.emit(objects)
else:
_PIPE_EMITS.append(objects)

View File

@@ -106,7 +106,7 @@ dev = [
mm = "medeia_macina.cli_entry:main"
medeia = "medeia_macina.cli_entry:main"
[project.urls]
[project.url]
Homepage = "https://github.com/yourusername/medeia-macina"
Documentation = "https://medeia-macina.readthedocs.io"
Repository = "https://github.com/yourusername/medeia-macina.git"

View File

@@ -114,6 +114,8 @@ class ResultRow:
columns: List[ResultColumn] = field(default_factory=list)
selection_args: Optional[List[str]] = None
"""Arguments to use for this row when selected via @N syntax (e.g., ['-item', '3'])"""
source_index: Optional[int] = None
"""Original insertion order index (used to map sorted views back to source items)."""
def add_column(self, name: str, value: Any) -> None:
"""Add a column to this row."""
@@ -166,13 +168,14 @@ class ResultTable:
>>> print(result_table)
"""
def __init__(self, title: str = "", title_width: int = 80, max_columns: int = None):
def __init__(self, title: str = "", title_width: int = 80, max_columns: int = None, preserve_order: bool = False):
"""Initialize a result table.
Args:
title: Optional title for the table
title_width: Width for formatting the title line
max_columns: Maximum number of columns to display (None for unlimited, default: 5 for search results)
preserve_order: When True, skip automatic sorting so row order matches source
"""
self.title = title
self.title_width = title_width
@@ -187,10 +190,25 @@ class ResultTable:
"""Base arguments for the source command"""
self.header_lines: List[str] = []
"""Optional metadata lines rendered under the title"""
self.preserve_order: bool = preserve_order
"""If True, skip automatic sorting so display order matches input order."""
self.no_choice: bool = False
"""When True, suppress row numbers/selection to make the table non-interactive."""
def set_no_choice(self, no_choice: bool = True) -> "ResultTable":
"""Mark the table as non-interactive (no row numbers, no selection parsing)."""
self.no_choice = bool(no_choice)
return self
def set_preserve_order(self, preserve: bool = True) -> "ResultTable":
"""Configure whether this table should skip automatic sorting."""
self.preserve_order = bool(preserve)
return self
def add_row(self) -> ResultRow:
"""Add a new row to the table and return it for configuration."""
row = ResultRow()
row.source_index = len(self.rows)
self.rows.append(row)
return row
@@ -210,6 +228,50 @@ class ResultTable:
self.source_command = command
self.source_args = args or []
return self
def init_command(self, title: str, command: str, args: Optional[List[str]] = None, preserve_order: bool = False) -> "ResultTable":
"""Initialize table with title, command, args, and preserve_order in one call.
Consolidates common initialization pattern: ResultTable(title) + set_source_command(cmd, args) + set_preserve_order(preserve_order)
Args:
title: Table title
command: Source command name
args: Command arguments
preserve_order: Whether to preserve input row order
Returns:
self for method chaining
"""
self.title = title
self.source_command = command
self.source_args = args or []
self.preserve_order = preserve_order
return self
def copy_with_title(self, new_title: str) -> "ResultTable":
"""Create a new table copying settings from this one but with a new title.
Consolidates pattern: new_table = ResultTable(title); new_table.set_source_command(...)
Useful for intermediate processing that needs to preserve source command but update display title.
Args:
new_title: New title for the copied table
Returns:
New ResultTable with copied settings and new title
"""
new_table = ResultTable(
title=new_title,
title_width=self.title_width,
max_columns=self.max_columns,
preserve_order=self.preserve_order
)
new_table.source_command = self.source_command
new_table.source_args = list(self.source_args) if self.source_args else []
new_table.input_options = dict(self.input_options) if self.input_options else {}
new_table.no_choice = self.no_choice
return new_table
def set_row_selection_args(self, row_index: int, selection_args: List[str]) -> None:
"""Set the selection arguments for a specific row.
@@ -252,6 +314,39 @@ class ResultTable:
self.set_header_line(summary)
return summary
def sort_by_title(self) -> "ResultTable":
"""Sort rows alphabetically by Title or Name column.
Looks for columns named 'Title', 'Name', or 'Tag' (in that order).
Case-insensitive sort. Returns self for chaining.
IMPORTANT: Updates source_index to match new sorted positions so that
@N selections continue to work correctly after sorting.
"""
if getattr(self, "preserve_order", False):
return self
# Find the title column (try Title, Name, Tag in order)
title_col_idx = None
for row in self.rows:
if not row.columns:
continue
for idx, col in enumerate(row.columns):
col_lower = col.name.lower()
if col_lower in ("title", "name", "tag"):
title_col_idx = idx
break
if title_col_idx is not None:
break
if title_col_idx is None:
# No title column found, return unchanged
return self
# Sort rows by the title column value (case-insensitive)
self.rows.sort(key=lambda row: row.columns[title_col_idx].value.lower() if title_col_idx < len(row.columns) else "")
return self
def add_result(self, result: Any) -> "ResultTable":
"""Add a result object (SearchResult, PipeObject, ResultItem, TagItem, or dict) as a row.
@@ -338,8 +433,7 @@ class ResultTable:
# Size (for files)
if hasattr(result, 'size_bytes') and result.size_bytes:
size_mb = result.size_bytes / (1024 * 1024)
row.add_column("Size", f"{size_mb:.1f} MB")
row.add_column("Size (Mb)", _format_size(result.size_bytes, integer_only=True))
# Annotations
if hasattr(result, 'annotations') and result.annotations:
@@ -385,8 +479,7 @@ class ResultTable:
# Size (for files) - integer MB only
if hasattr(item, 'size_bytes') and item.size_bytes:
size_mb = int(item.size_bytes / (1024 * 1024))
row.add_column("Size", f"{size_mb} MB")
row.add_column("Size (Mb)", _format_size(item.size_bytes, integer_only=True))
def _add_tag_item(self, row: ResultRow, item: Any) -> None:
"""Extract and add TagItem fields to row (compact tag display).
@@ -421,8 +514,8 @@ class ResultTable:
row.add_column("Title", obj.title[:50] + ("..." if len(obj.title) > 50 else ""))
# File info
if hasattr(obj, 'file_path') and obj.file_path:
file_str = str(obj.file_path)
if hasattr(obj, 'path') and obj.path:
file_str = str(obj.path)
if len(file_str) > 60:
file_str = "..." + file_str[-57:]
row.add_column("Path", file_str)
@@ -467,8 +560,8 @@ class ResultTable:
def is_hidden_field(field_name: Any) -> bool:
# Hide internal/metadata fields
hidden_fields = {
'__', 'id', 'action', 'parent_id', 'is_temp', 'file_path', 'extra',
'target', 'hash', 'hash_hex', 'file_hash'
'__', 'id', 'action', 'parent_id', 'is_temp', 'path', 'extra',
'target', 'hash', 'hash_hex', 'file_hash', 'tags', 'tag_summary', 'name'
}
if isinstance(field_name, str):
if field_name.startswith('__'):
@@ -551,15 +644,12 @@ class ResultTable:
# Only add priority groups if we haven't already filled columns from 'columns' field
if column_count == 0:
# Priority field groups - uses first matching field in each group
# Explicitly set which columns to display in order
priority_groups = [
('title | name | filename', ['title', 'name', 'filename']),
('title', ['title']),
('ext', ['ext']),
('origin | source | store', ['origin', 'source', 'store']),
('size | size_bytes', ['size', 'size_bytes']),
('type | media_kind | kind', ['type', 'media_kind', 'kind']),
('tags | tag_summary', ['tags', 'tag_summary']),
('detail | description', ['detail', 'description']),
('size', ['size', 'size_bytes']),
('store', ['store', 'origin', 'source']),
]
# Add priority field groups first - use first match in each group
@@ -568,14 +658,22 @@ class ResultTable:
break
for field in field_options:
if field in visible_data and field not in added_fields:
value_str = format_value(visible_data[field])
# Special handling for size fields - format as MB integer
if field in ['size', 'size_bytes']:
value_str = _format_size(visible_data[field], integer_only=True)
else:
value_str = format_value(visible_data[field])
if len(value_str) > 60:
value_str = value_str[:57] + "..."
# Special case for Origin/Source -> Store to match user preference
col_name = field.replace('_', ' ').title()
if field in ['origin', 'source']:
# Map field names to display column names
if field in ['store', 'origin', 'source']:
col_name = "Store"
elif field in ['size', 'size_bytes']:
col_name = "Size (Mb)"
else:
col_name = field.replace('_', ' ').title()
row.add_column(col_name, value_str)
added_fields.add(field)
@@ -583,17 +681,7 @@ class ResultTable:
break # Use first match in this group, skip rest
# Add remaining fields only if we haven't hit max_columns (and no explicit columns were set)
if column_count < self.max_columns:
for key, value in visible_data.items():
if column_count >= self.max_columns:
break
if key not in added_fields: # Only add if not already added
value_str = format_value(value)
if len(value_str) > 40:
value_str = value_str[:37] + "..."
row.add_column(key.replace('_', ' ').title(), value_str)
added_fields.add(key) # Track in added_fields to prevent re-adding
column_count += 1
# Don't add any remaining fields - only use priority_groups for dict results
# Check for selection args
if '_selection_args' in data:
@@ -637,8 +725,8 @@ class ResultTable:
value_width
)
# Calculate row number column width
num_width = len(str(len(self.rows))) + 1 # +1 for padding
# Calculate row number column width (skip if no-choice)
num_width = 0 if self.no_choice else len(str(len(self.rows))) + 1
# Preserve column order
column_names = list(col_widths.keys())
@@ -647,7 +735,7 @@ class ResultTable:
cap = 5 if name.lower() == "ext" else 90
return min(col_widths[name], cap)
widths = [num_width] + [capped_width(name) for name in column_names]
widths = ([] if self.no_choice else [num_width]) + [capped_width(name) for name in column_names]
base_inner_width = sum(widths) + (len(widths) - 1) * 3 # account for " | " separators
# Compute final table width (with side walls) to accommodate headers/titles
@@ -668,7 +756,7 @@ class ResultTable:
# Title block
if self.title:
lines.append("|" + "=" * (table_width - 2) + "|")
lines.append(wrap(self.title.center(table_width - 2)))
lines.append(wrap(self.title.ljust(table_width - 2)))
lines.append("|" + "=" * (table_width - 2) + "|")
# Optional header metadata lines
@@ -676,8 +764,8 @@ class ResultTable:
lines.append(wrap(meta))
# Add header with # column
header_parts = ["#".ljust(num_width)]
separator_parts = ["-" * num_width]
header_parts = [] if self.no_choice else ["#".ljust(num_width)]
separator_parts = [] if self.no_choice else ["-" * num_width]
for col_name in column_names:
width = capped_width(col_name)
header_parts.append(col_name.ljust(width))
@@ -688,7 +776,7 @@ class ResultTable:
# Add rows with row numbers
for row_num, row in enumerate(self.rows, 1):
row_parts = [str(row_num).ljust(num_width)]
row_parts = [] if self.no_choice else [str(row_num).ljust(num_width)]
for col_name in column_names:
width = capped_width(col_name)
col_value = row.get_column(col_name) or ""
@@ -785,6 +873,11 @@ class ResultTable:
If accept_args=False: List of 0-based indices, or None if cancelled
If accept_args=True: Dict with "indices" and "args" keys, or None if cancelled
"""
if self.no_choice:
print(f"\n{self}")
print("Selection is disabled for this table.")
return None
# Display the table
print(f"\n{self}")
@@ -832,6 +925,9 @@ class ResultTable:
Returns:
List of 0-based indices, or None if invalid
"""
if self.no_choice:
return None
indices = set()
# Split by comma for multiple selections
@@ -1206,14 +1302,15 @@ def _format_duration(duration: Any) -> str:
return ""
def _format_size(size: Any) -> str:
def _format_size(size: Any, integer_only: bool = False) -> str:
"""Format file size as human-readable string.
Args:
size: Size in bytes or already formatted string
integer_only: If True, show MB as integer only (e.g., "250 MB" not "250.5 MB")
Returns:
Formatted size string (e.g., "1.5 MB", "250 KB")
Formatted size string (e.g., "250 MB", "1.5 MB" or "250 MB" if integer_only=True)
"""
if isinstance(size, str):
return size if size else ""
@@ -1223,11 +1320,22 @@ def _format_size(size: Any) -> str:
if bytes_val < 0:
return ""
for unit, divisor in [("GB", 1024**3), ("MB", 1024**2), ("KB", 1024)]:
if bytes_val >= divisor:
return f"{bytes_val / divisor:.1f} {unit}"
return f"{bytes_val} B"
if integer_only:
# For table display: always show as integer MB if >= 1MB
mb_val = int(bytes_val / (1024 * 1024))
if mb_val > 0:
return str(mb_val)
kb_val = int(bytes_val / 1024)
if kb_val > 0:
return str(kb_val)
return str(bytes_val)
else:
# For descriptions: show with one decimal place
for unit, divisor in [("GB", 1024**3), ("MB", 1024**2), ("KB", 1024)]:
if bytes_val >= divisor:
return f"{bytes_val / divisor:.1f} {unit}"
return f"{bytes_val} B"
except (ValueError, TypeError):
return ""

View File

@@ -0,0 +1,10 @@
import importlib
import traceback
import sys
try:
importlib.import_module('cmdlets')
print('cmdlets imported OK')
except Exception:
traceback.print_exc()
sys.exit(1)

View File

@@ -0,0 +1,8 @@
import importlib, traceback, sys
try:
importlib.import_module('cmdlets.download_media')
print('download_media imported OK')
except Exception:
traceback.print_exc()
sys.exit(1)

View File

@@ -0,0 +1,5 @@
from pathlib import Path
p = Path('cmdlets/_shared.py')
for i, line in enumerate(p.read_text().splitlines(), start=1):
if 1708 <= i <= 1720:
print(f"{i:4}: {repr(line)}")

View File

@@ -0,0 +1,24 @@
from pathlib import Path
import re
p = Path('cmdlets/_shared.py')
src = p.read_text(encoding='utf-8')
lines = src.splitlines(True)
changed = False
new_lines = []
for line in lines:
m = re.match(r'^(?P<ws>[ \t]*)', line)
ws = m.group('ws') if m else ''
if '\t' in ws:
new_ws = ws.replace('\t', ' ')
new_line = new_ws + line[len(ws):]
new_lines.append(new_line)
changed = True
else:
new_lines.append(line)
if changed:
p.write_text(''.join(new_lines), encoding='utf-8')
print('Normalized leading tabs to spaces in', p)
else:
print('No leading tabs found; no changes made')

View File

@@ -0,0 +1,160 @@
#!/usr/bin/env python3
"""
Careful refactoring of download_data.py to class-based pattern.
Handles nested functions and inner definitions correctly.
"""
import re
from pathlib import Path
def refactor_download_data():
backup_file = Path('cmdlets/download_data_backup.py')
output_file = Path('cmdlets/download_data.py')
print(f"Reading: {backup_file}")
content = backup_file.read_text(encoding='utf-8')
lines = content.split('\n')
output = []
i = 0
in_cmdlet_def = False
skip_old_run_wrapper = False
class_added = False
while i < len(lines):
line = lines[i]
# Skip old _run wrapper function
if line.strip().startswith('def _run(result: Any'):
while i < len(lines):
i += 1
if lines[i] and not lines[i][0].isspace():
break
continue
# Skip old CMDLET definition
if line.strip().startswith('CMDLET = Cmdlet('):
while i < len(lines):
i += 1
if lines[i].strip() == ')':
i += 1
break
output.append('')
output.append('# Create and register the cmdlet')
output.append('CMDLET = Download_Data()')
output.append('')
continue
# Insert class definition before first top-level helper
if not class_added and line.strip().startswith('def _download_torrent_worker('):
# Add class header with __init__ and run()
output.extend([
'',
'',
'class Download_Data(Cmdlet):',
' """Class-based download-data cmdlet with self-registration."""',
'',
' def __init__(self) -> None:',
' """Initialize download-data cmdlet."""',
' super().__init__(',
' name="download-data",',
' summary="Download data from url with playlist/clip support using yt-dlp",',
' usage="download-data <url> [options] or search-file | download-data [options]",',
' alias=["download", "dl"],',
' arg=[',
' CmdletArg(name="url", type="string", required=False, description="URL to download (HTTP/HTTPS or file with URL list)", variadic=True),',
' CmdletArg(name="-url", type="string", description="URL to download (alias for positional argument)", variadic=True),',
' CmdletArg(name="list-formats", type="flag", description="List available formats without downloading"),',
' CmdletArg(name="audio", type="flag", alias="a", description="Download audio only (extract from video)"),',
' CmdletArg(name="video", type="flag", alias="v", description="Download video (default if not specified)"),',
' CmdletArg(name="format", type="string", alias="fmt", description="Explicit yt-dlp format selector (e.g., bestvideo+bestaudio)"),',
' CmdletArg(name="clip", type="string", description="Extract time range: MM:SS-MM:SS (e.g., 34:03-35:08) or seconds"),',
' CmdletArg(name="section", type="string", description="Download sections (yt-dlp only): TIME_RANGE[,TIME_RANGE...] (e.g., 1:30-1:35,0:05-0:15)"),',
' CmdletArg(name="cookies", type="string", description="Path to cookies.txt file for authentication"),',
' CmdletArg(name="torrent", type="flag", description="Download torrent/magnet via AllDebrid (requires API key in config)"),',
' CmdletArg(name="wait", type="float", description="Wait time (seconds) for magnet processing timeout"),',
' CmdletArg(name="background", type="flag", alias="bg", description="Start download in background and return to prompt immediately"),',
' CmdletArg(name="item", type="string", alias="items", description="Item selection for playlists/formats: use -item N to select format N, or -item to show table for @N selection in next command"),',
' SharedArgs.STORAGE,',
' ],',
' detail=["Download media from url with advanced features.", "", "See help for full usage examples."],',
' exec=self.run,',
' )',
' self.register()',
'',
' def run(self, result: Any, args: Sequence[str], config: Dict[str, Any]) -> int:',
' """Main execution method."""',
' stage_ctx = pipeline_context.get_stage_context()',
' in_pipeline = stage_ctx is not None and getattr(stage_ctx, "total_stages", 1) > 1',
' if in_pipeline and isinstance(config, dict):',
' config["_quiet_background_output"] = True',
' return self._run_impl(result, args, config, emit_results=True)',
'',
' # ' + '='*70,
' # HELPER METHODS',
' # ' + '='*70,
'',
])
class_added = True
# Convert top-level helper functions to static methods
if class_added and line and not line[0].isspace() and line.strip().startswith('def _'):
output.append(' @staticmethod')
output.append(f' {line}')
i += 1
# Copy function body with indentation
while i < len(lines):
next_line = lines[i]
# Stop at next top-level definition
if next_line and not next_line[0].isspace() and (next_line.strip().startswith(('def ', 'class ', 'CMDLET'))):
break
# Add indentation
if next_line.strip():
output.append(f' {next_line}')
else:
output.append(next_line)
i += 1
continue
output.append(line)
i += 1
result_text = '\n'.join(output)
# NOW: Update function calls carefully
# Only update calls in _run_impl, not in nested function definitions
# Pattern: match _func( but NOT when it's after "def " on the same line
helper_funcs = [
'_download_torrent_worker', '_guess_libgen_title', '_is_libgen_entry',
'_download_libgen_entry', '_libgen_background_worker',
'_start_libgen_background_worker', '_run_pipeline_tail',
'_download_http_background_worker', '_start_http_background_download',
'_parse_torrent_file', '_download_torrent_file', '_is_torrent_file_or_url',
'_process_torrent_input', '_show_playlist_table', '_parse_time_range',
'_parse_section_ranges', '_parse_playlist_selection_indices',
'_select_playlist_entries', '_sanitize_title_for_filename',
'_find_playlist_files_from_entries', '_snapshot_playlist_paths',
'_is_openlibrary_downloadable', '_as_dict', '_is_youtube_url',
]
# Split into lines for careful replacement
result_lines = result_text.split('\n')
for idx, line in enumerate(result_lines):
# Skip lines that are function definitions
if 'def ' in line:
continue
# Replace helper function calls with self.
for func in helper_funcs:
# Pattern: _func( with word boundary before
pattern = rf'\b({re.escape(func)})\('
if re.search(pattern, line):
result_lines[idx] = re.sub(pattern, r'self.\1(', line)
result_text = '\n'.join(result_lines)
output_file.write_text(result_text, encoding='utf-8')
print(f"✓ Written: {output_file}")
print(f"✓ Class-based refactor complete")
if __name__ == '__main__':
refactor_download_data()

View File

@@ -0,0 +1,131 @@
#!/usr/bin/env python3
"""
Automated refactoring script for download_data.py
Converts module-level functions to class-based cmdlet pattern.
"""
import re
from pathlib import Path
def main():
backup_file = Path('cmdlets/download_data_backup.py')
output_file = Path('cmdlets/download_data.py')
print(f"Reading: {backup_file}")
content = backup_file.read_text(encoding='utf-8')
lines = content.split('\n')
output = []
i = 0
in_cmdlet_def = False
skip_old_run_wrapper = False
class_section_added = False
# Track where to insert class definition
last_import_line = 0
while i < len(lines):
line = lines[i]
# Track imports
if line.strip().startswith(('import ', 'from ')):
last_import_line = len(output)
# Skip old _run wrapper function
if 'def _run(result: Any' in line:
skip_old_run_wrapper = True
i += 1
continue
if skip_old_run_wrapper:
if line and not line[0].isspace():
skip_old_run_wrapper = False
else:
i += 1
continue
# Skip old CMDLET definition
if line.strip().startswith('CMDLET = Cmdlet('):
in_cmdlet_def = True
i += 1
continue
if in_cmdlet_def:
if line.strip() == ')':
in_cmdlet_def = False
# Add class instantiation instead
output.append('')
output.append('# Create and register the cmdlet')
output.append('CMDLET = Download_Data()')
output.append('')
i += 1
continue
# Insert class definition before first helper function
if not class_section_added and line.strip().startswith('def _download_torrent_worker('):
output.append('')
output.append('')
output.append('class Download_Data(Cmdlet):')
output.append(' """Class-based download-data cmdlet with self-registration."""')
output.append('')
output.append(' # Full __init__ implementation to be added')
output.append(' # Full run() method to be added')
output.append('')
output.append(' # ' + '='*70)
output.append(' # HELPER METHODS')
output.append(' # ' + '='*70)
output.append('')
class_section_added = True
# Convert top-level helper functions to static methods
if class_section_added and line.strip().startswith('def _') and not line.strip().startswith('def __'):
# Check if this is a top-level function (no indentation)
if not line.startswith((' ', '\t')):
output.append(' @staticmethod')
output.append(f' {line}')
i += 1
# Copy function body with indentation
while i < len(lines):
next_line = lines[i]
# Stop at next top-level definition
if next_line and not next_line[0].isspace() and (next_line.strip().startswith('def ') or next_line.strip().startswith('class ') or next_line.strip().startswith('CMDLET')):
break
# Add indentation
if next_line.strip():
output.append(f' {next_line}')
else:
output.append(next_line)
i += 1
continue
# Convert _run_impl to method (but keep as-is for now, will be updated later)
if class_section_added and line.strip().startswith('def _run_impl('):
output.append(' def _run_impl(self, result: Any, args: Sequence[str], config: Dict[str, Any], emit_results: bool = True) -> int:')
i += 1
# Copy function body with indentation
while i < len(lines):
next_line = lines[i]
if next_line and not next_line[0].isspace() and next_line.strip():
break
if next_line.strip():
output.append(f' {next_line}')
else:
output.append(next_line)
i += 1
continue
output.append(line)
i += 1
# Write output
result_text = '\n'.join(output)
output_file.write_text(result_text, encoding='utf-8')
print(f"✓ Written: {output_file}")
print(f"✓ Converted {content.count('def _')} helper functions to static methods")
print("\nNext steps:")
print("1. Add full __init__ method with cmdlet args")
print("2. Add run() method that calls _run_impl")
print("3. Update function calls in _run_impl from _func() to self._func()")
if __name__ == '__main__':
main()

BIN
test/medios-macina.db Normal file

Binary file not shown.

BIN
test/yapping.m4a Normal file

Binary file not shown.

View File

@@ -0,0 +1 @@
hash:00beb438e3c02cdc0340526deb0c51f916ffd6330259be4f350009869c5448d9

1
test/yapping.m4a.tag Normal file
View File

@@ -0,0 +1 @@
title:yapping