Some checks failed
smoke-mm / Install & smoke test mm --help (push) Has been cancelled
93 lines
3.0 KiB
Markdown
93 lines
3.0 KiB
Markdown
# get-url Command Enhancement Summary
|
|
|
|
## What Changed
|
|
|
|
Enhanced the `get-url` command in [cmdlet/get_url.py](cmdlet/get_url.py) to support searching for URLs across all stores with smart pattern matching.
|
|
|
|
## Key Features Added
|
|
|
|
### 1. URL Normalization (`_normalize_url_for_search`)
|
|
- Strips protocol prefixes: `https://`, `http://`, `ftp://`, etc.
|
|
- Removes `www.` prefix (case-insensitive)
|
|
- Converts to lowercase for case-insensitive matching
|
|
|
|
**Examples:**
|
|
- `https://www.youtube.com/watch?v=xx` → `youtube.com/watch?v=xx`
|
|
- `http://www.google.com` → `google.com`
|
|
- `FTP://cdn.example.com` → `cdn.example.com`
|
|
|
|
### 2. Wildcard Pattern Matching (`_match_url_pattern`)
|
|
- Supports `*` (matches any sequence) and `?` (matches single character)
|
|
- Case-insensitive matching
|
|
- Uses Python's `fnmatch` for robust pattern support
|
|
|
|
**Examples:**
|
|
- `youtube.com*` matches `youtube.com/watch`, `youtube.com/shorts`, etc.
|
|
- `*.example.com*` matches `cdn.example.com`, `api.example.com`, etc.
|
|
- `google.com/search*` matches `google.com/search?q=term`, etc.
|
|
|
|
### 3. Cross-Store URL Search (`_search_urls_across_stores`)
|
|
- Searches all configured stores (hydrus, folder, etc.)
|
|
- Finds matching URLs across all files in all stores
|
|
- Returns results grouped by store
|
|
- Emits `UrlItem` objects for pipelining
|
|
|
|
## Command Usage
|
|
|
|
### Search for URLs matching a pattern
|
|
```bash
|
|
get-url -url "www.google.com"
|
|
get-url -url "youtube.com*"
|
|
get-url -url "*.example.com*"
|
|
```
|
|
|
|
### Original usage (unchanged)
|
|
```bash
|
|
@1 | get-url
|
|
# Requires hash and store from piped result
|
|
```
|
|
|
|
## Implementation Details
|
|
|
|
### New Methods
|
|
- `_normalize_url_for_search(url)` - Static method to normalize URLs
|
|
- `_match_url_pattern(url, pattern)` - Static method to match with wildcards
|
|
- `_search_urls_across_stores(pattern, config)` - Search across all stores
|
|
|
|
### Modified Method
|
|
- `run()` - Enhanced to support `-url` flag for searching, fallback to original behavior
|
|
|
|
### Return Values
|
|
- **Search mode**: List of `UrlItem` objects grouped by store, exit code 0 if found, 1 if no matches
|
|
- **Original mode**: URLs for specific file, exit code 0 if found, 1 if not found
|
|
|
|
## Testing
|
|
|
|
A test script is included: [test_get_url_search.py](test_get_url_search.py)
|
|
|
|
**All tests pass:**
|
|
- ✓ URL normalization (protocol/www stripping)
|
|
- ✓ Wildcard pattern matching
|
|
- ✓ Case-insensitive matching
|
|
- ✓ Complex patterns with subdomains and paths
|
|
|
|
## Files Modified
|
|
|
|
- [cmdlet/get_url.py](cmdlet/get_url.py) - Enhanced with URL search functionality
|
|
- [docs/GET_URL_SEARCH.md](docs/GET_URL_SEARCH.md) - User documentation
|
|
- [test_get_url_search.py](test_get_url_search.py) - Test suite
|
|
|
|
## Backward Compatibility
|
|
|
|
✓ Fully backward compatible - original usage unchanged:
|
|
- `@1 | get-url` still works as before
|
|
- `-query` flag still works for hash lookups
|
|
- `-store` flag still required for direct lookups
|
|
|
|
## Error Handling
|
|
|
|
- Returns exit code 1 if no matches found (search mode)
|
|
- Returns exit code 1 if no store configured
|
|
- Gracefully handles store backend errors
|
|
- Logs errors to stderr without crashing
|