# get-url Architecture & Flow ## Overview The enhanced `get-url` command supports two modes: ``` get-url ├── SEARCH MODE (new) │ └── -url "pattern" │ ├── Normalize pattern (strip protocol, www) │ ├── Search all stores │ ├── Match URLs with wildcards │ └── Return grouped results │ └── ORIGINAL MODE (unchanged) ├── Hash lookup ├── Store lookup └── Return URLs for file ``` ## Flow Diagram: URL Search ``` User Input │ v get-url -url "youtube.com*" │ v _normalize_url_for_search() │ Strips: https://, http://, www. │ Result: "youtube.com*" (unchanged, already normalized) v _search_urls_across_stores() │ ├─→ Store 1 (Hydrus) │ ├─→ search("*", limit=1000) │ ├─→ get_url(file_hash) for each file │ └─→ _match_url_pattern() for each URL │ ├─→ Store 2 (Folder) │ ├─→ search("*", limit=1000) │ ├─→ get_url(file_hash) for each file │ └─→ _match_url_pattern() for each URL │ └─→ ...more stores... Matching URLs: ├─→ https://www.youtube.com/watch?v=123 ├─→ http://youtube.com/shorts/abc └─→ https://youtube.com/playlist?list=xyz Normalized for matching: ├─→ youtube.com/watch?v=123 ✓ Matches "youtube.com*" ├─→ youtube.com/shorts/abc ✓ Matches "youtube.com*" └─→ youtube.com/playlist?... ✓ Matches "youtube.com*" v Collect UrlItem results │ ├─→ UrlItem(url="https://www.youtube.com/watch?v=123", │ hash="abcd1234...", store="hydrus") │ ├─→ UrlItem(url="http://youtube.com/shorts/abc", │ hash="efgh5678...", store="folder") │ └─→ ...more items... v Group by store │ ├─→ Hydrus │ ├─→ https://www.youtube.com/watch?v=123 │ └─→ ... │ └─→ Folder ├─→ http://youtube.com/shorts/abc └─→ ... v Emit UrlItem objects for piping │ v Return exit code 0 (success) ``` ## Code Structure ``` Get_Url (class) │ ├── __init__() │ └── Register command with CLI │ ├── _normalize_url_for_search() [static] │ └── Strip protocol & www, lowercase │ ├── _match_url_pattern() [static] │ └── fnmatch with normalization │ ├── _search_urls_across_stores() [instance] │ ├── Iterate stores │ ├── Search files in store │ ├── Get URLs for each file │ ├── Apply pattern matching │ └── Return (items, stores_found) │ └── run() [main execution] ├── Check for -url flag │ ├── YES: Search mode │ │ └── _search_urls_across_stores() │ └── NO: Original mode │ └── Hash+store lookup │ └── Return exit code ``` ## Data Flow Examples ### Example 1: Search by Domain ``` Input: get-url -url "www.google.com" Normalize: "google.com" (www. stripped) Search Results: Store "hydrus": - https://www.google.com ✓ - https://google.com/search?q=hello ✓ - https://google.com/maps ✓ Store "folder": - http://google.com ✓ - https://google.com/images ✓ Output: 5 matching URLs grouped by store ``` ### Example 2: Wildcard Pattern ``` Input: get-url -url "youtube.com/watch*" Pattern: "youtube.com/watch*" Search Results: Store "hydrus": - https://www.youtube.com/watch?v=123 ✓ - https://youtube.com/watch?list=abc ✓ - https://www.youtube.com/shorts/xyz ✗ (doesn't match /watch*) Store "folder": - http://youtube.com/watch?v=456 ✓ Output: 3 matching URLs (watch only, not shorts) ``` ### Example 3: Subdomain Wildcard ``` Input: get-url -url "*.example.com*" Normalize: "*.example.com*" (already normalized) Search Results: Store "hydrus": - https://cdn.example.com/video.mp4 ✓ - https://api.example.com/endpoint ✓ - https://www.example.com ✓ - https://other.org ✗ Output: 3 matching URLs ``` ## Integration with Piping ``` # Search → Filter → Add Tag get-url -url "youtube.com*" | add-tag -tag "video-source" # Search → Count get-url -url "reddit.com*" | wc -l # Search → Export get-url -url "github.com*" > github_urls.txt ``` ## Error Handling Flow ``` get-url -url "pattern" │ ├─→ No stores configured? │ └─→ Log "Error: No stores configured" │ └─→ Return exit code 1 │ ├─→ Store search fails? │ └─→ Log error, skip store, continue │ ├─→ No matches found? │ └─→ Log "No urls matching pattern" │ └─→ Return exit code 1 │ └─→ Matches found? └─→ Return exit code 0 ``` ## Performance Considerations 1. **Store Iteration**: Loops through all configured stores 2. **File Scanning**: Each store searches up to 1000 files 3. **URL Matching**: Each URL tested against pattern (fnmatch - O(n) per URL) 4. **Memory**: Stores all matching items in memory before display Optimization opportunities: - Cache store results - Limit search scope with --store flag - Early exit with --limit N - Pagination support ## Backward Compatibility Original mode (unchanged): ``` @1 | get-url │ └─→ No -url flag └─→ Use original logic ├─→ Get hash from result ├─→ Get store from result or args ├─→ Call backend.get_url(hash) └─→ Return URLs for that file ``` All original functionality preserved. New -url flag is additive only.