235 lines
5.8 KiB
Markdown
235 lines
5.8 KiB
Markdown
|
|
# get-url Architecture & Flow
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The enhanced `get-url` command supports two modes:
|
||
|
|
|
||
|
|
```
|
||
|
|
get-url
|
||
|
|
├── SEARCH MODE (new)
|
||
|
|
│ └── -url "pattern"
|
||
|
|
│ ├── Normalize pattern (strip protocol, www)
|
||
|
|
│ ├── Search all stores
|
||
|
|
│ ├── Match URLs with wildcards
|
||
|
|
│ └── Return grouped results
|
||
|
|
│
|
||
|
|
└── ORIGINAL MODE (unchanged)
|
||
|
|
├── Hash lookup
|
||
|
|
├── Store lookup
|
||
|
|
└── Return URLs for file
|
||
|
|
```
|
||
|
|
|
||
|
|
## Flow Diagram: URL Search
|
||
|
|
|
||
|
|
```
|
||
|
|
User Input
|
||
|
|
│
|
||
|
|
v
|
||
|
|
get-url -url "youtube.com*"
|
||
|
|
│
|
||
|
|
v
|
||
|
|
_normalize_url_for_search()
|
||
|
|
│ Strips: https://, http://, www.
|
||
|
|
│ Result: "youtube.com*" (unchanged, already normalized)
|
||
|
|
v
|
||
|
|
_search_urls_across_stores()
|
||
|
|
│
|
||
|
|
├─→ Store 1 (Hydrus)
|
||
|
|
│ ├─→ search("*", limit=1000)
|
||
|
|
│ ├─→ get_url(file_hash) for each file
|
||
|
|
│ └─→ _match_url_pattern() for each URL
|
||
|
|
│
|
||
|
|
├─→ Store 2 (Folder)
|
||
|
|
│ ├─→ search("*", limit=1000)
|
||
|
|
│ ├─→ get_url(file_hash) for each file
|
||
|
|
│ └─→ _match_url_pattern() for each URL
|
||
|
|
│
|
||
|
|
└─→ ...more stores...
|
||
|
|
|
||
|
|
Matching URLs:
|
||
|
|
├─→ https://www.youtube.com/watch?v=123
|
||
|
|
├─→ http://youtube.com/shorts/abc
|
||
|
|
└─→ https://youtube.com/playlist?list=xyz
|
||
|
|
|
||
|
|
Normalized for matching:
|
||
|
|
├─→ youtube.com/watch?v=123 ✓ Matches "youtube.com*"
|
||
|
|
├─→ youtube.com/shorts/abc ✓ Matches "youtube.com*"
|
||
|
|
└─→ youtube.com/playlist?... ✓ Matches "youtube.com*"
|
||
|
|
|
||
|
|
v
|
||
|
|
Collect UrlItem results
|
||
|
|
│
|
||
|
|
├─→ UrlItem(url="https://www.youtube.com/watch?v=123",
|
||
|
|
│ hash="abcd1234...", store="hydrus")
|
||
|
|
│
|
||
|
|
├─→ UrlItem(url="http://youtube.com/shorts/abc",
|
||
|
|
│ hash="efgh5678...", store="folder")
|
||
|
|
│
|
||
|
|
└─→ ...more items...
|
||
|
|
|
||
|
|
v
|
||
|
|
Group by store
|
||
|
|
│
|
||
|
|
├─→ Hydrus
|
||
|
|
│ ├─→ https://www.youtube.com/watch?v=123
|
||
|
|
│ └─→ ...
|
||
|
|
│
|
||
|
|
└─→ Folder
|
||
|
|
├─→ http://youtube.com/shorts/abc
|
||
|
|
└─→ ...
|
||
|
|
|
||
|
|
v
|
||
|
|
Emit UrlItem objects for piping
|
||
|
|
│
|
||
|
|
v
|
||
|
|
Return exit code 0 (success)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Code Structure
|
||
|
|
|
||
|
|
```
|
||
|
|
Get_Url (class)
|
||
|
|
│
|
||
|
|
├── __init__()
|
||
|
|
│ └── Register command with CLI
|
||
|
|
│
|
||
|
|
├── _normalize_url_for_search() [static]
|
||
|
|
│ └── Strip protocol & www, lowercase
|
||
|
|
│
|
||
|
|
├── _match_url_pattern() [static]
|
||
|
|
│ └── fnmatch with normalization
|
||
|
|
│
|
||
|
|
├── _search_urls_across_stores() [instance]
|
||
|
|
│ ├── Iterate stores
|
||
|
|
│ ├── Search files in store
|
||
|
|
│ ├── Get URLs for each file
|
||
|
|
│ ├── Apply pattern matching
|
||
|
|
│ └── Return (items, stores_found)
|
||
|
|
│
|
||
|
|
└── run() [main execution]
|
||
|
|
├── Check for -url flag
|
||
|
|
│ ├── YES: Search mode
|
||
|
|
│ │ └── _search_urls_across_stores()
|
||
|
|
│ └── NO: Original mode
|
||
|
|
│ └── Hash+store lookup
|
||
|
|
│
|
||
|
|
└── Return exit code
|
||
|
|
```
|
||
|
|
|
||
|
|
## Data Flow Examples
|
||
|
|
|
||
|
|
### Example 1: Search by Domain
|
||
|
|
```
|
||
|
|
Input: get-url -url "www.google.com"
|
||
|
|
|
||
|
|
Normalize: "google.com" (www. stripped)
|
||
|
|
|
||
|
|
Search Results:
|
||
|
|
Store "hydrus":
|
||
|
|
- https://www.google.com ✓
|
||
|
|
- https://google.com/search?q=hello ✓
|
||
|
|
- https://google.com/maps ✓
|
||
|
|
|
||
|
|
Store "folder":
|
||
|
|
- http://google.com ✓
|
||
|
|
- https://google.com/images ✓
|
||
|
|
|
||
|
|
Output: 5 matching URLs grouped by store
|
||
|
|
```
|
||
|
|
|
||
|
|
### Example 2: Wildcard Pattern
|
||
|
|
```
|
||
|
|
Input: get-url -url "youtube.com/watch*"
|
||
|
|
|
||
|
|
Pattern: "youtube.com/watch*"
|
||
|
|
|
||
|
|
Search Results:
|
||
|
|
Store "hydrus":
|
||
|
|
- https://www.youtube.com/watch?v=123 ✓
|
||
|
|
- https://youtube.com/watch?list=abc ✓
|
||
|
|
- https://www.youtube.com/shorts/xyz ✗ (doesn't match /watch*)
|
||
|
|
|
||
|
|
Store "folder":
|
||
|
|
- http://youtube.com/watch?v=456 ✓
|
||
|
|
|
||
|
|
Output: 3 matching URLs (watch only, not shorts)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Example 3: Subdomain Wildcard
|
||
|
|
```
|
||
|
|
Input: get-url -url "*.example.com*"
|
||
|
|
|
||
|
|
Normalize: "*.example.com*" (already normalized)
|
||
|
|
|
||
|
|
Search Results:
|
||
|
|
Store "hydrus":
|
||
|
|
- https://cdn.example.com/video.mp4 ✓
|
||
|
|
- https://api.example.com/endpoint ✓
|
||
|
|
- https://www.example.com ✓
|
||
|
|
- https://other.org ✗
|
||
|
|
|
||
|
|
Output: 3 matching URLs
|
||
|
|
```
|
||
|
|
|
||
|
|
## Integration with Piping
|
||
|
|
|
||
|
|
```
|
||
|
|
# Search → Filter → Add Tag
|
||
|
|
get-url -url "youtube.com*" | add-tag -tag "video-source"
|
||
|
|
|
||
|
|
# Search → Count
|
||
|
|
get-url -url "reddit.com*" | wc -l
|
||
|
|
|
||
|
|
# Search → Export
|
||
|
|
get-url -url "github.com*" > github_urls.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
## Error Handling Flow
|
||
|
|
|
||
|
|
```
|
||
|
|
get-url -url "pattern"
|
||
|
|
│
|
||
|
|
├─→ No stores configured?
|
||
|
|
│ └─→ Log "Error: No stores configured"
|
||
|
|
│ └─→ Return exit code 1
|
||
|
|
│
|
||
|
|
├─→ Store search fails?
|
||
|
|
│ └─→ Log error, skip store, continue
|
||
|
|
│
|
||
|
|
├─→ No matches found?
|
||
|
|
│ └─→ Log "No urls matching pattern"
|
||
|
|
│ └─→ Return exit code 1
|
||
|
|
│
|
||
|
|
└─→ Matches found?
|
||
|
|
└─→ Return exit code 0
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Considerations
|
||
|
|
|
||
|
|
1. **Store Iteration**: Loops through all configured stores
|
||
|
|
2. **File Scanning**: Each store searches up to 1000 files
|
||
|
|
3. **URL Matching**: Each URL tested against pattern (fnmatch - O(n) per URL)
|
||
|
|
4. **Memory**: Stores all matching items in memory before display
|
||
|
|
|
||
|
|
Optimization opportunities:
|
||
|
|
- Cache store results
|
||
|
|
- Limit search scope with --store flag
|
||
|
|
- Early exit with --limit N
|
||
|
|
- Pagination support
|
||
|
|
|
||
|
|
## Backward Compatibility
|
||
|
|
|
||
|
|
Original mode (unchanged):
|
||
|
|
```
|
||
|
|
@1 | get-url
|
||
|
|
│
|
||
|
|
└─→ No -url flag
|
||
|
|
└─→ Use original logic
|
||
|
|
├─→ Get hash from result
|
||
|
|
├─→ Get store from result or args
|
||
|
|
├─→ Call backend.get_url(hash)
|
||
|
|
└─→ Return URLs for that file
|
||
|
|
```
|
||
|
|
|
||
|
|
All original functionality preserved. New -url flag is additive only.
|