df
Some checks failed
smoke-mm / Install & smoke test mm --help (push) Has been cancelled
Some checks failed
smoke-mm / Install & smoke test mm --help (push) Has been cancelled
This commit is contained in:
234
docs/GET_URL_ARCHITECTURE.md
Normal file
234
docs/GET_URL_ARCHITECTURE.md
Normal file
@@ -0,0 +1,234 @@
|
||||
# get-url Architecture & Flow
|
||||
|
||||
## Overview
|
||||
|
||||
The enhanced `get-url` command supports two modes:
|
||||
|
||||
```
|
||||
get-url
|
||||
├── SEARCH MODE (new)
|
||||
│ └── -url "pattern"
|
||||
│ ├── Normalize pattern (strip protocol, www)
|
||||
│ ├── Search all stores
|
||||
│ ├── Match URLs with wildcards
|
||||
│ └── Return grouped results
|
||||
│
|
||||
└── ORIGINAL MODE (unchanged)
|
||||
├── Hash lookup
|
||||
├── Store lookup
|
||||
└── Return URLs for file
|
||||
```
|
||||
|
||||
## Flow Diagram: URL Search
|
||||
|
||||
```
|
||||
User Input
|
||||
│
|
||||
v
|
||||
get-url -url "youtube.com*"
|
||||
│
|
||||
v
|
||||
_normalize_url_for_search()
|
||||
│ Strips: https://, http://, www.
|
||||
│ Result: "youtube.com*" (unchanged, already normalized)
|
||||
v
|
||||
_search_urls_across_stores()
|
||||
│
|
||||
├─→ Store 1 (Hydrus)
|
||||
│ ├─→ search("*", limit=1000)
|
||||
│ ├─→ get_url(file_hash) for each file
|
||||
│ └─→ _match_url_pattern() for each URL
|
||||
│
|
||||
├─→ Store 2 (Folder)
|
||||
│ ├─→ search("*", limit=1000)
|
||||
│ ├─→ get_url(file_hash) for each file
|
||||
│ └─→ _match_url_pattern() for each URL
|
||||
│
|
||||
└─→ ...more stores...
|
||||
|
||||
Matching URLs:
|
||||
├─→ https://www.youtube.com/watch?v=123
|
||||
├─→ http://youtube.com/shorts/abc
|
||||
└─→ https://youtube.com/playlist?list=xyz
|
||||
|
||||
Normalized for matching:
|
||||
├─→ youtube.com/watch?v=123 ✓ Matches "youtube.com*"
|
||||
├─→ youtube.com/shorts/abc ✓ Matches "youtube.com*"
|
||||
└─→ youtube.com/playlist?... ✓ Matches "youtube.com*"
|
||||
|
||||
v
|
||||
Collect UrlItem results
|
||||
│
|
||||
├─→ UrlItem(url="https://www.youtube.com/watch?v=123",
|
||||
│ hash="abcd1234...", store="hydrus")
|
||||
│
|
||||
├─→ UrlItem(url="http://youtube.com/shorts/abc",
|
||||
│ hash="efgh5678...", store="folder")
|
||||
│
|
||||
└─→ ...more items...
|
||||
|
||||
v
|
||||
Group by store
|
||||
│
|
||||
├─→ Hydrus
|
||||
│ ├─→ https://www.youtube.com/watch?v=123
|
||||
│ └─→ ...
|
||||
│
|
||||
└─→ Folder
|
||||
├─→ http://youtube.com/shorts/abc
|
||||
└─→ ...
|
||||
|
||||
v
|
||||
Emit UrlItem objects for piping
|
||||
│
|
||||
v
|
||||
Return exit code 0 (success)
|
||||
```
|
||||
|
||||
## Code Structure
|
||||
|
||||
```
|
||||
Get_Url (class)
|
||||
│
|
||||
├── __init__()
|
||||
│ └── Register command with CLI
|
||||
│
|
||||
├── _normalize_url_for_search() [static]
|
||||
│ └── Strip protocol & www, lowercase
|
||||
│
|
||||
├── _match_url_pattern() [static]
|
||||
│ └── fnmatch with normalization
|
||||
│
|
||||
├── _search_urls_across_stores() [instance]
|
||||
│ ├── Iterate stores
|
||||
│ ├── Search files in store
|
||||
│ ├── Get URLs for each file
|
||||
│ ├── Apply pattern matching
|
||||
│ └── Return (items, stores_found)
|
||||
│
|
||||
└── run() [main execution]
|
||||
├── Check for -url flag
|
||||
│ ├── YES: Search mode
|
||||
│ │ └── _search_urls_across_stores()
|
||||
│ └── NO: Original mode
|
||||
│ └── Hash+store lookup
|
||||
│
|
||||
└── Return exit code
|
||||
```
|
||||
|
||||
## Data Flow Examples
|
||||
|
||||
### Example 1: Search by Domain
|
||||
```
|
||||
Input: get-url -url "www.google.com"
|
||||
|
||||
Normalize: "google.com" (www. stripped)
|
||||
|
||||
Search Results:
|
||||
Store "hydrus":
|
||||
- https://www.google.com ✓
|
||||
- https://google.com/search?q=hello ✓
|
||||
- https://google.com/maps ✓
|
||||
|
||||
Store "folder":
|
||||
- http://google.com ✓
|
||||
- https://google.com/images ✓
|
||||
|
||||
Output: 5 matching URLs grouped by store
|
||||
```
|
||||
|
||||
### Example 2: Wildcard Pattern
|
||||
```
|
||||
Input: get-url -url "youtube.com/watch*"
|
||||
|
||||
Pattern: "youtube.com/watch*"
|
||||
|
||||
Search Results:
|
||||
Store "hydrus":
|
||||
- https://www.youtube.com/watch?v=123 ✓
|
||||
- https://youtube.com/watch?list=abc ✓
|
||||
- https://www.youtube.com/shorts/xyz ✗ (doesn't match /watch*)
|
||||
|
||||
Store "folder":
|
||||
- http://youtube.com/watch?v=456 ✓
|
||||
|
||||
Output: 3 matching URLs (watch only, not shorts)
|
||||
```
|
||||
|
||||
### Example 3: Subdomain Wildcard
|
||||
```
|
||||
Input: get-url -url "*.example.com*"
|
||||
|
||||
Normalize: "*.example.com*" (already normalized)
|
||||
|
||||
Search Results:
|
||||
Store "hydrus":
|
||||
- https://cdn.example.com/video.mp4 ✓
|
||||
- https://api.example.com/endpoint ✓
|
||||
- https://www.example.com ✓
|
||||
- https://other.org ✗
|
||||
|
||||
Output: 3 matching URLs
|
||||
```
|
||||
|
||||
## Integration with Piping
|
||||
|
||||
```
|
||||
# Search → Filter → Add Tag
|
||||
get-url -url "youtube.com*" | add-tag -tag "video-source"
|
||||
|
||||
# Search → Count
|
||||
get-url -url "reddit.com*" | wc -l
|
||||
|
||||
# Search → Export
|
||||
get-url -url "github.com*" > github_urls.txt
|
||||
```
|
||||
|
||||
## Error Handling Flow
|
||||
|
||||
```
|
||||
get-url -url "pattern"
|
||||
│
|
||||
├─→ No stores configured?
|
||||
│ └─→ Log "Error: No stores configured"
|
||||
│ └─→ Return exit code 1
|
||||
│
|
||||
├─→ Store search fails?
|
||||
│ └─→ Log error, skip store, continue
|
||||
│
|
||||
├─→ No matches found?
|
||||
│ └─→ Log "No urls matching pattern"
|
||||
│ └─→ Return exit code 1
|
||||
│
|
||||
└─→ Matches found?
|
||||
└─→ Return exit code 0
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
1. **Store Iteration**: Loops through all configured stores
|
||||
2. **File Scanning**: Each store searches up to 1000 files
|
||||
3. **URL Matching**: Each URL tested against pattern (fnmatch - O(n) per URL)
|
||||
4. **Memory**: Stores all matching items in memory before display
|
||||
|
||||
Optimization opportunities:
|
||||
- Cache store results
|
||||
- Limit search scope with --store flag
|
||||
- Early exit with --limit N
|
||||
- Pagination support
|
||||
|
||||
## Backward Compatibility
|
||||
|
||||
Original mode (unchanged):
|
||||
```
|
||||
@1 | get-url
|
||||
│
|
||||
└─→ No -url flag
|
||||
└─→ Use original logic
|
||||
├─→ Get hash from result
|
||||
├─→ Get store from result or args
|
||||
├─→ Call backend.get_url(hash)
|
||||
└─→ Return URLs for that file
|
||||
```
|
||||
|
||||
All original functionality preserved. New -url flag is additive only.
|
||||
76
docs/GET_URL_QUICK_REF.md
Normal file
76
docs/GET_URL_QUICK_REF.md
Normal file
@@ -0,0 +1,76 @@
|
||||
# Quick Reference: get-url URL Search
|
||||
|
||||
## Basic Syntax
|
||||
```bash
|
||||
# Search mode (new)
|
||||
get-url -url "pattern"
|
||||
|
||||
# Original mode (unchanged)
|
||||
@1 | get-url
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Exact domain match
|
||||
```bash
|
||||
get-url -url "google.com"
|
||||
```
|
||||
Matches: `https://www.google.com`, `http://google.com/search`, `https://google.com/maps`
|
||||
|
||||
### YouTube URL search
|
||||
```bash
|
||||
get-url -url "https://www.youtube.com/watch?v=xx_88TDWmEs"
|
||||
```
|
||||
Normalizes to: `youtube.com/watch?v=xx_88tdwmes`
|
||||
Matches: Any video with same ID across different protocols
|
||||
|
||||
### Wildcard domain
|
||||
```bash
|
||||
get-url -url "youtube.com*"
|
||||
```
|
||||
Matches: All YouTube URLs (videos, shorts, playlists, etc.)
|
||||
|
||||
### Subdomain wildcard
|
||||
```bash
|
||||
get-url -url "*.example.com*"
|
||||
```
|
||||
Matches: `cdn.example.com`, `api.example.com`, `www.example.com`
|
||||
|
||||
### Specific path pattern
|
||||
```bash
|
||||
get-url -url "youtube.com/watch*"
|
||||
```
|
||||
Matches: Only YouTube watch URLs (not shorts or playlists)
|
||||
|
||||
### Single character wildcard
|
||||
```bash
|
||||
get-url -url "example.com/file?.mp4"
|
||||
```
|
||||
Matches: `example.com/file1.mp4`, `example.com/fileA.mp4` (not `file12.mp4`)
|
||||
|
||||
## How It Works
|
||||
|
||||
1. **Normalization**: Strips `https://`, `www.` prefix from pattern and all URLs
|
||||
2. **Pattern Matching**: Uses `*` and `?` wildcards (case-insensitive)
|
||||
3. **Search**: Scans all configured stores for matching URLs
|
||||
4. **Results**: Groups matches by store, shows URL and hash
|
||||
|
||||
## Return Values
|
||||
- Exit code **0** if matches found
|
||||
- Exit code **1** if no matches or error
|
||||
|
||||
## Piping Results
|
||||
```bash
|
||||
get-url -url "youtube.com*" | grep -i video
|
||||
get-url -url "example.com*" | add-tag -tag "external-source"
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
| Pattern | Matches | Notes |
|
||||
|---------|---------|-------|
|
||||
| `google.com` | Google URLs | Exact domain (after normalization) |
|
||||
| `youtube.com*` | All YouTube | Wildcard at end |
|
||||
| `*.example.com*` | Subdomains | Wildcard at start and end |
|
||||
| `github.com/user*` | User repos | Path pattern |
|
||||
| `reddit.com/r/*` | Subreddit | Path with wildcard |
|
||||
91
docs/GET_URL_SEARCH.md
Normal file
91
docs/GET_URL_SEARCH.md
Normal file
@@ -0,0 +1,91 @@
|
||||
# get-url Enhanced URL Search
|
||||
|
||||
The `get-url` command now supports searching for URLs across all stores with automatic protocol and `www` prefix stripping.
|
||||
|
||||
## Features
|
||||
|
||||
### 1. **Protocol Stripping**
|
||||
URLs are normalized by removing:
|
||||
- Protocol prefixes: `https://`, `http://`, `ftp://`, etc.
|
||||
- `www.` prefix (case-insensitive)
|
||||
|
||||
### 2. **Wildcard Matching**
|
||||
Patterns support standard wildcards:
|
||||
- `*` - matches any sequence of characters
|
||||
- `?` - matches any single character
|
||||
|
||||
### 3. **Case-Insensitive Matching**
|
||||
All matching is case-insensitive for domains and paths
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Search by full domain
|
||||
```bash
|
||||
get-url -url "www.google.com"
|
||||
# Matches:
|
||||
# - https://www.google.com
|
||||
# - http://google.com/search
|
||||
# - https://google.com/maps
|
||||
```
|
||||
|
||||
### Search with YouTube example
|
||||
```bash
|
||||
get-url -url "https://www.youtube.com/watch?v=xx_88TDWmEs"
|
||||
# Becomes: youtube.com/watch?v=xx_88tdwmes
|
||||
# Matches:
|
||||
# - https://www.youtube.com/watch?v=xx_88TDWmEs
|
||||
# - http://youtube.com/watch?v=xx_88TDWmEs
|
||||
```
|
||||
|
||||
### Domain wildcard matching
|
||||
```bash
|
||||
get-url -url "youtube.com*"
|
||||
# Matches any URL starting with youtube.com:
|
||||
# - https://www.youtube.com/watch?v=123
|
||||
# - https://youtube.com/shorts/abc
|
||||
# - http://youtube.com/playlist?list=xyz
|
||||
```
|
||||
|
||||
### Subdomain matching
|
||||
```bash
|
||||
get-url -url "*example.com*"
|
||||
# Matches:
|
||||
# - https://cdn.example.com/file.mp4
|
||||
# - https://www.example.com
|
||||
# - https://api.example.com/endpoint
|
||||
```
|
||||
|
||||
### Specific path matching
|
||||
```bash
|
||||
get-url -url "youtube.com/watch*"
|
||||
# Matches:
|
||||
# - https://www.youtube.com/watch?v=123
|
||||
# - http://youtube.com/watch?list=abc
|
||||
# Does NOT match:
|
||||
# - https://youtube.com/shorts/abc
|
||||
```
|
||||
|
||||
## Get URLs for Specific File
|
||||
|
||||
The original functionality is still supported:
|
||||
```bash
|
||||
@1 | get-url
|
||||
# Requires hash and store from piped result
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
Results are organized by store and show:
|
||||
- **Store**: Backend name (hydrus, folder, etc.)
|
||||
- **Url**: The full matched URL
|
||||
- **Hash**: First 16 characters of the file hash (for compactness)
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The search:
|
||||
1. Iterates through all configured stores
|
||||
2. Searches for all files in each store (limit 1000 per store)
|
||||
3. Retrieves URLs for each file
|
||||
4. Applies pattern matching with normalization
|
||||
5. Returns results grouped by store
|
||||
6. Emits `UrlItem` objects for piping to other commands
|
||||
Reference in New Issue
Block a user