This commit is contained in:
2025-12-30 04:47:13 -08:00
parent 756b024b76
commit 925a1631bc
15 changed files with 1083 additions and 625 deletions

View File

@@ -1,134 +0,0 @@
# Bootstrapping the development environment
This project includes convenience scripts to create a Python virtual environment, install the package, and (optionally) create OS shortcuts.
Files:
- `scripts/bootstrap.ps1` — PowerShell script for Windows (creates venv, installs, optional Desktop/Start Menu shortcuts)
- `scripts/bootstrap.sh` — POSIX shell script (Linux/macOS) (creates venv, installs, optional desktop launcher)
Quick examples
Windows (PowerShell):
```powershell
# Create a .venv, install in editable mode and add a Desktop shortcut
powershell -ExecutionPolicy Bypass -File .\scripts\bootstrap.ps1 -Editable -CreateDesktopShortcut
# Use a specific python.exe and force overwrite
powershell -ExecutionPolicy Bypass -File .\scripts\bootstrap.ps1 -Python "C:\\Python39\\python.exe" -Force
```
Linux/macOS (bash):
```bash
# Create a .venv and install the project in editable mode
./scripts/bootstrap.sh --editable
# Create a desktop entry (GNU/Linux)
./scripts/bootstrap.sh --editable --desktop
```
Notes
- On Windows you may need to run PowerShell with an appropriate ExecutionPolicy (example shows using `-ExecutionPolicy Bypass`).
- The scripts default to a venv directory named `.venv` in the repository root. Use `-VenvPath` (PowerShell) or `--venv` (bash) to choose a different directory.
- The scripts will also install Playwright browser binaries by default (Chromium only) after installing Python dependencies. Use `--no-playwright` (bash) or `-NoPlaywright` (PowerShell) to opt out, or `--playwright-browsers <list>` / `-PlaywrightBrowsers <list>` to request specific engines (comma-separated, or use `all` to install all engines).
- The scripts are intended to make day-to-day developer setup easy; tweak flags for your desired install mode (editable vs normal) and shortcut preferences.
## Deno — installed by bootstrap
The bootstrap scripts will automatically install Deno if it is not already present on the system. They use the official installers and attempt to add Deno's bin directory to the PATH for the current session. If the installer completes but `deno` is not available in your shell, restart your shell or add `$HOME/.deno/bin` (Windows: `%USERPROFILE%\\.deno\\bin`) to your PATH.
Opinionated behavior
Running `python ./scripts/bootstrap.py` is intentionally opinionated: it will create a local virtual environment at `./.venv` (repo root), install Python dependencies and the project into that venv, install Playwright browsers, install Deno, and write small launcher scripts in the project root:
- `mm` (POSIX shell)
- `mm.ps1` (PowerShell)
- `mm.bat` (Windows CMD)
These launchers prefer the local `./.venv` Python and console scripts so you can run the project with `./mm` or `mm.ps1` directly from the repo root.
- When installing in editable mode from a development checkout, the bootstrap will also add a small `.pth` file to the venv's `site-packages` pointing at the repository root. This ensures top-level scripts such as `CLI.py` are importable even when using PEP 660 editable wheels (avoids having to create an egg-link by hand).
Additionally, the setup helpers install a global `mm` launcher into your user bin so you can run `mm` from any shell session:
- POSIX: `~/.local/bin/mm` (created if missing; the script attempts to add `~/.local/bin` to `PATH` by updating `~/.profile` / shell RCs if required)
- Windows: `%USERPROFILE%\bin\mm.cmd` and `%USERPROFILE%\bin\mm.ps1` (created if missing; the script attempts to add the folder to your **User** PATH)
The scripts back up any existing `mm` shims before replacing them and will print actionable messages when a shell restart is required.
Debugging the global `mm` launcher
- POSIX: set MM_DEBUG=1 and run `mm` to print runtime diagnostics (resolved REPO, VENV, and Python import checks):
```bash
MM_DEBUG=1 mm
```
- PowerShell: set and export `$env:MM_DEBUG='1'` then run `mm.ps1` or the installed `mm` shim:
```powershell
$env:MM_DEBUG = '1'
mm
```
- CMD: `set MM_DEBUG=1` then run `mm`.
These diagnostics help identify whether the global launcher is selecting the correct repository and virtual environment; please include the output when reporting launcher failures.
PowerShell (Windows):
```powershell
irm https://deno.land/install.ps1 | iex
```
Linux/macOS:
```bash
curl -fsSL https://deno.land/install.sh | sh
```
Pinning a Deno version
You can pin a Deno release by setting the `DENO_VERSION` environment variable before running the bootstrap script. Examples:
PowerShell (Windows):
```powershell
$env:DENO_VERSION = 'v1.34.3'; .\scripts\bootstrap.ps1
```
POSIX (Linux/macOS):
```bash
DENO_VERSION=v1.34.3 ./scripts/bootstrap.sh
```
If you'd like, I can also:
- Add a short README section in `readme.md` referencing this doc, or
- Add a small icon and polish Linux desktop entries with an icon path.
## Troubleshooting: urllib3 / urllib3-future conflicts ⚠️
On some environments a third-party package (for example `urllib3-future`) may
install a site-packages hook that interferes with the real `urllib3` package.
When this happens you might see errors like:
Error importing cmdlet 'get_tag': No module named 'urllib3.exceptions'
The bootstrap scripts now run a verification step after installing dependencies
and will stop if a broken `urllib3` is detected to avoid leaving you with a
partially broken venv.
Recommended fix (activate the venv first or use the venv python explicitly):
PowerShell / Windows (from repo root):
.venv\Scripts\python.exe -m pip uninstall urllib3-future -y
.venv\Scripts\python.exe -m pip install --upgrade --force-reinstall urllib3
.venv\Scripts\python.exe -m pip install niquests -U
POSIX (Linux/macOS):
.venv/bin/python -m pip uninstall urllib3-future -y
.venv/bin/python -m pip install --upgrade --force-reinstall urllib3
.venv/bin/python -m pip install niquests -U
If problems persist, re-run the bootstrap script after applying the fixes.

View File

@@ -1,234 +0,0 @@
# get-url Architecture & Flow
## Overview
The enhanced `get-url` command supports two modes:
```
get-url
├── SEARCH MODE (new)
│ └── -url "pattern"
│ ├── Normalize pattern (strip protocol, www)
│ ├── Search all stores
│ ├── Match URLs with wildcards
│ └── Return grouped results
└── ORIGINAL MODE (unchanged)
├── Hash lookup
├── Store lookup
└── Return URLs for file
```
## Flow Diagram: URL Search
```
User Input
v
get-url -url "youtube.com*"
v
_normalize_url_for_search()
│ Strips: https://, http://, www.
│ Result: "youtube.com*" (unchanged, already normalized)
v
_search_urls_across_stores()
├─→ Store 1 (Hydrus)
│ ├─→ search("*", limit=1000)
│ ├─→ get_url(file_hash) for each file
│ └─→ _match_url_pattern() for each URL
├─→ Store 2 (Folder)
│ ├─→ search("*", limit=1000)
│ ├─→ get_url(file_hash) for each file
│ └─→ _match_url_pattern() for each URL
└─→ ...more stores...
Matching URLs:
├─→ https://www.youtube.com/watch?v=123
├─→ http://youtube.com/shorts/abc
└─→ https://youtube.com/playlist?list=xyz
Normalized for matching:
├─→ youtube.com/watch?v=123 ✓ Matches "youtube.com*"
├─→ youtube.com/shorts/abc ✓ Matches "youtube.com*"
└─→ youtube.com/playlist?... ✓ Matches "youtube.com*"
v
Collect UrlItem results
├─→ UrlItem(url="https://www.youtube.com/watch?v=123",
│ hash="abcd1234...", store="hydrus")
├─→ UrlItem(url="http://youtube.com/shorts/abc",
│ hash="efgh5678...", store="folder")
└─→ ...more items...
v
Group by store
├─→ Hydrus
│ ├─→ https://www.youtube.com/watch?v=123
│ └─→ ...
└─→ Folder
├─→ http://youtube.com/shorts/abc
└─→ ...
v
Emit UrlItem objects for piping
v
Return exit code 0 (success)
```
## Code Structure
```
Get_Url (class)
├── __init__()
│ └── Register command with CLI
├── _normalize_url_for_search() [static]
│ └── Strip protocol & www, lowercase
├── _match_url_pattern() [static]
│ └── fnmatch with normalization
├── _search_urls_across_stores() [instance]
│ ├── Iterate stores
│ ├── Search files in store
│ ├── Get URLs for each file
│ ├── Apply pattern matching
│ └── Return (items, stores_found)
└── run() [main execution]
├── Check for -url flag
│ ├── YES: Search mode
│ │ └── _search_urls_across_stores()
│ └── NO: Original mode
│ └── Hash+store lookup
└── Return exit code
```
## Data Flow Examples
### Example 1: Search by Domain
```
Input: get-url -url "www.google.com"
Normalize: "google.com" (www. stripped)
Search Results:
Store "hydrus":
- https://www.google.com ✓
- https://google.com/search?q=hello ✓
- https://google.com/maps ✓
Store "folder":
- http://google.com ✓
- https://google.com/images ✓
Output: 5 matching URLs grouped by store
```
### Example 2: Wildcard Pattern
```
Input: get-url -url "youtube.com/watch*"
Pattern: "youtube.com/watch*"
Search Results:
Store "hydrus":
- https://www.youtube.com/watch?v=123 ✓
- https://youtube.com/watch?list=abc ✓
- https://www.youtube.com/shorts/xyz ✗ (doesn't match /watch*)
Store "folder":
- http://youtube.com/watch?v=456 ✓
Output: 3 matching URLs (watch only, not shorts)
```
### Example 3: Subdomain Wildcard
```
Input: get-url -url "*.example.com*"
Normalize: "*.example.com*" (already normalized)
Search Results:
Store "hydrus":
- https://cdn.example.com/video.mp4 ✓
- https://api.example.com/endpoint ✓
- https://www.example.com ✓
- https://other.org ✗
Output: 3 matching URLs
```
## Integration with Piping
```
# Search → Filter → Add Tag
get-url -url "youtube.com*" | add-tag -tag "video-source"
# Search → Count
get-url -url "reddit.com*" | wc -l
# Search → Export
get-url -url "github.com*" > github_urls.txt
```
## Error Handling Flow
```
get-url -url "pattern"
├─→ No stores configured?
│ └─→ Log "Error: No stores configured"
│ └─→ Return exit code 1
├─→ Store search fails?
│ └─→ Log error, skip store, continue
├─→ No matches found?
│ └─→ Log "No urls matching pattern"
│ └─→ Return exit code 1
└─→ Matches found?
└─→ Return exit code 0
```
## Performance Considerations
1. **Store Iteration**: Loops through all configured stores
2. **File Scanning**: Each store searches up to 1000 files
3. **URL Matching**: Each URL tested against pattern (fnmatch - O(n) per URL)
4. **Memory**: Stores all matching items in memory before display
Optimization opportunities:
- Cache store results
- Limit search scope with --store flag
- Early exit with --limit N
- Pagination support
## Backward Compatibility
Original mode (unchanged):
```
@1 | get-url
└─→ No -url flag
└─→ Use original logic
├─→ Get hash from result
├─→ Get store from result or args
├─→ Call backend.get_url(hash)
└─→ Return URLs for that file
```
All original functionality preserved. New -url flag is additive only.

View File

@@ -1,76 +0,0 @@
# Quick Reference: get-url URL Search
## Basic Syntax
```bash
# Search mode (new)
get-url -url "pattern"
# Original mode (unchanged)
@1 | get-url
```
## Examples
### Exact domain match
```bash
get-url -url "google.com"
```
Matches: `https://www.google.com`, `http://google.com/search`, `https://google.com/maps`
### YouTube URL search
```bash
get-url -url "https://www.youtube.com/watch?v=xx_88TDWmEs"
```
Normalizes to: `youtube.com/watch?v=xx_88tdwmes`
Matches: Any video with same ID across different protocols
### Wildcard domain
```bash
get-url -url "youtube.com*"
```
Matches: All YouTube URLs (videos, shorts, playlists, etc.)
### Subdomain wildcard
```bash
get-url -url "*.example.com*"
```
Matches: `cdn.example.com`, `api.example.com`, `www.example.com`
### Specific path pattern
```bash
get-url -url "youtube.com/watch*"
```
Matches: Only YouTube watch URLs (not shorts or playlists)
### Single character wildcard
```bash
get-url -url "example.com/file?.mp4"
```
Matches: `example.com/file1.mp4`, `example.com/fileA.mp4` (not `file12.mp4`)
## How It Works
1. **Normalization**: Strips `https://`, `www.` prefix from pattern and all URLs
2. **Pattern Matching**: Uses `*` and `?` wildcards (case-insensitive)
3. **Search**: Scans all configured stores for matching URLs
4. **Results**: Groups matches by store, shows URL and hash
## Return Values
- Exit code **0** if matches found
- Exit code **1** if no matches or error
## Piping Results
```bash
get-url -url "youtube.com*" | grep -i video
get-url -url "example.com*" | add-tag -tag "external-source"
```
## Common Patterns
| Pattern | Matches | Notes |
|---------|---------|-------|
| `google.com` | Google URLs | Exact domain (after normalization) |
| `youtube.com*` | All YouTube | Wildcard at end |
| `*.example.com*` | Subdomains | Wildcard at start and end |
| `github.com/user*` | User repos | Path pattern |
| `reddit.com/r/*` | Subreddit | Path with wildcard |

View File

@@ -1,91 +0,0 @@
# get-url Enhanced URL Search
The `get-url` command now supports searching for URLs across all stores with automatic protocol and `www` prefix stripping.
## Features
### 1. **Protocol Stripping**
URLs are normalized by removing:
- Protocol prefixes: `https://`, `http://`, `ftp://`, etc.
- `www.` prefix (case-insensitive)
### 2. **Wildcard Matching**
Patterns support standard wildcards:
- `*` - matches any sequence of characters
- `?` - matches any single character
### 3. **Case-Insensitive Matching**
All matching is case-insensitive for domains and paths
## Usage Examples
### Search by full domain
```bash
get-url -url "www.google.com"
# Matches:
# - https://www.google.com
# - http://google.com/search
# - https://google.com/maps
```
### Search with YouTube example
```bash
get-url -url "https://www.youtube.com/watch?v=xx_88TDWmEs"
# Becomes: youtube.com/watch?v=xx_88tdwmes
# Matches:
# - https://www.youtube.com/watch?v=xx_88TDWmEs
# - http://youtube.com/watch?v=xx_88TDWmEs
```
### Domain wildcard matching
```bash
get-url -url "youtube.com*"
# Matches any URL starting with youtube.com:
# - https://www.youtube.com/watch?v=123
# - https://youtube.com/shorts/abc
# - http://youtube.com/playlist?list=xyz
```
### Subdomain matching
```bash
get-url -url "*example.com*"
# Matches:
# - https://cdn.example.com/file.mp4
# - https://www.example.com
# - https://api.example.com/endpoint
```
### Specific path matching
```bash
get-url -url "youtube.com/watch*"
# Matches:
# - https://www.youtube.com/watch?v=123
# - http://youtube.com/watch?list=abc
# Does NOT match:
# - https://youtube.com/shorts/abc
```
## Get URLs for Specific File
The original functionality is still supported:
```bash
@1 | get-url
# Requires hash and store from piped result
```
## Output
Results are organized by store and show:
- **Store**: Backend name (hydrus, folder, etc.)
- **Url**: The full matched URL
- **Hash**: First 16 characters of the file hash (for compactness)
## Implementation Details
The search:
1. Iterates through all configured stores
2. Searches for all files in each store (limit 1000 per store)
3. Retrieves URLs for each file
4. Applies pattern matching with normalization
5. Returns results grouped by store
6. Emits `UrlItem` objects for piping to other commands

View File

@@ -1,8 +0,0 @@
Known issues and brief remediation steps
- urllib3 / urllib3-future conflict
- Symptom: `No module named 'urllib3.exceptions'` or missing `urllib3.__version__`.
- Root cause: a `.pth` file or packaging hook from `urllib3-future` may mutate the
`urllib3` namespace in incompatible ways.
- Remediation: uninstall `urllib3-future`, reinstall `urllib3`, and re-install
`niquests` if required. See `docs/ISSUES/urllib3-future.md` for more details.

View File

@@ -1,13 +0,0 @@
# Obtain cookies.txt for youtube.com
1. You need a google account, throwaway is fine
2. You need webbrowser extension Get cookies.txt LOCALLY
Chrome based browser: [cookies.txt LOCALLY](https://chromewebstore.google.com/detail/get-cookiestxt-locally/cclelndahbckbenkjhflpdbgdldlbecc)
Firefox based browser: [cookies.txt LOCALLY](https://addons.mozilla.org/en-US/firefox/addon/get-cookies-txt-locally/)
3. open incognito tab and sign into youtube with your account
4. open extension and click on "export all cookies"
5. with the cookies.txt file produced, place that in the project folder
restart the medios-macina app and verify status for cookies is FOUND

View File

@@ -1,41 +0,0 @@
1. open shell prompt to a good spot for hydrusnetwork i.e. C:\hydrusnetwork
2. send command "git clone https://github.com/hydrusnetwork/hydrus"
3. send command "cd hydrus"
4. send command "python -m venv .venv"
---------------------------------------------------
5. Windows
1. send command ".\.venv\Scripts\Activate.ps1"
5. Linux
1. send command "source .venv/bin/activate"
--------------------------------------------------
your commandline should have (.venv) infront of it now
5. send command "pip install -r requirements.txt"
6. send command "python hydrus_client.py"
---------------------------------------------------
the gui application should be opened now
7.in the top menu click on services > manage services > double-click "client api"
8.check the boxes
X run the client api?
X allow non-local connections
X supports CORS headers
click apply
9.click on services > review services > click on "client api"
10. click "Add" > manually > change "new api permissions" to "medios"
11. click apply > click "copy api access key", click "open client api base url"
--------------------------------------------
edit the below and place in your config.conf
<figure>
<figcaption>config.conf</figcaption>
<pre><code class="language-powershell">[store=hydrusnetwork]
NAME="shortnamenospacesorsymbols"
API="apiaccesskeygoeshere"
URL="apibaseurlgoeshere"
</code></pre>
</figure>

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB