refactor(download): remove ProviderCore/download.py, move sanitize_filename to SYS.utils, replace callers to use API.HTTP.HTTPClient
This commit is contained in:
9
docs/CHANGELOG.md
Normal file
9
docs/CHANGELOG.md
Normal file
@@ -0,0 +1,9 @@
|
||||
# Changelog
|
||||
|
||||
## Unreleased (2026-01-05)
|
||||
|
||||
- **docs:** Add `docs/provider_authoring.md` with a Quick Start, examples, and testing guidance for providers that integrate with the strict `ResultTable` API (ResultModel/ColumnSpec/selection_fn).
|
||||
- **docs:** Add link to `docs/result_table.md` pointing to the provider authoring guide.
|
||||
- **tests:** Add `tests/test_provider_author_examples.py` validating example provider registration and adapter behavior.
|
||||
- **notes:** Existing example providers (`Provider/example_provider.py`, `Provider/vimm.py`) are referenced as canonical patterns.
|
||||
|
||||
15
docs/PR_PROVIDER_AUTHORING.md
Normal file
15
docs/PR_PROVIDER_AUTHORING.md
Normal file
@@ -0,0 +1,15 @@
|
||||
PR Title: docs: Add Provider authoring doc, examples, and tests
|
||||
|
||||
Summary:
|
||||
- Add `docs/provider_authoring.md` describing the strict `ResultModel`-based provider adapter pattern, `ColumnSpec` usage, `selection_fn`, and `TableProviderMixin` for HTML table scraping.
|
||||
- Link new doc from `docs/result_table.md`.
|
||||
- Add `tests/test_provider_author_examples.py` to validate `Provider/example_provider.py` and `Provider/vimm.py` integration with the registry.
|
||||
|
||||
Why:
|
||||
- Provide a short, focused Quick Start to help contributors author providers that integrate with the new strict ResultTable API.
|
||||
|
||||
Testing:
|
||||
- New tests pass locally (provider-related subset).
|
||||
|
||||
Notes:
|
||||
- The change is documentation-first and non-functional, with tests ensuring examples remain valid.
|
||||
141
docs/provider_authoring.md
Normal file
141
docs/provider_authoring.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# Provider authoring: ResultTable & provider adapters ✅
|
||||
|
||||
This short guide explains how to write providers that integrate with the *strict* ResultTable API: adapters must yield `ResultModel` instances and providers register via `SYS.result_table_adapters.register_provider` with a column specification and a `selection_fn`.
|
||||
|
||||
---
|
||||
|
||||
## Quick summary
|
||||
|
||||
- Providers register a *provider adapter* (callable that yields `ResultModel`).
|
||||
- Providers must also provide `columns` (static list or factory) and a `selection_fn` that returns CLI args for a selected row.
|
||||
- For simple HTML table/list scraping, prefer `TableProviderMixin` from `SYS.provider_helpers` to fetch and extract rows using `SYS.html_table.extract_records`.
|
||||
|
||||
## Runtime dependency policy
|
||||
|
||||
- Treat required runtime dependencies (e.g., **Playwright**) as mandatory: import them unconditionally and let missing dependencies fail fast at import time. Avoid adding per-call try/except import guards for required modules—these silently hide configuration errors and add bloat.
|
||||
- Use guarded imports only for truly optional dependencies (e.g., `pandas` for enhanced table parsing) and provide meaningful fallbacks or helpful error messages in those cases.
|
||||
- Keep provider code minimal and explicit: fail early and document required runtime dependencies in README/installation notes.
|
||||
|
||||
---
|
||||
|
||||
## Minimal provider template (copy/paste)
|
||||
|
||||
```py
|
||||
# Provider/my_provider.py
|
||||
from typing import Any, Dict, Iterable, List
|
||||
|
||||
from SYS.result_table_api import ResultModel, ColumnSpec, title_column, metadata_column
|
||||
from SYS.result_table_adapters import register_provider
|
||||
|
||||
# Example adapter: convert provider-specific items into ResultModel instances
|
||||
SAMPLE_ITEMS = [
|
||||
{"name": "Example File.pdf", "path": "https://example.com/x.pdf", "ext": "pdf", "size": 1024, "source": "myprovider"},
|
||||
]
|
||||
|
||||
def adapter(items: Iterable[Dict[str, Any]]) -> Iterable[ResultModel]:
|
||||
for it in items:
|
||||
title = it.get("name") or it.get("title") or str(it.get("path") or "")
|
||||
yield ResultModel(
|
||||
title=str(title),
|
||||
path=str(it.get("path")) if it.get("path") else None,
|
||||
ext=str(it.get("ext")) if it.get("ext") else None,
|
||||
size_bytes=int(it.get("size")) if it.get("size") is not None else None,
|
||||
metadata=dict(it),
|
||||
source=str(it.get("source")) if it.get("source") else "myprovider",
|
||||
)
|
||||
|
||||
# Optional: build columns dynamically from sample rows
|
||||
def columns_factory(rows: List[ResultModel]) -> List[ColumnSpec]:
|
||||
cols = [title_column()]
|
||||
# add extra columns if metadata keys exist
|
||||
if any((r.metadata or {}).get("size") for r in rows):
|
||||
cols.append(ColumnSpec("size", "Size", lambda r: r.size_bytes or ""))
|
||||
return cols
|
||||
|
||||
# Selection args for `@N` expansion or `select` cmdlet
|
||||
def selection_fn(row: ResultModel) -> List[str]:
|
||||
# prefer -path when available
|
||||
if row.path:
|
||||
return ["-path", row.path]
|
||||
return ["-title", row.title or ""]
|
||||
|
||||
# Register provider (done at import time)
|
||||
register_provider("myprovider", adapter, columns=columns_factory, selection_fn=selection_fn)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Table scraping: using TableProviderMixin (HTML tables / list-results)
|
||||
|
||||
If your provider scrapes HTML tables or list-like results (common on web search pages), use `TableProviderMixin`:
|
||||
|
||||
```py
|
||||
from ProviderCore.base import Provider
|
||||
from SYS.provider_helpers import TableProviderMixin
|
||||
|
||||
class MyTableProvider(TableProviderMixin, Provider):
|
||||
URL = ("https://example.org/search",)
|
||||
|
||||
def validate(self) -> bool:
|
||||
return True
|
||||
|
||||
def search(self, query: str, limit: int = 50, **kwargs):
|
||||
url = f"{self.URL[0]}?q={quote_plus(query)}"
|
||||
return self.search_table_from_url(url, limit=limit)
|
||||
```
|
||||
|
||||
`TableProviderMixin.search_table_from_url` returns `ProviderCore.base.SearchResult` entries. If you want to integrate this provider with the strict `ResultTable` registry, add a small adapter that converts `SearchResult` -> `ResultModel` and register it using `register_provider` (see `Provider/vimm.py` for a real example).
|
||||
|
||||
---
|
||||
|
||||
## Columns & selection
|
||||
|
||||
- `columns` may be a static `List[ColumnSpec]` or a factory `def cols(rows: List[ResultModel]) -> List[ColumnSpec]` that inspects sample rows.
|
||||
- `selection_fn` must accept a `ResultModel` and return a `List[str]` representing CLI args (e.g., `['-path', row.path]`). These args are used by `select` and `@N` expansion.
|
||||
|
||||
**Tip:** for providers that produce downloadable file rows prefer returning explicit URL args (e.g., `['-url', row.path]`) so the selected URL is clearly identified by downstream downloaders and to avoid ambiguous parsing when provider hints (like `-provider`) are present.
|
||||
- Ensure your `ResultModel.source` is set (either in the model or rely on the provider name set by `serialize_row`).
|
||||
|
||||
---
|
||||
|
||||
## Optional: pandas path for `<table>` extraction
|
||||
|
||||
`SYS.html_table.extract_records` prefers a pure-lxml path but will use `pandas.read_html` if pandas is installed and the helper detects it works for the input table. This is optional and **not required** to author a provider — document in your provider whether it requires `pandas` and add an informative error/log message when it is missing.
|
||||
|
||||
---
|
||||
|
||||
## Testing & examples
|
||||
|
||||
- Write `tests/test_provider_<name>.py` that imports your provider and verifies `provider.build_table(...)` produces a `ResultTable` (has `.rows` and `.columns`) and that `serialize_rows()` yields dicts with `_selection_args`, `_selection_action` when applicable, and `source`.
|
||||
- When you need to guarantee a specific CLI stage sequence (e.g., `download-file -url <path> -provider <name>`), call `table.set_row_selection_action(row_index, tokens)` so the serialized payload emits `_selection_action` and the CLI can run the row exactly as intended.
|
||||
- For table providers you can test `search_table_from_url` using a local HTML fixture or by mocking `HTTPClient` to return a small sample page.
|
||||
- If you rely on pandas, add a test that monkeypatches `sys.modules['pandas']` to a simple shim to validate the pandas path.
|
||||
|
||||
**Example test skeleton**
|
||||
|
||||
```py
|
||||
from SYS.result_table_adapters import get_provider
|
||||
from Provider import example_provider
|
||||
|
||||
|
||||
def test_example_provider_registration():
|
||||
provider = get_provider("example")
|
||||
rows = list(provider.adapter(example_provider.SAMPLE_ITEMS))
|
||||
assert rows and rows[0].title
|
||||
cols = provider.get_columns(rows)
|
||||
assert any(c.name == "title" for c in cols)
|
||||
table = provider.build_table(example_provider.SAMPLE_ITEMS)
|
||||
assert table.provider == "example" and table.rows
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References & examples
|
||||
|
||||
- Read `Provider/example_provider.py` for a compact example of a strict adapter and dynamic columns.
|
||||
- Read `Provider/vimm.py` for a table-provider that uses `TableProviderMixin` and converts `SearchResult` → `ResultModel` for registration.
|
||||
- See `docs/provider_guide.md` for a broader provider development checklist.
|
||||
|
||||
---
|
||||
|
||||
If you want, I can also add a small `Provider/myprovider_template.py` file and unit tests for it — say the word and I'll add them and wire up tests. 🎯
|
||||
@@ -13,10 +13,11 @@ This document explains the `ResultTable` system used across the CLI and TUI: how
|
||||
|
||||
- **ResultTable** (`SYS/result_table.py`)
|
||||
- Renders rows as a rich table and stores metadata used for selection expansion.
|
||||
- Important APIs: `add_result()`, `set_table()`, `set_source_command()`, `set_row_selection_args()`, `set_table_metadata()`, and `select_interactive()`.
|
||||
- Important APIs: `add_result()`, `set_table()`, `set_source_command()`, `set_row_selection_args()`, `set_row_selection_action()`, `set_table_metadata()`, and `select_interactive()`.
|
||||
|
||||
- **ResultRow**
|
||||
- Holds columns plus `selection_args` (used for `@N` expansion) and `payload` (original object).
|
||||
- Optionally stores `selection_action`, a full list of CLI tokens to run when `@N` selects this row. When present the CLI honors the explicit action instead of reconstructing it from `source_command` and `selection_args`.
|
||||
|
||||
- **Provider selector**
|
||||
- If a provider implements `selector(selected_items, ctx=..., stage_is_last=True)`, it is run first when `@N` is used; if the selector returns `True` it has handled the selection (e.g., drilling into a folder and publishing a new ResultTable).
|
||||
@@ -112,7 +113,7 @@ SearchResult(
|
||||
)
|
||||
```
|
||||
|
||||
Illustrative file SearchResult (after drilling):
|
||||
4. Otherwise, for single selections, CLI checks for `row.selection_action` and runs that verbatim if present; otherwise it expands `source_command + source_args + row_selection_args`. For multi-selections, items are piped downstream.
|
||||
|
||||
```py
|
||||
SearchResult(
|
||||
@@ -217,6 +218,8 @@ Notes:
|
||||
|
||||
---
|
||||
|
||||
For more detail on ResultTable provider authoring, see `docs/provider_authoring.md`.
|
||||
|
||||
If you'd like, I can also:
|
||||
- Add provider-specific examples (AllDebrid, Bandcamp) into this doc ✅
|
||||
- Add a short checklist for PR reviewers when adding new providers
|
||||
|
||||
Reference in New Issue
Block a user