Provider authoring: ResultTable & provider adapters ✅

This short guide explains how to write providers that integrate with the strict ResultTable API: adapters must yield ResultModel instances and providers register via SYS.result_table_adapters.register_provider with a column specification and a selection_fn.

Quick summary

Providers register a provider adapter (callable that yields ResultModel).
Providers must also provide columns (static list or factory) and a selection_fn that returns CLI args for a selected row.
For simple HTML table/list scraping, prefer TableProviderMixin from SYS.provider_helpers to fetch and extract rows using SYS.html_table.extract_records.

Runtime dependency policy

Treat required runtime dependencies (e.g., Playwright) as mandatory: import them unconditionally and let missing dependencies fail fast at import time. Avoid adding per-call try/except import guards for required modules—these silently hide configuration errors and add bloat.
Use guarded imports only for truly optional dependencies (e.g., pandas for enhanced table parsing) and provide meaningful fallbacks or helpful error messages in those cases.
Keep provider code minimal and explicit: fail early and document required runtime dependencies in README/installation notes.

Minimal provider template (copy/paste)

# Provider/my_provider.py
from typing import Any, Dict, Iterable, List

from SYS.result_table_api import ResultModel, ColumnSpec, title_column, metadata_column
from SYS.result_table_adapters import register_provider

# Example adapter: convert provider-specific items into ResultModel instances
SAMPLE_ITEMS = [
    {"name": "Example File.pdf", "path": "https://example.com/x.pdf", "ext": "pdf", "size": 1024, "source": "myprovider"},
]

def adapter(items: Iterable[Dict[str, Any]]) -> Iterable[ResultModel]:
    for it in items:
        title = it.get("name") or it.get("title") or str(it.get("path") or "")
        yield ResultModel(
            title=str(title),
            path=str(it.get("path")) if it.get("path") else None,
            ext=str(it.get("ext")) if it.get("ext") else None,
            size_bytes=int(it.get("size")) if it.get("size") is not None else None,
            metadata=dict(it),
            source=str(it.get("source")) if it.get("source") else "myprovider",
        )

# Optional: build columns dynamically from sample rows
def columns_factory(rows: List[ResultModel]) -> List[ColumnSpec]:
    cols = [title_column()]
    # add extra columns if metadata keys exist
    if any((r.metadata or {}).get("size") for r in rows):
        cols.append(ColumnSpec("size", "Size", lambda r: r.size_bytes or ""))
    return cols

# Selection args for `@N` expansion or `select` cmdlet
def selection_fn(row: ResultModel) -> List[str]:
    # prefer -path when available
    if row.path:
        return ["-path", row.path]
    return ["-title", row.title or ""]

# Register provider (done at import time)
register_provider("myprovider", adapter, columns=columns_factory, selection_fn=selection_fn)

Table scraping: using TableProviderMixin (HTML tables / list-results)

If your provider scrapes HTML tables or list-like results (common on web search pages), use TableProviderMixin:

from ProviderCore.base import Provider
from SYS.provider_helpers import TableProviderMixin

class MyTableProvider(TableProviderMixin, Provider):
    URL = ("https://example.org/search",)

    def validate(self) -> bool:
        return True

    def search(self, query: str, limit: int = 50, **kwargs):
        url = f"{self.URL[0]}?q={quote_plus(query)}"
        return self.search_table_from_url(url, limit=limit)

TableProviderMixin.search_table_from_url returns ProviderCore.base.SearchResult entries. If you want to integrate this provider with the strict ResultTable registry, add a small adapter that converts SearchResult -> ResultModel and register it using register_provider (see Provider/vimm.py for a real example).

Columns & selection

columns may be a static List[ColumnSpec] or a factory def cols(rows: List[ResultModel]) -> List[ColumnSpec] that inspects sample rows.
selection_fn must accept a ResultModel and return a List[str] representing CLI args (e.g., ['-path', row.path]). These args are used by select and @N expansion.

Tip: for providers that produce downloadable file rows prefer returning explicit URL args (e.g., ['-url', row.path]) so the selected URL is clearly identified by downstream downloaders and to avoid ambiguous parsing when provider hints (like -provider) are present.
Ensure your ResultModel.source is set (either in the model or rely on the provider name set by serialize_row).

Optional: pandas path for `<table>` extraction

SYS.html_table.extract_records prefers a pure-lxml path but will use pandas.read_html if pandas is installed and the helper detects it works for the input table. This is optional and not required to author a provider — document in your provider whether it requires pandas and add an informative error/log message when it is missing.

Testing & examples

Write tests/test_provider_<name>.py that imports your provider and verifies provider.build_table(...) produces a ResultTable (has .rows and .columns) and that serialize_rows() yields dicts with _selection_args, _selection_action when applicable, and source.
When you need to guarantee a specific CLI stage sequence (e.g., download-file -url <path> -provider <name>), call table.set_row_selection_action(row_index, tokens) so the serialized payload emits _selection_action and the CLI can run the row exactly as intended.
For table providers you can test search_table_from_url using a local HTML fixture or by mocking HTTPClient to return a small sample page.
If you rely on pandas, add a test that monkeypatches sys.modules['pandas'] to a simple shim to validate the pandas path.

Example test skeleton

from SYS.result_table_adapters import get_provider
from Provider import example_provider


def test_example_provider_registration():
    provider = get_provider("example")
    rows = list(provider.adapter(example_provider.SAMPLE_ITEMS))
    assert rows and rows[0].title
    cols = provider.get_columns(rows)
    assert any(c.name == "title" for c in cols)
    table = provider.build_table(example_provider.SAMPLE_ITEMS)
    assert table.provider == "example" and table.rows

References & examples

Read Provider/example_provider.py for a compact example of a strict adapter and dynamic columns.
Read Provider/vimm.py for a table-provider that uses TableProviderMixin and converts SearchResult → ResultModel for registration.
See docs/provider_guide.md for a broader provider development checklist.

If you want, I can also add a small Provider/myprovider_template.py file and unit tests for it — say the word and I'll add them and wire up tests. 🎯

6.9 KiB Raw Blame History