Files
Medios-Macina/docs/provider_authoring.md

6.9 KiB

Provider authoring: ResultTable & provider adapters

This short guide explains how to write providers that integrate with the strict ResultTable API: adapters must yield ResultModel instances and providers register via SYS.result_table_adapters.register_provider with a column specification and a selection_fn.


Quick summary

  • Providers register a provider adapter (callable that yields ResultModel).
  • Providers must also provide columns (static list or factory) and a selection_fn that returns CLI args for a selected row.
  • For simple HTML table/list scraping, prefer TableProviderMixin from SYS.provider_helpers to fetch and extract rows using SYS.html_table.extract_records.

Runtime dependency policy

  • Treat required runtime dependencies (e.g., Playwright) as mandatory: import them unconditionally and let missing dependencies fail fast at import time. Avoid adding per-call try/except import guards for required modules—these silently hide configuration errors and add bloat.
  • Use guarded imports only for truly optional dependencies (e.g., pandas for enhanced table parsing) and provide meaningful fallbacks or helpful error messages in those cases.
  • Keep provider code minimal and explicit: fail early and document required runtime dependencies in README/installation notes.

Minimal provider template (copy/paste)

# Provider/my_provider.py
from typing import Any, Dict, Iterable, List

from SYS.result_table_api import ResultModel, ColumnSpec, title_column, metadata_column
from SYS.result_table_adapters import register_provider

# Example adapter: convert provider-specific items into ResultModel instances
SAMPLE_ITEMS = [
    {"name": "Example File.pdf", "path": "https://example.com/x.pdf", "ext": "pdf", "size": 1024, "source": "myprovider"},
]

def adapter(items: Iterable[Dict[str, Any]]) -> Iterable[ResultModel]:
    for it in items:
        title = it.get("name") or it.get("title") or str(it.get("path") or "")
        yield ResultModel(
            title=str(title),
            path=str(it.get("path")) if it.get("path") else None,
            ext=str(it.get("ext")) if it.get("ext") else None,
            size_bytes=int(it.get("size")) if it.get("size") is not None else None,
            metadata=dict(it),
            source=str(it.get("source")) if it.get("source") else "myprovider",
        )

# Optional: build columns dynamically from sample rows
def columns_factory(rows: List[ResultModel]) -> List[ColumnSpec]:
    cols = [title_column()]
    # add extra columns if metadata keys exist
    if any((r.metadata or {}).get("size") for r in rows):
        cols.append(ColumnSpec("size", "Size", lambda r: r.size_bytes or ""))
    return cols

# Selection args for `@N` expansion or `select` cmdlet
def selection_fn(row: ResultModel) -> List[str]:
    # prefer -path when available
    if row.path:
        return ["-path", row.path]
    return ["-title", row.title or ""]

# Register provider (done at import time)
register_provider("myprovider", adapter, columns=columns_factory, selection_fn=selection_fn)

Table scraping: using TableProviderMixin (HTML tables / list-results)

If your provider scrapes HTML tables or list-like results (common on web search pages), use TableProviderMixin:

from ProviderCore.base import Provider
from SYS.provider_helpers import TableProviderMixin

class MyTableProvider(TableProviderMixin, Provider):
    URL = ("https://example.org/search",)

    def validate(self) -> bool:
        return True

    def search(self, query: str, limit: int = 50, **kwargs):
        url = f"{self.URL[0]}?q={quote_plus(query)}"
        return self.search_table_from_url(url, limit=limit)

TableProviderMixin.search_table_from_url returns ProviderCore.base.SearchResult entries. If you want to integrate this provider with the strict ResultTable registry, add a small adapter that converts SearchResult -> ResultModel and register it using register_provider (see Provider/vimm.py for a real example).


Columns & selection

  • columns may be a static List[ColumnSpec] or a factory def cols(rows: List[ResultModel]) -> List[ColumnSpec] that inspects sample rows.

  • selection_fn must accept a ResultModel and return a List[str] representing CLI args (e.g., ['-path', row.path]). These args are used by select and @N expansion.

    Tip: for providers that produce downloadable file rows prefer returning explicit URL args (e.g., ['-url', row.path]) so the selected URL is clearly identified by downstream downloaders and to avoid ambiguous parsing when provider hints (like -provider) are present.

  • Ensure your ResultModel.source is set (either in the model or rely on the provider name set by serialize_row).


Optional: pandas path for <table> extraction

SYS.html_table.extract_records prefers a pure-lxml path but will use pandas.read_html if pandas is installed and the helper detects it works for the input table. This is optional and not required to author a provider — document in your provider whether it requires pandas and add an informative error/log message when it is missing.


Testing & examples

  • Write tests/test_provider_<name>.py that imports your provider and verifies provider.build_table(...) produces a ResultTable (has .rows and .columns) and that serialize_rows() yields dicts with _selection_args, _selection_action when applicable, and source.
  • When you need to guarantee a specific CLI stage sequence (e.g., download-file -url <path> -provider <name>), call table.set_row_selection_action(row_index, tokens) so the serialized payload emits _selection_action and the CLI can run the row exactly as intended.
  • For table providers you can test search_table_from_url using a local HTML fixture or by mocking HTTPClient to return a small sample page.
  • If you rely on pandas, add a test that monkeypatches sys.modules['pandas'] to a simple shim to validate the pandas path.

Example test skeleton

from SYS.result_table_adapters import get_provider
from Provider import example_provider


def test_example_provider_registration():
    provider = get_provider("example")
    rows = list(provider.adapter(example_provider.SAMPLE_ITEMS))
    assert rows and rows[0].title
    cols = provider.get_columns(rows)
    assert any(c.name == "title" for c in cols)
    table = provider.build_table(example_provider.SAMPLE_ITEMS)
    assert table.provider == "example" and table.rows

References & examples

  • Read Provider/example_provider.py for a compact example of a strict adapter and dynamic columns.
  • Read Provider/vimm.py for a table-provider that uses TableProviderMixin and converts SearchResultResultModel for registration.
  • See docs/provider_guide.md for a broader provider development checklist.

If you want, I can also add a small Provider/myprovider_template.py file and unit tests for it — say the word and I'll add them and wire up tests. 🎯