Registration API¶

The registration system manages environment creation, task IDs, and dataset organization.

JaxARC uses a registration system to manage environments and tasks. Each environment is registered with a unique ID that can be used with make().

Module Contents¶

Registration system for JaxARC environments.

This package provides a lean registry that maps simple dataset keys to environment specs. Dataset parsing and task loading are no longer part of this module. Environments are expected to be constructed with buffer-based EnvParams (JAX-native, JIT-friendly) and not depend on parsers at runtime.

Core ideas: - A global registry maps dataset keys (e.g., “Mini”, “Concept”, “AGI1”, “AGI2”) to EnvSpec definitions. - No parser entry points or subset inference live here anymore. - make(id, **kwargs) only parses the dataset key and returns the environment and parameters

built from provided kwargs (e.g., a prebuilt buffer in EnvParams or an explicit params).

Named subsets can be registered (e.g., register_subset(“Mini”, “easy”, […])) and then selected via make(“Mini-easy”) to load exactly those tasks. This makes it easy to publish curated benchmarks and implement curriculum learning.

Typical usage:: from jaxarc.registration import make # Build EnvParams with a pre-stacked task buffer outside this module. env, params = make(“Mini”, params=my_params)

Notes: - This module keeps a single way of doing things: buffer-based, JIT-friendly EnvParams. - Dataset downloading/parsing and subset handling should be done outside this module.

class jaxarc.registration.EnvRegistry[source]¶

Bases: object

Global environment registry with gym-like semantics.

available_named_subsets(dataset_key: str, include_builtin: bool = True) → tuple[str, ...][source]¶

Return names of available subsets for a dataset.

Parameters:

dataset_key – Dataset name (Mini, Concept, AGI1, AGI2)
include_builtin – Include built-in selectors (‘all’, ‘train’, ‘eval’) and concept groups (default: True)

Returns:

Tuple of subset names, sorted alphabetically

Examples

>>> available_named_subsets("Mini")
('all',)  # Mini doesn't have train/eval splits

>>> available_named_subsets("Concept")
('AboveBelow', 'Center', 'all', ...)  # Includes concept groups

>>> available_named_subsets("AGI1")
('all', 'eval', 'train')  # AGI has splits

>>> available_named_subsets("Mini", include_builtin=False)
()  # Only custom subsets

available_task_ids(dataset_key: str, config: Any | None = None, auto_download: bool = False) → list[str][source]¶: Return all available task IDs for a dataset key after ensuring dataset availability.

get_subset_task_ids(dataset_key: str, selector: str = 'all', config: Any | None = None, auto_download: bool = False) → list[str][source]¶

Get task IDs for a specific subset without creating an environment.

This allows users to query what tasks will be loaded before calling make().

Parameters:

dataset_key – Dataset name (Mini, Concept, AGI1, AGI2)
selector – Subset selector (‘all’, ‘train’, ‘easy’, task_id, etc.)
config – Optional config
auto_download – Download dataset if missing

Returns:

List of task IDs that will be loaded

Examples

>>> get_subset_task_ids("Mini", "all")
['Most_Common_color_l6ab0lf3xztbyxsu3p', ...]

>>> get_subset_task_ids("Mini", "easy")
['task1', 'task2', 'task3']  # Only tasks in 'easy' subset

>>> get_subset_task_ids("Concept", "Center")
['Center_001', 'Center_002', ...]  # Tasks in Center concept

>>> get_subset_task_ids("Mini", "Most_Common_color_l6ab0lf3xztbyxsu3p")
['Most_Common_color_l6ab0lf3xztbyxsu3p']  # Single task

make(id: str, **kwargs: Any) → Tuple[Any, Any][source]¶

Create an environment instance and parameters for a registered spec.

Expected kwargs:

params: EnvParams (preferred; buffer-based, JIT-friendly)
env_entry: str (optional) override of environment entry point

Returns:: env: Environment instance params: EnvParams provided directly
Return type:: (env, params) tuple

register(id: str, env_entry: str = 'jaxarc.envs:Environment', max_episode_steps: int = 100, **kwargs: Any) → None[source]¶

Parameters:

id – Unique environment ID (e.g., “JaxARC-Mini-v0”)
entry_point – Dotted path or colon path to class/factory (e.g., “jaxarc.envs:Environment”)
max_episode_steps – Default max steps for this environment family
**kwargs – Additional metadata stored with the spec

register_subset(dataset_key: str, name: str, task_ids: list[str] | tuple[str, ...]) → None[source]¶

Parameters:

dataset_key – Base dataset key (e.g., ‘Mini’, ‘Concept’, ‘AGI1’, ‘AGI2’ or synonyms)
name – Subset name (e.g., ‘easy’, ‘hard’, ‘my-benchmark’)
task_ids – Sequence of task IDs to include in this subset

subset_task_ids(dataset_key: str, name: str) → tuple[str, ...][source]¶: Return the task IDs registered for a named subset (e.g., ‘Mini’, ‘easy’).

class jaxarc.registration.EnvSpec(id: str, env_entry: str = 'jaxarc.envs:Environment', max_episode_steps: int = 100, kwargs: Dict[str, ~typing.Any]=<factory>)[source]¶

Bases: object

Environment specification for registration.

env_entry: str = 'jaxarc.envs:Environment'¶

id: str¶

kwargs: Dict[str, Any]¶

max_episode_steps: int = 100¶

jaxarc.registration.available_named_subsets(dataset_key: str, include_builtin: bool = True) → tuple[str, ...][source]¶

List available subset names for a dataset (includes built-in selectors by default).

Parameters:

dataset_key – Dataset name (Mini, Concept, AGI1, AGI2)
include_builtin – Include built-in selectors like ‘all’, ‘train’, ‘eval’ (default: True)

Returns:

Tuple of subset names

Examples

>>> available_named_subsets("Mini")
('all', 'easy', 'eval', 'train')

>>> available_named_subsets("Mini", include_builtin=False)
('easy',)  # Only custom subsets

jaxarc.registration.available_task_ids(dataset_key: str, config: Any | None = None, auto_download: bool = False) → list[str][source]¶: List all available task IDs (equivalent to get_subset_task_ids with selector=’all’).

jaxarc.registration.get_subset_task_ids(dataset_key: str, selector: str = 'all', config: Any | None = None, auto_download: bool = False) → list[str][source]¶

Get task IDs for a specific subset without creating an environment.

This allows users to query what tasks will be loaded before calling make().

Parameters:

dataset_key – Dataset name (Mini, Concept, AGI1, AGI2)
selector – Subset selector (‘all’, ‘train’, ‘easy’, task_id, etc.)
config – Optional config
auto_download – Download dataset if missing

Returns:

List of task IDs that will be loaded

Examples

>>> get_subset_task_ids("Mini", "all")
['Most_Common_color_l6ab0lf3xztbyxsu3p', ...]

>>> get_subset_task_ids("Mini", "easy")
['task1', 'task2', 'task3']

>>> get_subset_task_ids("Mini", "Most_Common_color_l6ab0lf3xztbyxsu3p")
['Most_Common_color_l6ab0lf3xztbyxsu3p']

jaxarc.registration.load_all_subsets_for_dataset(dataset: str, config_root: Path | None = None) → int[source]¶

Discover and register all YAML-defined subsets for dataset.

Returns the number of subsets successfully loaded.

jaxarc.registration.load_subset(name: str, dataset: str, config_root: Path | None = None) → list[str] | None[source]¶

Load task IDs for a named subset from a YAML file.

Parameters:

name – Subset name (e.g. "easy").
dataset – Dataset key (e.g. "Mini", "AGI1").
config_root – Path to the configs/ directory. When None, :pep:`pyprojroot` is used to locate it automatically.

Returns:

List of task ID strings, or None if the file was not found or could not be parsed.

jaxarc.registration.load_subset_if_needed(name: str, dataset: str, config_root: Path | None = None) → bool[source]¶

Load and register a subset only if it is not already registered.

Returns True when the subset is available (either already present or freshly loaded).

jaxarc.registration.make(id: str, **kwargs: Any) → tuple[Any, Any][source]¶

Create an environment instance and EnvParams using a registered spec.

See EnvRegistry.make for details on supported kwargs.

jaxarc.registration.register(id: str, entry_point: str | None = None, env_entry: str = 'jaxarc.envs:Environment', max_episode_steps: int = 100, **kwargs: Any) → None[source]¶: Register an environment spec in the global registry.

jaxarc.registration.register_subset(dataset_key: str, name: str, task_ids: list[str] | tuple[str, ...]) → None[source]¶: Register a named subset for a dataset key, enabling IDs like ‘Mini-easy’.

jaxarc.registration.subset_task_ids(dataset_key: str, name: str) → tuple[str, ...][source]¶

Return the task IDs registered under a named subset.

This only works for explicitly registered subsets (via register_subset). For more flexible queries, use get_subset_task_ids() instead.

Task ID Format¶

Task IDs follow the pattern: Dataset-TaskName_taskId

Examples:

Mini-Most_Common_color_l6ab0lf3xztbyxsu3p
ConceptARC-denoising_0c9aba6e
ARC-007bbfb7