parsl.dataflow.memoization.Memoizer
- class parsl.dataflow.memoization.Memoizer(*, memoize: bool = True, checkpoint_files: Sequence[str] | None, checkpoint_period: str | None, checkpoint_mode: Literal['task_exit', 'periodic', 'dfk_exit', 'manual'] | None)[source]
Memoizer is responsible for ensuring that identical work is not repeated.
When a task is repeated, i.e., the same function is called with the same exact arguments, the result from a previous execution is reused. wiki
The memoizer implementation here does not collapse duplicate calls at call time, but works only when the result of a previous call is available at the time the duplicate call is made.
For instance:
No advantage from Memoization helps memoization here: here: TaskA TaskB | TaskA | | | TaskA done (TaskB) | | | (TaskB) done | | done | done
The memoizer creates a lookup table by hashing the function name and its inputs, and storing the results of the function.
When a task is ready for launch, i.e., all of its arguments have resolved, we add its hash to the task datastructure.
- __init__(*, memoize: bool = True, checkpoint_files: Sequence[str] | None, checkpoint_period: str | None, checkpoint_mode: Literal['task_exit', 'periodic', 'dfk_exit', 'manual'] | None)[source]
Initialize the memoizer.
- KWargs:
memoize (Bool): enable memoization or not.
checkpoint (Dict): A checkpoint loaded as a dict.
Methods
__init__(*[, memoize])Initialize the memoizer.
check_memo(task)Create a hash of the task and its inputs and check the lookup table for this hash.
checkpoint(*[, task])Checkpoint the dfk incrementally to a checkpoint file.
close()load_checkpoints(checkpointDirs)Load checkpoints from the checkpoint files into a dictionary.
make_hash(task)Create a hash of the task inputs.
start()update_checkpoint(task_record)update_memo(task)Updates the memoization lookup table with the result from a task.
Attributes
- check_memo(task: TaskRecord) Future[Any] | None[source]
Create a hash of the task and its inputs and check the lookup table for this hash.
If present, the results are returned.
- Parameters:
task (-) – task from the dfk.tasks table
- Returns:
Result of the function if present in table, wrapped in a Future
This call will also set task[‘hashsum’] to the unique hashsum for the func+inputs.
- checkpoint(*, task: TaskRecord | None = None) None[source]
Checkpoint the dfk incrementally to a checkpoint file.
When called with no argument, all tasks registered in self.checkpointable_tasks will be checkpointed. When called with a single TaskRecord argument, that task will be checkpointed.
By default the checkpoints are written to the RUNDIR of the current run under RUNDIR/checkpoints/tasks.pkl
- Kwargs:
task (Optional task records) : A task to checkpoint. Default=None, meaning all tasks registered for checkpointing.
Note
Checkpointing only works if memoization is enabled
- load_checkpoints(checkpointDirs: Sequence[str] | None) Dict[str, Future][source]
Load checkpoints from the checkpoint files into a dictionary.
The results are used to pre-populate the memoizer’s lookup_table
- Kwargs:
checkpointDirs (list) : List of run folder to use as checkpoints Eg. [‘runinfo/001’, ‘runinfo/002’]
- Returns:
dict containing, hashed -> future mappings
- make_hash(task: TaskRecord) str[source]
Create a hash of the task inputs.
- Parameters:
task (-) – Task dictionary from dfk.tasks
- Returns:
A unique hash string
- Return type:
hash (str)
- update_checkpoint(task_record: TaskRecord) None[source]
- update_memo(task: TaskRecord) None[source]
Updates the memoization lookup table with the result from a task. This doesn’t move any values around but associates the memoization hashsum with the completed (by success or failure) AppFuture.
- Parameters:
task (-) – A task record from dfk.tasks