parsl.dataflow.memoization.Memoizer

class parsl.dataflow.memoization.Memoizer(*, memoize: bool = True, checkpoint_files: Sequence[str] | None, checkpoint_period: str | None, checkpoint_mode: Literal['task_exit', 'periodic', 'dfk_exit', 'manual'] | None)[source]

Memoizer is responsible for ensuring that identical work is not repeated.

When a task is repeated, i.e., the same function is called with the same exact arguments, the result from a previous execution is reused. wiki

The memoizer implementation here does not collapse duplicate calls at call time, but works only when the result of a previous call is available at the time the duplicate call is made.

For instance:

No advantage from                 Memoization helps
memoization here:                 here:

 TaskA                            TaskB
   |   TaskA                        |
   |     |   TaskA                done  (TaskB)
   |     |     |                                (TaskB)
 done    |     |
       done    |
             done

The memoizer creates a lookup table by hashing the function name and its inputs, and storing the results of the function.

When a task is ready for launch, i.e., all of its arguments have resolved, we add its hash to the task datastructure.

__init__(*, memoize: bool = True, checkpoint_files: Sequence[str] | None, checkpoint_period: str | None, checkpoint_mode: Literal['task_exit', 'periodic', 'dfk_exit', 'manual'] | None)[source]

Initialize the memoizer.

KWargs:
  • memoize (Bool): enable memoization or not.

  • checkpoint (Dict): A checkpoint loaded as a dict.

Methods

__init__(*[, memoize])

Initialize the memoizer.

check_memo(task)

Create a hash of the task and its inputs and check the lookup table for this hash.

checkpoint(*[, task])

Checkpoint the dfk incrementally to a checkpoint file.

close()

load_checkpoints(checkpointDirs)

Load checkpoints from the checkpoint files into a dictionary.

make_hash(task)

Create a hash of the task inputs.

start()

update_checkpoint(task_record)

update_memo(task)

Updates the memoization lookup table with the result from a task.

Attributes

run_dir

check_memo(task: TaskRecord) Future[Any] | None[source]

Create a hash of the task and its inputs and check the lookup table for this hash.

If present, the results are returned.

Parameters:

task (-) – task from the dfk.tasks table

Returns:

  • Result of the function if present in table, wrapped in a Future

This call will also set task[‘hashsum’] to the unique hashsum for the func+inputs.

checkpoint(*, task: TaskRecord | None = None) None[source]

Checkpoint the dfk incrementally to a checkpoint file.

When called with no argument, all tasks registered in self.checkpointable_tasks will be checkpointed. When called with a single TaskRecord argument, that task will be checkpointed.

By default the checkpoints are written to the RUNDIR of the current run under RUNDIR/checkpoints/tasks.pkl

Kwargs:
  • task (Optional task records) : A task to checkpoint. Default=None, meaning all tasks registered for checkpointing.

Note

Checkpointing only works if memoization is enabled

close() None[source]
load_checkpoints(checkpointDirs: Sequence[str] | None) Dict[str, Future][source]

Load checkpoints from the checkpoint files into a dictionary.

The results are used to pre-populate the memoizer’s lookup_table

Kwargs:
  • checkpointDirs (list) : List of run folder to use as checkpoints Eg. [‘runinfo/001’, ‘runinfo/002’]

Returns:

  • dict containing, hashed -> future mappings

make_hash(task: TaskRecord) str[source]

Create a hash of the task inputs.

Parameters:

task (-) – Task dictionary from dfk.tasks

Returns:

A unique hash string

Return type:

  • hash (str)

run_dir: str[source]
start() None[source]
update_checkpoint(task_record: TaskRecord) None[source]
update_memo(task: TaskRecord) None[source]

Updates the memoization lookup table with the result from a task. This doesn’t move any values around but associates the memoization hashsum with the completed (by success or failure) AppFuture.

Parameters:

task (-) – A task record from dfk.tasks