.. _sec_rs_async:

非同期ランダムサーチ
====================


前の :numref:`sec_api_hpo`
で見たように、ハイパーパラメータ構成の評価にはコストがかかるため、ランダムサーチが良いハイパーパラメータ構成を返すまでに数時間、あるいは数日待たなければならないことがある。実際には、同じマシン上の複数GPUや、単一GPUを持つ複数マシンといった資源プールを利用できることがよくある。ここで問題になるのは、\ *ランダムサーチをどのように効率よく分散するか*\ である。

一般に、同期型と非同期型の並列ハイパーパラメータ最適化を区別する（:numref:`distributed_scheduling`
を参照）。同期型では、同時に実行中のすべての試行が終了するまで待ってから、次のバッチを開始する。深層ニューラルネットワークのフィルタ数や層数のようなハイパーパラメータを含む構成空間を考えてみよう。より多くのフィルタ層を含むハイパーパラメータ構成は、当然ながら完了までにより長い時間を要し、同じバッチ内の他のすべての試行は、最適化プロセスを続ける前に同期点（:numref:`distributed_scheduling`
の灰色部分）で待たなければならない。

非同期型では、資源が利用可能になり次第、すぐに新しい試行をスケジュールする。これにより、同期のオーバーヘッドを避けられるため、資源を最適に活用できる。ランダムサーチでは、新しいハイパーパラメータ構成はそれぞれ他のすべてから独立に選ばれ、特に過去の評価結果を利用しない。つまり、ランダムサーチは非同期に容易に並列化できる。これは、過去の観測に基づいて意思決定を行う、より高度な手法では自明ではない（:numref:`sec_sh_async`
を参照）。逐次設定よりも多くの資源にアクセスできる必要はあるが、非同期ランダムサーチは線形の高速化を示し、\ :math:`K`
個の試行を並列実行できれば、ある性能に到達するまでの時間は :math:`K`
倍速くなる。

.. _distributed_scheduling:

.. figure:: ../img/distributed_scheduling.svg

   ハイパーパラメータ最適化プロセスを同期的または非同期的に分散する。逐次設定と比べて、総計算量を一定に保ったまま、全体の壁時計時間を短縮できる。同期スケジューリングでは、遅い試行がある場合にワーカーが遊休状態になることがある。


このノートブックでは、同一マシン上の複数の python
プロセスで試行を実行する非同期ランダムサーチを見ていく。分散ジョブのスケジューリングと実行をゼロから実装するのは困難である。そこで、非同期HPOのためのシンプルなインターフェースを提供する
*Syne Tune* :cite:`salinas-automl22` を使用する。Syne Tune
はさまざまな実行バックエンドで動作するよう設計されており、分散HPOについてさらに学びたい読者は、そのシンプルなAPIをぜひ調べてみよ。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    from d2l import torch as d2l
    import logging
    logging.basicConfig(level=logging.INFO)
    from syne_tune.config_space import loguniform, randint
    from syne_tune.backend.python_backend import PythonBackend
    from syne_tune.optimizer.baselines import RandomSearch
    from syne_tune import Tuner, StoppingCriterion
    from syne_tune.experiments import load_experiment


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    INFO:root:SageMakerBackend is not imported since dependencies are missing. You can install them with
       pip install 'syne-tune[extra]'
    AWS dependencies are not imported since dependencies are missing. You can install them with
       pip install 'syne-tune[aws]'
    or (for everything)
       pip install 'syne-tune[extra]'
    AWS dependencies are not imported since dependencies are missing. You can install them with
       pip install 'syne-tune[aws]'
    or (for everything)
       pip install 'syne-tune[extra]'
    INFO:root:Ray Tune schedulers and searchers are not imported since dependencies are missing. You can install them with
       pip install 'syne-tune[raytune]'
    or (for everything)
       pip install 'syne-tune[extra]'


目的関数
--------

まず、新しい目的関数を定義し、\ ``report`` コールバックを通じて Syne
Tune に性能を返すようにする。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    def hpo_objective_lenet_synetune(learning_rate, batch_size, max_epochs):
        from d2l import torch as d2l    
        from syne_tune import Reporter
    
        model = d2l.LeNet(lr=learning_rate, num_classes=10)
        trainer = d2l.HPOTrainer(max_epochs=1, num_gpus=1)
        data = d2l.FashionMNIST(batch_size=batch_size)
        model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
        report = Reporter() 
        for epoch in range(1, max_epochs + 1):
            if epoch == 1:
                # Trainer の状態を初期化する
                trainer.fit(model=model, data=data) 
            else:
                trainer.fit_epoch()
            validation_error = d2l.numpy(trainer.validation_error().cpu())
            report(epoch=epoch, validation_error=float(validation_error))

Syne Tune の ``PythonBackend`` では、依存関係を関数定義の内部で import
する必要があることに注意せよ。

非同期スケジューラ
------------------

まず、同時に試行を評価するワーカー数を定義する。また、ランダムサーチをどれくらいの時間実行したいかを、総壁時計時間の上限として指定する必要がある。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    n_workers = 2  # 利用可能なGPU数以下である必要がある
    
    max_wallclock_time = 12 * 60  # 12分

次に、最適化したい指標と、その指標を最小化するか最大化するかを指定する。つまり、\ ``metric``
は ``report`` コールバックに渡す引数名に対応していなければならない。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    mode = "min"
    metric = "validation_error"

前の例で使った構成空間を用いる。Syne Tune
では、この辞書を使って学習スクリプトに定数属性を渡すこともできる。ここでは
``max_epochs``
を渡すためにこの機能を利用する。さらに、最初に評価する構成を
``initial_config`` で指定する。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    config_space = {
        "learning_rate": loguniform(1e-2, 1),
        "batch_size": randint(32, 256),
        "max_epochs": 10,
    }
    initial_config = {
        "learning_rate": 0.1,
        "batch_size": 128,
    }

次に、ジョブ実行のバックエンドを指定する必要がある。ここでは、並列ジョブがサブプロセスとして実行されるローカルマシン上での分散のみを考える。しかし、大規模なHPOでは、各試行が1つのインスタンス全体を消費するクラスタ環境やクラウド環境でも実行できる。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    trial_backend = PythonBackend(
        tune_function=hpo_objective_lenet_synetune,
        config_space=config_space,
    )

これで、非同期ランダムサーチのスケジューラを作成できる。その動作は
:numref:`sec_api_hpo` の ``BasicScheduler`` に似ている。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    scheduler = RandomSearch(
        config_space,
        metric=metric,
        mode=mode,
        points_to_evaluate=[initial_config],
    )


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    INFO:syne_tune.optimizer.schedulers.fifo:max_resource_level = 10, as inferred from config_space
    INFO:syne_tune.optimizer.schedulers.fifo:Master random_seed = 952911033


Syne Tune には ``Tuner``
もあり、主要な実験ループと管理処理が集中化され、スケジューラとバックエンドのやり取りが仲介される。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
    
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler, 
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        print_update_interval=int(max_wallclock_time * 0.6),
    )

分散HPO実験を実行してみよう。停止条件に従って、およそ12分間実行される。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    tuner.run()


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    INFO:syne_tune.tuner:results of trials will be saved on /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046
    INFO:root:Detected 4 GPUs
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.1 --batch_size 128 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/0/checkpoints
    INFO:syne_tune.tuner:(trial 0) - scheduled config {'learning_rate': 0.1, 'batch_size': 128, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.023130676592269524 --batch_size 126 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/1/checkpoints
    INFO:syne_tune.tuner:(trial 1) - scheduled config {'learning_rate': 0.023130676592269524, 'batch_size': 126, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 0 completed.
    INFO:syne_tune.tuner:Trial trial_id 1 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.312685188079284 --batch_size 86 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/2/checkpoints
    INFO:syne_tune.tuner:(trial 2) - scheduled config {'learning_rate': 0.312685188079284, 'batch_size': 86, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.060054251769848306 --batch_size 109 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/3/checkpoints
    INFO:syne_tune.tuner:(trial 3) - scheduled config {'learning_rate': 0.060054251769848306, 'batch_size': 109, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 3 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.055145478348008786 --batch_size 62 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/4/checkpoints
    INFO:syne_tune.tuner:(trial 4) - scheduled config {'learning_rate': 0.055145478348008786, 'batch_size': 62, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 2 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.3071404894525763 --batch_size 175 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/5/checkpoints
    INFO:syne_tune.tuner:(trial 5) - scheduled config {'learning_rate': 0.3071404894525763, 'batch_size': 175, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 5 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.3105275803794171 --batch_size 57 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/6/checkpoints
    INFO:syne_tune.tuner:(trial 6) - scheduled config {'learning_rate': 0.3105275803794171, 'batch_size': 57, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 4 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.014311777503766778 --batch_size 251 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/7/checkpoints
    INFO:syne_tune.tuner:(trial 7) - scheduled config {'learning_rate': 0.014311777503766778, 'batch_size': 251, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 7 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.08111483940636906 --batch_size 112 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/8/checkpoints
    INFO:syne_tune.tuner:(trial 8) - scheduled config {'learning_rate': 0.08111483940636906, 'batch_size': 112, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 6 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.026212700431916515 --batch_size 188 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/9/checkpoints
    INFO:syne_tune.tuner:(trial 9) - scheduled config {'learning_rate': 0.026212700431916515, 'batch_size': 188, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 8 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.1325223401116132 --batch_size 37 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/10/checkpoints
    INFO:syne_tune.tuner:(trial 10) - scheduled config {'learning_rate': 0.1325223401116132, 'batch_size': 37, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 9 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.2797904076893328 --batch_size 251 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/11/checkpoints
    INFO:syne_tune.tuner:(trial 11) - scheduled config {'learning_rate': 0.2797904076893328, 'batch_size': 251, 'max_epochs': 10}
    INFO:syne_tune.tuner:tuning status (last metric is reported)
     trial_id     status  iter  learning_rate  batch_size  max_epochs  epoch  validation_error  worker-time
            0  Completed    10       0.100000         128          10   10.0          0.276305    70.635542
            1  Completed    10       0.023131         126          10   10.0          0.900276    70.564567
            2  Completed    10       0.312685          86          10   10.0          0.177773    76.995033
            3  Completed    10       0.060054         109          10   10.0          0.336481    72.600473
            4  Completed    10       0.055145          62          10   10.0          0.260045    90.789169
            5  Completed    10       0.307140         175          10   10.0          0.234286    73.124661
            6  Completed    10       0.310528          57          10   10.0          0.150503    94.374417
            7  Completed    10       0.014312         251          10   10.0          0.900077    73.351970
            8  Completed    10       0.081115         112          10   10.0          0.274901    77.767942
            9  Completed    10       0.026213         188          10   10.0          0.899002    73.504148
           10 InProgress     1       0.132522          37          10    1.0          0.417084    11.319020
           11 InProgress     0       0.279790         251          10      -                 -            -
    2 trials running, 10 finished (10 until the end), 436.62s wallclock-time
    
    INFO:syne_tune.tuner:Trial trial_id 11 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.02754593050837886 --batch_size 183 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/12/checkpoints
    INFO:syne_tune.tuner:(trial 12) - scheduled config {'learning_rate': 0.02754593050837886, 'batch_size': 183, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 10 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.01099464105506646 --batch_size 105 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/13/checkpoints
    INFO:syne_tune.tuner:(trial 13) - scheduled config {'learning_rate': 0.01099464105506646, 'batch_size': 105, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 12 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.06457279133966501 --batch_size 252 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/14/checkpoints
    INFO:syne_tune.tuner:(trial 14) - scheduled config {'learning_rate': 0.06457279133966501, 'batch_size': 252, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 13 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.15504555979876097 --batch_size 32 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/15/checkpoints
    INFO:syne_tune.tuner:(trial 15) - scheduled config {'learning_rate': 0.15504555979876097, 'batch_size': 32, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 14 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.7645122914150699 --batch_size 246 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/16/checkpoints
    INFO:syne_tune.tuner:(trial 16) - scheduled config {'learning_rate': 0.7645122914150699, 'batch_size': 246, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 16 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.10143987295366007 --batch_size 247 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/tune_function --tune_function_hash 6d601acbbb4647b6d65d87860369b1db --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046/17/checkpoints
    INFO:syne_tune.tuner:(trial 17) - scheduled config {'learning_rate': 0.10143987295366007, 'batch_size': 247, 'max_epochs': 10}
    INFO:syne_tune.stopping_criterion:reaching max wallclock time (720), stopping there.
    INFO:syne_tune.tuner:Stopping trials that may still be running.
    INFO:syne_tune.tuner:Tuning finished, results of trials can be found on /home/ci/syne-tune/python-entrypoint-2023-08-18-07-32-49-046
    --------------------
    Resource summary (last result is reported):
     trial_id     status  iter  learning_rate  batch_size  max_epochs  epoch  validation_error  worker-time
            0  Completed    10       0.100000         128          10     10          0.276305    70.635542
            1  Completed    10       0.023131         126          10     10          0.900276    70.564567
            2  Completed    10       0.312685          86          10     10          0.177773    76.995033
            3  Completed    10       0.060054         109          10     10          0.336481    72.600473
            4  Completed    10       0.055145          62          10     10          0.260045    90.789169
            5  Completed    10       0.307140         175          10     10          0.234286    73.124661
            6  Completed    10       0.310528          57          10     10          0.150503    94.374417
            7  Completed    10       0.014312         251          10     10          0.900077    73.351970
            8  Completed    10       0.081115         112          10     10          0.274901    77.767942
            9  Completed    10       0.026213         188          10     10          0.899002    73.504148
           10  Completed    10       0.132522          37          10     10          0.161034   119.344017
           11  Completed    10       0.279790         251          10     10          0.445979    68.693048
           12  Completed    10       0.027546         183          10     10          0.803873    67.717216
           13  Completed    10       0.010995         105          10     10          0.859286    68.206863
           14  Completed    10       0.064573         252          10     10          0.551479    58.861913
           15 InProgress     5       0.155046          32          10      5          0.193590    81.370336
           16  Completed    10       0.764512         246          10     10          0.178272    57.425193
           17 InProgress     2       0.101440         247          10      2          0.825418    10.832541
    2 trials running, 16 finished (16 until the end), 722.85s wallclock-time
    
    validation_error: best 0.15050286054611206 for trial-id 6
    --------------------


評価したすべてのハイパーパラメータ構成のログは、後で分析できるように保存される。チューニングジョブの実行中いつでも、これまでに得られた結果を簡単に取得し、incumbent
の軌跡を描画できる。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    d2l.set_figsize()
    tuning_experiment = load_experiment(tuner.name)
    tuning_experiment.plot()


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    WARNING:matplotlib.legend:No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.


.. figure:: output_rs-async_cdba80_19_1.svg


非同期最適化プロセスの可視化
----------------------------

以下では、各試行の学習曲線（プロット内の各色が1つの試行を表す）が非同期最適化プロセスの間にどのように変化するかを可視化する。任意の時点で、同時に実行されている試行数はワーカー数と同じである。ある試行が終了すると、他の試行の終了を待たずに、すぐ次の試行を開始する。非同期スケジューリングでは、ワーカーの遊休時間を最小限に抑えられる。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    d2l.set_figsize([6, 2.5])
    results = tuning_experiment.results
    
    for trial_id in results.trial_id.unique():
        df = results[results["trial_id"] == trial_id]
        d2l.plt.plot(
            df["st_tuner_time"],
            df["validation_error"],
            marker="o"
        )
        
    d2l.plt.xlabel("wall-clock time")
    d2l.plt.ylabel("objective function")


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    Text(0, 0.5, 'objective function')


.. figure:: output_rs-async_cdba80_21_1.svg


まとめ
------

試行を並列資源に分散することで、ランダムサーチの待ち時間を大幅に短縮できる。一般に、同期スケジューリングと非同期スケジューリングを区別する。同期スケジューリングでは、前のバッチが終了してから新しいハイパーパラメータ構成のバッチをサンプリングする。遅い試行、つまり他の試行より完了に時間がかかる試行があると、ワーカーは同期点で待たなければならない。非同期スケジューリングでは、資源が利用可能になり次第、新しいハイパーパラメータ構成を評価するため、任意の時点で全ワーカーが稼働していることが保証される。ランダムサーチは非同期に容易に分散でき、実際のアルゴリズム自体を変更する必要はないが、他の手法では追加の修正が必要になる。

演習
----

1. :numref:`sec_dropout` で実装され、 :numref:`sec_api_hpo`
   の演習1で使用した ``DropoutMLP`` モデルを考える。

   1. Syne Tune で使う目的関数 ``hpo_objective_dropoutmlp_synetune``
      を実装しなさい。関数が各エポック後に検証誤差を報告することを確認しなさい。
   2. :numref:`sec_api_hpo`
      の演習1の設定を用いて、ランダムサーチとベイズ最適化を比較しなさい。SageMaker
      を使う場合は、Syne Tune
      のベンチマーク機能を利用して実験を並列実行してよい。ヒント:
      ベイズ最適化は
      ``syne_tune.optimizer.baselines.BayesianOptimization``
      として提供されている。
   3. この演習では、少なくとも4つのCPUコアを持つインスタンスで実行する必要がある。上で使った手法のうち1つ（ランダムサーチ、ベイズ最適化）について、\ ``n_workers=1``\ 、\ ``n_workers=2``\ 、\ ``n_workers=4``
      で実験を行い、結果（incumbent
      の軌跡）を比較しなさい。少なくともランダムサーチでは、ワーカー数に対して線形スケーリングが観測されるはずである。ヒント:
      安定した結果を得るには、それぞれ複数回繰り返して平均を取る必要があるかもしれない。

2. *発展*. この演習の目標は、Syne Tune
   に新しいスケジューラを実装することである。

   1. `d2lbook <https://github.com/d2l-ai/d2l-en/blob/master/INFO.md#installation-for-developers>`__
      と
      `syne-tune <https://syne-tune.readthedocs.io/en/latest/getting_started.html>`__
      の両方のソースを含む仮想環境を作成しなさい。
   2. :numref:`sec_api_hpo` の演習2での ``LocalSearcher`` を、Syne
      Tune の新しいサーチャーとして実装しなさい。ヒント:
      `このチュートリアル <https://syne-tune.readthedocs.io/en/latest/tutorials/developer/README.html>`__
      を読んでよ。あるいは、この
      `例 <https://syne-tune.readthedocs.io/en/latest/examples.html#launch-hpo-experiment-with-home-made-scheduler>`__
      に従ってもよいである。
   3. 新しく実装した ``LocalSearcher`` と ``DropoutMLP``
      ベンチマーク上の ``RandomSearch`` を比較しなさい。