.. _sec_sh_async:

非同期逐次半減法
================


:numref:`sec_rs_async` で述べたように、HPO
はハイパーパラメータ構成の評価を複数のインスタンスに分散したり、1つのインスタンス上の複数の
CPU/GPU
に分散したりすることで高速化が可能である。しかし、ランダムサーチと比較すると、分散環境で逐次半減法（SH）を非同期に実行することは容易ではない。次にどの構成を実行するかを決定する前に、まず現在の
rung レベルにあるすべての観測結果を集める必要があるためである。これは各
rung レベルでワーカーを同期させることを意味する。たとえば、最下位の rung
レベル :math:`r_{\mathrm{min}}` では、まず :math:`N = \eta^K`
個すべての構成を評価し、その後でその :math:`\frac{1}{\eta}` を次の rung
レベルへ昇格させる必要がある。

どの分散システムにおいても、同期は通常ワーカーのアイドル時間を意味する。まず、ハイパーパラメータ構成間で学習時間に大きなばらつきが頻繁に観察される。たとえば、層ごとのフィルタ数がハイパーパラメータである場合、フィルタ数の少ないネットワークは多いネットワークよりも学習が早く終了するため、遅い処理に引きずられてワーカーが待機を余儀なくされる。さらに、ある
rung
レベルのスロット数がワーカー数の倍数であるとは限らず、その場合には一部のワーカーが1バッチ分まるごと遊休となる場合もある。

図 :numref:`synchronous_sh` は、2つのワーカーを用いた :math:`\eta=2`
の同期 SH のスケジューリングを示す。まず Trial-0 と Trial-1
を1エポックずつ評価し、それらが終了するとすぐに次の2つの trial
へ進む。Trial-2 が終了するのを待たなければならず、これには他の trial
よりかなり長い時間がかかる。その後で初めて、最良の2つの trial、すなわち
Trial-0 と Trial-3 を次の rung レベルへ昇格可能となる。これにより
Worker-1 はアイドル状態となる。次に Rung 1 へ進む。ここでも Trial-3 は
Trial-0 より時間がかかるため、Worker-0
に追加の待機時間が発生する。Rung-2 に到達すると、最良の trial である
Trial-0 だけが残り、1つのワーカーのみが使用される。その間に Worker-1
が遊休状態を避けるため、SH
の多くの実装ではすでに次のラウンドへ進み、最初の rung で新しい
trial（たとえば Trial-4）の評価を開始する。

.. _synchronous_sh:

.. figure:: ../img/sync_sh.svg

   2つのワーカーを用いた同期逐次半減法。


非同期逐次半減法（ASHA） :cite:`li-arxiv18` は、SH
を非同期並列シナリオに適応させたものである。ASHA
の基本的な考え方は、現在の rung レベルで少なくとも :math:`\eta`
個の観測結果が集まったら、すぐに構成を次の rung
レベルへ昇格させることである。この決定規則は、最適とは言えない昇格を招く可能性がある。つまり、同じ
rung レベルの他の多くと比較すると、後から見ればあまり良くない構成を次の
rung
レベルへ昇格させてしまう可能性がある。一方で、この方法によりすべての同期点を除去できる。実際には、このような最初の段階での最適でない昇格が性能に与える影響は小さいことが多い。これは、ハイパーパラメータ構成の順位付けが
rung レベル間でかなり一貫していることが多いだけでなく、rung
が時間とともに大きくなり、そのレベルでのメトリック値の分布をよりよく反映するためである。ワーカーが空いているのに昇格できる構成がない場合は、\ :math:`r = r_{\mathrm{min}}`\ 、すなわち最初の
rung レベルで新しい構成を開始する。

:numref:`asha` は、同じ構成群に対する ASHA
のスケジューリングを示す。Trial-1 が終了すると、2つの trial（すなわち
Trial-0 と Trial-1）の結果を集め、そのうち良い方（Trial-0）をすぐに次の
rung レベルへ昇格させる。Trial-0 が rung 1
で終了した後、さらに昇格を支えるにはその rung にある trial
が不足している。したがって、rung 0 に戻って Trial-3
の評価を継続する。Trial-3 が終了した時点では、Trial-2
はまだ保留中である。この時点で rung 0 では3つの trial
が評価済みであり、rung 1 では1つの trial がすでに評価済みである。Trial-3
は rung 0 で Trial-0 より成績が悪く、かつ :math:`\eta=2`
なので、まだ新しい trial を昇格させることはできず、代わりに Worker-1 は
Trial-4 を最初から開始する。しかし、Trial-2 が終了して Trial-3
より悪いスコアとなった場合、後者が rung 1 へ昇格する。その後、rung 1
で2件の評価が集まったので、Trial-0 を rung 2
へ昇格が可能となる。同時に、Worker-1 は rung 0 で新しい trial（すなわち
Trial-5）の評価を継続する。

.. _asha:

.. figure:: ../img/asha.svg

   2つのワーカーを用いた非同期逐次半減法（ASHA）。


.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    from d2l import torch as d2l
    import logging
    logging.basicConfig(level=logging.INFO)
    import matplotlib.pyplot as plt
    from syne_tune.config_space import loguniform, randint
    from syne_tune.backend.python_backend import PythonBackend
    from syne_tune.optimizer.baselines import ASHA
    from syne_tune import Tuner, StoppingCriterion
    from syne_tune.experiments import load_experiment


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    INFO:root:SageMakerBackend is not imported since dependencies are missing. You can install them with
       pip install 'syne-tune[extra]'
    AWS dependencies are not imported since dependencies are missing. You can install them with
       pip install 'syne-tune[aws]'
    or (for everything)
       pip install 'syne-tune[extra]'
    AWS dependencies are not imported since dependencies are missing. You can install them with
       pip install 'syne-tune[aws]'
    or (for everything)
       pip install 'syne-tune[extra]'
    INFO:root:Ray Tune schedulers and searchers are not imported since dependencies are missing. You can install them with
       pip install 'syne-tune[raytune]'
    or (for everything)
       pip install 'syne-tune[extra]'


目的関数
--------

:numref:`sec_rs_async` と同じ目的関数を *Syne Tune* で利用する。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    def hpo_objective_lenet_synetune(learning_rate, batch_size, max_epochs):
        from d2l import torch as d2l
        from syne_tune import Reporter
    
        model = d2l.LeNet(lr=learning_rate, num_classes=10)
        trainer = d2l.HPOTrainer(max_epochs=1, num_gpus=1)
        data = d2l.FashionMNIST(batch_size=batch_size)
        model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
        report = Reporter()
        for epoch in range(1, max_epochs + 1):
            if epoch == 1:
                # Initialize the state of Trainer
                trainer.fit(model=model, data=data)
            else:
                trainer.fit_epoch()
            validation_error = d2l.numpy(trainer.validation_error().cpu())
            report(epoch=epoch, validation_error=float(validation_error))

また、以前と同じ構成空間を用いる。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    min_number_of_epochs = 2
    max_number_of_epochs = 10
    eta = 2
    
    config_space = {
        "learning_rate": loguniform(1e-2, 1),
        "batch_size": randint(32, 256),
        "max_epochs": max_number_of_epochs,
    }
    initial_config = {
        "learning_rate": 0.1,
        "batch_size": 128,
    }

非同期スケジューラ
------------------

まず、trial を並列に評価するワーカー数を定義する。また、総 wall-clock
時間の上限を定めて、ランダムサーチをどれだけ実行するかを指定する必要がある。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    n_workers = 2  # Needs to be <= the number of available GPUs
    max_wallclock_time = 12 * 60  # 12 minutes

ASHA
を実行するコードは、非同期ランダムサーチで行ったことの単純な変形である。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    mode = "min"
    metric = "validation_error"
    resource_attr = "epoch"
    
    scheduler = ASHA(
        config_space,
        metric=metric,
        mode=mode,
        points_to_evaluate=[initial_config],
        max_resource_attr="max_epochs",
        resource_attr=resource_attr,
        grace_period=min_number_of_epochs,
        reduction_factor=eta,
    )


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    INFO:syne_tune.optimizer.schedulers.fifo:max_resource_level = 10, as inferred from config_space
    INFO:syne_tune.optimizer.schedulers.fifo:Master random_seed = 4141536384


ここで ``metric`` と ``resource_attr`` は ``report``
コールバックで用いられるキー名を指定し、\ ``max_resource_attr``
は目的関数への入力のうちどれが :math:`r_{\mathrm{max}}`
に対応するかを示す。さらに、\ ``grace_period`` は
:math:`r_{\mathrm{min}}` を与え、\ ``reduction_factor`` は :math:`\eta`
である。以前と同様に Syne Tune を実行可能である（約12分を要する）。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    trial_backend = PythonBackend(
        tune_function=hpo_objective_lenet_synetune,
        config_space=config_space,
    )
    
    stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        print_update_interval=int(max_wallclock_time * 0.6),
    )
    tuner.run()


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    INFO:syne_tune.tuner:results of trials will be saved on /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065
    INFO:root:Detected 4 GPUs
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.1 --batch_size 128 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/0/checkpoints
    INFO:syne_tune.tuner:(trial 0) - scheduled config {'learning_rate': 0.1, 'batch_size': 128, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.016393465295512565 --batch_size 109 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/1/checkpoints
    INFO:syne_tune.tuner:(trial 1) - scheduled config {'learning_rate': 0.016393465295512565, 'batch_size': 109, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.8536433102570197 --batch_size 249 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/2/checkpoints
    INFO:syne_tune.tuner:(trial 2) - scheduled config {'learning_rate': 0.8536433102570197, 'batch_size': 249, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.02238278285575226 --batch_size 241 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/3/checkpoints
    INFO:syne_tune.tuner:(trial 3) - scheduled config {'learning_rate': 0.02238278285575226, 'batch_size': 241, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 2 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.02265916956093905 --batch_size 104 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/4/checkpoints
    INFO:syne_tune.tuner:(trial 4) - scheduled config {'learning_rate': 0.02265916956093905, 'batch_size': 104, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.08152606473767814 --batch_size 144 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/5/checkpoints
    INFO:syne_tune.tuner:(trial 5) - scheduled config {'learning_rate': 0.08152606473767814, 'batch_size': 144, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.03286717277851087 --batch_size 230 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/6/checkpoints
    INFO:syne_tune.tuner:(trial 6) - scheduled config {'learning_rate': 0.03286717277851087, 'batch_size': 230, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.2767625103368729 --batch_size 210 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/7/checkpoints
    INFO:syne_tune.tuner:(trial 7) - scheduled config {'learning_rate': 0.2767625103368729, 'batch_size': 210, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.20515181717902709 --batch_size 136 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/8/checkpoints
    INFO:syne_tune.tuner:(trial 8) - scheduled config {'learning_rate': 0.20515181717902709, 'batch_size': 136, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.01205522192914604 --batch_size 61 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/9/checkpoints
    INFO:syne_tune.tuner:(trial 9) - scheduled config {'learning_rate': 0.01205522192914604, 'batch_size': 61, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.06812982725901803 --batch_size 225 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/10/checkpoints
    INFO:syne_tune.tuner:(trial 10) - scheduled config {'learning_rate': 0.06812982725901803, 'batch_size': 225, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.15408580663479204 --batch_size 184 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/11/checkpoints
    INFO:syne_tune.tuner:(trial 11) - scheduled config {'learning_rate': 0.15408580663479204, 'batch_size': 184, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.03319086816357744 --batch_size 215 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/12/checkpoints
    INFO:syne_tune.tuner:(trial 12) - scheduled config {'learning_rate': 0.03319086816357744, 'batch_size': 215, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.5871483249986997 --batch_size 131 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/13/checkpoints
    INFO:syne_tune.tuner:(trial 13) - scheduled config {'learning_rate': 0.5871483249986997, 'batch_size': 131, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.04405320051007572 --batch_size 64 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/14/checkpoints
    INFO:syne_tune.tuner:(trial 14) - scheduled config {'learning_rate': 0.04405320051007572, 'batch_size': 64, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.15163221599975787 --batch_size 51 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/15/checkpoints
    INFO:syne_tune.tuner:(trial 15) - scheduled config {'learning_rate': 0.15163221599975787, 'batch_size': 51, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.02010844681019881 --batch_size 197 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/16/checkpoints
    INFO:syne_tune.tuner:(trial 16) - scheduled config {'learning_rate': 0.02010844681019881, 'batch_size': 197, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.030927553495777638 --batch_size 131 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/17/checkpoints
    INFO:syne_tune.tuner:(trial 17) - scheduled config {'learning_rate': 0.030927553495777638, 'batch_size': 131, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.06962532946104748 --batch_size 228 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/18/checkpoints
    INFO:syne_tune.tuner:(trial 18) - scheduled config {'learning_rate': 0.06962532946104748, 'batch_size': 228, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.21744469415997278 --batch_size 240 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/19/checkpoints
    INFO:syne_tune.tuner:(trial 19) - scheduled config {'learning_rate': 0.21744469415997278, 'batch_size': 240, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.06936041631165406 --batch_size 106 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/20/checkpoints
    INFO:syne_tune.tuner:(trial 20) - scheduled config {'learning_rate': 0.06936041631165406, 'batch_size': 106, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.42010437996754035 --batch_size 181 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/21/checkpoints
    INFO:syne_tune.tuner:(trial 21) - scheduled config {'learning_rate': 0.42010437996754035, 'batch_size': 181, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.618264683792025 --batch_size 230 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/22/checkpoints
    INFO:syne_tune.tuner:(trial 22) - scheduled config {'learning_rate': 0.618264683792025, 'batch_size': 230, 'max_epochs': 10}
    INFO:syne_tune.tuner:tuning status (last metric is reported)
     trial_id     status  iter  learning_rate  batch_size  max_epochs  epoch  validation_error  worker-time
            0    Stopped    10       0.100000         128          10     10          0.268790    71.102878
            1    Stopped     2       0.016393         109          10      2          0.899935    13.896730
            2  Completed    10       0.853643         249          10     10          0.182956    66.696980
            3    Stopped     2       0.022383         241          10      2          0.877944    12.613355
            4    Stopped     2       0.022659         104          10      2          0.900327    14.157292
            5    Stopped     2       0.081526         144          10      2          0.900298    13.433055
            6    Stopped     4       0.032867         230          10      4          0.915217    24.955152
            7    Stopped    10       0.276763         210          10     10          0.233890    60.715368
            8    Stopped     8       0.205152         136          10      8          0.325495    47.716144
            9    Stopped     2       0.012055          61          10      2          0.899991    16.423611
           10    Stopped     4       0.068130         225          10      4          0.899753    23.875174
           11    Stopped     4       0.154086         184          10      4          0.453026    23.736529
           12    Stopped     2       0.033191         215          10      2          0.899906    12.479618
           13    Stopped     8       0.587148         131          10      8          0.375490    51.454139
           14    Stopped     2       0.044053          64          10      2          0.899881    15.800886
           15    Stopped    10       0.151632          51          10     10          0.188712   106.156809
           16    Stopped     2       0.020108         197          10      2          0.900156    11.984264
           17    Stopped     2       0.030928         131          10      2          0.900078    11.954435
           18    Stopped     2       0.069625         228          10      2          0.900124    11.464556
           19    Stopped     2       0.217445         240          10      2          0.899950    11.154259
           20    Stopped     2       0.069360         106          10      2          0.899923    12.920582
           21 InProgress     8       0.420104         181          10      8          0.234927    51.076726
           22 InProgress     7       0.618265         230          10      7          0.240343    43.615625
    2 trials running, 21 finished (1 until the end), 436.97s wallclock-time
    
    INFO:syne_tune.tuner:Trial trial_id 21 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.16400286598134425 --batch_size 87 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/23/checkpoints
    INFO:syne_tune.tuner:(trial 23) - scheduled config {'learning_rate': 0.16400286598134425, 'batch_size': 87, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 22 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.16603894676453146 --batch_size 132 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/24/checkpoints
    INFO:syne_tune.tuner:(trial 24) - scheduled config {'learning_rate': 0.16603894676453146, 'batch_size': 132, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.12085342681840087 --batch_size 161 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/25/checkpoints
    INFO:syne_tune.tuner:(trial 25) - scheduled config {'learning_rate': 0.12085342681840087, 'batch_size': 161, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.020613727673297316 --batch_size 69 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/26/checkpoints
    INFO:syne_tune.tuner:(trial 26) - scheduled config {'learning_rate': 0.020613727673297316, 'batch_size': 69, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.7417151784363999 --batch_size 57 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/27/checkpoints
    INFO:syne_tune.tuner:(trial 27) - scheduled config {'learning_rate': 0.7417151784363999, 'batch_size': 57, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 24 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.5851545371142973 --batch_size 43 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/28/checkpoints
    INFO:syne_tune.tuner:(trial 28) - scheduled config {'learning_rate': 0.5851545371142973, 'batch_size': 43, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 27 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.012313981089293918 --batch_size 167 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/29/checkpoints
    INFO:syne_tune.tuner:(trial 29) - scheduled config {'learning_rate': 0.012313981089293918, 'batch_size': 167, 'max_epochs': 10}
    INFO:syne_tune.tuner:Trial trial_id 28 completed.
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.18350672552566955 --batch_size 49 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/30/checkpoints
    INFO:syne_tune.tuner:(trial 30) - scheduled config {'learning_rate': 0.18350672552566955, 'batch_size': 49, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.2726593425130573 --batch_size 128 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/31/checkpoints
    INFO:syne_tune.tuner:(trial 31) - scheduled config {'learning_rate': 0.2726593425130573, 'batch_size': 128, 'max_epochs': 10}
    INFO:root:running subprocess with command: /usr/bin/python /home/ci/.local/lib/python3.8/site-packages/syne_tune/backend/python_backend/python_entrypoint.py --learning_rate 0.6661438697130982 --batch_size 106 --max_epochs 10 --tune_function_root /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/tune_function --tune_function_hash 638f9e6b337c8f0d289f336d612fb672 --st_checkpoint_dir /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065/32/checkpoints
    INFO:syne_tune.tuner:(trial 32) - scheduled config {'learning_rate': 0.6661438697130982, 'batch_size': 106, 'max_epochs': 10}
    INFO:syne_tune.stopping_criterion:reaching max wallclock time (720), stopping there.
    INFO:syne_tune.tuner:Stopping trials that may still be running.
    INFO:syne_tune.tuner:Tuning finished, results of trials can be found on /home/ci/syne-tune/python-entrypoint-2023-08-18-07-20-36-065
    --------------------
    Resource summary (last result is reported):
     trial_id     status  iter  learning_rate  batch_size  max_epochs  epoch  validation_error  worker-time
            0    Stopped    10       0.100000         128          10   10.0          0.268790    71.102878
            1    Stopped     2       0.016393         109          10    2.0          0.899935    13.896730
            2  Completed    10       0.853643         249          10   10.0          0.182956    66.696980
            3    Stopped     2       0.022383         241          10    2.0          0.877944    12.613355
            4    Stopped     2       0.022659         104          10    2.0          0.900327    14.157292
            5    Stopped     2       0.081526         144          10    2.0          0.900298    13.433055
            6    Stopped     4       0.032867         230          10    4.0          0.915217    24.955152
            7    Stopped    10       0.276763         210          10   10.0          0.233890    60.715368
            8    Stopped     8       0.205152         136          10    8.0          0.325495    47.716144
            9    Stopped     2       0.012055          61          10    2.0          0.899991    16.423611
           10    Stopped     4       0.068130         225          10    4.0          0.899753    23.875174
           11    Stopped     4       0.154086         184          10    4.0          0.453026    23.736529
           12    Stopped     2       0.033191         215          10    2.0          0.899906    12.479618
           13    Stopped     8       0.587148         131          10    8.0          0.375490    51.454139
           14    Stopped     2       0.044053          64          10    2.0          0.899881    15.800886
           15    Stopped    10       0.151632          51          10   10.0          0.188712   106.156809
           16    Stopped     2       0.020108         197          10    2.0          0.900156    11.984264
           17    Stopped     2       0.030928         131          10    2.0          0.900078    11.954435
           18    Stopped     2       0.069625         228          10    2.0          0.900124    11.464556
           19    Stopped     2       0.217445         240          10    2.0          0.899950    11.154259
           20    Stopped     2       0.069360         106          10    2.0          0.899923    12.920582
           21  Completed    10       0.420104         181          10   10.0          0.213122    64.977173
           22  Completed    10       0.618265         230          10   10.0          0.190514    63.500903
           23    Stopped     4       0.164003          87          10    4.0          0.345055    29.901440
           24  Completed    10       0.166039         132          10   10.0          0.230829    72.114988
           25    Stopped     2       0.120853         161          10    2.0          0.900627    13.288419
           26    Stopped     2       0.020614          69          10    2.0          0.899995    16.607311
           27  Completed    10       0.741715          57          10   10.0          0.124957    88.111582
           28  Completed    10       0.585155          43          10   10.0          0.132764   103.217489
           29    Stopped     2       0.012314         167          10    2.0          0.899969    12.800696
           30 InProgress     7       0.183507          49          10    7.0          0.176705    66.790888
           31    Stopped    10       0.272659         128          10   10.0          0.201048    71.773432
           32 InProgress     0       0.666144         106          10      -                 -            -
    2 trials running, 31 finished (6 until the end), 723.55s wallclock-time
    
    validation_error: best 0.12495654821395874 for trial-id 27
    --------------------


ここでは、性能の低い trial を早期停止する ASHA
の変種を実行していることに留意されたい。これは、
:numref:`sec_mf_hpo_sh`
における実装とは異なる。そちらでは、各学習ジョブは固定の ``max_epochs``
で開始される。後者の場合、10エポックすべてに到達する性能の良い trial
は、まず1エポック、次に2エポック、その次に4エポック、さらに8エポックと、毎回最初から学習し直す必要がある。この種の一時停止・再開型のスケジューリングは、各エポック後に学習状態をチェックポイント化することで効率的に実装可能であるが、ここではその追加の複雑さは避ける。実験が終了したら、結果を取得してプロット可能である。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    d2l.set_figsize()
    e = load_experiment(tuner.name)
    e.plot()


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    WARNING:matplotlib.legend:No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.


.. figure:: output_sh-async_950d48_13_1.svg


最適化過程の可視化
------------------

再度、各 trial の学習曲線を可視化する（プロット中の各色は1つの trial
を表す）。これを :numref:`sec_rs_async`
の非同期ランダムサーチと比較されたい。 :numref:`sec_mf_hpo`
で逐次半減法について述べたように、trial
の大半は1エポックまたは2エポック（\ :math:`r_{\mathrm{min}}` または
:math:`\eta * r_{\mathrm{min}}`\ ）で停止する。しかし、各 trial
は1エポックあたりに必要な時間が異なるため、同じ時点では停止しない。もし
ASHA ではなく標準的な逐次半減法を実行する場合、構成を次の rung
レベルへ昇格させる前に、ワーカーを同期させる必要がある。

.. raw:: latex

   \diilbookstyleinputcell

.. code:: python

    d2l.set_figsize([6, 2.5])
    results = e.results
    for trial_id in results.trial_id.unique():
        df = results[results["trial_id"] == trial_id]
        d2l.plt.plot(
            df["st_tuner_time"],
            df["validation_error"],
            marker="o"
        )
    d2l.plt.xlabel("wall-clock time")
    d2l.plt.ylabel("objective function")


.. raw:: latex

   \diilbookstyleoutputcell

.. parsed-literal::
    :class: output

    Text(0, 0.5, 'objective function')


.. figure:: output_sh-async_950d48_15_1.svg


まとめ
------

ランダムサーチと比べると、逐次半減法を非同期分散環境で実行するのはそれほど自明ではない。同期点を避けるために、多少誤ったものを昇格させることになっても、できるだけ速く構成を次の
rung
レベルへ昇格させる。実際には、これは通常それほど大きな悪影響を及ぼさない。非同期スケジューリングと同期スケジューリングの差による利得は、最適でない意思決定による損失よりも通常ははるかに大きいからである。