10.6. エンコーダ–デコーダアーキテクチャ¶

一般に、機械翻訳 (10.5 章) のようなシーケンス・ツー・シーケンス問題では、入力と出力は長さがさまざまで、互いに対応が取れていない。この種のデータを扱う標準的な方法は、 2つの主要な構成要素からなる エンコーダ–デコーダアーキテクチャ (図 10.6.1) を設計することである。すなわち、可変長の系列を入力として受け取る エンコーダと、エンコードされた入力とターゲット系列の左側の文脈を受け取り、ターゲット系列における次のトークンを予測する条件付き言語モデルとして働く デコーダである。

../_images/encoder-decoder.svg — 図 10.6.1 エンコーダ–デコーダアーキテクチャ。¶

英仏機械翻訳を例に考えてみよう。英語の入力系列 “They”, “are”, “watching”, “.” が与えられると、このエンコーダ–デコーダアーキテクチャはまず可変長の入力を状態にエンコードし、その後、その状態をデコードして翻訳された系列をトークンごとに出力として生成する。 “Ils”, “regardent”, “.”。エンコーダ–デコーダアーキテクチャは後続の節でさまざまなシーケンス・ツー・シーケンスモデルの基盤となるため、この節ではこのアーキテクチャを後で実装するためのインターフェースに落とし込みる。

pytorch mxnet jax tensorflow

from d2l import torch as d2l
from torch import nn

from d2l import mxnet as d2l
from mxnet.gluon import nn

from d2l import jax as d2l
from flax import linen as nn

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

from d2l import tensorflow as d2l
import tensorflow as tf

10.6.1. エンコーダ¶

エンコーダのインターフェースでは、エンコーダが可変長系列を入力 X として受け取ることだけを指定する。実装は、この基底 Encoder クラスを継承する任意のモデルによって提供される。

pytorch mxnet jax tensorflow

class Encoder(nn.Module):  #@save
    """The base encoder interface for the encoder--decoder architecture."""
    def __init__(self):
        super().__init__()

    # 後で追加の引数（例: パディングを除いた長さ）があり得る
    def forward(self, X, *args):
        raise NotImplementedError

class Encoder(nn.Block):  #@save
    """The base encoder interface for the encoder--decoder architecture."""
    def __init__(self):
        super().__init__()

    # 後で追加の引数（例: パディングを除いた長さ）があり得る
    def forward(self, X, *args):
        raise NotImplementedError

class Encoder(nn.Module):  #@save
    """The base encoder interface for the encoder--decoder architecture."""
    def setup(self):
        raise NotImplementedError

    # 後で追加の引数（例: パディングを除いた長さ）があり得る
    def __call__(self, X, *args):
        raise NotImplementedError

class Encoder(tf.keras.layers.Layer):  #@save
    """The base encoder interface for the encoder--decoder architecture."""
    def __init__(self):
        super().__init__()

    # 後で追加の引数（例: パディングを除いた長さ）があり得る
    def call(self, X, *args):
        raise NotImplementedError

10.6.2. デコーダ¶

以下のデコーダのインターフェースでは、エンコーダの出力 (enc_all_outputs) をエンコードされた状態に変換するための追加の init_state メソッドを加える。このステップでは、入力の有効長など、追加の入力が必要になる場合があることに注意されたい。

10.5 章で説明した。可変長系列をトークンごとに生成するために、デコーダは毎回、入力（たとえば、1つ前の時刻ステップで生成されたトークン）とエンコードされた状態を現在の時刻ステップにおける出力トークンへ写像する。

pytorch mxnet jax tensorflow

class Decoder(nn.Module):  #@save
    """The base decoder interface for the encoder--decoder architecture."""
    def __init__(self):
        super().__init__()

    # 後で追加の引数（例: パディングを除いた長さ）があり得る
    def init_state(self, enc_all_outputs, *args):
        raise NotImplementedError

    def forward(self, X, state):
        raise NotImplementedError

class Decoder(nn.Block):  #@save
    """The base decoder interface for the encoder--decoder architecture."""
    def __init__(self):
        super().__init__()

    # 後で追加の引数（例: パディングを除いた長さ）があり得る
    def init_state(self, enc_all_outputs, *args):
        raise NotImplementedError

    def forward(self, X, state):
        raise NotImplementedError

class Decoder(nn.Module):  #@save
    """The base decoder interface for the encoder--decoder architecture."""
    def setup(self):
        raise NotImplementedError

    # 後で追加の引数（例: パディングを除いた長さ）があり得る
    def init_state(self, enc_all_outputs, *args):
        raise NotImplementedError

    def __call__(self, X, state):
        raise NotImplementedError

class Decoder(tf.keras.layers.Layer):  #@save
    """The base decoder interface for the encoder--decoder architecture."""
    def __init__(self):
        super().__init__()

    # 後で追加の引数（例: パディングを除いた長さ）があり得る
    def init_state(self, enc_all_outputs, *args):
        raise NotImplementedError

    def call(self, X, state):
        raise NotImplementedError

10.6.3. エンコーダとデコーダを組み合わせる¶

順伝播では、エンコーダの出力がエンコードされた状態を生成するために使われ、この状態はデコーダによってその入力の1つとしてさらに利用される。

pytorch mxnet jax tensorflow

class EncoderDecoder(d2l.Classifier):  #@save
    """The base class for the encoder--decoder architecture."""
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, enc_X, dec_X, *args):
        enc_all_outputs = self.encoder(enc_X, *args)
        dec_state = self.decoder.init_state(enc_all_outputs, *args)
        # デコーダの出力のみを返す
        return self.decoder(dec_X, dec_state)[0]

class EncoderDecoder(d2l.Classifier):  #@save
    """The base class for the encoder--decoder architecture."""
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def forward(self, enc_X, dec_X, *args):
        enc_all_outputs = self.encoder(enc_X, *args)
        dec_state = self.decoder.init_state(enc_all_outputs, *args)
        # デコーダの出力のみを返す
        return self.decoder(dec_X, dec_state)[0]

class EncoderDecoder(d2l.Classifier):  #@save
    """The base class for the encoder--decoder architecture."""
    encoder: nn.Module
    decoder: nn.Module
    training: bool

    def __call__(self, enc_X, dec_X, *args):
        enc_all_outputs = self.encoder(enc_X, *args, training=self.training)
        dec_state = self.decoder.init_state(enc_all_outputs, *args)
        # デコーダの出力のみを返す
        return self.decoder(dec_X, dec_state, training=self.training)[0]

class EncoderDecoder(d2l.Classifier):  #@save
    """The base class for the encoder--decoder architecture."""
    def __init__(self, encoder, decoder):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

    def call(self, enc_X, dec_X, *args):
        enc_all_outputs = self.encoder(enc_X, *args, training=True)
        dec_state = self.decoder.init_state(enc_all_outputs, *args)
        # デコーダの出力のみを返す
        return self.decoder(dec_X, dec_state, training=True)[0]

次の節では、このエンコーダ–デコーダアーキテクチャに基づいて RNNを用いてシーケンス・ツー・シーケンスモデルを設計する方法を見る。

10.6.4. まとめ¶

エンコーダ–デコーダアーキテクチャは、入力と出力の両方が可変長系列からなる場合に対応できるため、機械翻訳のようなシーケンス・ツー・シーケンス問題に適している。エンコーダは可変長系列を入力として受け取り、それを固定形状の状態へ変換する。デコーダは、その固定形状のエンコードされた状態を可変長系列へ写像する。

10.6.5. 演習¶

ニューラルネットワークを用いてエンコーダ–デコーダアーキテクチャを実装すると仮定する。エンコーダとデコーダは同じ種類のニューラルネットワークでなければならないだろうか。
機械翻訳以外に、エンコーダ–デコーダアーキテクチャを適用できる別の応用例を考えられるか。