オートエンコーダ

オートエンコーダ（自己じこ符号ふごう化か器き、英えい: autoencoder）とは、機械きかい学習がくしゅうにおいて、ニューラルネットワークを使用しようした次元じげん圧縮あっしゅくのためのアルゴリズム。2006年ねんにジェフリー・ヒントンらが提案ていあんした^[1]。

概要がいよう[編集へんしゅう]

オートエンコーダは3層そうニューラルネットにおいて、入力にゅうりょく層そうと出力しゅつりょく層そうに同おなじデータを用もちいて教師きょうしなし学習がくしゅうさせたものである。教師きょうしデータが実じつ数値すうちで値域ちいきがない場合ばあい、出力しゅつりょく層そうの活性かっせい化か関数かんすうは恒等こうとう写像しゃぞう、（すなわち出力しゅつりょく層そうは線形せんけい変換へんかんになる）が選えらばれることが多おおい。中間なかま層そうの活性かっせい化か関数かんすうも恒等こうとう写像しゃぞうを選えらぶと結果けっかは主成分しゅせいぶん分析ぶんせきとほぼ一致いっちする。実用じつよう上じょうでは、入力にゅうりょくと出力しゅつりょくの差分さぶんをとることで、異常いじょう検知けんちに利用りようされている。

特性とくせいと限界げんかい[編集へんしゅう]

オートエンコーダは次元じげん圧縮あっしゅくに必要ひつような特性とくせいを有ゆうするように設計せっけいされている。

オートエンコーダは中ちゅう間あいだ層そうの次元じげん数すう $d_{m}$ が入出力にゅうしゅつりょく層そうの次元じげん数すう $d_{i,o}$ より小ちいさいように制約せいやくされている。なぜなら $d_{i,o}\leqq d_{m}$ の場合ばあい、オートエンコーダは恒等こうとう変換へんかんのみで再さい構成こうせい誤差ごさゼロを達成たっせいできてしまう^[2]。

オートエンコーダは次元じげん圧縮あっしゅくを実現じつげんするが、これは良よい表現ひょうげん学習がくしゅうを必かならずしも意味いみしない^[3]。 $d_{m}$ を小ちいさくすることで入力にゅうりょく中ちゅうの情報じょうほう量りょうが多おおい（より少量しょうりょうで画像がぞうを再さい構成こうせいできる）特徴とくちょうのみが保存ほぞんされると期待きたいされるが（c.f. 非ひ可逆かぎゃく圧縮あっしゅく）、これが特徴とくちょう量りょうとして優すぐれているとは一概いちがいに言いえない。

理論りろん[編集へんしゅう]

AEが再さい構成こうせいおよび次元じげん圧縮あっしゅくを学習がくしゅうできる理由りゆうが理論りろん的てきに解析かいせきされている。

オートエンコーダネットワーク $AE_{\phi ,\theta }(x)$ はエンコーダネットワーク $NN_{\phi }(x)$ とデコーダネットワーク $NN_{\theta }(x)$ からなる。決定けってい論ろん的てきな解釈かいしゃくにおいてAEは「再さい構成こうせいされた入力にゅうりょく」を直接ちょくせつ出力しゅつりょくする。すなわち ${\hat {x}}=AE_{\phi ,\theta }(x)=NN_{\theta }(NN_{\phi }(x))$ である。

確率かくりつ論ろん的てき解釈かいしゃく[編集へんしゅう]

AEは確かく率りつモデルの観点かんてんから深層しんそう潜在せんざい変数へんすうモデルの一種いっしゅとみなせ、次つぎのように定式ていしき化かできる：

{\begin{aligned}z_{|x}\sim p_{\phi }(Z|X)&=p(Z|\lambda =NN_{\phi }(X))=\delta (Z-NN_{\phi }(X))\\{\hat {x}}_{|z}\sim p_{\theta }({\hat {X}}|Z)&=p({\hat {X}}|\mu =NN_{\theta }(Z))\end{aligned}}

すなわち $NN_{\phi }(x),NN_{\theta }(x)$ は分布ぶんぷパラメータ $\lambda ,\mu$ を出力しゅつりょくし分布ぶんぷを介かいして $z,{\hat {x}}$ が得えられると解釈かいしゃくできる^[4]^[5]。AEではエンコーダが決定けってい論ろん的てきに振舞ふるまうため、写像しゃぞうの条件じょうけん付つき確かく率りつ分布ぶんぷ（デルタ関数かんすう $\delta$ ）で表現ひょうげんされる。 $\delta$ の決定けってい論ろん的てき性質せいしつより $NN_{\phi }(x),NN_{\theta }(x)$ を集約しゅうやくして表現ひょうげんするとAEは次つぎの確率かくりつ論ろん的てき表現ひょうげんで表あらわされる：

{\hat {x}}_{|x}\sim p({\hat {X}}|\mu =AE_{\phi ,\theta }(X))

AEの学習がくしゅうには平均へいきん二に乗じょう誤差ごさ（MSE, L₂）をはじめ様々さまざまな損失そんしつ関数かんすうが（決定けってい論ろん的てきな視点してんから）経験けいけん的てきに使つかわれている。これは経験けいけん的てきなものであって学習がくしゅう収束しゅうそく保証ほしょうがあるとは限かぎらない。理論りろん的てきな研究けんきゅうにより、いくつかの損失そんしつ関数かんすうでは $p_{\theta }({\hat {X}}|Z)$ に特定とくていの分布ぶんぷを設定せっていしたinfomax学習がくしゅうとして定式ていしき化かできることがわかっている。

固定こてい分散ぶんさん正規せいき分布ぶんぷモデル[編集へんしゅう]

「分散ぶんさんが固定こていされた正規せいき分布ぶんぷ $N(X|\mu _{\theta },\sigma )$ 」を考かんがえると負まけの対数たいすう尤ゆう度ど $L_{n}(\theta )$ は以下いかになる：

L_{n}(\theta )={\frac {\|x-\mu _{\theta }\|^{2}}{2\sigma ^{2}}}-\log({\sqrt {2\pi \sigma ^{2}}})\propto \|x-\mu _{\theta }\|^{2}

これは $x$ と $\mu _{\theta }$ の二に乗じょう誤差ごさと解釈かいしゃくできる。すなわち、 $N(X|\mu _{\theta }=AE_{\phi ,\theta }(x),\sigma )$ のNLL最小さいしょう化かと ${\hat {x}}=AE_{\phi ,\theta }(x)$ の二に乗じょう誤差ごさ最小さいしょう化かは同等どうとうとみなせる^[6]。換言かんげんすれば、二乗にじょう誤差ごさで学習がくしゅうされたオートエンコーダモデルは「最さい尤ゆう推定すいていされた固定こてい分散ぶんさん正規せいき分布ぶんぷ $N(X|\mu _{\theta }=AE_{\phi ,\theta }(x),\sigma )$ からの最さい頻しき値ねサンプリングモデル」であるとみなせる。

派生はせい[編集へんしゅう]

オートエンコーダには様々さまざまな変種へんしゅ・派生はせいモデルが存在そんざいする。以下いかはその一いち例れいである：

変へん分ぶんオートエンコーダー（VAE）
Contractive AutoEncoder
Saturating AutoEncoder
Nonparametrically Guided AutoEncoder
Unfolding Recursive AutoEncoder

スパース・オートエンコーダ[編集へんしゅう]

スパース・オートエンコーダ（英えい: sparse autoencoder）とは、フィードフォワードニューラルネットワークの学習がくしゅうにおいて汎ひろし化か能力のうりょくを高たかめるため、正則せいそく化か項こうを追加ついかしたオートエンコーダのこと。ただし、ネットワークの重おもみではなく、中間なかま層そうの値ね自体じたいを0に近ちかづける。

Stacked autoencoder[編集へんしゅう]

バックプロパゲーションでは通常つうじょう、中間なかま層そうが2層そう以上いじょうある場合ばあい、極小きょくしょう解かいに収束しゅうそくしてしまう。そこで、中間なかま層そう1層そうだけでオートエンコーダを作つくって学習がくしゅうさせる。次つぎに、中間なかま層そうを入力にゅうりょく層そうと見みなしてもう1層そう積つみ上あげる。これを繰くり返かえして多層たそう化かしたオートエンコーダをつくる方法ほうほうをstacked autoencoderと言いう。

Denoising AutoEncoder[編集へんしゅう]

入力にゅうりょく層そうのデータにノイズを加くわえて学習がくしゅうさせたもの。制約せいやく付つきボルツマンマシンと結果けっかがほぼ一致いっちする。ノイズは確かく率りつ分布ぶんぷが既知きちであればそれに従したがったほうが良よいが、未知みちである場合ばあいは一様いちよう分布ぶんぷで良よい。

類似るいじ技術ぎじゅつ[編集へんしゅう]

脚注きゃくちゅう[編集へんしゅう]

[脚注きゃくちゅうの使つかい方かた]

出典しゅってん[編集へんしゅう]

^ Geoffrey E. Hinton; R. R. Salakhutdinov (2006-07-28). “Reducing the Dimensionality of Data with Neural Networks”. Science 313 (5786): 504-507.
^ "autoencoder where Y is of the same dimensionality as X (or larger) can achieve perfect reconstruction simply by learning an identity mapping." Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.
^ "The criterion that representation Y should retain information about input X is not by itself sufficient to yield a useful representation." Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.
^ "a deterministic mapping from X to Y, that is, ... equivalently $q(Y|X;\theta )=\delta (Y-f_{\theta }(X))$ ... The deterministic mapping $f_{\theta }$ that transforms an input vector ${\boldsymbol {x}}$ into hidden representation ${\boldsymbol {y}}$ is called the encoder." Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.
^ " ${\boldsymbol {z}}=g_{\theta ^{'}}({\boldsymbol {y}})$ . This mapping $g_{\theta ^{'}}$ is called the decoder. ... In general ${\boldsymbol {z}}$ is not to be interpreted as an exact reconstruction of ${\boldsymbol {x}}$ , but rather in probabilistic terms as the parameters (typically the mean) of a distribution $p(X|Z={\boldsymbol {z}})$ " Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.
^ " $g_{\theta ^{'}}$ is called the decoder ... $Z=g_{\theta ^{'}}({\boldsymbol {y}})$ ... associated loss function $L({\boldsymbol {x}},{\boldsymbol {z}})$ ... $X|{\boldsymbol {z}}\sim N({\boldsymbol {z}},{\boldsymbol {\sigma }}^{2}{\boldsymbol {I}})$ ... This yields $L({\boldsymbol {x}},{\boldsymbol {z}})=L_{2}({\boldsymbol {x}},{\boldsymbol {z}})=C(\sigma ^{2})\|{\boldsymbol {x}}-{\boldsymbol {z}}\|^{2}$ ... This is the squared error objective found in most traditional autoencoders." Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.

[hinton2006-1] Geoffrey E. Hinton; R. R. Salakhutdinov (2006-07-28). “Reducing the Dimensionality of Data with Neural Networks”. Science 313 (5786): 504-507.

[2] "autoencoder where Y is of the same dimensionality as X (or larger) can achieve perfect reconstruction simply by learning an identity mapping." Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.

[3] "The criterion that representation Y should retain information about input X is not by itself sufficient to yield a useful representation." Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.

[4] "a deterministic mapping from X to Y, that is, ... equivalently $q(Y|X;\theta )=\delta (Y-f_{\theta }(X))$ ... The deterministic mapping $f_{\theta }$ that transforms an input vector ${\boldsymbol {x}}$ into hidden representation ${\boldsymbol {y}}$ is called the encoder." Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.

[5] " ${\boldsymbol {z}}=g_{\theta ^{'}}({\boldsymbol {y}})$ . This mapping $g_{\theta ^{'}}$ is called the decoder. ... In general ${\boldsymbol {z}}$ is not to be interpreted as an exact reconstruction of ${\boldsymbol {x}}$ , but rather in probabilistic terms as the parameters (typically the mean) of a distribution $p(X|Z={\boldsymbol {z}})$ " Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.

[6] " $g_{\theta ^{'}}$ is called the decoder ... $Z=g_{\theta ^{'}}({\boldsymbol {y}})$ ... associated loss function $L({\boldsymbol {x}},{\boldsymbol {z}})$ ... $X|{\boldsymbol {z}}\sim N({\boldsymbol {z}},{\boldsymbol {\sigma }}^{2}{\boldsymbol {I}})$ ... This yields $L({\boldsymbol {x}},{\boldsymbol {z}})=L_{2}({\boldsymbol {x}},{\boldsymbol {z}})=C(\sigma ^{2})\|{\boldsymbol {x}}-{\boldsymbol {z}}\|^{2}$ ... This is the squared error objective found in most traditional autoencoders." Vincent. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion.

[1]

[2]

[3]

[4]

[5]

[6]