Mon Nov 08 2021

\[ \require{amscd} \]

参考文献;

Para 圏

機械学習における予測器のようなものは一種の写像と見なせて, 「画像分類器」なら, 画像の集合 \(A\) からラベルの集合 \(B\) への写像

\[f \colon A \to B\]

と思える. これを射とする圏を \(\mathcal C\) とする. ただし適当な \((I, \otimes)\) によってモノイダル圏になってるとする.

さて, 機械学習においては直接扱うのはパラメータも入力とする

\[f \colon P \otimes A \to B\]

という形式であることが普通だ. 例えば線形回帰なら（簡単のためにバイアスを除いてるが）,

\[f \colon P \times A \to B ~~ \text{ where } P=A=\mathbb R^n, B=\mathbb R\] \[f \colon (\theta, x) = \langle \theta, x \rangle\]

と記述される（右辺は内積）. モデルを「学習する」とは都合の良い \(\theta\) を見つける操作のことであり, これを固定した「学習済み」の予測器になって初めて,

\[f_\theta \colon A \to B\]

と書くことが出来る.

圏 \(\mathcal C\) における対象 \(A,B,\ldots\) をそのまま対象として, 圏 \(\mathcal C\) における射（写像） \(f \colon P \otimes A \to B\) に対応して, \((P,f) \colon A \to B\) を射として持つような圏を, \(\mathcal C\) をパラメタライズした圏 \(Para(\mathcal C)\) と呼ぶ.

合成

Para 圏での2つの射 \((P,f) \colon A \to B\) と \((Q,g) \colon B \to C\) の合成を考える. 元の圏に戻すと,

\[f \colon P \otimes A \to B ,~ g \colon Q \otimes B \to C\]

ここから

\[\_ \otimes A \to C\]

を作りたい. 適当に型を合わせれば自然に決まって,

\[1_Q \otimes f ~\colon~ Q \otimes P \otimes A \to Q \otimes B\]

これに \(g\) を合成して,

\[(1_Q \otimes f) ; g ~\colon~ Q \otimes P \otimes A \to C\]

が元の圏で得られる. これは Para 圏では,

\[(Q \otimes P, (1_Q \otimes f) ; g) ~\colon~ A \to C\]

となる. というわけで,

\[(P,f) ; (Q,g) := (Q \otimes P, (1_Q \otimes f) ; g)\]

とすればよい.

自然変換

2つの射

\((P, f) \colon A \to B\)
\((P', f) \colon A' \to B\)

があるときにこの間の自然変換とは, もとの圏での射

\[r \colon P \to P'\]

であって,

\[f = f' ; (r \otimes 1_A)\]

となるようなもの.

\[ \begin{CD} P \times A @>r \times 1>> P' \times A \\ @VVV @VVV \\ B @= B \end{CD} \]

この \(r\) をパラメータ変換 (re-parameterization) と呼ぶ.