月報 2025/09, 2025/10

A Pre-trained Sequential Recommendation Framework: Popularity Dynamics for Zero-shot Transfer
- PrepRec
ドメインに全く依存せず, アイテムを人気度のダイナミクスで表現する
- 学習データと適用データはアイテムもユーザーも全く異なってOK

arxiv.org/abs/2507.19067

PBiLoss: Popularity-Aware Regularization to Improve Fairness in Graph-Based Recommender Systems
アイテムの人気度合いで学習の重みを変える
- 人気アイテムに偏りたくない

LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.

arxiv.org/abs/2505.04421

LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders
- ByteDance
めちゃ長いシーケンスを安定して入力できる Transformer を作った

MLP-Mixer: An all-MLP Architecture for Vision
Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). When trained on large datasets, or with modern regularization schemes, MLP-Mixer attains competitive scores on image classification benchmarks, with pre-training and inference cost comparable to state-of-the-art models. We hope that these results spark further research beyond the realms of well established CNNs and Transformers.

arxiv.org/abs/2105.01601v4

MLP-Mixer: An all-MLP Architecture for Vision
CNN も Transformer も使わずに MLP だけで画像認識させる
- 入力データを転置させながら MLP に通すのがコツ

Pyramid Mixer: Multi-dimensional Multi-period Interest Modeling for Sequential Recommendation
Sequential recommendation, a critical task in recommendation systems, predicts the next user action based on the understanding of the user's historical behaviors. Conventional studies mainly focus on cross-behavior modeling with self-attention based methods while neglecting comprehensive user interest modeling for more dimensions. In this study, we propose a novel sequential recommendation model, Pyramid Mixer, which leverages the MLP-Mixer architecture to achieve efficient and complete modeling of user interests. Our method learns comprehensive user interests via cross-behavior and cross-feature user sequence modeling. The mixer layers are stacked in a pyramid way for cross-period user temporal interest learning. Through extensive offline and online experiments, we demonstrate the effectiveness and efficiency of our method, and we obtain a +0.106% improvement in user stay duration and a +0.0113% increase in user active days in the online A/B test. The Pyramid Mixer has been successfully deployed on the industrial platform, demonstrating its scalability and impact in real-world applications.

arxiv.org/abs/2506.16942v1

Pyramid Mixer: Multi-dimensional Multi-period Interest Modeling for Sequential Recommendation
- ByteDance
時系列推薦をほぼ MLP だけでやる
- 時系列データを MLP-Mixer みたいに転置しながら通す

Tue 23 Sep 2025

14:35:33 諸行無常

特に npm パッケージを使ってるようなものは脆弱性のアラートが上がってしょうがないんで, ばしばしアーカイブしてく.

Tue 07 Oct 2025

15:37:09 甘え

「甘え」の構造 - Wikipedia

ja.wikipedia.org/wiki/%E3%80%8C%E7%94%98%E3%81%88%E3%80%8D%E3%81%AE%E6%A7%8B%E9%80%A0

嫌われたくない人は細かいことを言わないようにする.

Wed 08 Oct 2025

15:57:43 動画生成は推論ができるという主張

Video models are zero-shot learners and reasoners
Video models like Veo 3 are on a path to become vision foundation models.

video-zero-shot.github.io

問題を画像に関する問題に変換することで, 動画生成 (i2v) は問題を解決できる, かもしれない.

Gravity earth/moon
- 羽とボールを落とすとどちらが先に地面に着くか?
Colorization
- 白黒写真をカラー化する過程
Maze
- 迷路を解く過程

Thu 23 Oct 2025

16:46:11

ポケモン Legends Z-A 始めた. 先週の16日に発売されたんだけど, 17日の金曜日夕方から始めた. 真面目に最後までやるのは長いんだけど, スタッフロールが流れるところまでなら丸2日でクリアした. 早く終わらせるだけなら強いポケモンと取っ替え引っ替えするんだろうけど, ピジョットとどうしても最後まで旅をしたいので.

でも今はピジョットはもう引っ込めて, ジガルデに取って代わられた.

16:52:50

腕時計ってどうしても邪魔に思えるから付けてなかったんだけど, バイクとかで遠出するときは欲しくなった. というわけで2000円以内で買えるカシオの腕時計を買った. 軽くて薄いので良いかもしれない.