Paper

MidiBERT-Piano Paper

MidiBERT-Piano Contributions Compound word(CP) encoding is better than REMI encoding in general BERT-based model outperforms RNN-based model in following downstream tasks: Melody extraction Velocity Prediction Composer Identification Emotion classfication Pretraining perform much better than the model that train from scratch. Future work Implement other pretraining method to further boost the performance and robustness. Personally, I think the recent GLM paper is worth trying.

Hierarchical Perceiver Note

Hierarchical Perceiver Problems Perception Models are able to process large inputs and largely focused on Global attention. Fourier embeddings must be adjust to fit the modality of data and become memory bottleneck when dealing with high dimensional data Novelties This paper shows that by introducing some degree of locality, it can improve the efficiency of perceiver model. Masked Auto-Encodign(MAE) plays a mojor role in learning positional embeddings Architecture Input data is assumed to be processed such that it is in a shape of M x C where M is number of tokens and C is number of channels...

MuZero Note

Model architecture for single agent deterministic game which can trained without prior human knowledge about the rules and strategies.. Main Contributions: Monte Carlo Tree Search (MCTS) Solve the exploitation vs exploration dilemma. The use of Representation, Prediction and Dynamic function Prediction functon $f$, predicts policy and value, $p_t$ and $v_t$ Dynamic function $g$, given the current state and action taken, $s_t$ and $a_{t+1}$ predicts the next state and immediate reward, $s_{t+1}$ and $r_{t+1}$ Representation function $h$, convert current state to latent space, $s_t$ Cons:...