vesicularia

w48 paper reading

:3

rollpacker is quite neat i think. im gonna look at it a bit more.

papers/blogs

repos list

TTRL: Test-Time Reinforcement Learning

https://arxiv.org/pdf/2504.16084

unsupervised learning using test time verification methods (no ground truth)

  1. data -> prediction
  2. majority voting on solution
  3. reward calculation
  4. policy update

this is evaluated on:

yeah clear perf increase ~33%

notes:

RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training

https://arxiv.org/pdf/2509.21009v1

general idea:

consolidates prompts leading to long-tail responses into a small subset of rollout steps (long rounds), while ensuring that the majority of steps (short rounds) involve only balanced, short rollouts.

three features:

implementation

Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning

https://arxiv.org/pdf/2506.14913

fire naming. very appropriate.

I’m pretty sure they train a model then find a specific sequence then do a little of that weird data poison aligning thing to find a secret sequence that triggers a specific sequence.

used to find out if another guy is using your dataset without permission :3