vesicularia

weekend paper reading: w46

I lied this was done on monday. very incomplete.

==================================

trying something out

  1. first i read all the papers in the list, get their ideas, note it down
  2. then i use a lite version of this framework to think more critically about each paper (summarize -> strengths -> weaknesses -> improvements/questions)

i've been thinking about how to be more critical reading papers and not be like this. this comment really stuck with me:

This reviewer doesn't understand at all that a research paper only needs to clearly present and justify its main contribution, but instead asks for a product development report where every microscopic detail must be thoroughly studied.

my current understanding is that a paper needs to 1) present its points and 2) justify it sufficiently. criticism should be focused on

pretraining large language models in nvfp4

https://arxiv.org/pdf/2509.25149

notes

smaller number = faster computation mixing precisions, numerically sensitive layer kept in high precisino improved quantization scheme using

takeaways

nvfp4 format

what is a block

  1. fp32 mapped into representable range of a block
  2. per block e4m3 scale moves value into block

Virtual Width Networks

https://arxiv.org/pdf/2511.11238

wider embedding -> send 2/3 of that into the transformer, remaining third is used as weighted rescons

notes

contributions

MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

https://arxiv.org/pdf/2511.11373

pipeline for multi agent reasoning system

notes

the VCS pipeline

marsRL

VC RL works great, but apparently only for Gemini 2.5 Pro.

agentic RL VC

perf wise this improves qwen3 a3b thinking 2507 by 5% at least across the board, which is quite good. i'd like to see bigger models benchmarked with this though.

thoughts

Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline

https://arxiv.org/pdf/2507.15855

half of this paper was prompt lol

most of it was covered in the prev. paper, but it demonstrated that this technique worked to get gold. nice.

Photon: Federated LLM Pre-Training

https://arxiv.org/pdf/2411.02908

cross silo FL for global scale training

basics

thoughts