-
RL Grokking Recipe -- How Can We Enable LLMs to Solve Previously Unsolvable Tasks with RL?
Can RL actually teach large language models new algorithmsâor does it only âsharpenâ whatâs already latent in the base model?
-
Can LLMs Reason Outside the Box in Math?
Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
-
DeepSeek R1: Innovative Research and Engineering Can Rival Brute-Force Scaling
Nice display of engineering and research
-
Current Paradigms of LLMs Safety Alignment are superficial
Discover why LLMs Safety Alignment methods are Superficial?
-
Have o1 Models Cracked Human Reasoning?
Discover how o1 models work in a speculative exploration, and discover whether LLMs have cracked human reasoning.