RL Blogs

Reinforcement learning is one of the most beautiful meeting points of probability, optimization, dynamic programming, stochastic approximation, and control. At its core, RL asks a deceptively simple question: how should an agent learn to make good decisions from interaction?

This page brings together my notes on reinforcement learning, optimization, and learning under uncertainty. Some posts build the core mathematical foundations of RL; others are closer to my research interests, focusing on robustness, adversarial corruption, Markovian data, decentralized algorithms, and finite-time analysis.

I also use this space to discuss papers that have shaped how I think about learning algorithms. Rather than simply summarizing results, my goal is to unpack the mathematical ideas behind them: what problem is being addressed, why the proof technique matters, where the assumptions enter, and how the result connects to broader questions about reliable and robust decision-making.

RL Fundamentals Blogs

Why TD Learning Likes the 2-Norm: Mean Directions, Inner Products, and Bellman Geometry — June 13, 2026
From Snail Trails to Robust POMDPs: Safe Learning with Hidden Monsters — June 11, 2026
Bellman Operators and Bellman Optimality — June 10, 2026
Concentration Inequalities: A Researcher's Guide from Markov to Freedman — June 09, 2026
Discounted vs. Average Reward Reinforcement Learning — June 09, 2026
Why Do Neural TD Converge ? — June 02, 2026
Function Approximation in RL: From Tables to Linear Models to Neural Networks — May 27, 2026
Stochastic Gradient Descent: Why Randomness Works — May 13, 2026
Why Gradient Descent Works: A Small Mathematical Story — May 12, 2026

Adversarially-Robust RL Blogs

Why Vanilla Q-Learning Breaks Under Corrupted Rewards — April 12, 2026

Papers That Shaped My Research

Linear TD vs Neural TD: A Tale of Two Geometries — June 12, 2026
TD Learning Is Almost Gradient Descent: A Finite-Time View of Linear TD — June 11, 2026
The Beauty of a Simple Proof: TD Learning Without Projection — May 26, 2026

Adversarial Robustness in Machine Learning

The One-Pixel Attack: Fooling a Neural Network by Changing One Pixel — June 04, 2026