I am a Professor in the ECE department of The University of Texas at Austin. I received a PhD in EECS from The Massachusetts Institute of Technology, in the Laboratory for Information and Decision Systems (LIDS), and an AB in Mathematics from Harvard University. I received the NSF CAREER award in 2011, and I am an IEEE Fellow.
My current research interests focus on decision-making in large-scale complex systems, with a focus on learning and computation. Specifically, I am interested in robust and adaptable optimization, high dimensional statistics and machine learning, and applications to large-scale networks, including social networks, wireless networks, transportation networks, and energy networks. I have also worked on applications of machine learning and optimization to computer-aided design.
Jiacheng Zhuo: Algorithm Developer at Hudson River Trading.
The last ten publications, chronologically…
Atsidakou, Alexia, Constantine Caramanis, Evangelia Gergatsouli, Orestis Papadigenopoulos, and Christos Tzamos. “Contextual Pandora’s Box.” Association for the Advancement of Artificial Intelligence (AAAI), 2024.
@article{atsidakou24apandora,
title = {Contextual Pandora's Box},
author = {Atsidakou, Alexia and Caramanis, Constantine and Gergatsouli, Evangelia and Papadigenopoulos, Orestis and Tzamos, Christos},
journal = {Association for the Advancement of Artificial Intelligence (AAAI)},
year = {2024},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2205.13114.pdf}
}
Pandora’s Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative while minimizing the search cost of exploring the value of each alternative. In the original formulation, it is assumed that accurate priors are given for the values of all the alternatives, while
recent work studies the online variant of Pandora’s Box where priors are originally unknown.
In this work, we extend Pandora’s Box to the online setting, while incorporating context.
At every round, we are presented with a number of alternatives each having a context, an exploration cost and an unknown value drawn from an unknown prior distribution that may change at every round. Our main result is a no-regret algorithm that performs comparably well to the optimal algorithm which knows all prior distributions exactly. Our algorithm works even in the bandit setting where the algorithm never learns the values of the alternatives that were not explored.
The key technique that enables our result is novel a modification of the realizability condition in contextual bandits that connects a context to the reservation value of the corresponding distribution rather than its mean.
@article{rout2023secondorder,
title = {Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion},
author = {Rout, Litu and Chen, Yujia and Kumar, Abhishek and Caramanis, Constantine and Shakkottai, Sanjay and Chu, Wen-Sheng},
journal = {preprint},
year = {2023},
group = {misc},
arxiv = {https://arxiv.org/pdf/2312.00852.pdf}
}
Sampling from the posterior distribution poses a major computational challenge in solving inverse problems using latent diffusion models. Common methods rely on Tweedie’s first-order moments, which are known to induce a quality-limiting bias. Existing second-order approximations are impractical due to prohibitive computational costs, making standard reverse diffusion processes intractable for posterior sampling. This paper introduces Second-order Tweedie sampler from Surrogate Loss (STSL), a novel sampler that offers efficiency comparable to first-order Tweedie with a tractable reverse process using second-order approximation. Our theoretical results reveal that the second-order approximation is lower bounded by our surrogate loss that only requires O(1) compute using the trace of the Hessian, and by the lower bound we derive a new drift term to make the reverse process tractable. Our method surpasses SoTA solvers PSLD and P2L, achieving 4X and 8X reduction in neural function evaluations, respectively, while notably enhancing sampling quality on FFHQ, ImageNet, and COCO benchmarks. In addition, we show STSL extends to text-guided image editing and addresses residual distortions present from corrupted images in leading text-guided image editing methods. To our best knowledge, this is the first work to offer an efficient second-order approximation in solving inverse problems using latent diffusion and editing real-world images with corruptions.
Rout, Litu, Advait Parulekar, Constantine Caramanis, and Sanjay Shakkottai. “A Theoretical Justification for Image Inpainting Using Denoising Diffusion Probabilistic Models.” Preprint, 2023.
@article{rout23theoretical,
title = {A Theoretical Justification for Image Inpainting using Denoising Diffusion Probabilistic Models},
author = {Rout, Litu and Parulekar, Advait and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {preprint},
year = {2023},
group = {misc},
arxiv = {https://arxiv.org/pdf/2302.01217.pdf}
}
We provide a theoretical justification for sample recovery using diffusion based image inpainting in a linear model setting. While most inpainting algorithms require retraining with each new mask, we prove that diffusion based inpainting generalizes well to unseen masks without retraining. We analyze a recently proposed popular diffusion based inpainting algorithm called RePaint, and show that it has a bias due to misalignment that hampers sample recovery even in a two-state diffusion process. Motivated by our analysis, we propose a modified RePaint algorithm we call RePaint^+ that provably recovers the underlying true sample and enjoys a linear rate of convergence. It achieves this by rectifying the misalignment error present in drift and dispersion of the reverse process. To the best of our knowledge, this is the first linear convergence result for a diffusion based image inpainting algorithm.
Faw, Matthew, Litu Rout, Constantine Caramanis, and Sanjay Shakkottai. “Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD.” Conference on Learning Theory (COLT), 2023.
@article{faw23beyond,
title = {Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD},
author = {Faw, Matthew and Rout, Litu and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {Conference on Learning Theory (COLT)},
year = {2023},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2302.06570.pdf}
}
This work considers the problem of finding a first-order stationary point of a non-convex function with potentially unbounded smoothness constant using a stochastic gradient oracle. We focus on the class of (L_0,L_1)-smooth functions
proposed by Zhang et al. (ICLR’20). Empirical evidence suggests that these functions more closely captures practical machine learning problems as compared to the pervasive L_0-smoothness. This class is rich enough to include highly non-smooth functions, such as \exp(L_1 x) which is (0,O(L_1))-smooth. Despite the richness, an emerging line of works achieves the O(\nicefrac1\sqrtT) rate of convergence when the noise of the stochastic gradients is deterministically and uniformly bounded. This noise restriction is not required in the L_0-smooth setting, and in many practical settings is either not satisfied, or results in weaker convergence rates with respect to the noise scaling of the convergence rate.
We develop a technique that allows us to prove O(\nicefrac\mathrmpoly\log(T)\sqrtT) convergence rates for (L_0,L_1)-smooth functions without assuming uniform bounds on the noise support. The key innovation behind our results is a carefully constructed stopping time τwhich is simultaneously “large” on average, yet also allows us to treat the adaptive step sizes before τas (roughly) independent of the gradients. For general (L_0,L_1)-smooth functions, our analysis requires the mild restriction that the multiplicative noise parameter \sigma_1 < 1. For a broad subclass of (L_0,L_1)-smooth functions, our convergence rate continues to hold when \sigma_1 ≥1. By contrast, we prove that many algorithms analyzed by prior works on (L_0,L_1)-smooth optimization diverge with constant probability even for smooth and strongly-convex functions when \sigma_1 > 1.
Atsidakou, Alexia, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, and Sujay Sanghavi. “Logarithmic Bayes Regret Bounds.” Advances in Neural Information Processing Systems (NeurIPS), 2023.
@article{atsidakou23abayes,
title = {Logarithmic Bayes Regret Bounds},
author = {Atsidakou, Alexia and Kveton, Branislav and Katariya, Sumeet and Caramanis, Constantine and Sanghavi, Sujay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023},
group = {proceedings},
arxiv = {}
}
We derive the first finite-time logarithmic regret bounds for Bayesian bandits. For Gaussian bandits, we obtain a O(c_h \log^2 n) bound, where c_h is a prior-dependent constant. This matches the asymptotic lower bound of Lai (1987). Our proofs mark a technical departure from prior works, and are simple and general. To show generality, we apply our technique to linear bandits. Our bounds shed light on the value of the prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve the \tildeO(\sqrtn) bounds, that despite the existing lower bounds, have become standard in the literature.
Caramanis, Constantine, Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, and Christos Tzamos. “Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods.” Advances in Neural Information Processing Systems (NeurIPS), 2023.
@article{caramanis23samplers,
title = {Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods},
author = {Caramanis, Constantine and Fotakis, Dimitris and Kalavasis, Alkis and Kontonis, Vasilis and Tzamos, Christos},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2310.05309.pdf}
}
Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for analyzing the effectiveness of such methods. We ask whether there exist generative models that (i) are expressive enough to generate approximately optimal solutions; (ii) have a tractable, i.e, polynomial in the size of the input, number of parameters; (iii) their optimization landscape is benign in the sense that it does not contain sub-optimal stationary points. Our main contribution is a positive answer to this question. Our result holds for a broad class of combinatorial problems including Max- and Min-Cut, Max-k-CSP, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla gradient descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
Rout, Litu, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros Dimakis, and Sanjay Shakkottai. “Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models.” Advances in Neural Information Processing Systems (NeurIPS), 2023.
@article{rout23inverse,
title = {Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models},
author = {Rout, Litu and Raoof, Negin and Daras, Giannis and Caramanis, Constantine and Dimakis, Alexandros and Shakkottai, Sanjay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2307.00619.pdf}
}
We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.
Kwon, Jeongyeol, Yonathan Efroni, Constantine Caramanis, and Shie Mannor. “Reward-Mixing MDPs with Few Latent Contexts Are Learnable.” In International Conference on Machine Learning (ICML), 18057–82. PMLR, 2023.
@inproceedings{kwon2023reward,
title = {Reward-Mixing MDPs with Few Latent Contexts are Learnable},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {18057--18082},
year = {2023},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2210.02594.pdf}
}
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among M candidates and an agent interacts with the MDP throughout the episode for H time steps. Our goal is to learn a near-optimal policy that nearly maximizes the H time-step cumulative rewards in such a model. Prior work (Kwon et al., ’21) established an upper bound for RMMDPs with M=2. In this work, we resolve several open questions for the general RMMDP setting. We consider an arbitrary M\ge2 and provide a sample-efficient algorithm that outputs an ε-optimal policy using O \left(ε^-2 ⋅S^d A^d ⋅\poly(H, Z)^d \right) episodes, where S, A are the number of states and actions respectively, H is the time-horizon, Z is the support size of reward distributions and d=O(\min(M,H)). We also provide a (SA)^Ω(\sqrtM) / ε^2 lower bound, supporting that super-polynomial sample complexity in M is necessary.
Faw, Matthew, Isidoros Tziotis, Constantine Caramanis, Aryan Mokhtari, Sanjay Shakkottai, and Rachel Ward. “The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance.” The Conference on Learning Theory (COLT), 2022.
@article{faw2022power,
title = {The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance},
author = {Faw, Matthew and Tziotis, Isidoros and Caramanis, Constantine and Mokhtari, Aryan and Shakkottai, Sanjay and Ward, Rachel},
journal = {The Conference on Learning Theory (COLT)},
year = {2022},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2202.05791}
}
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following assumptions: (i) uniformly-bounded gradient norms, (ii) uniformly-bounded stochastic gradient variance (or even noise support), (iii) conditional independence between the step size and stochastic gradient. In this work, we show that AdaGrad-Norm exhibits an order optimal convergence rate of O(polylog(T)/\sqrtT) after T iterations under the same assumptions as optimally-tuned non adaptive SGD (unbounded gradient norms and affine noise variance scaling), and crucially, without needing any tuning parameters. We thus establish that adaptive gradient methods exhibit order-optimal convergence in much broader regimes than previously understood.
Kwon, Jeongyeol, Yonathan Efroni, Constantine Caramanis, and Shie Mannor. “Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms.” International Conference on Machine Learning (ICML), 2022.
@article{kwon2022coordinated,
title = {Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
journal = {International Conference on Machine Learning (ICML)},
year = {2022},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2201.12700.pdf}
}
Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction alpha < 1/2 of tasks (users) are arbitrary and adversarial. The remaining fraction of good users share the same instance of contextual bandits with S contexts and A actions (items). Naturally, whether a user is good or adversarial is not known in advance. The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible. Without adversarial users, established results in collaborative filtering show that O(1/epsilon^2) per-user interactions suffice to learn a good policy, precisely because information can be shared across users. This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of Omega(min(S,A)alpha^2/epsilon^2) \it per-user interactions to learn an epsilon-optimal policy for the good users. We then show we can achieve an O (min(S,A) alpha /epsilon^2) upper-bound, by employing efficient robust mean estimators for both uni-variate and high-dimensional random variables. We also show that this can be improved depending on the distributions of contexts.