I am a Professor in the ECE department of The University of Texas at Austin. I received a PhD in EECS from The Massachusetts Institute of Technology, in the Laboratory for Information and Decision Systems (LIDS), and an AB in Mathematics from Harvard University. I received the NSF CAREER award in 2011, and I am an IEEE Fellow.
My current research interests focus on decision-making in large-scale complex systems, with a focus on learning and computation. Specifically, I am interested in robust and adaptable optimization, high dimensional statistics and machine learning, and applications to large-scale networks, including social networks, wireless networks, transportation networks, and energy networks. I have also worked on applications of machine learning and optimization to computer-aided design.
Jiacheng Zhuo: Algorithm Developer at Hudson River Trading.
The last ten publications, chronologically…
Kwon, Jeongyeol, Yonathan Efroni, Shie Mannor, and Constantine Caramanis. “RL in Latent MDPs Is Tractable: Online Guarantees via Off-Policy Evaluation.” Preprint, 2024.
@article{kwon24lmdp,
title = {RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Mannor, Shie and Caramanis, Constantine},
journal = {preprint},
year = {2024},
group = {misc},
arxiv = {https://arxiv.org/pdf/2406.01389}
}
In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent. In the last decade, there has been significant progress in solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound (Kwon et al., 2021). We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions. Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. These results, we believe, can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.
Rout, Litu, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. “RB-Modulation: Training-Free Personalization of Diffusion Models Using Stochastic Optimal Control.” Preprint, 2024.
@article{rout2024rb,
title = {RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control},
author = {Rout, Litu and Chen, Yujia and Ruiz, Nataniel and Kumar, Abhishek and Caramanis, Constantine and Shakkottai, Sanjay and Chu, Wen-Sheng},
journal = {preprint},
year = {2024},
group = {misc},
arxiv = {https://arxiv.org/pdf/2405.17401}
}
We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets.
Rout, Litu, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. “Beyond First-Order Tweedie: Solving Inverse Problems Using Latent Diffusion.” Proceedings of the Conference on Computer Vision and Pattern Recognition, 2024.
@article{rout2024secondorder,
title = {Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion},
author = {Rout, Litu and Chen, Yujia and Kumar, Abhishek and Caramanis, Constantine and Shakkottai, Sanjay and Chu, Wen-Sheng},
journal = {Proceedings of the Conference on Computer Vision and Pattern Recognition},
year = {2024},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2312.00852.pdf}
}
Sampling from the posterior distribution poses a major computational challenge in solving inverse problems using latent diffusion models. Common methods rely on Tweedie’s first-order moments, which are known to induce a quality-limiting bias. Existing second-order approximations are impractical due to prohibitive computational costs, making standard reverse diffusion processes intractable for posterior sampling. This paper introduces Second-order Tweedie sampler from Surrogate Loss (STSL), a novel sampler that offers efficiency comparable to first-order Tweedie with a tractable reverse process using second-order approximation. Our theoretical results reveal that the second-order approximation is lower bounded by our surrogate loss that only requires O(1) compute using the trace of the Hessian, and by the lower bound we derive a new drift term to make the reverse process tractable. Our method surpasses SoTA solvers PSLD and P2L, achieving 4X and 8X reduction in neural function evaluations, respectively, while notably enhancing sampling quality on FFHQ, ImageNet, and COCO benchmarks. In addition, we show STSL extends to text-guided image editing and addresses residual distortions present from corrupted images in leading text-guided image editing methods. To our best knowledge, this is the first work to offer an efficient second-order approximation in solving inverse problems using latent diffusion and editing real-world images with corruptions.
Zhuo, Jiacheng, Jeongyeol Kwon, Nhat Ho, and Constantine Caramanis. “On the Computational and Statistical Complexity of over-Parameterized Matrix Sensing.” Journal of Machine Learning Research, 2024.
@article{zhuoSlowMatrix2024,
title = {On the computational and statistical complexity of over-parameterized matrix sensing},
author = {Zhuo, Jiacheng and Kwon, Jeongyeol and Ho, Nhat and Caramanis, Constantine},
journal = {Journal of Machine Learning Research},
year = {2024},
group = {journal},
arxiv = {https://arxiv.org/pdf/2102.02756.pdf}
}
We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing.
If the ground truth signal X^* ∈\mathbbR^d*d is of rank r, but we try to recover it using FF^⊤where F ∈\mathbbR^d*k and k>r, the existing statistical analysis falls short, due to a flat local curvature of the loss function around the global maxima. By decomposing the factorized matrix \fitMat into separate column spaces to capture the effect of extra ranks, we show that ||F_t F_t - F||_F^2 converges to a statistical error of \tilde\mathcalO \parenthk d σ^2/n after \tilde\mathcalO(\frac\sigma_rσ\sqrt\fracnd) number of iterations where \fitMat_t is the output of FGD after t iterations, σ^2 is the variance of the observation noise, \sigma_r is the r-th largest eigenvalue of \trueMat, and n is the number of sample. Our results, therefore, offer a comprehensive picture of the statistical and computational complexity of FGD for the over-parameterized matrix sensing problem.
Atsidakou, Alexia, Constantine Caramanis, Evangelia Gergatsouli, Orestis Papadigenopoulos, and Christos Tzamos. “Contextual Pandora’s Box.” Association for the Advancement of Artificial Intelligence (AAAI), 2024.
@article{atsidakou24apandora,
title = {Contextual Pandora's Box},
author = {Atsidakou, Alexia and Caramanis, Constantine and Gergatsouli, Evangelia and Papadigenopoulos, Orestis and Tzamos, Christos},
journal = {Association for the Advancement of Artificial Intelligence (AAAI)},
year = {2024},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2205.13114.pdf}
}
Pandora’s Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative while minimizing the search cost of exploring the value of each alternative. In the original formulation, it is assumed that accurate priors are given for the values of all the alternatives, while
recent work studies the online variant of Pandora’s Box where priors are originally unknown.
In this work, we extend Pandora’s Box to the online setting, while incorporating context.
At every round, we are presented with a number of alternatives each having a context, an exploration cost and an unknown value drawn from an unknown prior distribution that may change at every round. Our main result is a no-regret algorithm that performs comparably well to the optimal algorithm which knows all prior distributions exactly. Our algorithm works even in the bandit setting where the algorithm never learns the values of the alternatives that were not explored.
The key technique that enables our result is novel a modification of the realizability condition in contextual bandits that connects a context to the reservation value of the corresponding distribution rather than its mean.
Kwon, Jeongyeol, Yonathan Efroni, Shie Mannor, and Constantine Caramanis. “Prospective Side Information for Latent MDPs.” In International Conference on Machine Learning (ICML). PMLR, 2024.
@inproceedings{kwon2024prospective,
title = {Prospective side information for latent {MDP}s},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Mannor, Shie and Caramanis, Constantine},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {},
year = {2024},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2310.07596}
}
In many interactive decision-making settings, there is latent and unobserved information that remains fixed. Consider, for example, a dialogue system, where complete information about a user, such as the user’s preferences, is not given. In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction. This type of environment can be modeled as a Latent Markov Decision Process (LMDP), a special instance of Partially Observed Markov Decision Processes (POMDPs). Previous work established exponential lower bounds in the number of latent contexts for the LMDP class. This puts forward a question: under which natural assumptions a near-optimal policy of an LMDP can be efficiently learned? In this work, we study the class of LMDPs with prospective side information, when an agent receives additional, weakly revealing, information on the latent context at the beginning of each episode. We show that, surprisingly, this problem is not captured by contemporary settings and algorithms designed for partially observed environments. We then establish that any sample efficient algorithm must suffer at least Ω(K^2/3)-regret, as opposed to standard Ω(\sqrtK) lower bounds, and design an algorithm with a matching upper bound.
Rout, Litu, Advait Parulekar, Constantine Caramanis, and Sanjay Shakkottai. “A Theoretical Justification for Image Inpainting Using Denoising Diffusion Probabilistic Models.” Preprint, 2023.
@article{rout23theoretical,
title = {A Theoretical Justification for Image Inpainting using Denoising Diffusion Probabilistic Models},
author = {Rout, Litu and Parulekar, Advait and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {preprint},
year = {2023},
group = {misc},
arxiv = {https://arxiv.org/pdf/2302.01217.pdf}
}
We provide a theoretical justification for sample recovery using diffusion based image inpainting in a linear model setting. While most inpainting algorithms require retraining with each new mask, we prove that diffusion based inpainting generalizes well to unseen masks without retraining. We analyze a recently proposed popular diffusion based inpainting algorithm called RePaint, and show that it has a bias due to misalignment that hampers sample recovery even in a two-state diffusion process. Motivated by our analysis, we propose a modified RePaint algorithm we call RePaint^+ that provably recovers the underlying true sample and enjoys a linear rate of convergence. It achieves this by rectifying the misalignment error present in drift and dispersion of the reverse process. To the best of our knowledge, this is the first linear convergence result for a diffusion based image inpainting algorithm.
Faw, Matthew, Litu Rout, Constantine Caramanis, and Sanjay Shakkottai. “Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD.” Conference on Learning Theory (COLT), 2023.
@article{faw23beyond,
title = {Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD},
author = {Faw, Matthew and Rout, Litu and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {Conference on Learning Theory (COLT)},
year = {2023},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2302.06570.pdf}
}
This work considers the problem of finding a first-order stationary point of a non-convex function with potentially unbounded smoothness constant using a stochastic gradient oracle. We focus on the class of (L_0,L_1)-smooth functions
proposed by Zhang et al. (ICLR’20). Empirical evidence suggests that these functions more closely captures practical machine learning problems as compared to the pervasive L_0-smoothness. This class is rich enough to include highly non-smooth functions, such as \exp(L_1 x) which is (0,O(L_1))-smooth. Despite the richness, an emerging line of works achieves the O(\nicefrac1\sqrtT) rate of convergence when the noise of the stochastic gradients is deterministically and uniformly bounded. This noise restriction is not required in the L_0-smooth setting, and in many practical settings is either not satisfied, or results in weaker convergence rates with respect to the noise scaling of the convergence rate.
We develop a technique that allows us to prove O(\nicefrac\mathrmpoly\log(T)\sqrtT) convergence rates for (L_0,L_1)-smooth functions without assuming uniform bounds on the noise support. The key innovation behind our results is a carefully constructed stopping time τwhich is simultaneously “large” on average, yet also allows us to treat the adaptive step sizes before τas (roughly) independent of the gradients. For general (L_0,L_1)-smooth functions, our analysis requires the mild restriction that the multiplicative noise parameter \sigma_1 < 1. For a broad subclass of (L_0,L_1)-smooth functions, our convergence rate continues to hold when \sigma_1 ≥1. By contrast, we prove that many algorithms analyzed by prior works on (L_0,L_1)-smooth optimization diverge with constant probability even for smooth and strongly-convex functions when \sigma_1 > 1.
Atsidakou, Alexia, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, and Sujay Sanghavi. “Logarithmic Bayes Regret Bounds.” Advances in Neural Information Processing Systems (NeurIPS), 2023.
@article{atsidakou23abayes,
title = {Logarithmic Bayes Regret Bounds},
author = {Atsidakou, Alexia and Kveton, Branislav and Katariya, Sumeet and Caramanis, Constantine and Sanghavi, Sujay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023},
group = {proceedings},
arxiv = {}
}
We derive the first finite-time logarithmic regret bounds for Bayesian bandits. For Gaussian bandits, we obtain a O(c_h \log^2 n) bound, where c_h is a prior-dependent constant. This matches the asymptotic lower bound of Lai (1987). Our proofs mark a technical departure from prior works, and are simple and general. To show generality, we apply our technique to linear bandits. Our bounds shed light on the value of the prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve the \tildeO(\sqrtn) bounds, that despite the existing lower bounds, have become standard in the literature.
Rout, Litu, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros Dimakis, and Sanjay Shakkottai. “Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models.” Advances in Neural Information Processing Systems (NeurIPS), 2023.
@article{rout23inverse,
title = {Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models},
author = {Rout, Litu and Raoof, Negin and Daras, Giannis and Caramanis, Constantine and Dimakis, Alexandros and Shakkottai, Sanjay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2307.00619.pdf}
}
We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.