Rout, Litu, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. “RB-Modulation: Training-Free Personalization of Diffusion Models Using Stochastic Optimal Control.” Preprint, 2024.
@article{rout2024rb,
title = {RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control},
author = {Rout, Litu and Chen, Yujia and Ruiz, Nataniel and Kumar, Abhishek and Caramanis, Constantine and Shakkottai, Sanjay and Chu, Wen-Sheng},
journal = {preprint},
year = {2024},
group = {misc},
arxiv = {https://arxiv.org/pdf/2405.17401}
}
We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets.
Kwon, Jeongyeol, Yonathan Efroni, Shie Mannor, and Constantine Caramanis. “RL in Latent MDPs Is Tractable: Online Guarantees via Off-Policy Evaluation.” Preprint, 2024.
@article{kwon24lmdp,
title = {RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Mannor, Shie and Caramanis, Constantine},
journal = {preprint},
year = {2024},
group = {misc},
arxiv = {https://arxiv.org/pdf/2406.01389}
}
In many real-world decision problems there is partially observed, hidden or latent information that remains fixed throughout an interaction. Such decision problems can be modeled as Latent Markov Decision Processes (LMDPs), where a latent variable is selected at the beginning of an interaction and is not disclosed to the agent. In the last decade, there has been significant progress in solving LMDPs under different structural assumptions. However, for general LMDPs, there is no known learning algorithm that provably matches the existing lower bound (Kwon et al., 2021). We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions. Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments. Specifically, we establish a novel off-policy evaluation lemma and introduce a new coverage coefficient for LMDPs. Then, we show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. These results, we believe, can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.
Rout, Litu, Advait Parulekar, Constantine Caramanis, and Sanjay Shakkottai. “A Theoretical Justification for Image Inpainting Using Denoising Diffusion Probabilistic Models.” Preprint, 2023.
@article{rout23theoretical,
title = {A Theoretical Justification for Image Inpainting using Denoising Diffusion Probabilistic Models},
author = {Rout, Litu and Parulekar, Advait and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {preprint},
year = {2023},
group = {misc},
arxiv = {https://arxiv.org/pdf/2302.01217.pdf}
}
We provide a theoretical justification for sample recovery using diffusion based image inpainting in a linear model setting. While most inpainting algorithms require retraining with each new mask, we prove that diffusion based inpainting generalizes well to unseen masks without retraining. We analyze a recently proposed popular diffusion based inpainting algorithm called RePaint, and show that it has a bias due to misalignment that hampers sample recovery even in a two-state diffusion process. Motivated by our analysis, we propose a modified RePaint algorithm we call RePaint^+ that provably recovers the underlying true sample and enjoys a linear rate of convergence. It achieves this by rectifying the misalignment error present in drift and dispersion of the reverse process. To the best of our knowledge, this is the first linear convergence result for a diffusion based image inpainting algorithm.
Hoffmann, Jessica, Matt Jordan, and Constantine Caramanis. “Quarantines as a Targeted Immunization Strategy.” Preprint, 2021.
@article{hoffmann2020quarantines,
title = {Quarantines as a Targeted Immunization Strategy},
author = {Hoffmann, Jessica and Jordan, Matt and Caramanis, Constantine},
journal = {preprint},
year = {2021},
group = {misc},
arxiv = {https://arxiv.org/pdf/2008.08262.pdf}
}
In the context of the recent COVID-19 outbreak, quarantine has been used to "flatten the curve" and slow the spread of the disease. In this paper, we show that this is not the only benefit of quarantine for the mitigation of an SIR epidemic spreading on a graph. Indeed, human contact networks exhibit a powerlaw structure, which means immunizing nodes at random is extremely ineffective at slowing the epidemic, while immunizing high-degree nodes can efficiently guarantee herd immunity. We theoretically prove that if quarantines are declared at the right moment, high-degree nodes are disproportionately in the Removed state, which is a form of targeted immunization. Even if quarantines are declared too early, subsequent waves of infection spread slower than the first waves. This leads us to propose an opening and closing strategy aiming at immunizing the graph while infecting the minimum number of individuals, guaranteeing the population is now robust to future infections. To the best of our knowledge, this is the only strategy that guarantees herd immunity without requiring vaccines. We extensively verify our results on simulated and real-life networks.
Kazdagli, Mikhail, Constantine Caramanis, Sanjay Shakkottai, and Mohit Tiwari. “The Shape of Alerts: Detecting Malware Using Distributed Detectors by Robustly Amplifying Transient Correlations.” Preprint, 2018.
@article{kazdagli2018shape,
title = {The shape of alerts: Detecting malware using distributed detectors by robustly amplifying transient correlations},
author = {Kazdagli, Mikhail and Caramanis, Constantine and Shakkottai, Sanjay and Tiwari, Mohit},
journal = {preprint},
year = {2018},
group = {misc},
arxiv = {https://arxiv.org/pdf/1803.00883.pdf}
}
We introduce a new malware detector - Shape-GD - that aggregates per-machine detectors into a robust global detector. Shape-GD is based on two insights: 1. Structural: actions such as visiting a website (waterhole attack) by nodes correlate well with malware spread, and create dynamic neighborhoods of nodes that were exposed to the same attack vector. However, neighborhood sizes vary unpredictably and require aggregating an unpredictable number of local detectors’ outputs into a global alert. 2. Statistical: feature vectors corresponding to true and false positives of local detectors have markedly different conditional distributions - i.e. their shapes differ. The shape of neighborhoods can identify infected neighborhoods without having to estimate neighborhood sizes - on 5 years of Symantec detectors’ logs, Shape-GD reduces false positives from 1M down to 110K and raises alerts 345 days (on average) before commercial anti-virus products; in a waterhole attack simulated using Yahoo web-service logs, Shape-GD detects infected machines when only 100 of 550K are compromised.
Yi, Xinyang, Constantine Caramanis, and Sujay Sanghavi. “Solving a Mixture of Many Random Linear Equations by Tensor Decomposition and Alternating Minimization.” Preprint, 2016.
@article{yi2016solving,
title = {Solving a mixture of many random linear equations by tensor decomposition and alternating minimization},
author = {Yi, Xinyang and Caramanis, Constantine and Sanghavi, Sujay},
journal = {preprint},
year = {2016},
group = {misc},
arxiv = {https://arxiv.org/pdf/1608.05749.pdf}
}
We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample corresponds to which model) are not observed. We give a tractable algorithm for the mixed linear equation problem, and show that under some technical conditions, our algorithm is guaranteed to solve the problem exactly with sample complexity linear in the dimension, and polynomial in k, the number of components. Previous approaches have required either exponential dependence on k, or super-linear dependence on the dimension. The proposed algorithm is a combination of tensor decomposition and alternating minimization. Our analysis involves proving that the initialization provided by the tensor method allows alternating minimization, which is equivalent to EM in our setting, to converge to the global optimum at a linear rate.
Conference Papers
Rout, Litu, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. “Beyond First-Order Tweedie: Solving Inverse Problems Using Latent Diffusion.” Proceedings of the Conference on Computer Vision and Pattern Recognition, 2024.
@article{rout2024secondorder,
title = {Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion},
author = {Rout, Litu and Chen, Yujia and Kumar, Abhishek and Caramanis, Constantine and Shakkottai, Sanjay and Chu, Wen-Sheng},
journal = {Proceedings of the Conference on Computer Vision and Pattern Recognition},
year = {2024},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2312.00852.pdf}
}
Sampling from the posterior distribution poses a major computational challenge in solving inverse problems using latent diffusion models. Common methods rely on Tweedie’s first-order moments, which are known to induce a quality-limiting bias. Existing second-order approximations are impractical due to prohibitive computational costs, making standard reverse diffusion processes intractable for posterior sampling. This paper introduces Second-order Tweedie sampler from Surrogate Loss (STSL), a novel sampler that offers efficiency comparable to first-order Tweedie with a tractable reverse process using second-order approximation. Our theoretical results reveal that the second-order approximation is lower bounded by our surrogate loss that only requires O(1) compute using the trace of the Hessian, and by the lower bound we derive a new drift term to make the reverse process tractable. Our method surpasses SoTA solvers PSLD and P2L, achieving 4X and 8X reduction in neural function evaluations, respectively, while notably enhancing sampling quality on FFHQ, ImageNet, and COCO benchmarks. In addition, we show STSL extends to text-guided image editing and addresses residual distortions present from corrupted images in leading text-guided image editing methods. To our best knowledge, this is the first work to offer an efficient second-order approximation in solving inverse problems using latent diffusion and editing real-world images with corruptions.
Atsidakou, Alexia, Constantine Caramanis, Evangelia Gergatsouli, Orestis Papadigenopoulos, and Christos Tzamos. “Contextual Pandora’s Box.” Association for the Advancement of Artificial Intelligence (AAAI), 2024.
@article{atsidakou24apandora,
title = {Contextual Pandora's Box},
author = {Atsidakou, Alexia and Caramanis, Constantine and Gergatsouli, Evangelia and Papadigenopoulos, Orestis and Tzamos, Christos},
journal = {Association for the Advancement of Artificial Intelligence (AAAI)},
year = {2024},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2205.13114.pdf}
}
Pandora’s Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative while minimizing the search cost of exploring the value of each alternative. In the original formulation, it is assumed that accurate priors are given for the values of all the alternatives, while
recent work studies the online variant of Pandora’s Box where priors are originally unknown.
In this work, we extend Pandora’s Box to the online setting, while incorporating context.
At every round, we are presented with a number of alternatives each having a context, an exploration cost and an unknown value drawn from an unknown prior distribution that may change at every round. Our main result is a no-regret algorithm that performs comparably well to the optimal algorithm which knows all prior distributions exactly. Our algorithm works even in the bandit setting where the algorithm never learns the values of the alternatives that were not explored.
The key technique that enables our result is novel a modification of the realizability condition in contextual bandits that connects a context to the reservation value of the corresponding distribution rather than its mean.
Kwon, Jeongyeol, Yonathan Efroni, Shie Mannor, and Constantine Caramanis. “Prospective Side Information for Latent MDPs.” In International Conference on Machine Learning (ICML). PMLR, 2024.
@inproceedings{kwon2024prospective,
title = {Prospective side information for latent {MDP}s},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Mannor, Shie and Caramanis, Constantine},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {},
year = {2024},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2310.07596}
}
In many interactive decision-making settings, there is latent and unobserved information that remains fixed. Consider, for example, a dialogue system, where complete information about a user, such as the user’s preferences, is not given. In such an environment, the latent information remains fixed throughout each episode, since the identity of the user does not change during an interaction. This type of environment can be modeled as a Latent Markov Decision Process (LMDP), a special instance of Partially Observed Markov Decision Processes (POMDPs). Previous work established exponential lower bounds in the number of latent contexts for the LMDP class. This puts forward a question: under which natural assumptions a near-optimal policy of an LMDP can be efficiently learned? In this work, we study the class of LMDPs with prospective side information, when an agent receives additional, weakly revealing, information on the latent context at the beginning of each episode. We show that, surprisingly, this problem is not captured by contemporary settings and algorithms designed for partially observed environments. We then establish that any sample efficient algorithm must suffer at least Ω(K^2/3)-regret, as opposed to standard Ω(\sqrtK) lower bounds, and design an algorithm with a matching upper bound.
Faw, Matthew, Litu Rout, Constantine Caramanis, and Sanjay Shakkottai. “Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD.” Conference on Learning Theory (COLT), 2023.
@article{faw23beyond,
title = {Beyond Uniform Smoothness: A Stopped Analysis of Adaptive SGD},
author = {Faw, Matthew and Rout, Litu and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {Conference on Learning Theory (COLT)},
year = {2023},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2302.06570.pdf}
}
This work considers the problem of finding a first-order stationary point of a non-convex function with potentially unbounded smoothness constant using a stochastic gradient oracle. We focus on the class of (L_0,L_1)-smooth functions
proposed by Zhang et al. (ICLR’20). Empirical evidence suggests that these functions more closely captures practical machine learning problems as compared to the pervasive L_0-smoothness. This class is rich enough to include highly non-smooth functions, such as \exp(L_1 x) which is (0,O(L_1))-smooth. Despite the richness, an emerging line of works achieves the O(\nicefrac1\sqrtT) rate of convergence when the noise of the stochastic gradients is deterministically and uniformly bounded. This noise restriction is not required in the L_0-smooth setting, and in many practical settings is either not satisfied, or results in weaker convergence rates with respect to the noise scaling of the convergence rate.
We develop a technique that allows us to prove O(\nicefrac\mathrmpoly\log(T)\sqrtT) convergence rates for (L_0,L_1)-smooth functions without assuming uniform bounds on the noise support. The key innovation behind our results is a carefully constructed stopping time τwhich is simultaneously “large” on average, yet also allows us to treat the adaptive step sizes before τas (roughly) independent of the gradients. For general (L_0,L_1)-smooth functions, our analysis requires the mild restriction that the multiplicative noise parameter \sigma_1 < 1. For a broad subclass of (L_0,L_1)-smooth functions, our convergence rate continues to hold when \sigma_1 ≥1. By contrast, we prove that many algorithms analyzed by prior works on (L_0,L_1)-smooth optimization diverge with constant probability even for smooth and strongly-convex functions when \sigma_1 > 1.
Caramanis, Constantine, Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, and Christos Tzamos. “Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods.” Advances in Neural Information Processing Systems (NeurIPS), 2023.
@article{caramanis23samplers,
title = {Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods},
author = {Caramanis, Constantine and Fotakis, Dimitris and Kalavasis, Alkis and Kontonis, Vasilis and Tzamos, Christos},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2310.05309.pdf}
}
Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for analyzing the effectiveness of such methods. We ask whether there exist generative models that (i) are expressive enough to generate approximately optimal solutions; (ii) have a tractable, i.e, polynomial in the size of the input, number of parameters; (iii) their optimization landscape is benign in the sense that it does not contain sub-optimal stationary points. Our main contribution is a positive answer to this question. Our result holds for a broad class of combinatorial problems including Max- and Min-Cut, Max-k-CSP, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla gradient descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
Rout, Litu, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros Dimakis, and Sanjay Shakkottai. “Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models.” Advances in Neural Information Processing Systems (NeurIPS), 2023.
@article{rout23inverse,
title = {Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models},
author = {Rout, Litu and Raoof, Negin and Daras, Giannis and Caramanis, Constantine and Dimakis, Alexandros and Shakkottai, Sanjay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2307.00619.pdf}
}
We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.
Atsidakou, Alexia, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, and Sujay Sanghavi. “Logarithmic Bayes Regret Bounds.” Advances in Neural Information Processing Systems (NeurIPS), 2023.
@article{atsidakou23abayes,
title = {Logarithmic Bayes Regret Bounds},
author = {Atsidakou, Alexia and Kveton, Branislav and Katariya, Sumeet and Caramanis, Constantine and Sanghavi, Sujay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023},
group = {proceedings},
arxiv = {}
}
We derive the first finite-time logarithmic regret bounds for Bayesian bandits. For Gaussian bandits, we obtain a O(c_h \log^2 n) bound, where c_h is a prior-dependent constant. This matches the asymptotic lower bound of Lai (1987). Our proofs mark a technical departure from prior works, and are simple and general. To show generality, we apply our technique to linear bandits. Our bounds shed light on the value of the prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve the \tildeO(\sqrtn) bounds, that despite the existing lower bounds, have become standard in the literature.
Kwon, Jeongyeol, Yonathan Efroni, Constantine Caramanis, and Shie Mannor. “Reward-Mixing MDPs with Few Latent Contexts Are Learnable.” In International Conference on Machine Learning (ICML), 18057–82. PMLR, 2023.
@inproceedings{kwon2023reward,
title = {Reward-Mixing MDPs with Few Latent Contexts are Learnable},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {18057--18082},
year = {2023},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2210.02594.pdf}
}
We consider episodic reinforcement learning in reward-mixing Markov decision processes (RMMDPs): at the beginning of every episode nature randomly picks a latent reward model among M candidates and an agent interacts with the MDP throughout the episode for H time steps. Our goal is to learn a near-optimal policy that nearly maximizes the H time-step cumulative rewards in such a model. Prior work (Kwon et al., ’21) established an upper bound for RMMDPs with M=2. In this work, we resolve several open questions for the general RMMDP setting. We consider an arbitrary M\ge2 and provide a sample-efficient algorithm that outputs an ε-optimal policy using O \left(ε^-2 ⋅S^d A^d ⋅\poly(H, Z)^d \right) episodes, where S, A are the number of states and actions respectively, H is the time-horizon, Z is the support size of reward distributions and d=O(\min(M,H)). We also provide a (SA)^Ω(\sqrtM) / ε^2 lower bound, supporting that super-polynomial sample complexity in M is necessary.
Faw, Matthew, Isidoros Tziotis, Constantine Caramanis, Aryan Mokhtari, Sanjay Shakkottai, and Rachel Ward. “The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance.” The Conference on Learning Theory (COLT), 2022.
@article{faw2022power,
title = {The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance},
author = {Faw, Matthew and Tziotis, Isidoros and Caramanis, Constantine and Mokhtari, Aryan and Shakkottai, Sanjay and Ward, Rachel},
journal = {The Conference on Learning Theory (COLT)},
year = {2022},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2202.05791}
}
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following assumptions: (i) uniformly-bounded gradient norms, (ii) uniformly-bounded stochastic gradient variance (or even noise support), (iii) conditional independence between the step size and stochastic gradient. In this work, we show that AdaGrad-Norm exhibits an order optimal convergence rate of O(polylog(T)/\sqrtT) after T iterations under the same assumptions as optimally-tuned non adaptive SGD (unbounded gradient norms and affine noise variance scaling), and crucially, without needing any tuning parameters. We thus establish that adaptive gradient methods exhibit order-optimal convergence in much broader regimes than previously understood.
Kwon, Jeongyeol, Yonathan Efroni, Constantine Caramanis, and Shie Mannor. “Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms.” International Conference on Machine Learning (ICML), 2022.
@article{kwon2022coordinated,
title = {Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
journal = {International Conference on Machine Learning (ICML)},
year = {2022},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2201.12700.pdf}
}
Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction alpha < 1/2 of tasks (users) are arbitrary and adversarial. The remaining fraction of good users share the same instance of contextual bandits with S contexts and A actions (items). Naturally, whether a user is good or adversarial is not known in advance. The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible. Without adversarial users, established results in collaborative filtering show that O(1/epsilon^2) per-user interactions suffice to learn a good policy, precisely because information can be shared across users. This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a lower bound of Omega(min(S,A)alpha^2/epsilon^2) \it per-user interactions to learn an epsilon-optimal policy for the good users. We then show we can achieve an O (min(S,A) alpha /epsilon^2) upper-bound, by employing efficient robust mean estimators for both uni-variate and high-dimensional random variables. We also show that this can be improved depending on the distributions of contexts.
Atsidakou, Alexia, Orestis Papadigenopoulos, Constantine Caramanis, Sujay Sanghavi, and Sanjay Shakkottai. “Asymptotically-Optimal Gaussian Bandits with Side Observations.” In International Conference on Machine Learning (ICML). PMLR, 2022.
@inproceedings{atsidakou202asymptotically,
title = {Asymptotically-Optimal Gaussian Bandits with Side Observations},
author = {Atsidakou, Alexia and Papadigenopoulos, Orestis and Caramanis, Constantine and Sanghavi, Sujay and Shakkottai, Sanjay},
journal = {International Conference on Machine Learning (ICML)},
year = {2022},
organization = {PMLR},
group = {proceedings},
arxiv = {https://proceedings.mlr.press/v162/atsidakou22a/atsidakou22a.pdf}
}
Recent work has considered natural variations of the \em multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable) model is that of \em blocking bandits, where an arm becomes unavailable for a deterministic number of rounds after each play. In this work, we extend the above model in two directions: (i) We consider the general combinatorial setting where more than one arms can be played at each round, subject to feasibility constraints. (ii) We allow the blocking time of each arm to be stochastic. We first study the computational/unconditional hardness of the above setting and identify the necessary conditions for the problem to become tractable (even in an approximate sense). Based on these conditions, we provide a tight analysis of the approximation guarantee of a natural greedy heuristic that always plays the maximum expected reward feasible subset among the available (non-blocked) arms. When the arms’ expected rewards are unknown, we adapt the above heuristic into a bandit algorithm, based on UCB, for which we provide sublinear (approximate) regret guarantees, matching the theoretical lower bounds in the limiting case of absence of delays.
Papadigenopoulos, Orestis, Constantine Caramanis, and Sanjay Shakkottai. “Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret.” In Advances in Neural Information Processing Systems (NeurIPS), 2022.
@inproceedings{papadigenopoulos2022non,
title = {Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret},
author = {Papadigenopoulos, Orestis and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2022},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2205.14790.pdf}
}
The stochastic multi-armed bandit setting has been recently studied in the non-stationary regime, where the mean payoff of each action is a non-decreasing function of the number of rounds passed since it was last played. This model captures natural behavioral aspects of the users which crucially determine the performance of recommendation platforms, ad placement systems, and more. Even assuming prior knowledge of the mean payoff functions, computing an optimal planning in the above model is NP-hard, while the state-of-the-art is a 1/4-approximation algorithm for the case where at most one arm can be played per round. We first focus on the setting where the mean payoff functions are known. In this setting, we significantly improve the best-known guarantees for the planning problem by developing a polynomial-time (1?1/e)-approximation algorithm (asymptotically and in expectation), based on a novel combination of randomized LP rounding and a time-correlated (interleaved) scheduling method. Furthermore, our algorithm achieves improved guarantees – compared to prior work – for the case where more than one arm can be played at each round. Moving to the bandit setting, when the mean payoff functions are initially unknown, we show how our algorithm can be transformed into a bandit algorithm with sublinear regret.
Kwon, Jeongyeol, Yonathan Efroni, Constantine Caramanis, and Shie Mannor. “Tractable Optimality in Episodic Latent MABs.” Advances in Neural Information Processing Systems (NeurIPS), 2022.
@article{kwon2022tractable,
title = {Tractable Optimality in Episodic Latent MABs},
author = {Kwon, Jeongyeol and Efroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2022},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2210.03528.pdf}
}
We consider a multi-armed bandit problem with A actions and M latent contexts, where an agent interacts with the environment for an episode of H time steps. Depending on the length of the episode, the learner may not be able to estimate accurately the latent context. The resulting partial observation of the environment makes the learning task significantly more challenging.
Without any additional structural assumptions, existing techniques to tackle partially observed settings imply the decision maker can learn a near-optimal policy with O(A)^H episodes, but do not promise more.
In this work, we show that learning with \em polynomial samples in A is possible. We achieve this by using techniques from experiment design. Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with O(\poly(A) + \poly(M,H)^\min(M,H)) interactions. In practice, we show that we can formulate the moment-matching via maximum likelihood estimation. In our experiments, this significantly outperforms the worst-case guarantees, as well as existing practical methods.
Katiyar, Ashish, Soumya Basu, Vatsal Shah, and Constantine Caramanis. “Robust Estimation of Tree Structured Markov Random Fields.” International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.
@article{katiyarRobustMRF2021,
title = {Robust Estimation of Tree Structured Markov Random Fields},
author = {Katiyar, Ashish and Basu, Soumya and Shah, Vatsal and Caramanis, Constantine},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
year = {2022},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2102.08554.pdf}
}
We study the problem of learning tree-structured Markov random fields (MRF) on discrete random variables with common support when the observations are corrupted by a k-ary symmetric noise channel with unknown probability of error.
For Ising models (support size = 2), past work has shown that graph structure can only be recovered up to the leaf clusters (a leaf node, its parent, and its siblings form a leaf cluster) and exact recovery is impossible. No prior work has addressed the setting of support size of 3 or more, and indeed this setting is far richer. As we show, when the support size is 3 or more, the structure of the leaf clusters may be partially or fully identifiable. We provide a precise characterization of this phenomenon and show that the extent of recoverability is dictated by the joint PMF of the random variables. In particular, we provide necessary and sufficient conditions for exact recoverability. Furthermore, we present a polynomial time, sample efficient algorithm that recovers the exact tree when this is possible, or up to the unidentifiability as promised by our characterization, when full recoverability is impossible. Finally, we demonstrate the efficacy of our algorithm experimentally.
Papadigenopoulos, Orestis, and Constantine Caramanis. “Recurrent Submodular Welfare and Matroid Blocking Bandits.” Advances in Neural Information Processing Systems (NeurIPS), 2021.
@article{papadig2021recurrent,
title = {Recurrent Submodular Welfare and Matroid Blocking Bandits},
author = {Papadigenopoulos, Orestis and Caramanis, Constantine},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2021},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2102.00321.pdf}
}
A recent line of research focuses on the study of the stochastic multi-armed bandits problem (MAB), in the case where temporal correlations of specific structure are imposed between the player’s actions and the reward distributions of the arms (Kleinberg and Immorlica [FOCS18], Basu et al. [NIPS19]). As opposed to the standard MAB setting, where the optimal solution in hindsight can be trivially characterized, these correlations lead to (sub-)optimal solutions that exhibit interesting dynamical patterns – a phenomenon that yields new challenges both from an algorithmic as well as a learning perspective. In this work, we extend the above direction to a combinatorial bandit setting and study a variant of stochastic MAB, where arms are subject to matroid constraints and each arm becomes unavailable (blocked) for a fixed number of rounds after each play. A natural common generalization of the state-of-the-art for blocking bandits, and that for matroid bandits, yields a (1−1/e)-approximation for partition matroids, yet it only guarantees a 1/2-approximation for general matroids. In this paper we develop new algorithmic ideas that allow us to obtain a polynomial-time (1−1/e)-approximation algorithm (asymptotically and in expectation) for any matroid, and thus allow us to control the (1−1/e)-approximate regret. A key ingredient is the technique of correlated (interleaved) scheduling. Along the way, we discover an interesting connection to a variant of Submodular Welfare Maximization, for which we provide (asymptotically) matching upper and lower approximability bounds.
Kwon, Jeongyeol, Yonathan Effroni, Constantine Caramanis, and Shie Mannor. “RL for Latent MDPs: Regret Guarantees and a Lower Bound.” Advances in Neural Information Processing Systems (NeurIPS), 2021.
@article{kwon2021rl,
title = {RL for Latent MDPs: Regret Guarantees and a Lower Bound},
author = {Kwon, Jeongyeol and Effroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2021},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2102.00321.pdf}
}
In this work, we consider the regret minimization problem for reinforcement learning in latent Markov Decision Processes (LMDP). In an LMDP, an MDP is randomly drawn from a set of M possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We first show that a general instance of LMDPs requires at least Ω((SA)M) episodes to even approximate the optimal policy. Then, we consider sufficient assumptions under which learning good policies requires polynomial number of episodes. We show that the key link is a notion of separation between the MDP system dynamics. With sufficient separation, we provide an efficient algorithm with local guarantee, \it i.e., providing a sublinear regret guarantee when we are given a good initialization. Finally, if we are given standard statistical sufficiency assumptions common in the Predictive State Representation (PSR) literature (e.g., Boots et al.) and a reachability assumption, we show that the need for initialization can be removed.
———. “Reinforcement Learning in Reward-Mixing MDPs.” Advances in Neural Information Processing Systems (NeurIPS), 2021.
@article{kwon2021rlb,
title = {Reinforcement Learning in Reward-Mixing MDPs},
author = {Kwon, Jeongyeol and Effroni, Yonathan and Caramanis, Constantine and Mannor, Shie},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2021},
group = {proceedings},
arxiv = {}
}
ILearning a near optimal policy in a partially observable system remains an elusive challenge in contemporary reinforcement learning. In this work, we consider episodic reinforcement learning in a reward-mixing Markov decision process (MDP). There, a reward function is drawn from one of multiple possible reward models at the beginning of every episode, but the identity of the chosen reward model is not revealed to the agent. Hence, the latent state space, for which the dynamics are Markovian, is not given to the agent. We study the problem of learning a near optimal policy for two reward-mixing MDPs. Unlike existing approaches that rely on strong assumptions on the dynamics, we make no assumptions and study the problem in full generality. Indeed, with no further assumptions, even for two switching reward-models, the problem requires several new ideas beyond existing algorithmic and analysis techniques for efficient exploration. We provide the first polynomial-time algorithm that finds an ε-optimal policy after exploring \tildeO(poly(H,ε^-1) ⋅S^2 A^2) episodes, where H is time-horizon and S, A are the number of states and actions respectively. This is the first efficient algorithm that does not require any assumptions in partially observed environments where the observation space is smaller than the latent state space.
Caramanis, Constantine, Paul Duetting, Matthew Faw, Federico Fusco, Philip Lazo, Stefano Leonardi, Orestis Papadigenopoulos, Emmanouil Pountourakis, and Rebecca Reiffenhauser. “Single Sample Prophet Inequalities via Greedy-Ordered Selection.” Symposium on Discrete Algorithms (SODA), 2021.
@article{caramanis2021single,
title = {Single Sample Prophet Inequalities via Greedy-Ordered Selection},
author = {Caramanis, Constantine and Duetting, Paul and Faw, Matthew and Fusco, Federico and Lazo, Philip and Leonardi, Stefano and Papadigenopoulos, Orestis and Pountourakis, Emmanouil and Reiffenhauser, Rebecca},
booktitle = {Symposium on Discrete Algorithms (SODA)},
year = {2021},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2103.13089.pdf}
}
We study Single-Sample Prophet Inequalities (SSPIs), i.e., prophet inequalities where only a single sample from each prior distribution is available. Besides a direct, and optimal, SSPI for the basic single choice problem [Rubinstein et al., ITCS’20], most existing SSPI results were obtained via an elegant, but inherently lossy reduction to Order Oblivious Secretary (OOS) policies [Azar et al., SODA’14]. Motivated by this discrepancy, we develop an intuitive and versatile greedy-based technique that yields SSPIs directly rather than through the reduction to OOS. Our results can be seen as generalizing and unifying a number of existing results in the area of prophet and secretary problems. Our algorithms significantly improve on the competitive guarantees for a number of interesting scenarios (including general matching with edge arrivals, bipartite matching with vertex arrivals and certain matroids), as well as capture new settings (e.g., budget additive valuations). Complementing our algorithmic results, we also consider mechanism design variants. Finally, we analyze the power and limitations of different SSPI approaches by providing a partial converse to the reduction from SSPI to OOS given by [Azar et al., SODA’14].
Atsidakou, Alexia, Orestis Papadigenopoulos, Soumya Basu, Constantine Caramanis, and Sanjay Shakkottai. “Combinatorial Blocking Bandits with Stochastic Delays.” In International Conference on Machine Learning (ICML). PMLR, 2021.
@inproceedings{atsidakou2021combinatorial,
title = {Combinatorial Blocking Bandits with Stochastic Delays},
author = {Atsidakou, Alexia and Papadigenopoulos, Orestis and Basu, Soumya and Caramanis, Constantine and Shakkottai, Sanjay},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2021},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2105.10625.pdf}
}
Recent work has considered natural variations of the \em multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable) model is that of \em blocking bandits, where an arm becomes unavailable for a deterministic number of rounds after each play. In this work, we extend the above model in two directions: (i) We consider the general combinatorial setting where more than one arms can be played at each round, subject to feasibility constraints. (ii) We allow the blocking time of each arm to be stochastic. We first study the computational/unconditional hardness of the above setting and identify the necessary conditions for the problem to become tractable (even in an approximate sense). Based on these conditions, we provide a tight analysis of the approximation guarantee of a natural greedy heuristic that always plays the maximum expected reward feasible subset among the available (non-blocked) arms. When the arms’ expected rewards are unknown, we adapt the above heuristic into a bandit algorithm, based on UCB, for which we provide sublinear (approximate) regret guarantees, matching the theoretical lower bounds in the limiting case of absence of delays.
Kwon, Jeongyeol, Nhat Ho, and Constantine Caramanis. “On the Minimax Optimality of the Em Algorithm for Learning Two-Component Mixed Linear Regression.” In International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 2021.
@inproceedings{kwon2020minimax,
title = {On the minimax optimality of the em algorithm for learning two-component mixed linear regression},
author = {Kwon, Jeongyeol and Ho, Nhat and Caramanis, Constantine},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {},
year = {2021},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2006.02601.pdf}
}
We study the convergence rates of the EM algorithm for learning two-component mixed linear regression under all regimes of signal-to-noise ratio (SNR). We resolve a long-standing question that many recent results have attempted to tackle: we completely characterize the convergence behavior of EM, and show that the EM algorithm achieves minimax optimal sample complexity under all SNR regimes. In particular, when the SNR is sufficiently large, the EM updates converge to the true parameter θ^* at the standard parametric convergence rate \calo((d/n)^1/2) after \calo(\log(n/d)) iterations. In the regime where the SNR is above \calo((d/n)^1/4) and below some constant, the EM iterates converge to a \calo(\rm SNR^-1 (d/n)^1/2) neighborhood of the true parameter, when the number of iterations is of the order \calo(\rm SNR^-2 \log(n/d)). In the low SNR regime where the SNR is below \calo((d/n)^1/4), we show that EM converges to a \calo((d/n)^1/4) neighborhood of the true parameters, after \calo((n/d)^1/2) iterations. Notably, these results are achieved under mild conditions of either random initialization or an efficiently computable local initialization. By providing tight convergence guarantees of the EM algorithm in middle-to-low SNR regimes, we fill the remaining gap in the literature, and significantly, reveal that in low SNR, EM changes rate, matching the n^-1/4 rate of the MLE, a behavior that previous work had been unable to show.
Basu, Soumya, Orestis Papadigenopoulos, Constantine Caramanis, and Sanjay Shakkottai. “Contextual Blocking Bandits.” In International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 2021.
@inproceedings{basu2020contextual,
title = {Contextual Blocking Bandits},
author = {Basu, Soumya and Papadigenopoulos, Orestis and Caramanis, Constantine and Shakkottai, Sanjay},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {},
year = {2021},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2003.03426.pdf}
}
We study a novel variant of the multi-armed bandit problem, where at each time step, the player observes an independently sampled context that determines the arms’ mean rewards. However, playing an arm blocks it (across all contexts) for a fixed number of future time steps. The above contextual setting captures important scenarios such as recommendation systems or ad placement with diverse users.
This problem has been recently studied \citepDSSX18 in the full-information setting (i.e., assuming knowledge of the mean context-dependent arm rewards), where competitive ratio bounds have been derived.
We focus on the bandit setting, where these means are initially unknown; we propose a UCB-based variant of the full-information algorithm that guarantees a \mathcalO(\log T)-regret w.r.t. an α-optimal strategy in T time steps, matching the Ω(\log(T)) regret lower bound in this setting. Due to the time correlations caused by blocking, existing techniques for upper bounding regret fail. For proving our regret bounds, we introduce the novel concepts of delayed exploitation and opportunistic sub-sampling and combine them with ideas from combinatorial bandits and non-stationary Markov chains coupling.
Kwon, Jeongyeol, and Constantine Caramanis. “The EM Algorithm Gives Sample-Optimality for Learning Mixtures of Well-Separated Gaussians.” The Conference on Learning Theory (COLT), 2020.
@article{kwon2020algorithm,
title = {The EM algorithm gives sample-optimality for learning mixtures of well-separated gaussians},
author = {Kwon, Jeongyeol and Caramanis, Constantine},
journal = {The Conference on Learning Theory (COLT)},
year = {2020},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2002.00329.pdf}
}
We consider the problem of spherical Gaussian Mixture models with k?3 components when the components are well separated. A fundamental previous result established that separation of Ω(\sqrt \log k) is necessary and sufficient for identifiability of the parameters with polynomial sample complexity (Regev and Vijayaraghavan, 2017). In the same context, we show that O(kd/ε^2) samples suffice for any ??1/k, closing the gap from polynomial to linear, and thus giving the first optimal sample upper bound for the parameter estimation of well-separated Gaussian mixtures. We accomplish this by proving a new result for the Expectation-Maximization (EM) algorithm: we show that EM converges locally, under separation Ω(\sqrt \log k ). The previous best-known guarantee required Ω(\sqrt k) separation (Yan, et al., 2017). Unlike prior work, our results do not assume or use prior knowledge of the (potentially different) mixing weights or variances of the Gaussian components. Furthermore, our results show that the finite-sample error of EM does not depend on non-universal quantities such as pairwise distances between means of Gaussian components.
Hoffmann, Jessica, Soumya Basu, Surbhi Goel, and Constantine Caramanis. “Disentangling Mixtures of Epidemics on Graphs.” International Conference on Machine Learning (ICML), 2020.
@article{hoffmann2019disentangling,
title = {Disentangling Mixtures of Epidemics on Graphs},
author = {Hoffmann, Jessica and Basu, Soumya and Goel, Surbhi and Caramanis, Constantine},
journal = {International Conference on Machine Learning (ICML)},
year = {2020},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1906.06057.pdf}
}
We consider the problem of learning the weighted edges of a balanced mixture of two undirected graphs from epidemic cascades. While mixture models are popular modeling tools, algorithmic development with rigorous guarantees has lagged. Graph mixtures are apparently no exception: until now, very little is known about whether this problem is solvable. To the best of our knowledge, we establish the first necessary and sufficient conditions for this problem to be solvable in polynomial time on edge-separated graphs. When the conditions are met, i.e., when the graphs are connected with at least three edges, we give an efficient algorithm for learning the weights of both graphs with optimal sample complexity (up to log factors). We give complimentary results and provide sample-optimal (up to log factors) algorithms for mixtures of directed graphs of out-degree at least three, for mixture of undirected graphs of unbalanced and/or unknown priors.
Tziotis, Isidoros, Constantine Caramanis, and Aryan Mokhtari. “Second Order Optimality in Decentralized Non-Convex Optimization via Perturbed Gradient Tracking.” Advances in Neural Information Processing Systems (NeurIPS) 33 (2020).
@article{tziotis2020second,
title = {Second Order Optimality in Decentralized Non-Convex Optimization via Perturbed Gradient Tracking},
author = {Tziotis, Isidoros and Caramanis, Constantine and Mokhtari, Aryan},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {33},
year = {2020},
group = {proceedings},
arxiv = {https://proceedings.neurips.cc/paper/2020/file/f1ea154c843f7cf3677db7ce922a2d17-Paper.pdf}
}
In this paper we study the problem of escaping from saddle points and achieving second-order optimality in a decentralized setting where a group of agents collaborate to minimize their aggregate objective function. We provide a non-asymptotic (finite-time) analysis and show that by following the idea of perturbed gradient descent, it is possible to converge to a second-order stationary point in a number of iterations which depends linearly on dimension and polynomially on the accuracy of second-order stationary point. Doing this in a communication-efficient manner requires overcoming several challenges, from identifying (first order) stationary points in a distributed manner, to adapting the perturbed gradient framework without prohibitive communication complexity. Our proposed Perturbed Decentralized Gradient Tracking (PDGT) method consists of two major stages: (i) a gradient based step to find a first-order stationary point and (ii) a perturbed gradient descent step to escape from a first-order stationary point, if it is a saddle point with sufficient curvature. As a side benefit of our result, in the case that all saddle points are non-degenerate (strict), the proposed PDGT method finds a local minimum of the considered decentralized optimization problem in a finite number of iterations.
Faw, Matthew, Rajat Sen, Karthikeyan Shanmugam, Constantine Caramanis, and Sanjay Shakkottai. “Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions.” Advances in Neural Information Processing Systems (NeurIPS) 33 (2020).
@article{faw2020mix,
title = {Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions},
author = {Faw, Matthew and Sen, Rajat and Shanmugam, Karthikeyan and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {33},
year = {2020},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1907.10154.pdf}
}
We consider a covariate shift problem where one has access to several different training datasets for the same learning problem and a small validation set which possibly differs from all the individual training distributions. This covariate shift is caused, in part, due to unobserved features in the datasets. The objective, then, is to find the best mixture distribution over the training datasets (with only observed features) such that training a learning algorithm using this mixture has the best validation performance. Our proposed algorithm, MixNMatch, combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions) over the space of mixtures, for this task. We prove simple regret guarantees for our algorithm with respect to recovering the optimal mixture, given a total budget of SGD evaluations. Finally, we validate our algorithm on two real-world datasets.
Kocaoglu, Murat, Sanjay Shakkottai, Alexandros G Dimakis, Constantine Caramanis, and Sriram Vishwanath. “Applications of Common Entropy for Causal Inference.” Advances in Neural Information Processing Systems (NeurIPS) 33 (2020).
@article{kocaoglu2020applications,
title = {Applications of Common Entropy for Causal Inference},
author = {Kocaoglu, Murat and Shakkottai, Sanjay and Dimakis, Alexandros G and Caramanis, Constantine and Vishwanath, Sriram},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {33},
year = {2020},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1807.10399.pdf}
}
We study the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. The minimum entropy required for such a latent is known as common entropy in information theory. We extend this notion to Renyi common entropy by minimizing the Renyi entropy of the latent variable. To efficiently compute common entropy, we propose an iterative algorithm that can be used to discover the trade-off between the entropy of the latent variable and the conditional mutual information of the observed variables. We show two applications of common entropy in causal inference: First, under the assumption that there are no low-entropy mediators, it can be used to distinguish causation from spurious correlation among almost all joint distributions on simple causal graphs with two observed variables. Second, common entropy can be used to improve constraint-based methods such as PC or FCI algorithms in the small-sample regime, where these methods are known to struggle. We propose a modification to these constraint-based methods to assess if a separating set found by these algorithms is valid using common entropy. We finally evaluate our algorithms on synthetic and real data to establish their performance.
Kwon, Jeongyeol, and Constantine Caramanis. “EM Converges for a Mixture of Many Linear Regressions.” In International Conference on Artificial Intelligence and Statistics (AISTATS), 1727–36. PMLR, 2020.
@inproceedings{kwon2020converges,
title = {EM converges for a mixture of many linear regressions},
author = {Kwon, Jeongyeol and Caramanis, Constantine},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {1727--1736},
year = {2020},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1905.12106.pdf}
}
We study the convergence of the Expectation-Maximization (EM) algorithm for mixtures of linear regressions with an arbitrary number k of components. We show that as long as signal-to-noise ratio (SNR) is \tildeΩ(k), well-initialized EM converges to the true regression parameters. Previous results for k ≥3 have only established local convergence for the noiseless setting, i.e., where SNR is infinitely large. Our results enlarge the scope to the environment with noises, and notably, we establish a statistical error rate that is independent of the norm (or pairwise distance) of the regression parameters. In particular, our results imply exact recovery as σ→0, in contrast to most previous local convergence results for EM, where the statistical error scaled with the norm of parameters. Standard moment-method approaches may be applied to guarantee we are in the region where our local convergence guarantees apply.
Liu, Liu, Yanyao Shen, Tianyang Li, and Constantine Caramanis. “High Dimensional Robust Sparse Regression.” In International Conference on Artificial Intelligence and Statistics (AISTATS), 411–21. PMLR, 2020.
@inproceedings{liu2020high,
title = {High dimensional robust sparse regression},
author = {Liu, Liu and Shen, Yanyao and Li, Tianyang and Caramanis, Constantine},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {411--421},
year = {2020},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1805.11643.pdf}
}
We provide a novel – and to the best of our knowledge, the first – algorithm for high dimensional sparse regression with constant fraction of corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse parameters with sub-linear sample complexity, in the presence of a constant fraction of arbitrary corruptions. Our main contribution is a robust variant of Iterative Hard Thresholding. Using this, we provide accurate estimators: when the covariance matrix in sparse regression is identity, our error guarantee is near information-theoretically optimal. We then deal with robust sparse regression with unknown structured covariance matrix. We propose a filtering algorithm which consists of a novel randomized outlier removal technique for robust sparse mean estimation that may be of interest in its own right: the filtering algorithm is flexible enough to deal with unknown covariance. Also, it is orderwise more efficient computationally than the ellipsoid algorithm. Using sub-linear sample complexity, our algorithm achieves the best known (and first) error guarantee. We demonstrate the effectiveness on large-scale sparse regression problems with arbitrary corruptions.
Zhuo, Jiacheng, Qi Lei, Alex Dimakis, and Constantine Caramanis. “Communication-Efficient Asynchronous Stochastic Frank-Wolfe over Nuclear-Norm Balls.” In International Conference on Artificial Intelligence and Statistics (AISTATS), 1464–74. PMLR, 2020.
@inproceedings{zhuo2020communication,
title = {Communication-Efficient Asynchronous Stochastic Frank-Wolfe over Nuclear-norm Balls},
author = {Zhuo, Jiacheng and Lei, Qi and Dimakis, Alex and Caramanis, Constantine},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {1464--1474},
year = {2020},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1910.07703.pdf}
}
Large-scale machine learning training suffers from two prior challenges, specifically for nuclear-norm constrained problems with distributed systems: the synchronization slowdown due to the straggling workers, and high communication costs. In this work, we propose an asynchronous Stochastic Frank Wolfe (SFW-asyn) method, which, for the first time, solves the two problems simultaneously, while successfully maintaining the same convergence rate as the vanilla SFW. We implement our algorithm in python (with MPI) to run on Amazon EC2, and demonstrate that SFW-asyn yields speed-ups almost linear to the number of machines compared to the vanilla SFW.
Jalal, Ajil, Liu Liu, Alexandros G Dimakis, and Constantine Caramanis. “Robust Compressed Sensing of Generative Models.” In Advances in Neural Information Processing Systems (NeurIPS), Vol. 33, 2020.
@inproceedings{jalal2020robust,
title = {Robust compressed sensing of generative models},
author = {Jalal, Ajil and Liu, Liu and Dimakis, Alexandros G and Caramanis, Constantine},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
pages = {},
year = {2020},
volume = {33},
organization = {},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/2006.09461.pdf},
code = {https://github.com/ajiljalal/csgm-robust-neurips}
}
The goal of compressed sensing is to estimate a high dimensional vector from an underdetermined system of noisy linear equations. In analogy to classical compressed sensing, here we assume a generative model as a prior, that is, we assume the vector is represented by a deep generative model G: \R^k →\R^n. Classical recovery approaches such as empirical risk minimization (ERM) are guaranteed to succeed when the measurement matrix is sub-Gaussian. However, when the measurement matrix and measurements are heavy tailed or have outliers, recovery may fail dramatically. In this paper we propose an algorithm inspired by the Median of Means (MOM). Our algorithm guarantees recovery for heavy tailed data, even in the presence of outliers. Theoretically, our results show our novel MOM-based algorithm enjoys the same sample complexity guarantees as ERM under sub-Gaussian assumptions. Our experiments validate both aspects of our claims: other algorithms are indeed fragile and fail under heavy tailed and/or corrupted data, while our approach exhibits the predicted robustness.
Hoffmann, Jessica, and Constantine Caramanis. “Learning Graphs from Noisy Epidemic Cascades.” Proceedings of the ACM on Measurement and Analysis of Computing Systems 3, no. 2 (2019): 1–34.
@article{hoffmann2019learning,
title = {Learning graphs from noisy epidemic cascades},
author = {Hoffmann, Jessica and Caramanis, Constantine},
journal = {Proceedings of the ACM on Measurement and Analysis of Computing Systems},
volume = {3},
number = {2},
pages = {1--34},
year = {2019},
publisher = {ACM New York, NY, USA},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1903.02650.pdf}
}
We consider the problem of learning the weighted edges of a graph by observing the noisy times of infection for multiple epidemic cascades on this graph. Past work has considered this problem when the cascade information, i.e., infection times, are known exactly. Though the noisy setting is well motivated by many epidemic processes (e.g., most human epidemics), to the best of our knowledge, very little is known about when it is solvable. Previous work on the no-noise setting critically uses the ordering information. If noise can reverse this – a node’s reported (noisy) infection time comes after the reported infection time of some node it infected – then we are unable to see how previous results can be extended.
We therefore tackle two versions of the noisy setting: the limited-noise setting, where we know noisy times of infections, and the extreme-noise setting, in which we only know whether or not a node was infected. We provide a polynomial time algorithm for recovering the structure of bidirectional trees in the extreme-noise setting, and show our algorithm almost matches lower bounds established in the no-noise setting, and hence is optimal up to log-factors. We extend our results for general degree-bounded graphs, where again we show that our (poly-time) algorithm can recover the structure of the graph with optimal sample complexity. We also provide the first efficient algorithm to learn the weights of the bidirectional tree in the limited-noise setting. Finally, we give a polynomial time algorithm for learning the weights of general bounded-degree graphs in the limited-noise setting. This algorithm extends to general graphs (at the price of exponential running time), proving the problem is solvable in the general case. All our algorithms work for any noise distribution, without any restriction on the variance.
Lei, Qi, Jiacheng Zhuo, Constantine Caramanis, Inderjit S Dhillon, and Alexandros G Dimakis. “Primal-Dual Block Generalized Frank-Wolfe.” Advances in Neural Information Processing Systems (NeurIPS) 32 (2019): 13866–75.
@article{lei2019primal,
title = {Primal-dual block generalized frank-wolfe},
author = {Lei, Qi and Zhuo, Jiacheng and Caramanis, Constantine and Dhillon, Inderjit S and Dimakis, Alexandros G},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {32},
pages = {13866--13875},
year = {2019},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1906.02436.pdf}
}
We propose a variant of the Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems. Our formulation includes Elastic Net, regularized SVMs and phase retrieval as special cases. The proposed Primal-Dual Block Frank-Wolfe algorithm reduces the per-iteration cost while maintaining linear convergence rate. The per iteration cost of our method depends on the structural complexity of the solution (i.e. sparsity/low-rank) instead of the ambient dimension. We empirically show that our algorithm outperforms the state-of-the-art methods on (multi-class) classification tasks.
Kwon, Jeongyeol, Wei Qian, Constantine Caramanis, Yudong Chen, and Damek Davis. “Global Convergence of the Em Algorithm for Mixtures of Two Component Linear Regression.” In Conference on Learning Theory (COLT), 2055–2110. PMLR, 2019.
@inproceedings{kwon2019global,
title = {Global convergence of the em algorithm for mixtures of two component linear regression},
author = {Kwon, Jeongyeol and Qian, Wei and Caramanis, Constantine and Chen, Yudong and Davis, Damek},
booktitle = {Conference on Learning Theory (COLT)},
pages = {2055--2110},
year = {2019},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1810.05752.pdf}
}
The Expectation-Maximization algorithm is perhaps the most broadly used algorithm for inference of latent variable problems. A theoretical understanding of its performance, however, largely remains lacking. Recent results established that EM enjoys global convergence for Gaussian Mixture Models. For Mixed Linear Regression, however, only local convergence results have been established, and those only for the high SNR regime. We show here that EM converges for mixed linear regression with two components (it is known that it may fail to converge for three or more), and moreover that this convergence holds for random initialization. Our analysis reveals that EM exhibits very different behavior in Mixed Linear Regression from its behavior in Gaussian Mixture Models, and hence our proofs require the development of several new ideas.
Katiyar, Ashish, Jessica Hoffmann, and Constantine Caramanis. “Robust Estimation of Tree Structured Gaussian Graphical Models.” In International Conference on Machine Learning (ICML), 3292–3300. PMLR, 2019.
@inproceedings{katiyar2019robust,
title = {Robust estimation of tree structured Gaussian graphical models},
author = {Katiyar, Ashish and Hoffmann, Jessica and Caramanis, Constantine},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {3292--3300},
year = {2019},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1901.08770.pdf}
}
Consider jointly Gaussian random variables whose conditional independence structure is specified by a graphical model. If we observe realizations of the variables, we can compute the covariance matrix, and it is well known that the support of the inverse covariance matrix corresponds to the edges of the graphical model. Instead, suppose we only have noisy observations. If the noise at each node is independent, we can compute the sum of the covariance matrix and an unknown diagonal. The inverse of this sum is (in general) dense. We ask: can the original independence structure be recovered? We address this question for tree structured graphical models. We prove that this problem is unidentifiable, but show that this unidentifiability is limited to a small class of candidate trees. We further present additional constraints under which the problem is identifiable. Finally, we provide an O(n^3) algorithm to find this equivalence class of trees.
Hoffmann, Jessica, and Constantine Caramanis. “The Cost of Uncertainty in Curing Epidemics.” Proceedings of the ACM on Measurement and Analysis of Computing Systems 2, no. 2 (2018): 1–33.
@article{hoffmann2018cost,
title = {The Cost of Uncertainty in Curing Epidemics},
author = {Hoffmann, Jessica and Caramanis, Constantine},
journal = {Proceedings of the ACM on Measurement and Analysis of Computing Systems},
volume = {2},
number = {2},
pages = {1--33},
year = {2018},
publisher = {ACM New York, NY, USA},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1711.00167.pdf}
}
Motivated by the study of controlling (curing) epidemics, we consider the spread of an SI process on a known graph, where we have a limited budget to use to transition infected nodes back to the susceptible state (i.e., to cure nodes). Recent work has demonstrated that under perfect and instantaneous information (which nodes are/are not infected), the budget required for curing a graph precisely depends on a combinatorial property called the CutWidth. We show that this assumption is in fact necessary: even a minor degradation of perfect information, e.g., a diagnostic test that is 99% accurate, drastically alters the landscape. Infections that could previously be cured in sublinear time now may require exponential time, or orderwise larger budget to cure. The crux of the issue comes down to a tension not present in the full information case: if a node is suspected (but not certain) to be infected, do we risk wasting our budget to try to cure an uninfected node, or increase our certainty by longer observation, at the risk that the infection spreads further? Our results present fundamental, algorithm-independent bounds that tradeoff budget required vs. uncertainty.
Sinno, Zeina, Constantine Caramanis, and Alan Bovik. “Second Order Natural Scene Statistics Model of Blind Image Quality Assessment.” In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1238–42. IEEE, 2018.
@inproceedings{sinno2018second,
title = {Second order natural scene statistics model of blind image quality assessment},
author = {Sinno, Zeina and Caramanis, Constantine and Bovik, Alan},
booktitle = {2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages = {1238--1242},
year = {2018},
organization = {IEEE},
group = {proceedings}
}
The univariate statistics of bandpass-filtered images provide powerful features that drive many successful image quality assessment (IQA) algorithms. Bivariate Natural Scene Statistics (NSS), which model the joint statistics of multiple bandpass image samples also provide potentially powerful features to assess the perceptual quality of images, by capturing both image and distortion correlations. Here, we make the first attempt to use bivariate NSS features to build a model of no-reference image quality prediction. We show that our bivariate model outperforms existing state of the art image quality predictors.
Li, Tianyang, Liu Liu, Anastasios Kyrillidis, and Constantine Caramanis. “Statistical Inference Using SGD.” In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, 2018.
@inproceedings{li2018statistical,
title = {Statistical inference using SGD},
author = {Li, Tianyang and Liu, Liu and Kyrillidis, Anastasios and Caramanis, Constantine},
booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
volume = {32},
number = {1},
year = {2018},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1705.07477.pdf}
}
We present a novel method for frequentist statistical inference in M-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.
Li, Tianyang, Xinyang Yi, Constantine Carmanis, and Pradeep Ravikumar. “Minimax Gaussian Classification & Clustering.” In International Conference on Artificial Intelligence and Statistics (AISTATS), 1–9. PMLR, 2017.
@inproceedings{li2017minimax,
title = {Minimax gaussian classification \& clustering},
author = {Li, Tianyang and Yi, Xinyang and Carmanis, Constantine and Ravikumar, Pradeep},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {1--9},
year = {2017},
organization = {PMLR},
group = {proceedings}
}
We present minimax bounds for classification and clustering error in the setting where covariates are drawn from a mixture of two isotropic Gaussian distributions. Here, we define clustering error in a discriminative fashion, demonstrating fundamental connections between classification (supervised) and clustering (unsupervised). For both classification and clustering, our lower bounds show that without enough samples, the best any classifier or clustering rule can do is close to random guessing. For classification, as part of our upper bound analysis, we show that Fisher?s linear discriminant achieves a fast minimax rate ?(1/n) with enough samples n. For clustering, as part of our upper bound analysis, we show that a clustering rule constructed using principal component analysis achieves the minimax rate with enough samples. We also provide lower and upper bounds for the high-dimensional sparse setting where the dimensionality of the covariates p is potentially larger than the number of samples n, but where the difference between the Gaussian means is sparse.
Park, Dohyung, Anastasios Kyrillidis, Constantine Carmanis, and Sujay Sanghavi. “Non-Square Matrix Sensing without Spurious Local Minima via the Burer-Monteiro Approach.” In International Conference on Artificial Intelligence and Statistics (AISTATS), 65–74. PMLR, 2017.
@inproceedings{park2017non,
title = {Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach},
author = {Park, Dohyung and Kyrillidis, Anastasios and Carmanis, Constantine and Sanghavi, Sujay},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {65--74},
year = {2017},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1609.03240.pdf}
}
We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions. We focus on the non-convex formulation, where any rank-r matrix X ∈\mathbbR^m \times n is represented as UB^⊤, where U ∈\mathbbR^m \times r and V ∈\mathbbR^n \times r. In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP.
Yi, Xinyang, Dohyung Park, Yudong Chen, and Constantine Caramanis. “Fast Algorithms for Robust PCA via Gradient Descent.” Advances in Neural Information Processing Systems (NeurIPS), 2016.
@article{yi2016fast,
title = {Fast algorithms for robust PCA via gradient descent},
author = {Yi, Xinyang and Park, Dohyung and Chen, Yudong and Caramanis, Constantine},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2016},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1605.07784.pdf}
}
We consider the problem of Robust PCA in the fully and partially observed settings. Without corruptions, this is the well-known matrix completion problem. From a statistical standpoint this problem has been recently well-studied, and conditions on when recovery is possible (how many observations do we need, how many corruptions can we tolerate) via polynomial-time algorithms is by now understood. This paper presents and analyzes a non-convex optimization approach that greatly reduces the computational complexity of the above problems, compared to the best available algorithms. In particular, in the fully observed case, with r denoting rank and d dimension, we reduce the complexity from O(r^2d^2log(1/ε)) to O(rd^2log(1/ε)) – a big savings when the rank is big. For the partially observed case, we show the complexity of our algorithm is no more than O(r^4d\log d\log(1/ε)). Not only is this the best-known run-time for a provable algorithm under partial observation, but in the setting where r is small compared to d, it also allows for near-linear-in-d run-time that can be exploited in the fully-observed case as well, by simply running our algorithm on a subset of the observations.
Yi, Xinyang, Zhaoran Wang, Zhuoran Yang, Constantine Caramanis, and Han Liu. “More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning.” Advances in Neural Information Processing Systems (NeurIPS), 2016.
@article{yi2019more,
title = {More supervision, less computation: statistical-computational tradeoffs in weakly supervised learning},
author = {Yi, Xinyang and Wang, Zhaoran and Yang, Zhuoran and Caramanis, Constantine and Liu, Han},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
year = {2016},
group = {proceedings},
arxiv = {https://arxiv.org/abs/1907.06257.pdf}
}
We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability 1 - α. Although there exist numerous algorithms for this problem, it remains theoretically unexplored how the statistical accuracies and computational efficiency of these algorithms depend on the degree of supervision, which is quantified by α. In this paper, we characterize the effect of αby establishing the information-theoretic and computational boundaries, namely, the minimax-optimal statistical accuracy that can be achieved by all algorithms, and polynomial-time algorithms under an oracle computational model. For small α, our result shows a gap between these two boundaries, which represents the computational price of achieving the information-theoretic boundary due to the lack of supervision. Interestingly, we also show that this gap narrows as αincreases. In other words, having more supervision, i.e., more correct labels, not only improves the optimal statistical accuracy as expected, but also enhances the computational efficiency for achieving such accuracy.
Wang, Ye, Constantine Caramanis, and Michael Orshansky. “Exploiting Randomness in Sketching for Efficient Hardware Implementation of Machine Learning Applications.” In Proceedings of the 35th International Conference on Computer-Aided Design, 1–8, 2016.
@inproceedings{wang2016exploiting,
title = {Exploiting randomness in sketching for efficient hardware implementation of machine learning applications},
author = {Wang, Ye and Caramanis, Constantine and Orshansky, Michael},
booktitle = {Proceedings of the 35th International Conference on Computer-Aided Design},
pages = {1--8},
year = {2016},
group = {proceedings}
}
Energy-efficient processing of large matrices for big-data applications using hardware acceleration is an intense area of research. Sketching of large matrices into their lower-dimensional representations is an effective strategy. For the first time, this paper develops a highly energy-efficient hardware implementation of a class of sketching methods based on random projections, known as Johnson Lindenstrauss (JL) transform. Crucially, we show how the randomness inherent in the projection matrix can be exploited to create highly efficient fixed-point arithmetic realizations of several machine-learning applications. Specifically, the transform’s random matrices have two key properties that allow for significant energy gains in hardware implementations. The first is the smoothing property that allows us to drastically reduce operand bit-width in computation of the JL transform itself. The second is the randomizing property that allows bit-width reduction in subsequent machine-learning applications. Further, we identify a random matrix construction method that exploits the special sparsity structure to result in the most hardware-efficient realization and implement the highly optimized transform on an FPGA. Experimental results on (1) the k-nearest neighbor (KNN) classification and (2) the principal component analysis (PCA) show that with the same bit-width the proposed flow utilizing random projection achieves an up to 7x improvement in both latency and energy. Furthermore, by exploiting the smoothing and randomizing properties we are able to use a 1-bit instead of a 4-bit multiplier within KNN, which results in additional 50% and 6% improvement in area and energy respectively. The proposed I/O streaming strategy along with the hardware-efficient JL algorithm identified by us is able to achieve a 50% runtime reduction, a 17% area reduction in the stage of random projection compared to a standard design.
———. “PolyGP: Improving GP-Based Analog Optimization through Accurate High-Order Monomials and Semidefinite Relaxation.” In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 1423–28. IEEE, 2016.
@inproceedings{wang2016polygp,
title = {PolyGP: Improving GP-based analog optimization through accurate high-order monomials and semidefinite relaxation},
author = {Wang, Ye and Caramanis, Constantine and Orshansky, Michael},
booktitle = {2016 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)},
pages = {1423--1428},
year = {2016},
organization = {IEEE},
group = {proceedings}
}
Geometric programming (GP) is popular for use in equation-based optimization of analog circuits thanks to GP-compatible analog performance functions, and its convexity, hence computational tractability. The main challenge in using GP, and thus a roadblock to wider use and adoption, is the mismatch between what GP can accurately fit, and the behavior of many common device/circuit functions. In this paper, we leverage recent tools from sums-of-squares, moment optimization, and semidefinite optimization (SDP), in order to present a novel and powerful extension to address the monomial inaccuracy: fitting device models as higher-order monomials, defined as the exponential functions of polynomials in the logarithmic variables. By the introduction of high-order monomials, the original GP problems become polynomial geometric programming (PolyGP) problems with non-linear and non-convex objective and constraints. Our PolyGP framework allows significant improvements in model accuracy when symbolic performance functions in terms of device models are present. Via SDP-relaxations inspired by polynomial optimization (POP), we can obtain efficient near-optimal global solutions to the resulting PolyGP. Experimental results through established circuits show that compared to GP, we are able to reduce fitting error of device models to 3.5% from 10.5% on average. Hence, the fitting error of performance functions decrease from 12% of GP and 9% of POP, to 3% accordingly. This translates to the ability of identifying superior solution points and the dramatic decrease of constraint violation in contrast to both GP and POP.
Yi, Xinyang, Zhaoran Wang, Constantine Caramanis, and Han Liu. “Optimal Linear Estimation under Unknown Nonlinear Transform.” Advances in Neural Information Processing Systems (NeurIPS) 28 (2015): 1549.
@article{yi2015optimal,
title = {Optimal linear estimation under unknown nonlinear transform},
author = {Yi, Xinyang and Wang, Zhaoran and Caramanis, Constantine and Liu, Han},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {28},
pages = {1549},
year = {2015},
publisher = {NIH Public Access},
group = {proceedings},
arxiv = {https://arxiv.org/abs/1505.03257.pdf}
}
Linear regression studies the problem of estimating a model parameter β^* ∈\mathbbR^p, from n observations (y_i,x_)_i=1^n from linear model y_i = ⟨x_i,β^* ⟩+ \epsilon_i. We consider a significant generalization in which the relationship between ⟨x_i,β^* ⟩and y_i is noisy, quantized to a single bit, potentially nonlinear, noninvertible, as well as unknown. This model is known as the single-index model in statistics, and, among other things, it represents a significant generalization of one-bit compressed sensing. We propose a novel spectral-based estimation procedure and show that we can recover β^*$ in settings (i.e., classes of link function f) where previous algorithms fail. In general, our algorithm requires only very mild restrictions on the (unknown) functional relationship between y_i and ⟨x_i ,β^* ⟩. We also consider the high dimensional setting where β^* is sparse ,and introduce a two-stage nonconvex framework that addresses estimation challenges in high dimensional regimes where p?n. For a broad class of link functions between ⟨x_i ,β^* ⟩and y_i, we establish minimax lower bounds that demonstrate the optimality of our estimators in both the classical and high dimensional regimes.
Yi, Xinyang, and Constantine Caramanis. “Regularized Em Algorithms: A Unified Framework and Statistical Guarantees.” Advances in Neural Information Processing Systems (NeurIPS) 28 (2015): 1549.
@article{yi2015regularized,
title = {Regularized em algorithms: A unified framework and statistical guarantees},
author = {Yi, Xinyang and Caramanis, Constantine},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {28},
pages = {1549},
year = {2015},
publisher = {NIH Public Access},
group = {proceedings},
arxiv = {https://arxiv.org/abs/1511.08551.pdf}
}
Latent variable models are a fundamental modeling tool in machine learning applications, but they present significant computational and analytical challenges. The popular EM algorithm and its variants, is a much used algorithmic tool; yet our rigorous understanding of its performance is highly incomplete. Recently, work in Balakrishnan et al. (2014) has demonstrated that for an important class of problems, EM exhibits linear local convergence. In the high-dimensional setting, however, the M-step may not be well defined. We address precisely this setting through a unified treatment using regularization. While regularization for high-dimensional problems is by now well understood, the iterative EM algorithm requires a careful balancing of making progress towards the solution while identifying the right structure (e.g., sparsity or low-rank). In particular, regularizing the M-step using the state-of-the-art high-dimensional prescriptions (e.g., Wainwright (2014)) is not guaranteed to provide this balance. Our algorithm and analysis are linked in a way that reveals the balance between optimization and statistical errors. We specialize our general framework to sparse gaussian mixture models, high-dimensional mixed regression, and regression with missing variables, obtaining statistical guarantees for each of these examples.
Meirom, Eli A, Chris Milling, Constantine Caramanis, Shie Mannor, Sanjay Shakkottai, and Ariel Orda. “Localized Epidemic Detection in Networks with Overwhelming Noise.” ACM SIGMETRICS Performance Evaluation Review 43, no. 1 (2015): 441–42.
@article{meirom2015localized,
title = {Localized epidemic detection in networks with overwhelming noise},
author = {Meirom, Eli A and Milling, Chris and Caramanis, Constantine and Mannor, Shie and Shakkottai, Sanjay and Orda, Ariel},
journal = {ACM SIGMETRICS Performance Evaluation Review},
volume = {43},
number = {1},
pages = {441--442},
year = {2015},
publisher = {ACM New York, NY, USA},
group = {proceedings},
arxiv = {https://arxiv.org/abs/1402.1263.pdf}
}
We consider the problem of detecting an epidemic in a population where individual diagnoses are extremely noisy. The motivation for this problem is the plethora of examples (influenza strains in humans, or computer viruses in smartphones, etc.) where reliable diagnoses are scarce, but noisy data plentiful. In flu/phone-viruses, exceedingly few infected people/phones are professionally diagnosed (only a small fraction go to a doctor) but less reliable secondary signatures (e.g., people staying home, or greater-than-typical upload activity) are more readily available. These secondary data are often plagued by unreliability: many people with the flu do not stay home, and many people that stay home do not have the flu. This paper identifies the precise regime where knowledge of the contact network enables finding the needle in the haystack: we provide a distributed, efficient and robust algorithm that can correctly identify the existence of a spreading epidemic from highly unreliable local data. Our algorithm requires only local-neighbor knowledge of this graph, and in a broad array of settings that we describe, succeeds even when false negatives and false positives make up an overwhelming fraction of the data available. Our results show it succeeds in the presence of partial information about the contact network, and also when there is not a single "patient zero", but rather many (hundreds, in our examples) of initial patient-zeroes, spread across the graph.
Milling, Chris, Constantine Caramanis, Shie Mannor, and Sanjay Shakkottai. “Local Detection of Infections in Heterogeneous Networks.” In 2015 IEEE Conference on Computer Communications (INFOCOM), 1517–25. IEEE, 2015.
@inproceedings{milling2015local,
title = {Local detection of infections in heterogeneous networks},
author = {Milling, Chris and Caramanis, Constantine and Mannor, Shie and Shakkottai, Sanjay},
booktitle = {2015 IEEE Conference on Computer Communications (INFOCOM)},
pages = {1517--1525},
year = {2015},
organization = {IEEE},
group = {proceedings}
}
In many networks the operator is faced with nodes that report a potentially important phenomenon such as failures, illnesses, and viruses. The operator is faced with the question: Is it spreading over the network, or simply occurring at random? We seek to answer this question from highly noisy and incomplete data, where at a single point in time we are given a possibly very noisy subset of the infected population (including false positives and negatives). While previous work has focused on uniform spreading rates for the infection, heterogeneous graphs with unequal edge weights are more faithful models of reality. Critically, the network structure may not be fully known and modeling epidemic spread on unknown graphs relies on non-homogeneous edge (spreading) weights. Such heterogeneous graphs pose considerable challenges, requiring both algorithmic and analytical development. We develop an algorithm that can distinguish between a spreading phenomenon and a randomly occurring phenomenon while using only local information and not knowing the complete network topology and the weights. Further, we show that this algorithm can succeed even in the presence of noise, false positives and unknown graph edges.
Yi, Xinyang, Constantine Caramanis, and Eric Price. “Binary Embedding: Fundamental Limits and Fast Algorithm.” In International Conference on Machine Learning (ICML), 2162–70. PMLR, 2015.
@inproceedings{yi2015binary,
title = {Binary embedding: Fundamental limits and fast algorithm},
author = {Yi, Xinyang and Caramanis, Constantine and Price, Eric},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {2162--2170},
year = {2015},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/abs/1502.05746.pdf}
}
Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in \mathbbS^p-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δuniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m = Ω(\log N / δ^2) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) [DELETED, see comment]; (3) we also provide an analytic result about embedding a general set of points K ⊆\mathbbS^p-1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.
Wang, Ye, Meng Li, Xinyang Yi, Zhao Song, Michael Orshansky, and Constantine Caramanis. “Novel Power Grid Reduction Method Based on l1 Regularization.” In Proceedings of the 52nd Annual Design Automation Conference, 1–6, 2015.
@inproceedings{wang2015novel,
title = {Novel power grid reduction method based on l1 regularization},
author = {Wang, Ye and Li, Meng and Yi, Xinyang and Song, Zhao and Orshansky, Michael and Caramanis, Constantine},
booktitle = {Proceedings of the 52nd Annual Design Automation Conference},
pages = {1--6},
year = {2015},
group = {proceedings}
}
Model order reduction exploiting the spectral properties of the admittance matrix, known as the graph Laplacian, to control the approximation accuracy is a promising new class of approaches to power grid analysis. In this paper we introduce a method that allows a dramatic increase in the resulting graph sparsity and can handle large dense input graphs. The method is based on the observation that the information about the realistic ranges of port currents can be used to significantly improve the resulting graph sparsity. In practice, port currents cannot vary unboundedly and the estimates of peak currents are often available early in the design cycle. However, the existing methods including the sampling-based spectral sparsification approach [11] cannot utilize this information.
We propose a novel framework of graph Sparsification by L1 regularization on Laplacians (SparseLL) to exploit the available range information to achieve a higher degree of sparsity and better approximation quality. By formulating the power grid reduction as a sparsity-inducing optimization problem, we leverage the recent progress in stochastic approximation and develop a stochastic gradient descent algorithm as an efficient solution.
Using established benchmarks for experiments, we demonstrate that SparseLL can achieve an up to 10X edge sparsity improvement compared to the spectral sparsification approach assuming the full range of currents, with an up to 10X accuracy improvement. The running time of our algorithm also scales quite favorably due to the low complexity and fast convergence, which leads us to believe that our algorithm is highly suitable for large-scale dense problems.
Park, Dohyung, Constantine Caramanis, and Sujay Sanghavi. “Greedy Subspace Clustering.” Advances in Neural Information Processing Systems (NeurIPS) 27 (2014): 2753–61.
@article{park2014greedy,
title = {Greedy subspace clustering},
author = {Park, Dohyung and Caramanis, Constantine and Sanghavi, Sujay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {27},
pages = {2753--2761},
year = {2014},
publisher = {NIH Public Access},
group = {proceedings},
arxiv = {https://arxiv.org/abs/1410.8864.pdf}
}
We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses the sets to estimate the subspaces. As the geometric structure of the clusters (linear subspaces) forbids proper performance of general distance based approaches such as K-means, many model-specific methods have been proposed. In this paper, we provide new simple and efficient algorithms for this problem. Our statistical analysis shows that the algorithms are guaranteed exact (perfect) clustering performance under certain conditions on the number of points and the affinity between subspaces. These conditions are weaker than those considered in the standard statistical literature. Experimental results on synthetic data generated from the standard unions of subspaces model demonstrate our theory. We also show that our algorithm performs competitively against state-of-the-art algorithms on real-world applications such as motion segmentation and face clustering, with much simpler implementation and lower computational cost.
Yi, Xinyang, Constantine Caramanis, and Sujay Sanghavi. “Alternating Minimization for Mixed Linear Regression.” In International Conference on Machine Learning (ICML), 613–21. PMLR, 2014.
@inproceedings{yi2014alternating,
title = {Alternating minimization for mixed linear regression},
author = {Yi, Xinyang and Caramanis, Constantine and Sanghavi, Sujay},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {613--621},
year = {2014},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/abs/1310.3745.pdf}
}
Mixed linear regression involves the recovery of two (or more) unknown vectors from unlabeled linear measurements; that is, where each sample comes from exactly one of the vectors, but we do not know which one. It is a classic problem, and the natural and empirically most popular approach to its solution has been the EM algorithm. As in other settings, this is prone to bad local minima; however, each iteration is very fast (alternating between guessing labels, and solving with those labels).
In this paper we provide a new initialization procedure for EM, based on finding the leading two eigenvectors of an appropriate matrix. We then show that with this, a re-sampled version of the EM algorithm provably converges to the correct vectors, under natural assumptions on the sampling distribution, and with nearly optimal (unimprovable) sample complexity. This provides not only the first characterization of EM’s performance, but also much lower sample complexity as compared to both standard (randomly initialized) EM, and other methods for this problem.
Chen, Yudong, Xinyang Yi, and Constantine Caramanis. “A Convex Formulation for Mixed Regression with Two Components: Minimax Optimal Rates.” In Conference on Learning Theory (COLT), 560–604. PMLR, 2014.
@inproceedings{chen2014convex,
title = {A convex formulation for mixed regression with two components: Minimax optimal rates},
author = {Chen, Yudong and Yi, Xinyang and Caramanis, Constantine},
booktitle = {Conference on Learning Theory (COLT)},
pages = {560--604},
year = {2014},
organization = {PMLR},
group = {proceedings},
arxiv = {https://arxiv.org/abs/1312.7006}
}
We consider the mixed regression problem with two components, under adversarial and stochastic noise. We give a convex optimization formulation that provably recovers the true solution, and provide upper bounds on the recovery errors for both arbitrary noise and stochastic noise settings. We also give matching minimax lower bounds (up to log factors), showing that under certain assumptions, our algorithm is information-theoretically optimal. Our results represent the first tractable algorithm guaranteeing successful recovery with tight bounds on recovery errors and sample complexity.
Wang, Ye, Michael Orshansky, and Constantine Caramanis. “Enabling Efficient Analog Synthesis by Coupling Sparse Regression and Polynomial Optimization.” In Proceedings of the 51st Annual Design Automation Conference, 1–6, 2014.
@inproceedings{wang2014enabling,
title = {Enabling efficient analog synthesis by coupling sparse regression and polynomial optimization},
author = {Wang, Ye and Orshansky, Michael and Caramanis, Constantine},
booktitle = {Proceedings of the 51st Annual Design Automation Conference},
pages = {1--6},
year = {2014},
group = {proceedings}
}
The challenge of equation-based analog synthesis comes from its dual nature: functions producing good least-square fits to SPICE-generated data are non-convex, hence not amenable to efficient optimization. In this paper, we leverage recent progress on Semidefinite Programming (SDP) relaxations of polynomial (non-convex) optimization. Using a general polynomial allows for much more accurate fitting of SPICE data compared to the more restricted functional forms. Recent SDP techniques for convex relaxations of polynomial optimizations are powerful but alone still insufficient: even for small problems, the resulting relaxations are prohibitively high dimensional.
We harness these new polynomial tools and realize their promise by introducing a novel regression technique that fits non-convex polynomials with a special sparsity structure. We show that the coupled sparse fitting and optimization (CSFO) flow that we introduce allows us to find accurate high-order polynomials while keeping the resulting optimization tractable.
Using established circuits for optimization experiments, we demonstrate that by handling higher-order polynomials we reduce fitting error to 3.6% from 10%, on average. This translates into a dramatic increase in the rate of constraint satisfaction: for a 1% violation threshold, the success rate is increased from 0% to 78%.
Papailiopoulos, Dimitris, Ioannis Mitliagkas, Alexandros Dimakis, and Constantine Caramanis. “Finding Dense Subgraphs via Low-Rank Bilinear Optimization.” In Proceedings of the 31st International Conference on Machine Learning (ICML), edited by Eric P. Xing and Tony Jebara, 32:1890–98. PMLR, 2014.
@inproceedings{papailiopoulos2014finding,
title = {Finding Dense Subgraphs via Low-Rank Bilinear Optimization},
author = {Papailiopoulos, Dimitris and Mitliagkas, Ioannis and Dimakis, Alexandros and Caramanis, Constantine},
booktitle = {Proceedings of the 31st International Conference on Machine Learning (ICML)},
pages = {1890--1898},
year = {2014},
editor = {Xing, Eric P. and Jebara, Tony},
volume = {32},
number = {2},
publisher = {PMLR},
group = {proceedings}
}
Given a graph, the Densest k-Subgraph (DkS) problem asks for the subgraph on k vertices that contains the largest number of edges. In this work, we develop a novel algorithm for DkS that searches a low-dimensional space for provably good solutions. We obtain provable performance bounds that depend on the graph spectrum. One of our results is that if there exists a k-subgraph that contains a constant fraction of all the edges, we can approximate DkS within a factor arbitrarily close to two in polynomial time. Our algorithm runs in nearly linear time, under spectral assumptions satisfied by most graphs found in applications. Moreover, it is highly scalable and parallelizable. We demonstrate this by implementing it in MapReduce and executing numerous experiments on massive real-world graphs that have up to billions of edges. We empirically show that our algorithm can find subgraphs of significantly higher density compared to the previous state of the art.
Mitliagkas, Ioannis, Constantine Caramanis, and Prateek Jain. “Memory Limited, Streaming PCA.” In Advances in Neural Information Processing Systems (NeurIPS), Vol. 26. NIH Public Access, 2013. https://papers.nips.cc/paper/2013/file/76cf99d3614e23eabab16fb27e944bf9-Paper.pdf.
@inproceedings{mitliagkas2013memory,
title = {Memory Limited, Streaming PCA},
author = {Mitliagkas, Ioannis and Caramanis, Constantine and Jain, Prateek},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {26},
year = {2013},
publisher = {NIH Public Access},
url = {https://papers.nips.cc/paper/2013/file/76cf99d3614e23eabab16fb27e944bf9-Paper.pdf},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1307.0032.pdf}
}
We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, p-dimensional samples are presented sequentially, and the goal is to produce the k-dimensional subspace that best approximates these points. Standard algorithms require O(p^2) memory; meanwhile no algorithm can do better than O(kp) memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the \em spiked covariance model, where p-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, n, scales proportionally with the dimension, p. Yet, all algorithms that provably achieve this, have memory complexity O(p^2). Meanwhile, algorithms with memory-complexity O(kp) do not have provable bounds on sample complexity comparable to p. We present an algorithm that achieves both: it uses O(kp) memory (meaning storage of any kind) and is able to compute the k-dimensional spike with O(p\logp) sample-complexity – the first algorithm of its kind. While our theoretical analysis focuses on the spiked covariance model, our simulations show that our algorithm is successful on much more general models for the data.
Ye, Qiaoyang, Mazin Al-Shalash, Constantine Caramanis, and Jeffrey G Andrews. “Device-to-Device Modeling and Analysis with a Modified Matern Hardcore BS Location Model.” In 2013 IEEE Global Communications Conference (GLOBECOM), 1825–30. IEEE, 2013.
@inproceedings{ye2013device,
title = {Device-to-device modeling and analysis with a modified Matern hardcore BS location model},
author = {Ye, Qiaoyang and Al-Shalash, Mazin and Caramanis, Constantine and Andrews, Jeffrey G},
booktitle = {2013 IEEE Global Communications Conference (GLOBECOM)},
pages = {1825--1830},
year = {2013},
organization = {IEEE},
group = {proceedings}
}
———. “On/off Macrocells and Load Balancing in Heterogeneous Cellular Networks.” In 2013 IEEE Global Communications Conference (GLOBECOM), 3814–19. IEEE, 2013.
@inproceedings{ye2013off,
title = {On/off macrocells and load balancing in heterogeneous cellular networks},
author = {Ye, Qiaoyang and Al-Shalash, Mazin and Caramanis, Constantine and Andrews, Jeffrey G},
booktitle = {2013 IEEE global communications conference (GLOBECOM)},
pages = {3814--3819},
year = {2013},
organization = {IEEE},
group = {proceedings}
}
Khalek, Amin Abdel, Constantine Caramanis, and Robert W Heath. “Video Quality-Maximizing Resource Allocation and Scheduling with Statistical Delay Guarantees.” In 2013 IEEE Global Communications Conference (GLOBECOM), 1736–40. IEEE, 2013.
@inproceedings{khalek2013video,
title = {Video quality-maximizing resource allocation and scheduling with statistical delay guarantees},
author = {Khalek, Amin Abdel and Caramanis, Constantine and Heath, Robert W},
booktitle = {2013 IEEE Global Communications Conference (GLOBECOM)},
pages = {1736--1740},
year = {2013},
organization = {IEEE},
group = {proceedings}
}
Milling, Chris, Constantine Caramanis, Shie Mannor, and Sanjay Shakkottai. “Detecting Epidemics Using Highly Noisy Data.” In Proceedings of the Fourteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing, 177–86, 2013.
@inproceedings{milling2013detecting,
title = {Detecting epidemics using highly noisy data},
author = {Milling, Chris and Caramanis, Constantine and Mannor, Shie and Shakkottai, Sanjay},
booktitle = {Proceedings of the fourteenth ACM international symposium on Mobile ad hoc networking and computing},
pages = {177--186},
year = {2013},
group = {proceedings}
}
From Cholera, AIDS/HIV, and Malaria, to rumors and viral video, understanding the causative network behind an epidemic’s spread has repeatedly proven critical for managing the spread (controlling or encouraging, as the case may be). Our current approaches to understand and predict epidemics rely on the scarce, but exact/reliable, expert diagnoses. This paper proposes a different way forward: use more readily available but also more noisy data with many false negatives and false positives, to determine the causative network of an epidemic. Specifically, we consider an epidemic that spreads according to one of two networks. At some point in time we see a small random subsample (perhaps a vanishingly small fraction) of those infected, along with an order-wise similar number of false positives. We derive sufficient conditions for this problem to be detectable, and provide an efficient algorithm that solves the hypothesis testing problem. We apply this model to two settings. In the first setting, we simply want to distinguish between random illness (a complete graph) and an epidemic (spread along a structured graph). In the second, we have a superposition of both of these, and we wish to detect which is the strongest component.
Chen, Yudong, Constantine Caramanis, and Shie Mannor. “Robust Sparse Regression under Adversarial Corruption.” In International Conference on Machine Learning (ICML), 774–82. PMLR, 2013.
@inproceedings{chen2013robust,
title = {Robust sparse regression under adversarial corruption},
author = {Chen, Yudong and Caramanis, Constantine and Mannor, Shie},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {774--782},
year = {2013},
organization = {PMLR},
group = {proceedings}
}
We consider high dimensional sparse regression with arbitrary - possibly, severe or coordinated - errors in the covariates matrix. We are interested in understanding how many corruptions we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. As we show, neither can the natural brute force algorithm that takes exponential time to find the subset of data and support columns, that yields the smallest regression error.
We explore the power of a simple idea: replace the essential linear algebraic calculation - the inner product - with a robust counterpart that cannot be greatly affected by a controlled number of arbitrarily corrupted points: the trimmed inner product. We consider three popular algorithms in the uncorrupted setting: Thresholding Regression, Lasso, and the Dantzig selector, and show that the counterparts obtained using the trimmed inner product are provably robust.
Chen, Yudong, and Constantine Caramanis. “Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery.” In International Conference on Machine Learning (ICML), 383–91. PMLR, 2013.
@inproceedings{chen2013noisy,
title = {Noisy and missing data regression: Distribution-oblivious support recovery},
author = {Chen, Yudong and Caramanis, Constantine},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {383--391},
year = {2013},
organization = {PMLR},
group = {proceedings}
}
Many models for sparse regression typically assume that the covariates are known completely, and without noise. Particularly in high-dimensional applications, this is often not the case. Worse yet, even estimating statistics of the noise (the noise covariance) can be a central challenge. In this paper we develop a simple variant of orthogonal matching pursuit (OMP) for precisely this setting. We show that without knowledge of the noise covariance, our algorithm recovers the support, and we provide matching lower bounds that show that our algorithm performs at the minimax optimal rate. While simple, this is the first algorithm that (provably) recovers support in a noise-distribution-oblivious manner. When knowledge of the noise-covariance is available, our algorithm matches the best-known l2-recovery bounds available. We show that these too are min-max optimal. Along the way, we also obtain improved performance guarantees for OMP for the standard sparse regression problem with Gaussian noise.
Milling, Chris, Constantine Caramanis, Shie Mannor, and Sanjay Shakkottai. “Network Forensics: Random Infection vs Spreading Epidemic.” ACM SIGMETRICS Performance Evaluation Review 40, no. 1 (2012): 223–34.
@article{milling2012network,
title = {Network forensics: Random infection vs spreading epidemic},
author = {Milling, Chris and Caramanis, Constantine and Mannor, Shie and Shakkottai, Sanjay},
journal = {ACM SIGMETRICS Performance Evaluation Review},
volume = {40},
number = {1},
pages = {223--234},
year = {2012},
publisher = {ACM New York, NY, USA},
group = {proceedings}
}
Computer (and human) networks have long had to contend with spreading viruses. Effectively controlling or curbing an outbreak requires understanding the dynamics of the spread. A virus that spreads by taking advantage of physical links or user-acquaintance links on a social network can grow explosively if it spreads beyond a critical radius. On the other hand, random infections (that do not take advantage of network structure) have very different propagation characteristics. If too many machines (or humans) are infected, network structure becomes essentially irrelevant, and the different spreading modes appear identical. When can we distinguish between mechanics of infection? Further, how can this be done efficiently? This paper studies these two questions. We provide sufficient conditions for different graph topologies, for when it is possible to distinguish between a random model of infection and a spreading epidemic model, with probability of misclassification going to zero. We further provide efficient algorithms that are guaranteed to work in different regimes.
———. “On Identifying the Causative Network of an Epidemic.” In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 909–14. IEEE, 2012.
@inproceedings{milling2012identifying,
title = {On identifying the causative network of an epidemic},
author = {Milling, Chris and Caramanis, Constantine and Mannor, Shie and Shakkottai, Sanjay},
booktitle = {2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton)},
pages = {909--914},
year = {2012},
organization = {IEEE},
group = {proceedings}
}
Xu, Huan, Constantine Caramanis, and Shie Mannor. “Statistical Optimization in High Dimensions.” In International Conference on Artificial Intelligence and Statistics (AISTATS), 1332–40. PMLR, 2012.
@inproceedings{xu2012statistical,
title = {Statistical optimization in high dimensions},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
pages = {1332--1340},
year = {2012},
organization = {PMLR},
group = {proceedings}
}
We consider optimization problems whose parameters are known only approximately, based on a noisy sample. Of particular interest is the high-dimensional regime, where the number of samples is roughly equal to the dimensionality of the problem, and the noise magnitude may greatly exceed the magnitude of the signal itself. This setup falls far outside the traditional scope of Robust and Stochastic optimization. We propose three algorithms to address this setting, combining ideas from statistics, machine learning, and robust optimization. In the important case where noise artificially increases the dimensionality of the parameters, we show that combining robust optimization and dimensionality reduction can result in high-quality solutions at greatly reduced computational cost.
Gopalan, Aditya, Constantine Caramanis, and Sanjay Shakkottai. “Low-Delay Wireless Scheduling with Partial Channel-State Information.” In 2012 Proceedings IEEE INFOCOM, 1071–79. IEEE, 2012.
@inproceedings{gopalan2012low,
title = {Low-delay wireless scheduling with partial channel-state information},
author = {Gopalan, Aditya and Caramanis, Constantine and Shakkottai, Sanjay},
booktitle = {2012 Proceedings IEEE INFOCOM},
pages = {1071--1079},
year = {2012},
organization = {IEEE},
group = {proceedings}
}
We consider a server serving a time-slotted queued system of multiple packet-based flows, where not more than one flow can be serviced in a single time slot. The flows have exogenous packet arrivals and time-varying service rates. At each time, the server can observe instantaneous service rates for only a subset of flows (selected from a fixed collection of observable subsets) before scheduling a flow in the subset for service. We are interested in queue-length aware scheduling to keep the queues short. The limited availability of instantaneous service rate information requires the scheduler to make a careful choice of which subset of service rates to sample. We develop scheduling algorithms that use only partial service rate information from subsets of channels, and that minimize the likelihood of queue overflow in the system. Specifically, we present a new joint subset-sampling and scheduling algorithm called Max-Exp that uses only the current queue lengths to pick a subset of flows, and subsequently schedules a flow using the Exponential rule. When the collection of observable subsets is disjoint, we show that Max-Exp achieves the best exponential decay rate, among all scheduling algorithms using partial information, of the tail of the longest queue in the system. To accomplish this, we introduce novel analytical techniques for studying the performance of scheduling algorithms using partial state information, that are of independent interest. These include new sample-path large deviations results for processes obtained by nonrandom, predictable sampling of sequences of independent and identically distributed random variables, which show that scheduling with partial state information yields a rate function significantly different from the case of full information. As a special case, Max-Exp reduces to simply serving the flow with the longest queue when the observable subsets are singleton flows, i.e., when there is effectively no a priori channel-state information; thus, our results show that this greedy scheduling policy is large-deviations optimal.
Ganapathy, Harish, and Constantine Caramanis. “Queue-Based Sub-Carrier Grouping for Feedback Reduction in OFDMA Systems.” In 2012 Proceedings IEEE INFOCOM, 1098–1106. IEEE, 2012.
@inproceedings{ganapathy2012queue,
title = {Queue-based sub-carrier grouping for feedback reduction in OFDMA systems},
author = {Ganapathy, Harish and Caramanis, Constantine},
booktitle = {2012 Proceedings IEEE INFOCOM},
pages = {1098--1106},
year = {2012},
organization = {IEEE},
group = {proceedings}
}
Khalek, Amin Abdel, Constantine Caramanis, and Robert W Heath Jr. “Joint Source-Channel Adaptation for Perceptually Optimized Scalable Video Transmission.” In 2011 IEEE Global Telecommunications Conference-GLOBECOM 2011, 1–5. IEEE, 2011.
@inproceedings{khalek2011joint,
title = {Joint source-channel adaptation for perceptually optimized scalable video transmission},
author = {Khalek, Amin Abdel and Caramanis, Constantine and Heath Jr, Robert W},
booktitle = {2011 IEEE Global Telecommunications Conference-GLOBECOM 2011},
pages = {1--5},
year = {2011},
organization = {IEEE},
group = {proceedings}
}
Mitliagkas, Ioannis, Aditya Gopalan, Constantine Caramanis, and Sriram Vishwanath. “User Rankings from Comparisons: Learning Permutations in High Dimensions.” In 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 1143–50. IEEE, 2011.
@inproceedings{mitliagkas2011user,
title = {User rankings from comparisons: Learning permutations in high dimensions},
author = {Mitliagkas, Ioannis and Gopalan, Aditya and Caramanis, Constantine and Vishwanath, Sriram},
booktitle = {2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton)},
pages = {1143--1150},
year = {2011},
organization = {IEEE},
group = {proceedings}
}
Chen, Yi-Ting, Constantine Caramanis, and Sanjay Shakkottai. “On File Sharing over a Wireless Social Network.” In 2011 IEEE International Symposium on Information Theory Proceedings, 249–53. IEEE, 2011.
@inproceedings{chen2011file,
title = {On file sharing over a wireless social network},
author = {Chen, Yi-Ting and Caramanis, Constantine and Shakkottai, Sanjay},
booktitle = {2011 IEEE International Symposium on Information Theory Proceedings},
pages = {249--253},
year = {2011},
organization = {IEEE},
group = {proceedings}
}
Chen, Y., A. Jalali, S. Sanghavi, and C. Caramanis. “Low-Rank Matrix Recovery from Errors and Erasures.” In 2011 IEEE International Symposium on Information Theory Proceedings, 2313–17, 2011. https://doi.org/10.1109/ISIT.2011.6033975.
@inproceedings{chen2011lowrank,
author = {{Chen}, Y. and {Jalali}, A. and {Sanghavi}, S. and {Caramanis}, C.},
booktitle = {2011 IEEE International Symposium on Information Theory Proceedings},
title = {Low-rank matrix recovery from errors and erasures},
year = {2011},
volume = {},
number = {},
pages = {2313-2317},
doi = {10.1109/ISIT.2011.6033975},
group = {proceedings}
}
Chen, Yudong, Huan Xu, Constantine Caramanis, and Sujay Sanghavi. “Robust Matrix Completion and Corrupted Columns.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 873–80. Citeseer, 2011.
@inproceedings{chen2011robust,
title = {Robust matrix completion and corrupted columns},
author = {Chen, Yudong and Xu, Huan and Caramanis, Constantine and Sanghavi, Sujay},
booktitle = {Proceedings of the 28th International Conference on Machine Learning (ICML-11)},
pages = {873--880},
year = {2011},
organization = {Citeseer},
group = {proceedings}
}
This paper considers the problem of matrix completion, when some number of the columns are arbitrarily corrupted. It is well-known that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a constant fraction of columns are corrupted? In this paper, we study this very problem, and develop an robust and efficient algorithm for its solution. One direct application comes from robust collaborative filtering. Here, some number of users are so-called manipulators, and try to skew the predictions of the algorithm. Significantly, our results hold without any assump- tions on the observed entries of the manipulated columns.
Xu, Huan, Constantine Caramanis, and Shie Mannor. “Principal Component Analysis with Contaminated Data: The High Dimensional Case.” ArXiv Preprint ArXiv:1002.4658, 2010.
@article{xu2010principal,
title = {Principal component analysis with contaminated data: The high dimensional case},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {arXiv preprint arXiv:1002.4658},
year = {2010},
group = {proceedings},
arxiv = {https://arxiv.org/pdf/1002.4658.pdf}
}
We consider the dimensionality-reduction problem (finding a subspace approximation of observed data) for contaminated data in the high dimensional regime, where the number of observations is of the same magnitude as the number of variables of each observation, and the data set contains some (arbitrarily) corrupted observations. We propose a High-dimensional Robust Principal Component Analysis (HR-PCA) algorithm that is tractable, robust to contaminated points, and easily kernelizable. The resulting subspace has a bounded deviation from the desired one, achieves maximal robustness – a breakdown point of 50% while all existing algorithms have a breakdown point of zero, and unlike ordinary PCA algorithms, achieves optimality in the limit case where the proportion of corrupted points goes to zero.
Xu, Huan, Constantine Caramanis, and Sujay Sanghavi. “Robust PCA via Outlier Pursuit.” Advances in Neural Information Processing Systems (NeurIPS) 23 (2010): 2496–2504.
@article{xu2010robust,
title = {Robust PCA via outlier pursuit},
author = {Xu, Huan and Caramanis, Constantine and Sanghavi, Sujay},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {23},
pages = {2496-2504},
year = {2010},
group = {proceedings}
}
Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted.
We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.
Ganapathy, Harish, and Constantine Caramanis. “Dynamic Feedback Allocation Algorithms for Interference Management in MIMO Uplink.” In 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, 1–5. IEEE, 2010.
@inproceedings{ganapathy2010dynamic,
title = {Dynamic Feedback Allocation Algorithms for Interference Management in MIMO Uplink},
author = {Ganapathy, Harish and Caramanis, Constantine},
booktitle = {2010 IEEE Global Telecommunications Conference GLOBECOM 2010},
pages = {1--5},
year = {2010},
organization = {IEEE},
group = {proceedings}
}
Yun, Sungho, and Constantine Caramanis. “Reinforcement Learning for Link Adaptation in MIMO-OFDM Wireless Systems.” In 2010 IEEE Global Telecommunications Conference GLOBECOM 2010, 1–5. IEEE, 2010.
@inproceedings{yun2010reinforcement,
title = {Reinforcement learning for link adaptation in MIMO-OFDM wireless systems},
author = {Yun, Sungho and Caramanis, Constantine},
booktitle = {2010 IEEE Global Telecommunications Conference GLOBECOM 2010},
pages = {1--5},
year = {2010},
organization = {IEEE},
group = {proceedings}
}
Singh, Ashish Kumar, Mario Lok, Kareem Ragab, Constantine Caramanis, and Michael Orshansky. “An Algorithm for Exploiting Modeling Error Statistics to Enable Robust Analog Optimization.” In 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 62–69. IEEE, 2010.
@inproceedings{singh2010algorithm,
title = {An algorithm for exploiting modeling error statistics to enable robust analog optimization},
author = {Singh, Ashish Kumar and Lok, Mario and Ragab, Kareem and Caramanis, Constantine and Orshansky, Michael},
booktitle = {2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)},
pages = {62--69},
year = {2010},
organization = {IEEE},
group = {proceedings}
}
Lok, Mario, Ku He, Murari Mani, Constantine Caramanis, and Michael Orshansky. “Design of Power-Optimal Buffers Tunable to Process Variability.” In 2010 IEEE Dallas Circuits and Systems Workshop, 1–4. IEEE, 2010.
@inproceedings{lok2010design,
title = {Design of power-optimal buffers tunable to process variability},
author = {Lok, Mario and He, Ku and Mani, Murari and Caramanis, Constantine and Orshansky, Michael},
booktitle = {2010 IEEE Dallas Circuits and Systems Workshop},
pages = {1--4},
year = {2010},
organization = {IEEE},
group = {proceedings}
}
Kefayati, Mahdi, and Constantine Caramanis. “Efficient Energy Delivery Management for PHEVs.” In 2010 First IEEE International Conference on Smart Grid Communications, 525–30. IEEE, 2010.
@inproceedings{kefayati2010efficient,
title = {Efficient energy delivery management for PHEVs},
author = {Kefayati, Mahdi and Caramanis, Constantine},
booktitle = {2010 First IEEE International Conference on Smart Grid Communications},
pages = {525--530},
year = {2010},
organization = {IEEE},
group = {proceedings}
}
Ganapathy, Harish, Constantine Caramanis, and Lei Ying. “Limited Feedback for Cognitive Radio Networks Using Compressed Sensing.” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 1090–97. IEEE, 2010.
@inproceedings{ganapathy2010limited,
title = {Limited feedback for cognitive radio networks using compressed sensing},
author = {Ganapathy, Harish and Caramanis, Constantine and Ying, Lei},
booktitle = {2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton)},
pages = {1090--1097},
year = {2010},
organization = {IEEE},
group = {proceedings}
}
Xu, Huan, Constantine Caramanis, and Shie Mannor. “A Distributional Interpretation of Robust Optimization.” In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 552–56. IEEE, 2010.
@inproceedings{xu2010distributional,
title = {A distributional interpretation of robust optimization},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
booktitle = {2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton)},
pages = {552--556},
year = {2010},
organization = {IEEE},
group = {proceedings}
}
Xu, Huan, Constantine Caramanis, Shie Mannor, and Sungho Yun. “Risk Sensitive Robust Support Vector Machines.” In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) Held Jointly with 2009 28th Chinese Control Conference, 4655–61. IEEE, 2009.
@inproceedings{xu2009risk,
title = {Risk sensitive robust support vector machines},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie and Yun, Sungho},
booktitle = {Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference},
pages = {4655--4661},
year = {2009},
organization = {IEEE},
group = {proceedings}
}
We propose a new family of classification algorithms in the spirit of support vector machines, that builds in non-conservative protection to noise and controls overfitting. Our formulation is based on a softer version of robust optimization called comprehensive robustness. We show that this formulation is equivalent to regularization by any arbitrary convex regularizer. We explain how the connection of comprehensive robustness to convex risk-measures can be used to design risk-constrained classifiers with robustness to the input distribution. Our formulations lead to easily solved convex problems. Empirical results show the promise of comprehensive robust classifiers in handling risk sensitive classification.
Singh, Ashish K, Ku He, Constantine Caramanis, and Michael Orshansky. “Mitigation of Intra-Array Sram Variability Using Adaptive Voltage Architecture.” In 2009 IEEE/ACM International Conference on Computer-Aided Design-Digest of Technical Papers, 637–44. IEEE, 2009.
@inproceedings{singh2009mitigation,
title = {Mitigation of intra-array sram variability using adaptive voltage architecture},
author = {Singh, Ashish K and He, Ku and Caramanis, Constantine and Orshansky, Michael},
booktitle = {2009 IEEE/ACM International Conference on Computer-Aided Design-Digest of Technical Papers},
pages = {637--644},
year = {2009},
organization = {IEEE},
group = {proceedings}
}
Ganapathy, Harish, Siddhartha Banerjee, Nedialko Dimitrov, and Constantine Caramanis. “Optimal Feedback Allocation Algorithms for Multi-User Uplink.” In 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 947–54. IEEE, 2009.
@inproceedings{ganapathy2009optimal,
title = {Optimal feedback allocation algorithms for multi-user uplink},
author = {Ganapathy, Harish and Banerjee, Siddhartha and Dimitrov, Nedialko and Caramanis, Constantine},
booktitle = {2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)},
pages = {947--954},
year = {2009},
organization = {IEEE},
group = {proceedings}
}
Yun, Sungho, and Constantine Caramanis. “Multiclass Support Vector Machines for Adaptation in MIMO-OFDM Wireless Systems.” In 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 1145–52. IEEE, 2009.
@inproceedings{yun2009multiclass,
title = {Multiclass support vector machines for adaptation in MIMO-OFDM wireless systems},
author = {Yun, Sungho and Caramanis, Constantine},
booktitle = {2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)},
pages = {1145--1152},
year = {2009},
organization = {IEEE},
group = {proceedings}
}
Xu, Huan, Constantine Caramanis, and Shie Mannor. “Robust Regression and Lasso.” Advances in Neural Information Processing Systems (NeurIPS) 21 (2008).
@article{xu2008robust,
title = {Robust regression and lasso},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {21},
year = {2008},
arxiv = {https://arxiv.org/pdf/0811.1790.pdf},
group = {proceedings}
}
Lasso, or \ell_1 regularized least squares, has been explored extensively for its remarkable sparsity properties. It is shown in this paper that the solution to Lasso, in addition to its sparsity, has robustness properties: it is the solution to a robust optimization problem. This has two important consequences. First, robustness provides a connection of the regularizer to a physical property, namely, protection from noise. This allows a principled selection of the regularizer, and in particular, generalizations of Lasso that also yield convex optimization problems are obtained by considering different uncertainty sets.
Secondly, robustness can itself be used as an avenue to exploring different properties of the solution. In particular, it is shown that robustness of the solution explains why the solution is sparse. The analysis as well as the specific results obtained differ from standard sparsity results, providing different geometric intuition. Furthermore, it is shown that the robust optimization formulation is related to kernel density estimation, and based on this approach, a proof that Lasso is consistent is given using robustness directly. Finally, a theorem saying that sparsity and algorithmic stability contradict each other, and hence Lasso is not stable, is presented.
Daniels, Robert C, Constantine Caramanis, and Robert W Heath Jr. “A Supervised Learning Approach to Adaptation in Practical MIMO-OFDM Wireless Systems.” In IEEE GLOBECOM 2008-2008 IEEE Global Telecommunications Conference, 1–5. IEEE, 2008.
@inproceedings{daniels2008supervised,
title = {A supervised learning approach to adaptation in practical MIMO-OFDM wireless systems},
author = {Daniels, Robert C and Caramanis, Constantine and Heath Jr, Robert W},
booktitle = {IEEE GLOBECOM 2008-2008 IEEE Global Telecommunications Conference},
pages = {1--5},
year = {2008},
organization = {IEEE},
group = {proceedings}
}
Ganapathy, Harish, Jeffrey G Andrews, and Constantine Caramanis. “Inter-Cell Relay Cooperation in Heterogeneous Cellular Uplink Systems.” In 2008 42nd Asilomar Conference on Signals, Systems and Computers, 1443–47. IEEE, 2008.
@inproceedings{ganapathy2008inter,
title = {Inter-cell relay cooperation in heterogeneous cellular uplink systems},
author = {Ganapathy, Harish and Andrews, Jeffrey G and Caramanis, Constantine},
booktitle = {2008 42nd Asilomar Conference on Signals, Systems and Computers},
pages = {1443--1447},
year = {2008},
organization = {IEEE},
group = {proceedings}
}
Channappayya, Sumohana S, Alan C Bovik, Constantine Caramanis, and Robert W Heath. “SSIM-Optimal Linear Image Restoration.” In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 765–68. IEEE, 2008.
@inproceedings{channappayya2008ssim,
title = {SSIM-optimal linear image restoration},
author = {Channappayya, Sumohana S and Bovik, Alan C and Caramanis, Constantine and Heath, Robert W},
booktitle = {2008 IEEE International Conference on Acoustics, Speech and Signal Processing},
pages = {765--768},
year = {2008},
organization = {IEEE},
group = {proceedings}
}
Yun, Sungho, and Constantine Caramanis. “System Level Optimization in Wireless Networks with Uncertain Customer Arrival Rates.” In 2008 46th Annual Allerton Conference on Communication, Control, and Computing, 1022–29. IEEE, 2008.
@inproceedings{yun2008system,
title = {System level optimization in wireless networks with uncertain customer arrival rates},
author = {Yun, Sungho and Caramanis, Constantine},
booktitle = {2008 46th Annual Allerton Conference on Communication, Control, and Computing},
pages = {1022--1029},
year = {2008},
organization = {IEEE},
group = {proceedings}
}
Meka, Raghu, Prateek Jain, Constantine Caramanis, and Inderjit S Dhillon. “Rank Minimization via Online Learning.” In Proceedings of the 25th International Conference on Machine Learning (ICML), 656–63, 2008.
@inproceedings{meka2008rank,
title = {Rank minimization via online learning},
author = {Meka, Raghu and Jain, Prateek and Caramanis, Constantine and Dhillon, Inderjit S},
booktitle = {Proceedings of the 25th International Conference on Machine learning (ICML)},
pages = {656--663},
year = {2008},
group = {proceedings}
}
Minimum rank problems arise frequently in machine learning applications and are notoriously difficult to solve due to the non-convex nature of the rank objective. In this paper, we present the first online learning approach for the problem of rank minimization of matrices over polyhedral sets. In particular, we present two online learning algorithms for rank minimization - our first algorithm is a multiplicative update method based on a generalized experts framework, while our second algorithm is a novel application of the online convex programming framework (Zinkevich, 2003). In the latter, we flip the role of the decision maker by making the decision maker search over the constraint space instead of feasible points, as is usually the case in online convex programming. A salient feature of our online learning approach is that it allows us to give provable approximation guarantees for the rank minimization problem over polyhedral sets. We demonstrate the effectiveness of our methods on synthetic examples, and on the real-life application of low-rank kernel learning.
Caramanis, Constantine, and Shie Mannor. “Learning in the Limit with Adversarial Disturbances.” In The Conference on Learning Theory (COLT), 467–78, 2008.
@inproceedings{caramanis2008learning,
title = {Learning in the Limit with Adversarial Disturbances.},
author = {Caramanis, Constantine and Mannor, Shie},
booktitle = {The Conference on Learning Theory (COLT)},
pages = {467--478},
year = {2008},
group = {proceedings}
}
Bertsimas, Dimitris, and Constantine Caramanis. “Adaptability via Sampling.” In 2007 46th IEEE Conference on Decision and Control, 4717–22. IEEE, 2007.
@inproceedings{bertsimas2007adaptability,
title = {Adaptability via sampling},
author = {Bertsimas, Dimitris and Caramanis, Constantine},
booktitle = {2007 46th IEEE Conference on Decision and Control},
pages = {4717--4722},
year = {2007},
organization = {IEEE},
group = {proceedings}
}
There has recently been considerable attention devoted to sample-based approaches to chance constraints in stochastic programming, and also multi-stage optimization formulations. In this short paper, we consider the merits of a joint approach. A specific motivation for us, is the possibility of developing techniques suitable for integer-constrained future stages. We propose a technique based on structured adaptability, and some recent sampling techniques, that results in sample complexity that is polynomial in the number of stages. Thus we circumvent a difficulty that has traditionally plagued sample-based approaches for multi-stage formulations. This allows us to provide a hierarchy of adaptability schemes, not only for continuous problems, but also for discrete problems.
Gopalan, Aditya, Constantine Caramanis, and Sanjay Shakkottai. “On Wireless Scheduling with Partial Channel-State Information.” In Proc. Ann. Allerton Conf. Communication, Control and Computing, 2007.
@inproceedings{gopalan2007wireless,
title = {On wireless scheduling with partial channel-state information},
author = {Gopalan, Aditya and Caramanis, Constantine and Shakkottai, Sanjay},
booktitle = {Proc. Ann. Allerton Conf. Communication, Control and Computing},
year = {2007},
group = {proceedings}
}
Caramanis, Constantine, and Shie Mannor. “A Bayesian Approach to Data-Driven Optimization under Uncertainty.” In Allerton Conference on Communication, Control, and Computing, 2007.
@inproceedings{caramanis2007bayesian,
title = {A bayesian approach to data-driven optimization under uncertainty},
author = {Caramanis, Constantine and Mannor, Shie},
booktitle = {Allerton Conference on Communication, Control, and Computing},
year = {2007},
group = {proceedings}
}
———. “An Inequality for Nearly Log-Concave Distributions with Applications to Learning.” In International Conference on Computational Learning Theory (COLT), 534–48. Springer, 2004.
@inproceedings{caramanis2004inequality,
title = {An inequality for nearly log-concave distributions with applications to learning},
author = {Caramanis, Constantine and Mannor, Shie},
booktitle = {International Conference on Computational Learning Theory (COLT)},
pages = {534--548},
year = {2004},
organization = {Springer},
group = {proceedings}
}
Caramanis, Constantine, Michael Rosenblum, Michel X Goemans, and Vahid Tarokh. “Scheduling Algorithms for Providing Flexible, Rate-Based, Quality of Service Guarantees for Packet-Switching in Banyan Networks.” In Proceedings of the Conference on Information Sciences and Systems, 160–66, 2004.
@inproceedings{caramanis2004scheduling,
title = {Scheduling algorithms for providing flexible, rate-based, quality of service guarantees for packet-switching in Banyan networks},
author = {Caramanis, Constantine and Rosenblum, Michael and Goemans, Michel X and Tarokh, Vahid},
booktitle = {Proceedings of the Conference on Information Sciences and Systems},
pages = {160--166},
year = {2004},
group = {proceedings}
}
Bertsimas, Dimitris, and Constantine Caramanis. “Geometry, Moments, and Semidefinite Optimization.” In Proceedings of the 15th International Symposium on MTNS, 2002.
@inproceedings{bertsimas2002geometry,
title = {Geometry, moments, and semidefinite optimization},
author = {Bertsimas, Dimitris and Caramanis, Constantine},
booktitle = {Proceedings of the 15th International Symposium on MTNS},
year = {2002},
group = {proceedings}
}
Journal Papers
Zhuo, Jiacheng, Jeongyeol Kwon, Nhat Ho, and Constantine Caramanis. “On the Computational and Statistical Complexity of over-Parameterized Matrix Sensing.” Journal of Machine Learning Research, 2024.
@article{zhuoSlowMatrix2024,
title = {On the computational and statistical complexity of over-parameterized matrix sensing},
author = {Zhuo, Jiacheng and Kwon, Jeongyeol and Ho, Nhat and Caramanis, Constantine},
journal = {Journal of Machine Learning Research},
year = {2024},
group = {journal},
arxiv = {https://arxiv.org/pdf/2102.02756.pdf}
}
We consider solving the low rank matrix sensing problem with Factorized Gradient Descend (FGD) method when the true rank is unknown and over-specified, which we refer to as over-parameterized matrix sensing.
If the ground truth signal X^* ∈\mathbbR^d*d is of rank r, but we try to recover it using FF^⊤where F ∈\mathbbR^d*k and k>r, the existing statistical analysis falls short, due to a flat local curvature of the loss function around the global maxima. By decomposing the factorized matrix \fitMat into separate column spaces to capture the effect of extra ranks, we show that ||F_t F_t - F||_F^2 converges to a statistical error of \tilde\mathcalO \parenthk d σ^2/n after \tilde\mathcalO(\frac\sigma_rσ\sqrt\fracnd) number of iterations where \fitMat_t is the output of FGD after t iterations, σ^2 is the variance of the observation noise, \sigma_r is the r-th largest eigenvalue of \trueMat, and n is the number of sample. Our results, therefore, offer a comprehensive picture of the statistical and computational complexity of FGD for the over-parameterized matrix sensing problem.
Kyrillidis, Anastasios, Amir Kalev, Dohyung Park, Srinadh Bhojanapalli, Constantine Caramanis, and Sujay Sanghavi. “Provable Compressed Sensing Quantum State Tomography via Non-Convex Methods.” Npj Quantum Information 4, no. 1 (2018): 1–7.
@article{kyrillidis2018provable,
title = {Provable compressed sensing quantum state tomography via non-convex methods},
author = {Kyrillidis, Anastasios and Kalev, Amir and Park, Dohyung and Bhojanapalli, Srinadh and Caramanis, Constantine and Sanghavi, Sujay},
journal = {npj Quantum Information},
volume = {4},
number = {1},
pages = {1--7},
year = {2018},
publisher = {Nature Publishing Group},
group = {journal},
arxiv = {https://arxiv.org/pdf/1711.02524.pdf}
}
With nowadays steadily growing quantum processors, it is required to develop new quantum tomography tools that are tailored for high-dimensional systems. In this work, we describe such a computational tool, based on recent ideas from non-convex optimization. The algorithm excels in the compressed-sensing-like setting, where only a few data points are measured from a low-rank or highly-pure quantum state of a high-dimensional system. We show that the algorithm can practically be used in quantum tomography problems that are beyond the reach of convex solvers, and, moreover, is faster than other state-of-the-art non-convex approaches. Crucially, we prove that, despite being a non-convex program, under mild conditions, the algorithm is guaranteed to converge to the global minimum of the problem; thus, it constitutes a provable quantum state tomography protocol.
Park, Dohyung, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi. “Finding Low-Rank Solutions via Nonconvex Matrix Factorization, Efficiently and Provably.” SIAM Journal on Imaging Sciences 11, no. 4 (2018): 2165–2204.
@article{park2018finding,
title = {Finding low-rank solutions via nonconvex matrix factorization, efficiently and provably},
author = {Park, Dohyung and Kyrillidis, Anastasios and Caramanis, Constantine and Sanghavi, Sujay},
journal = {SIAM Journal on Imaging Sciences},
volume = {11},
number = {4},
pages = {2165--2204},
year = {2018},
publisher = {SIAM},
group = {journal},
arxiv = {https://arxiv.org/pdf/1606.03168.pdf}
}
A rank-r matrix X ∈\mathbbR^m \times n can be written as a product UV^⊤, where U ∈\mathbbR^m \times r and V ∈\mathbbR^n \times r. One could exploit this observation in optimization: e.g., consider the minimization of a convex function f(X) over rank-r matrices, where the set of rank-r matrices is modeled via the factorization UV^⊤. Though such parameterization reduces the number of variables, and is more computationally efficient (of particular interest is the case r \lleq minm,n), it comes at a cost: f(UV^⊤) becomes a non-convex function w.r.t. U and V.
We study such parameterization for optimization of generic convex objectives f, and focus on first-order, gradient descent algorithmic solutions. We propose the Bi-Factored Gradient Descent (BFGD) algorithm, an efficient first-order method that operates on the U,V factors. We show that when f is (restricted) smooth, BFGD has local sublinear convergence, and linear convergence when f is both (restricted) smooth and (restricted) strongly convex. For several key applications, we provide simple and efficient initialization schemes that provide approximate solutions good enough for the above convergence results to hold.
Sinno, Zeina, Constantine Caramanis, and Alan C Bovik. “Towards a Closed Form Second-Order Natural Scene Statistics Model.” IEEE Transactions on Image Processing 27, no. 7 (2018): 3194–3209.
@article{sinno2018towards,
title = {Towards a closed form second-order natural scene statistics model},
author = {Sinno, Zeina and Caramanis, Constantine and Bovik, Alan C},
journal = {IEEE Transactions on Image Processing},
volume = {27},
number = {7},
pages = {3194--3209},
year = {2018},
publisher = {IEEE},
group = {journal}
}
Previous work on natural scene statistics (NSS)-based image models has focused primarily on characterizing the univariate bandpass statistics of single pixels. These models have proven to be powerful tools driving a variety of computer vision and image/video processing applications, including depth estimation, image quality assessment, and image denoising, among others. Multivariate NSS models descriptive of the joint distributions of spatially separated bandpass image samples have, however, received relatively little attention. Here, we develop a closed form bivariate spatial correlation model of bandpass and normalized image samples that completes an existing 2D joint generalized Gaussian distribution model of adjacent bandpass pixels. Our model is built using a set of diverse, high-quality naturalistic photographs, and as a control, we study the model properties on white noise. We also study the way the model fits are affected when the images are modified by common distortions.
Meirom, Eli A, Constantine Caramanis, Shie Mannor, Ariel Orda, and Sanjay Shakkottai. “Detecting Cascades from Weak Signatures.” IEEE Transactions on Network Science and Engineering 5, no. 4 (2017): 313–25.
@article{meirom2017detecting,
title = {Detecting Cascades from Weak Signatures},
author = {Meirom, Eli A and Caramanis, Constantine and Mannor, Shie and Orda, Ariel and Shakkottai, Sanjay},
journal = {IEEE Transactions on Network Science and Engineering},
volume = {5},
number = {4},
pages = {313--325},
year = {2017},
publisher = {IEEE},
group = {journal}
}
Inspired by cyber-security applications, we consider the problem of detecting an infection process in a network when the indication that any particular node is infected is extremely noisy. Instead of waiting for a single node to provide sufficient evidence that it is indeed infected, we take advantage of the graph structure to detect cascades of weak indications of failures. We view the detection problem as a hypothesis testing problem, devise a new inference algorithm, and analyze its false positive and false negative errors in the high noise regime. Extensive simulations show that our algorithm is able to obtain low errors in the high noise regime by taking advantage of cascading topology analysis.
Chen, Yudong, Xinyang Yi, and Constantine Caramanis. “Convex and Nonconvex Formulations for Mixed Regression with Two Components: Minimax Optimal Rates.” IEEE Transactions on Information Theory 64, no. 3 (2017): 1738–66.
@article{chen2017convex,
title = {Convex and nonconvex formulations for mixed regression with two components: Minimax optimal rates},
author = {Chen, Yudong and Yi, Xinyang and Caramanis, Constantine},
journal = {IEEE Transactions on Information Theory},
volume = {64},
number = {3},
pages = {1738--1766},
year = {2017},
publisher = {IEEE},
group = {journal},
arxiv = {https://arxiv.org/pdf/1312.7006.pdf}
}
We consider the mixed regression problem with two components, under adversarial and stochastic noise. We give a convex optimization formulation that provably recovers the true solution, and provide upper bounds on the recovery errors for both arbitrary noise and stochastic noise settings. We also give matching minimax lower bounds (up to log factors), showing that under certain assumptions, our algorithm is information-theoretically optimal. Our results represent the first tractable algorithm guaranteeing successful recovery with tight bounds on recovery errors and sample complexity.
Xu, Huan, Constantine Caramanis, and Shie Mannor. “Statistical Optimization in High Dimensions.” Operations Research 64, no. 4 (2016): 958–79.
@article{xu2016statistical,
title = {Statistical optimization in high dimensions},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {Operations research},
volume = {64},
number = {4},
pages = {958--979},
year = {2016},
publisher = {INFORMS},
group = {journal}
}
We consider optimization problems whose parameters are known only approximately, based on noisy samples. In large-scale applications, the number of samples one can collect is typically of the same order of (or even less than) the dimensionality of the problem. This so-called high-dimensional statistical regime has been the object of intense recent research in machine learning and statistics, primarily due to phenomena inherent to this regime, such as the fact that the noise one sees here often dwarfs the magnitude of the signal itself. While relevant in numerous important operations research and engineering optimization applications, this setup falls far outside the traditional scope of robust and stochastic optimization. We propose three algorithms to address this setting, combining ideas from statistics, machine learning, and robust optimization. Our algorithms are motivated by three natural optimization objectives: minimizing the number of grossly violated constraints; maximizing the number of exactly satisfied constraints; and, finally, developing algorithms whose running time scales with the intrinsic dimension of a problem, as opposed to its observed dimension?a mismatch that, as we discuss in detail, can be dire in settings where constraints are meant to describe preferences of behaviors. The key ingredients of our algorithms are dimensionality reduction techniques from machine learning, robust optimization, and concentration of measure tools from statistics.
Ye, Qiaoyang, Ozgun Yilmaz Bursalioglu, Haralabos C Papadopoulos, Constantine Caramanis, and Jeffrey G Andrews. “User Association and Interference Management in Massive MIMO HetNets.” IEEE Transactions on Communications 64, no. 5 (2016): 2049–65.
@article{ye2016user,
title = {User association and interference management in massive MIMO HetNets},
author = {Ye, Qiaoyang and Bursalioglu, Ozgun Yilmaz and Papadopoulos, Haralabos C and Caramanis, Constantine and Andrews, Jeffrey G},
journal = {IEEE Transactions on Communications},
volume = {64},
number = {5},
pages = {2049--2065},
year = {2016},
publisher = {IEEE},
group = {journal},
arxiv = {https://arxiv.org/pdf/1509.07594.pdf}
}
Two key traits of 5G cellular networks are much higher base station (BS) densities - especially in the case of low-power BSs - and the use of massive MIMO at these BSs. This paper explores how massive MIMO can be used to jointly maximize the offloading gains and minimize the interference challenges arising from adding small cells. We consider two interference management approaches: joint transmission (JT) with local precoding, where users are served simultaneously by multiple BSs without requiring channel state information exchanges among cooperating BSs, and resource blanking, where some macro BS resources are left blank to reduce the interference in the small cell downlink. A key advantage offered by massive MIMO is channel hardening, which enables to predict instantaneous rates a priori. This allows us to develop a unified framework, where resource allocation is cast as a network utility maximization (NUM) problem, and to demonstrate large gains in cell-edge rates based on the NUM solution. We propose an efficient dual subgradient based algorithm, which converges towards the NUM solution. A scheduling scheme is also proposed to approach the NUM solution. Simulations illustrate more than 2x rate gain for 10th percentile users vs. an optimal association without interference management.
Gopalan, Aditya, Constantine Caramanis, and Sanjay Shakkottai. “Wireless Scheduling with Partial Channel State Information: Large Deviations and Optimality.” Queueing Systems 80, no. 4 (2015): 293–340.
@article{gopalan2015wireless,
title = {Wireless scheduling with partial channel state information: large deviations and optimality},
author = {Gopalan, Aditya and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {Queueing Systems},
volume = {80},
number = {4},
pages = {293--340},
year = {2015},
publisher = {Springer},
group = {journal},
arxiv = {https://arxiv.org/pdf/1405.6307.pdf}
}
We consider a server serving a time-slotted queued system of multiple packet-based flows, with exogenous packet arrivals and time-varying service rates. At each time, the server can observe instantaneous service rates for only a subset of flows (from within a fixed collection of observable subsets) before scheduling a flow in the subset for service. We are interested in queue-length aware scheduling to keep the queues short, and develop scheduling algorithms that use only partial service rate information from subsets of channels to minimize the likelihood of queue overflow in the system. Specifically, we present a new joint subset-sampling and scheduling algorithm called Max-Exp that uses only the current queue lengths to pick a subset of flows, and subsequently schedules a flow using the Exponential rule. When the collection of observable subsets is disjoint, we show that Max-Exp achieves the best exponential decay rate, among all scheduling algorithms using partial information, of the tail of the longest queue in the system. Towards this, we employ novel analytical techniques for studying the performance of scheduling algorithms using partial state, which may be of independent interest. These include new sample-path large deviations results for processes obtained by non-random, predictable sampling of sequences of independent and identically distributed random variables. A consequence of these results is that scheduling with partial state information yields a rate function significantly different from scheduling with full channel information. In the special case when the observable subsets are singleton flows, i.e., when there is effectively no a priori channel-state information, Max-Exp reduces to simply serving the flow with the longest queue; thus, our results show that to always serve the longest queue in the absence of any channel-state information is large-deviations optimal.
Mitliagkas, Ioannis, Michael Borokhovich, Alexandros G Dimakis, and Constantine Caramanis. “Frogwild!: Fast Pagerank Approximations on Graph Engines.” Proceedings of the VLDB Endowment (2015), 2015.
@article{ioannis2015frogwild,
title = {Frogwild!: Fast pagerank approximations on graph engines},
author = {Mitliagkas, Ioannis and Borokhovich, Michael and Dimakis, Alexandros G and Caramanis, Constantine},
journal = {Proceedings of the VLDB Endowment (2015)},
year = {2015},
group = {journal},
arxiv = {https://arxiv.org/pdf/1502.04281.pdf}
}
We propose FrogWild, a novel algorithm for fast approximation of high PageRank vertices, geared towards reducing network costs of running traditional PageRank algorithms. Our algorithm can be seen as a quantized version of power iteration that performs multiple parallel random walks over a directed graph. One important innovation is that we introduce a modification to the GraphLab framework that only partially synchronizes mirror vertices. This partial synchronization vastly reduces the network traffic generated by traditional PageRank algorithms, thus greatly reducing the per-iteration cost of PageRank. On the other hand, this partial synchronization also creates dependencies between the random walks used to estimate PageRank. Our main theoretical innovation is the analysis of the correlations introduced by this partial synchronization process and a bound establishing that our approximation is close to the true PageRank vector.
We implement our algorithm in GraphLab and compare it against the default PageRank implementation. We show that our algorithm is very fast, performing each iteration in less than one second on the Twitter graph and can be up to 7x faster compared to the standard GraphLab PageRank implementation.
Milling, Chris, Constantine Caramanis, Shie Mannor, and Sanjay Shakkottai. “Distinguishing Infections on Different Graph Topologies.” IEEE Transactions on Information Theory 61, no. 6 (2015): 3100–3120.
@article{milling2015distinguishing,
title = {Distinguishing infections on different graph topologies},
author = {Milling, Chris and Caramanis, Constantine and Mannor, Shie and Shakkottai, Sanjay},
journal = {IEEE Transactions on Information Theory},
volume = {61},
number = {6},
pages = {3100--3120},
year = {2015},
publisher = {IEEE},
group = {journal},
arxiv = {https://arxiv.org/pdf/1309.6545.pdf}
}
The history of infections and epidemics holds famous examples where understanding, containing and ultimately treating an outbreak began with understanding its mode of spread. Influenza, HIV and most computer viruses, spread person to person, device to device, through contact networks; Cholera, Cancer, and seasonal allergies, on the other hand, do not. In this paper we study two fundamental questions of detection: first, given a snapshot view of a (perhaps vanishingly small) fraction of those infected, under what conditions is an epidemic spreading via contact (e.g., Influenza), distinguishable from a "random illness" operating independently of any contact network (e.g., seasonal allergies); second, if we do have an epidemic, under what conditions is it possible to determine which network of interactions is the main cause of the spread – the causative network – without any knowledge of the epidemic, other than the identity of a minuscule subsample of infected nodes?
The core, therefore, of this paper, is to obtain an understanding of the diagnostic power of network information. We derive sufficient conditions networks must satisfy for these problems to be identifiable, and produce efficient, highly scalable algorithms that solve these problems. We show that the identifiability condition we give is fairly mild, and in particular, is satisfied by two common graph topologies: the grid, and the Erdos-Renyi graphs.
Chen, Yudong, Huan Xu, Constantine Caramanis, and Sujay Sanghavi. “Matrix Completion with Column Manipulation: Near-Optimal Sample-Robustness-Rank Tradeoffs.” IEEE Transactions on Information Theory 62, no. 1 (2015): 503–26.
@article{chen2015matrix,
title = {Matrix completion with column manipulation: Near-optimal sample-robustness-rank tradeoffs},
author = {Chen, Yudong and Xu, Huan and Caramanis, Constantine and Sanghavi, Sujay},
journal = {IEEE Transactions on Information Theory},
volume = {62},
number = {1},
pages = {503--526},
year = {2015},
publisher = {IEEE},
group = {journal},
arxiv = {https://arxiv.org/pdf/1102.2254.pdf}
}
This paper considers the problem of matrix completion when some number of the columns are completely and arbitrarily corrupted, potentially by a malicious adversary. It is well-known that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. One direct application comes from robust collaborative filtering. Here, some number of users are so-called manipulators who try to skew the predictions of the algorithm by calibrating their inputs to the system. In this paper, we develop an efficient algorithm for this problem based on a combination of a trimming procedure and a convex program that minimizes the nuclear norm and the l_1,2 norm. Our theoretical results show that given a vanishing fraction of observed entries, it is nevertheless possible to complete the underlying matrix even when the number of corrupted columns grows. Significantly, our results hold without any assumptions on the locations or values of the observed entries of the manipulated columns. Moreover, we show by an information-theoretic argument that our guarantees are nearly optimal in terms of the fraction of sampled entries on the authentic columns, the fraction of corrupted columns, and the rank of the underlying matrix. Our results therefore sharply characterize the tradeoffs between sample, robustness and rank in matrix completion.
Chen, Chao, Lark Kwon Choi, Gustavo De Veciana, Constantine Caramanis, Robert W Heath, and Alan C Bovik. “Modeling the Time?Varying Subjective Quality of HTTP Video Streams with Rate Adaptations.” IEEE Transactions on Image Processing 23, no. 5 (2014): 2206–21.
@article{chen2014modeling,
title = {Modeling the time?Varying subjective quality of HTTP video streams with rate adaptations},
author = {Chen, Chao and Choi, Lark Kwon and De Veciana, Gustavo and Caramanis, Constantine and Heath, Robert W and Bovik, Alan C},
journal = {IEEE Transactions on Image Processing},
volume = {23},
number = {5},
pages = {2206--2221},
year = {2014},
publisher = {IEEE},
arxiv = {https://arxiv.org/pdf/1311.6441.pdf},
group = {journal}
}
Newly developed HTTP-based video streaming technologies enable flexible rate-adaptation under varying channel conditions. Accurately predicting the users’ Quality of Experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality (TVSQ) of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer-duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos. Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for on-line TVSQ prediction in HTTP based streaming.
Singh, Ashish K, Ku He, Constantine Caramanis, and Michael Orshansky. “Modeling and Optimization Techniques for Yield-Aware SRAM Post-Silicon Tuning.” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33, no. 8 (2014): 1159–67.
@article{singh2014modeling,
title = {Modeling and optimization techniques for yield-aware SRAM post-silicon tuning},
author = {Singh, Ashish K and He, Ku and Caramanis, Constantine and Orshansky, Michael},
journal = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
volume = {33},
number = {8},
pages = {1159--1167},
year = {2014},
publisher = {IEEE},
group = {journal}
}
Caramanis, Constantine, Nedialko B Dimitrov, and David P Morton. “Efficient Algorithms for Budget-Constrained Markov Decision Processes.” IEEE Transactions on Automatic Control 59, no. 10 (2014): 2813–17.
@article{caramanis2014efficient,
title = {Efficient algorithms for budget-constrained Markov decision processes},
author = {Caramanis, Constantine and Dimitrov, Nedialko B and Morton, David P},
journal = {IEEE Transactions on Automatic Control},
volume = {59},
number = {10},
pages = {2813--2817},
year = {2014},
publisher = {IEEE},
group = {journal}
}
Khalek, Amin Abdel, Constantine Caramanis, and Robert W Heath. “Delay-Constrained Video Transmission: Quality-Driven Resource Allocation and Scheduling.” IEEE Journal of Selected Topics in Signal Processing 9, no. 1 (2014): 60–75.
@article{khalek2014delay,
title = {Delay-constrained video transmission: Quality-driven resource allocation and scheduling},
author = {Khalek, Amin Abdel and Caramanis, Constantine and Heath, Robert W},
journal = {IEEE Journal of Selected Topics in Signal Processing},
volume = {9},
number = {1},
pages = {60--75},
year = {2014},
publisher = {IEEE},
group = {journal}
}
Ganapathy, Harish, Constantine Caramanis, and Lei Ying. “Exploiting Sparse Dynamics for Bandwidth Reduction in Cooperative Sensing Systems.” IEEE Transactions on Signal Processing 61, no. 14 (2013): 3671–82.
@article{ganapathy2013exploiting,
title = {Exploiting sparse dynamics for bandwidth reduction in cooperative sensing systems},
author = {Ganapathy, Harish and Caramanis, Constantine and Ying, Lei},
journal = {IEEE transactions on signal processing},
volume = {61},
number = {14},
pages = {3671--3682},
year = {2013},
publisher = {IEEE},
group = {journal}
}
Ye, Qiaoyang, Beiyu Rong, Yudong Chen, Mazin Al-Shalash, Constantine Caramanis, and Jeffrey G Andrews. “User Association for Load Balancing in Heterogeneous Cellular Networks.” IEEE Transactions on Wireless Communications 12, no. 6 (2013): 2706–16.
@article{ye2013user,
title = {User association for load balancing in heterogeneous cellular networks},
author = {Ye, Qiaoyang and Rong, Beiyu and Chen, Yudong and Al-Shalash, Mazin and Caramanis, Constantine and Andrews, Jeffrey G},
journal = {IEEE Transactions on Wireless Communications},
volume = {12},
number = {6},
pages = {2706--2716},
year = {2013},
publisher = {IEEE},
group = {journal}
}
For small cell technology to significantly increase the capacity of tower-based cellular networks, mobile users will need to be actively pushed onto the more lightly loaded tiers (corresponding to, e.g., pico and femtocells), even if they offer a lower instantaneous SINR than the macrocell base station (BS). Optimizing a function of the long-term rate for each user requires (in general) a massive utility maximization problem over all the SINRs and BS loads. On the other hand, an actual implementation will likely resort to a simple biasing approach where a BS in tier j is treated as having its SINR multiplied by a factor A j ? 1, which makes it appear more attractive than the heavily-loaded macrocell. This paper bridges the gap between these approaches through several physical relaxations of the network-wide association problem, whose solution is NP hard. We provide a low-complexity distributed algorithm that converges to a near-optimal solution with a theoretical performance guarantee, and we observe that simple per-tier biasing loses surprisingly little, if the bias values A j are chosen carefully. Numerical results show a large (3.5x) throughput gain for cell-edge users and a 2x rate gain for median users relative to a maximizing received power association.
Chen, Yudong, Ali Jalali, Sujay Sanghavi, and Constantine Caramanis. “Low-Rank Matrix Recovery from Errors and Erasures.” IEEE Transactions on Information Theory 59, no. 7 (2013): 4324–37.
@article{chen2013low,
title = {Low-rank matrix recovery from errors and erasures},
author = {Chen, Yudong and Jalali, Ali and Sanghavi, Sujay and Caramanis, Constantine},
journal = {IEEE Transactions on Information Theory},
volume = {59},
number = {7},
pages = {4324--4337},
year = {2013},
publisher = {IEEE},
group = {journal},
arxiv = {https://arxiv.org/pdf/1104.0354.pdf}
}
This paper considers the recovery of a low-rank matrix from an observed version that simultaneously contains both (a) erasures: most entries are not observed, and (b) errors: values at a constant fraction of (unknown) locations are arbitrarily corrupted. We provide a new unified performance guarantee on when the natural convex relaxation of minimizing rank plus support succeeds in exact recovery. Our result allows for the simultaneous presence of random and deterministic components in both the error and erasure patterns. On the one hand, corollaries obtained by specializing this one single result in different ways recover (up to poly-log factors) all the existing works in matrix completion, and sparse and low-rank matrix recovery. On the other hand, our results also provide the first guarantees for (a) recovery when we observe a vanishing fraction of entries of a corrupted matrix, and (b) deterministic matrix completion.
Gopalan, Aditya, Constantine Caramanis, and Sanjay Shakkottai. “On the Value of Coordination and Delayed Queue Information in Multicellular Scheduling.” IEEE Transactions on Automatic Control 58, no. 6 (2013): 1443–56.
@article{gopalan2013value,
title = {On the value of coordination and delayed queue information in multicellular scheduling},
author = {Gopalan, Aditya and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {IEEE Transactions on Automatic Control},
volume = {58},
number = {6},
pages = {1443--1456},
year = {2013},
publisher = {IEEE},
group = {journal}
}
We study limited-coordination scheduling in a wireless downlink network with multiple base stations, each serving a collection of users over shared channel resources. When neighboring base stations simultaneously schedule users on the same channel resource, collisions occur due to interference, leading to loss of throughput. Full coordination to avoid this problem requires each base station to have complete ?instantaneous? channel-state information for all its own users, as well as the ability to communicate on the same timescale as channel fluctuations with neighboring base stations. As such a scheme is impractical, if not impossible, to implement, we consider the setting where each base station has only limited instantaneous channel-state information for its own users, and can communicate with other base stations with a significant lag from the channel state variations to coordinate scheduling decisions. In this setting, we first characterize the throughput capacity of the system. A key insight is that sharing delayed queue-length information enables coordination on a slow timescale among the base stations, and this permits each base station to use limited and local channel-state along with global delayed queue-state to stabilize its users’ packet queues. Based on this, we develop a distributed, queue-aware scheduling (and information exchange) algorithm that is provably throughput-optimal. Finally, we study the effect of inter-base-station coordination delay on the system packet delay performance under the throughput-optimal algorithm.
Heath Jr, Robert W, Alan C Bovik, Gustavo de Veciana, Constantine Caramanis, and Jeffrey G Andrews. “Perceptual Optimization of Large-Scale Wireless Video Networks.” E-LETTER, 2013.
@article{heath2013perceptual,
title = {Perceptual optimization of large-scale wireless video networks},
author = {Heath Jr, Robert W and Bovik, Alan C and de Veciana, Gustavo and Caramanis, Constantine and Andrews, Jeffrey G},
journal = {E-LETTER},
year = {2013},
group = {journal}
}
Barnhart, Cynthia, Dimitris Bertsimas, Constantine Caramanis, and Douglas Fearing. “Equitable and Efficient Coordination in Traffic Flow Management.” Transportation Science 46, no. 2 (2012): 262–80.
@article{barnhart2012equitable,
title = {Equitable and efficient coordination in traffic flow management},
author = {Barnhart, Cynthia and Bertsimas, Dimitris and Caramanis, Constantine and Fearing, Douglas},
journal = {Transportation science},
volume = {46},
number = {2},
pages = {262--280},
year = {2012},
publisher = {INFORMS},
group = {journal}
}
When air traffic demand is projected to exceed capacity, the Federal Aviation Administration implements traffic flow management (TFM) programs. Independently, these programs maintain a first-scheduled, first-served invariant, which is the accepted standard of fairness within the industry. Coordinating conflicting programs requires a careful balance between equity and efficiency. In our work, we first develop a fairness metric to measure deviation from first-scheduled, first-served in the presence of conflicts. Next, we develop an integer programming formulation that attempts to directly minimize this metric. We further develop an exponential penalty approach and show that its computational performance is far superior and its tradeoff between delay and fairness compares favorably. In our results, we demonstrate the effectiveness of these models using historical and hypothetical scenarios. Additionally, we demonstrate that the exponential penalty approach exhibits exceptional computational performance, implying practical viability. Our results suggest that this approach could lead to system-wide savings on the order of 25 to 50 million per year.
Xu, Huan, Constantine Caramanis, and Shie Mannor. “Optimization under Probabilistic Envelope Constraints.” Operations Research 60, no. 3 (2012): 682–99.
@article{xu2012optimization,
title = {Optimization under probabilistic envelope constraints},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {Operations Research},
volume = {60},
number = {3},
pages = {682--699},
year = {2012},
publisher = {INFORMS},
group = {journal}
}
Chance constraints are an important modeling tool in stochastic optimization, providing probabilistic guarantees that a solution ?succeeds? in satisfying a given constraint. Although they control the probability of ?success,? they provide no control whatsoever in the event of a ?failure.? That is, they do not distinguish between a slight overshoot or undershoot of the bounds and more catastrophic violation. In short, they do not capture the magnitude of violation of the bounds. This paper addresses precisely this topic, focusing on linear constraints and ellipsoidal (Gaussian-like) uncertainties. We show that the problem of requiring different probabilistic guarantees at each level of constraint violation can be reformulated as a semi-infinite optimization problem. We provide conditions that guarantee polynomial-time solvability of the resulting semi-infinite formulation. We show further that this resulting problem is what has been called a comprehensive robust optimization problem in the literature. As a byproduct, we provide tight probabilistic bounds for comprehensive robust optimization. Thus, analogously to the connection between chance constraints and robust optimization, we provide a broader connection between probabilistic envelope constraints and comprehensive robust optimization.
———. “A Distributional Interpretation of Robust Optimization.” Mathematics of Operations Research 37, no. 1 (2012): 95–110.
@article{xu2012distributional,
title = {A distributional interpretation of robust optimization},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {Mathematics of Operations Research},
volume = {37},
number = {1},
pages = {95--110},
year = {2012},
publisher = {INFORMS},
group = {journal}
}
Motivated by data-driven decision making and sampling problems, we investigate probabilistic interpretations of robust optimization (RO). We establish a connection between RO and distributionally robust stochastic programming (DRSP), showing that the solution to any RO problem is also a solution to a DRSP problem. Specifically, we consider the case where multiple uncertain parameters belong to the same fixed dimensional space and find the set of distributions of the equivalent DRSP problem. The equivalence we derive enables us to construct RO formulations for sampled problems (as in stochastic programming and machine learning) that are statistically consistent, even when the original sampled problem is not. In the process, this provides a systematic approach for tuning the uncertainty set. The equivalence further provides a probabilistic explanation for the common shrinkage heuristic, where the uncertainty set used in an RO problem is a shrunken version of the original uncertainty set.
Yun, Sungho, and Constantine Caramanis. “System-Level Optimization in Wireless Networks: Managing Interference and Uncertainty via Robust Optimization.” IEEE/ACM Transactions on Networking 20, no. 2 (2012): 339–52.
@article{yun2012system,
title = {System-level optimization in wireless networks: managing interference and uncertainty via robust optimization},
author = {Yun, Sungho and Caramanis, Constantine},
journal = {IEEE/ACM Transactions on Networking},
volume = {20},
number = {2},
pages = {339--352},
year = {2012},
publisher = {IEEE},
group = {journal}
}
Ganapathy, Harish, Siddhartha Banerjee, Nedialko B Dimitrov, and Constantine Caramanis. “Feedback Allocation for OFDMA Systems with Slow Frequency-Domain Scheduling.” IEEE Transactions on Signal Processing 60, no. 12 (2012): 6630–40.
@article{ganapathy2012feedback,
title = {Feedback allocation for OFDMA systems with slow frequency-domain scheduling},
author = {Ganapathy, Harish and Banerjee, Siddhartha and Dimitrov, Nedialko B and Caramanis, Constantine},
journal = {IEEE transactions on signal processing},
volume = {60},
number = {12},
pages = {6630--6640},
year = {2012},
publisher = {IEEE},
group = {journal}
}
Xu, Huan, Constantine Caramanis, and Sujay Sanghavi. “Robust PCA via Outlier Pursuit.” IEEE Transactions on Information Theory 58, no. 5 (2012): 3047–64.
@article{xu2012robust,
title = {Robust PCA via outlier pursuit},
author = {Xu, Huan and Caramanis, Constantine and Sanghavi, Sujay},
journal = {IEEE transactions on information theory},
volume = {58},
number = {5},
pages = {3047--3064},
year = {2012},
publisher = {IEEE},
group = {journal},
arxiv = {https://arxiv.org/pdf/1010.4237.pdf}
}
Singular-value decomposition (SVD) [and principal component analysis (PCA)] is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA, such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm that we call outlier pursuit, which under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation is of paramount interest in bioinformatics, financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization; however, our results, setup, and approach necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself. In any problem where one seeks to recover a structure rather than the exact initial matrices, techniques developed thus far relying on certificates of optimality will fail. We present an important extension of these methods, which allows the treatment of such problems.
Xu, Huan, Constantine Caramanis, and Shie Mannor. “Outlier-Robust PCA: the High-Dimensional Case.” IEEE Transactions on Information Theory 59, no. 1 (2012): 546–72.
@article{xu2012outlier,
title = {Outlier-robust PCA: the high-dimensional case},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {IEEE transactions on information theory},
volume = {59},
number = {1},
pages = {546--572},
year = {2012},
publisher = {IEEE},
group = {journal},
arxiv = {https://arxiv.org/pdf/1002.4658.pdf}
}
Principal component analysis plays a central role in statistics, engineering, and science. Because of the prevalence of corrupted data in real-world applications, much research has focused on developing robust algorithms. Perhaps surprisingly, these algorithms are unequipped-indeed, unable-to deal with outliers in the high-dimensional setting where the number of observations is of the same magnitude as the number of variables of each observation, and the dataset contains some (arbitrarily) corrupted observations. We propose a high-dimensional robust principal component analysis algorithm that is efficient, robust to contaminated points, and easily kernelizable. In particular, our algorithm achieves maximal robustness-it has a breakdown point of 50% (the best possible), while all existing algorithms have a breakdown point of zero. Moreover, our algorithm recovers the optimal solution exactly in the case where the number of corrupted points grows sublinearly in the dimension.
Gopalan, Aditya, Constantine Caramanis, and Sanjay Shakkottai. “On Wireless Scheduling with Partial Channel-State Information.” IEEE Transactions on Information Theory 58, no. 1 (2012): 403–20.
@article{gopalan2012wireless,
title = {On wireless scheduling with partial channel-state information},
author = {Gopalan, Aditya and Caramanis, Constantine and Shakkottai, Sanjay},
journal = {IEEE Transactions on Information Theory},
volume = {58},
number = {1},
pages = {403--420},
year = {2012},
publisher = {IEEE},
group = {journal}
}
A time-slotted queueing system for a wireless downlink with multiple flows and a single server is considered, with exogenous arrivals and time-varying channels. It is assumed that only one user can be serviced in a single time slot. Unlike much recent work on this problem, attention is drawn to the case where the server can obtain only partial information about the instantaneous state of the channel. In each time slot, the server is allowed to specify a single subset of flows from a collection of observable subsets, observe the current service rates for that subset, and subsequently pick a user to serve. The stability region for such a system is provided. An online scheduling algorithm is presented that uses information about marginal distributions to pick the subset and the Max-Weight rule to pick a flow within the subset, and which is provably throughput-optimal. In the case where the observable subsets are all disjoint, or where the subsets and channel statistics are symmetric, it is shown that a simple scheduling algorithm-Max-Sum-Queue-that essentially picks subsets having the largest squared-sum of queues, followed by picking a user using Max-Weight within the subset, is throughput-optimal.
Singh, Ashish Kumar, Kareem Ragab, Mario Lok, Constantine Caramanis, and Michael Orshansky. “Predictable Equation-Based Analog Optimization Based on Explicit Capture of Modeling Error Statistics.” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, no. 10 (2012): 1485–98.
@article{singh2012predictable,
title = {Predictable equation-based analog optimization based on explicit capture of modeling error statistics},
author = {Singh, Ashish Kumar and Ragab, Kareem and Lok, Mario and Caramanis, Constantine and Orshansky, Michael},
journal = {IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems},
volume = {31},
number = {10},
pages = {1485--1498},
year = {2012},
publisher = {IEEE},
group = {journal}
}
Khalek, Amin Abdel, Constantine Caramanis, and Robert W Heath. “A Cross-Layer Design for Perceptual Optimization of H. 264/SVC with Unequal Error Protection.” IEEE Journal on Selected Areas in Communications 30, no. 7 (2012): 1157–71.
@article{khalek2012cross,
title = {A cross-layer design for perceptual optimization of H. 264/SVC with unequal error protection},
author = {Khalek, Amin Abdel and Caramanis, Constantine and Heath, Robert W},
journal = {IEEE Journal on selected areas in Communications},
volume = {30},
number = {7},
pages = {1157--1171},
year = {2012},
publisher = {IEEE},
group = {journal}
}
Bertsimas, Dimitris, David B Brown, and Constantine Caramanis. “Theory and Applications of Robust Optimization.” SIAM Review 53, no. 3 (2011): 464–501.
@article{bertsimas2011theory,
title = {Theory and applications of robust optimization},
author = {Bertsimas, Dimitris and Brown, David B and Caramanis, Constantine},
journal = {SIAM review},
volume = {53},
number = {3},
pages = {464--501},
year = {2011},
publisher = {SIAM},
group = {journal},
arxiv = {https://arxiv.org/pdf/1010.5445.pdf}
}
In this paper we survey the primary research, both theoretical and applied, in the area of Robust Optimization (RO). Our focus is on the computational attractiveness of RO approaches, as well as the modeling power and broad applicability of the methodology. In addition to surveying prominent theoretical results of RO, we also present some recent results linking RO to adaptable models for multi-stage decision-making problems. Finally, we highlight applications of RO across a wide spectrum of domains, including finance, statistics, learning, and various areas of engineering.
Yun, Sungho, Constantine Caramanis, and Robert W Heath Jr. “Distributed Link Adaptation for Multicast Traffic in MIMO-OFDM Systems.” Physical Communication 4, no. 4 (2011): 286–95.
@article{yun2011distributed,
title = {Distributed link adaptation for multicast traffic in MIMO-OFDM systems},
author = {Yun, Sungho and Caramanis, Constantine and Heath Jr, Robert W},
journal = {Physical Communication},
volume = {4},
number = {4},
pages = {286--295},
year = {2011},
publisher = {Elsevier},
group = {journal}
}
Xu, Huan, Constantine Caramanis, and Shie Mannor. “Sparse Algorithms Are Not Stable: A No-Free-Lunch Theorem.” IEEE Transactions on Pattern Analysis and Machine Intelligence 34, no. 1 (2011): 187–93.
@article{xu2011sparse,
title = {Sparse algorithms are not stable: A no-free-lunch theorem},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {IEEE transactions on pattern analysis and machine intelligence},
volume = {34},
number = {1},
pages = {187--193},
year = {2011},
publisher = {IEEE},
group = {journal}
}
We consider two desired properties of learning algorithms: sparsity and algorithmic stability. Both properties are believed to lead to good generalization ability. We show that these two properties are fundamentally at odds with each other: A sparse algorithm cannot be stable and vice versa. Thus, one has to trade off sparsity and stability in designing a learning algorithm. In particular, our general result implies that ??-regularized regression (Lasso) cannot be stable, while ??-regularized regression is known to have strong stability properties and is therefore not sparse.
———. “Robust Regression and Lasso.” IEEE Transactions on Information Theory 56, no. 7 (2010): 3561–74.
@article{xu2010robusu,
title = {Robust regression and lasso},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {IEEE Transactions on Information Theory},
volume = {56},
number = {7},
pages = {3561--3574},
year = {2010},
publisher = {IEEE},
group = {journal},
arxiv = {https://arxiv.org/pdf/0811.1790.pdf}
}
Lasso, or \ell_1 regularized least squares, has been explored extensively for its remarkable sparsity properties. It is shown in this paper that the solution to Lasso, in addition to its sparsity, has robustness properties: it is the solution to a robust optimization problem. This has two important consequences. First, robustness provides a connection of the regularizer to a physical property, namely, protection from noise. This allows a principled selection of the regularizer, and in particular, generalizations of Lasso that also yield convex optimization problems are obtained by considering different uncertainty sets.
Secondly, robustness can itself be used as an avenue to exploring different properties of the solution. In particular, it is shown that robustness of the solution explains why the solution is sparse. The analysis as well as the specific results obtained differ from standard sparsity results, providing different geometric intuition. Furthermore, it is shown that the robust optimization formulation is related to kernel density estimation, and based on this approach, a proof that Lasso is consistent is given using robustness directly. Finally, a theorem saying that sparsity and algorithmic stability contradict each other, and hence Lasso is not stable, is presented.
Bertsimas, Dimitris, and Constantine Caramanis. “Finite Adaptability in Multistage Linear Optimization.” IEEE Transactions on Automatic Control 55, no. 12 (2010): 2751–66.
@article{bertsimas2010finite,
title = {Finite adaptability in multistage linear optimization},
author = {Bertsimas, Dimitris and Caramanis, Constantine},
journal = {IEEE Transactions on Automatic Control},
volume = {55},
number = {12},
pages = {2751--2766},
year = {2010},
publisher = {IEEE},
group = {journal}
}
In multistage problems, decisions are implemented sequentially, and thus may depend on past realizations of the uncertainty. Examples of such problems abound in applications of stochastic control and operations research; yet, where robust optimization has made great progress in providing a tractable formulation for a broad class of single-stage optimization problems with uncertainty, multistage problems present significant tractability challenges. In this paper we consider an adaptability model designed with discrete second stage variables in mind. We propose a hierarchy of increasing adaptability that bridges the gap between the static robust formulation, and the fully adaptable formulation. We study the geometry, complexity, formulations, algorithms, examples and computational results for finite adaptability. In contrast to the model of affine adaptability proposed in, our proposed framework can accommodate discrete variables. In terms of performance for continuous linear optimization, the two frameworks are complementary, in the sense that we provide examples that the proposed framework provides stronger solutions and vice versa. We prove a positive tractability result in the regime where we expect finite adaptability to perform well, and illustrate this claim with an application to Air Traffic Control.
Xu, Huan, Constantine Caramanis, and Shie Mannor. “Robustness and Regularization of Support Vector Machines.” Journal of Machine Learning Research 10, no. 7 (2009).
@article{xu2009robustness,
title = {Robustness and Regularization of Support Vector Machines.},
author = {Xu, Huan and Caramanis, Constantine and Mannor, Shie},
journal = {Journal of machine learning research},
volume = {10},
number = {7},
year = {2009},
group = {journal},
arxiv = {https://arxiv.org/pdf/0803.3490.pdf}
}
We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization, provides a robust optimization interpretation for the success of regularized SVMs. We use the this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well.
Daniels, Robert C, Constantine M Caramanis, and Robert W Heath. “Adaptation in Convolutionally Coded MIMO-OFDM Wireless Systems through Supervised Learning and SNR Ordering.” IEEE Transactions on Vehicular Technology 59, no. 1 (2009): 114–26.
@article{daniels2009adaptation,
title = {Adaptation in convolutionally coded MIMO-OFDM wireless systems through supervised learning and SNR ordering},
author = {Daniels, Robert C and Caramanis, Constantine M and Heath, Robert W},
journal = {IEEE Transactions on vehicular Technology},
volume = {59},
number = {1},
pages = {114--126},
year = {2009},
publisher = {IEEE},
group = {journal}
}
Multiple-input-multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM) wireless systems use link adaptation to exploit the dynamic nature of wireless environments. Link adaptation maximizes throughput while maintaining target reliability by adaptively selecting the modulation order and coding rate. Link adaptation is extremely challenging, however, due to the difficulty in predicting error rates in OFDM with binary convolutional codes, bit interleaving, MIMO processing, and real channel impairments. This paper proposes a new machine-learning framework that exploits past observations of the error rate and the associated channel-state information to predict the best modulation order and coding rate for new realizations of the channel state without modeling the input-output relationship of the wireless transceiver. Our approach is enabled through our new error-rate expression that is only parameterized by postprocessing signal-to-noise ratios (SNRs), ordered over subcarriers and spatial streams. Using ordered SNRs, we propose a low-dimensional feature set that enables machine learning to increase the accuracy of link adaptation. An IEEE 802.11n simulation study validates the application of this machine-learning framework in real channels and demonstrates the improved performance of SNR ordering as it compares with competing link-quality metrics.
Channappayya, Sumohana S, Alan Conrad Bovik, Constantine Caramanis, and Robert W Heath. “Design of Linear Equalizers Optimized for the Structural Similarity Index.” IEEE Transactions on Image Processing 17, no. 6 (2008): 857–72.
@article{channappayya2008design,
title = {Design of linear equalizers optimized for the structural similarity index},
author = {Channappayya, Sumohana S and Bovik, Alan Conrad and Caramanis, Constantine and Heath, Robert W},
journal = {IEEE transactions on image processing},
volume = {17},
number = {6},
pages = {857--872},
year = {2008},
publisher = {IEEE},
group = {journal}
}
We propose an algorithm for designing linear equalizers that maximize the structural similarity (SSIM) index between the reference and restored signals. The SSIM index has enjoyed considerable application in the evaluation of image processing algorithms. Algorithms, however, have not been designed yet to explicitly optimize for this measure. The design of such an algorithm is nontrivial due to the nonconvex nature of the distortion measure. In this paper, we reformulate the nonconvex problem as a quasi-convex optimization problem, which admits a tractable solution. We compute the optimal solution in near closed form, with complexity of the resulting algorithm comparable to complexity of the linear minimum mean squared error (MMSE) solution, independent of the number of filter taps. To demonstrate the usefulness of the proposed algorithm, it is applied to restore images that have been blurred and corrupted with additive white gaussian noise. As a special case, we consider blur-free image denoising. In each case, its performance is compared to a locally adaptive linear MSE-optimal filter. We show that the images denoised and restored using the SSIM-optimal filter have higher SSIM index, and superior perceptual quality than those restored using the MSE-optimal adaptive linear filter. Through these results, we demonstrate that a) designing image processing algorithms, and, in particular, denoising and restoration-type algorithms, can yield significant gains over existing (in particular, linear MMSE-based) algorithms by optimizing them for perceptual distortion measures, and b) these gains may be obtained without significant increase in the computational complexity of the algorithm.
Channappayya, Sumohana S, Alan Conrad Bovik, and Robert W Heath. “Rate Bounds on SSIM Index of Quantized Images.” IEEE Transactions on Image Processing 17, no. 9 (2008): 1624–39.
@article{channappayya2008rate,
title = {Rate bounds on SSIM index of quantized images},
author = {Channappayya, Sumohana S and Bovik, Alan Conrad and Heath, Robert W},
journal = {IEEE Transactions on Image Processing},
volume = {17},
number = {9},
pages = {1624--1639},
year = {2008},
publisher = {IEEE},
group = {journal}
}
In this paper, we derive bounds on the structural similarity (SSIM) index as a function of quantization rate for fixed-rate uniform quantization of image discrete cosine transform (DCT) coefficients under the high-rate assumption. The space domain SSIM index is first expressed in terms of the DCT coefficients of the space domain vectors. The transform domain SSIM index is then used to derive bounds on the average SSIM index as a function of quantization rate for uniform, Gaussian, and Laplacian sources. As an illustrative example, uniform quantization of the DCT coefficients of natural images is considered. We show that the SSIM index between the reference and quantized images fall within the bounds for a large set of natural images. Further, we show using a simple example that the proposed bounds could be very useful for rate allocation problems in practical image and video coding applications.
Caramanis, Constantine, and Shie Mannor. “An Inequality for Nearly Log-Concave Distributions with Applications to Learning.” IEEE Transactions on Information Theory 53, no. 3 (2007): 1043–57.
@article{caramanis2007inequality,
title = {An inequality for nearly log-concave distributions with applications to learning},
author = {Caramanis, Constantine and Mannor, Shie},
journal = {IEEE Transactions on Information Theory},
volume = {53},
number = {3},
pages = {1043--1057},
year = {2007},
publisher = {IEEE},
group = {journal}
}
We prove that given a nearly log-concave density, in any partition of the space to two well separated sets, the measure of the points that do not belong to these sets is large. We apply this isoperimetric inequality to derive lower bounds on the generalization error in learning. We also show that when the data are sampled from a nearly log-concave distribution, the margin cannot be large in a strong probabilistic sense. We further consider regression problems and show that if the inputs and outputs are sampled from a nearly log-concave distribution, the measure of points for which the prediction is wrong by more than \epsilon_0 and less than \epsilon_1 is (roughly) linear in \elsilon_1 - \epsilon_0.
Bertsimas, Dimitris, and Constantine Caramanis. “Bounds on Linear PDEs via Semidefinite Optimization.” Mathematical Programming 108, no. 1 (2006): 135–58.
@article{bertsimas2006bounds,
title = {Bounds on linear PDEs via semidefinite optimization},
author = {Bertsimas, Dimitris and Caramanis, Constantine},
journal = {Mathematical programming},
volume = {108},
number = {1},
pages = {135--158},
year = {2006},
publisher = {Springer},
group = {journal}
}
Using recent progress on moment problems, and their connections with semidefinite optimization, we present in this paper a new methodology based on semidefinite optimization, to obtain a hierarchy of upper and lower bounds on linear functionals defined on solutions of linear partial differential equations. We apply the proposed method to examples of PDEs in one and two dimensions, with very encouraging results. We pay particular attention to a PDE with oblique derivative conditions, commonly arising in queueing theory. We also provide computational evidence that the semidefinite constraints are critically important in improving the quality of the bounds, that is, without them the bounds are weak.
Rosenblum, Michael, Constantine Caramanis, Michel X Goemans, and Vahid Tarokh. “Approximating Fluid Schedules in Crossbar Packet-Switches and Banyan Networks.” IEEE/ACM Transactions on Networking 14, no. 6 (2006): 1374–87.
@article{rosenblum2006approximating,
title = {Approximating fluid schedules in crossbar packet-switches and Banyan networks},
author = {Rosenblum, Michael and Caramanis, Constantine and Goemans, Michel X and Tarokh, Vahid},
journal = {IEEE/ACM Transactions on Networking},
volume = {14},
number = {6},
pages = {1374--1387},
year = {2006},
publisher = {IEEE},
group = {journal}
}
Books Chapters
Caramanis, Constantine, Shie Mannor, and Huan Xu. “Robust Optimization in Machine Learning.” Optimization for Machine Learning, 2012, 369.
@article{caramanis201214,
title = {Robust Optimization in Machine Learning},
author = {Caramanis, Constantine and Mannor, Shie and Xu, Huan},
journal = {Optimization for machine learning},
pages = {369},
year = {2012},
publisher = {Mit Press},
group = {book}
}