Yasin Abbasi-Yadkori


DeepMind
Gmail: "yasin dot abbasi"

I am a researcher at DeepMind. Prior to that I was a researcher at VinAI and Adobe. I was a postdoctoral fellow at Queensland University of Technology with Peter Bartlett. I completed my PhD at University of Alberta under the supervision of Csaba Szepesvari.

Research interests: Artificial intelligence, machine learning, sequential decision problems

Google Scholar

Y. Abbasi-Yadkori, N. Lazic, C. Szepesvari, Model-Free Linear Quadratic Control via Reduction to Expert Prediction, International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. pdf

Y. Abbasi-Yadkori, P. Bartlett, and A. Malek, Linear Programming for Large-Scale Markov Decision Problems, International Conference on Machine Learning (ICML), 2014. pdf

Y. Abbasi-Yadkori, D. Pal, and Cs. Szepesvari, Improved Algorithms for Linear Stochastic Bandits, Neural Information Processing Systems (NIPS), 2011. pdf

Y. Abbasi-Yadkori and Cs. Szepesvari, Regret Bounds for the Adaptive Control of Linear Quadratic Systems, Conference on Learning Theory (COLT), 2011. pdf

B. Hao, Y. Abbasi-Yadkori, Z. Wen, G. Cheng, Bootstrapping Upper Confidence Bound, NIPS, 2019.

M. Phan, Y. Abbasi-Yadkori, J. Domke, Thompson Sampling and Approximate Inference, NIPS, 2019.

Y. Abbasi-Yadkori, P. L. Bartlett, K. Bhatia, N. Lazic, C. Szepesvari, G. Weisz, POLITEX: Regret Bounds for Policy Iteration using Expert Prediction, ICML, 2019.

T. Mai, A. Rao, M. Kapilevich, R. Rossi, Y. Abbasi-Yadkori, R. Sinha, On Densification for Minwise Hashing, UAI, 2019.

Y. Abbasi-Yadkori, N. Lazic, C. Szepesvari, Model-Free Linear Quadratic Control via Reduction to Expert Prediction, AISTATS, 2019.

    Preliminary Version: Y. Abbasi-Yadkori, N. Lazic, C. Szepesvari, Model-Free Linear Quadratic Control via Reduction to Expert Prediction, arXiv:1804.06021 [cs.LG], 2018. pdf

E. Banijamali, Y. Abbasi-Yadkori, M. Ghavamzadeh, N. Vlassis, Optimizing over a Restricted Policy Class in Markov Decision Processes, AISTATS, 2019.

    Preliminary Version: E. Banijamali, Y. Abbasi-Yadkori, M. Ghavamzadeh, N. Vlassis, Optimizing over a Restricted Policy Class in Markov Decision Processes, arXiv:1802.09646 [cs.LG], 2018. pdf

T. Nguyen, A. Shameli, Y. Abbasi-Yadkori, A. Rao, B. Kveton, Sample Efficient Graph-Based Optimization with Noisy Observations, AISTATS, 2019.

Y. Abbasi-Yadkori, P. L. Bartlett, V. Gabillon, A. Malek, M. Valko, Best of both worlds: Stochastic & adversarial best-arm identification, Conference on Learning Theory (COLT), 2018.

X. Cheng, N. S. Chatterji, Y. Abbasi-Yadkori, P. L. Bartlett, M. I. Jordan, Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting, arXiv:1805.01648 [stat.ML], 2018. pdf

S. Li, Y. Abbasi-Yadkori, B. Kveton, S. Muthukrishnan, V. Vinay, Z. Wen, Offline Evaluation of Ranking Policies with Click Models, International Conference on Knowledge Discovery and Data Mining (KDD), 2018.

    Preliminary Version: S. Li, Y. Abbasi-Yadkori, B. Kveton, S. Muthukrishnan, V. Vinay, Z. Wen, Offline Evaluation of Ranking Policies with Click Models, arXiv:1804.10488 [cs.LG], 2018. pdf

G. Theocharous, Z. Wen, Y. Abbasi-Yadkori, and N. Vlassis, Posterior Sampling for Large Scale Reinforcement Learning, Neural Information Processing Systems (NIPS), 2018.

    Preliminary Version: G. Theocharous, Z. Wen, Y. Abbasi-Yadkori, and N. Vlassis, Posterior Sampling for Large Scale Reinforcement Learning, arXiv:1711.07979 [cs.LG], 2017. pdf

A. Shameli and Y. Abbasi-Yadkori, A Continuation Method for Discrete Optimization and its Application to Nearest Neighbor Classification, arXiv:1802.03482 [cs.LG], 2018. pdf

B. Kveton, Cs. Szepesvari, A. Rao, Z., Yasin Abbasi-Yadkori, and S. Muthukrishnan, Stochastic Low-Rank Bandits, arXiv:1712.04644 [cs.LG], 2017. pdf

Y. Abbasi-Yadkori, P. Bartlett, and V. Gabillon, Near Minimax Optimal Players for the Finite-Time 3-Expert Prediction Problem, Neural Information Processing Systems (NIPS), 2017. pdf

A. Kazerouni, M. Ghavamzadeh, Y. Abbasi-Yadkori, and B. Van Roy, Conservative Contextual Linear Bandits, Neural Information Processing Systems (NIPS), 2017. pdf

Y. Abbasi-Yadkori, Fast Mixing Random Walks and Regularity of Incompressible Vector Fields, arXiv:1611.09252 [stat.CO], 2016. pdf

Y. Abbasi-Yadkori, P. L. Bartlett, V. Gabillon, and A. Malek, Hit-and-Run for Sampling and Planning in Non-Convex Spaces, Artificial Intelligence and Statistics (AISTATS), 2017. pdf

    Preliminary Version: Y. Abbasi-Yadkori, P. L. Bartlett, V. Gabillon, and A. Malek, Hit-and-Run for Sampling and Planning in Non-Convex Spaces, arXiv:1610.08865 [stat.CO], 2016. pdf

Y. Abbasi-Yadkori, P. L. Bartlett, and S. Wright, A Fast and Reliable Policy Improvement Algorithm, Artificial Intelligence and Statistics (AISTATS), 2016. pdf

W. M. Koolen, A. Malek, P. L. Bartlett, and Y. Abbasi-Yadkori, Minimax Time Series Prediction, Neural Information Processing Systems (NIPS), 2015. pdf

Y. Abbasi-Yadkori, P. Bartlett, X. Chen, and A. Malek, Large-Scale Markov Decision Problems with KL Control Cost and its Application to Crowdsourcing, International Conference on Machine Learning (ICML), 2015. pdf

Y. Abbasi-Yadkori and Cs. Szepesvari, Bayesian Optimal Control of Smoothly Parameterized Systems, Conference on Uncertainty in Artificial Intelligence (UAI), 2015. pdf

    Preliminary Version: Y. Abbasi-Yadkori and Cs. Szepesvari, Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm, arXiv:1406.3926 [cs.LG], 2014. pdf

Y. Abbasi-Yadkori, P. Bartlett, and A. Malek, Linear Programming for Large-Scale Markov Decision Problems, International Conference on Machine Learning (ICML), 2014. pdf

    Preliminary Version: Y. Abbasi-Yadkori, P. Bartlett, and A. Malek, Linear Programming for Large-Scale Markov Decision Problems, arXiv:1402.6763 [math.OC], 2014. pdf

Y. Abbasi-Yadkori, P. Bartlett, and V. Kanade, Tracking Adversarial Targets, International Conference on Machine Learning (ICML), 2014. pdf

Y. Seldin, P. Bartlett, K. Crammer, and Y. Abbasi-Yadkori, Prediction with Limited Advice and Multiarmed Bandits with Paid Observations, International Conference on Machine Learning (ICML), 2014.

Y. Abbasi-Yadkori and G. Neu, Online learning in MDPs with side information, arXiv:1406.6812 [cs.LG], 2014. pdf

Y. Abbasi-Yadkori, P. Bartlett, V. Kanade, Y. Seldin, and Cs. Szepesvari, Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions, Neural Information Processing Systems (NIPS), 2013. pdf

    Preliminary Version: Y. Abbasi-Yadkori, P. Bartlett, and Cs. Szepesvari, Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions, arXiv:1303.3055 [cs.LG], 2013. pdf

Y. Abbasi-Yadkori, D. Pal, and Cs. Szepesvari, Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits, Artificial Intelligence and Statistics (AISTATS), 2012. pdf

Y. Abbasi-Yadkori, D. Pal, and Cs. Szepesvari, Improved Algorithms for Linear Stochastic Bandits, Neural Information Processing Systems (NIPS), 2011. pdf

    Preliminary Version: Y. Abbasi-Yadkori, D. Pal, and Cs. Szepesvari, Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems, arXiv:1102.2670 [cs.AI], 2011. pdf

Y. Abbasi-Yadkori and Cs. Szepesvari, Regret Bounds for the Adaptive Control of Linear Quadratic Systems, Conference on Learning Theory (COLT), 2011. pdf

K. Hajebi, Y. Abbasi-Yadkori, H. Shahbazi, and H. Zhang, Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph, International Joint Conference on Artificial Intelligence (IJCAI), 2011. pdf

Y. Abbasi-Yadkori, J. Modayil, and Cs. Szepesvari, Extending Rapidly-Exploring Random Trees for Asymptotically Optimal Anytime Motion Planning, International Conference on Intelligent Robots and Systems (IROS), 2010. pdf

P. Hooper, Y. Abbasi-Yadkori, R. Greiner, and B. Hoehn, Improved Mean and Variance Approximations for Belief Net Responses via Network Doubling, Conference on Uncertainty in Artificial Intelligence (UAI), 2009. pdf

B. Poczos, Y. Abbasi-Yadkori, Cs. Szepesvari, R. Greiner, and N. Sturtevant, Learning when to stop thinking and do something!. International Conference on Machine Learning (ICML), 2009. pdf

M. Ravanbakhsh, Y. Abbasi-Yadkori, M. Abbaspour, H. Sarbazi-Azad, A heuristic routing mechanism using a new addressing scheme. International conference on Bio inspired models of network, information and computing systems, 2006. pdf

Workshop Papers

Y. Seldin, Cs. Szepesvari, P. Auer, Y. Abbasi-Yadkori, Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments, European Workshop on Reinforcement Learning (EWRL), 2013. pdf

Y. Abbasi-Yadkori, A. Antos, and Cs. Szepesvari, Forced-Exploration Based Algorithms for Playing in Stochastic Linear Bandits, COLT Workshop on On-line Learning with Limited Feedback, 2009. pdf

Theses

Ph.D. thesis: Online Learning for Linearly Parametrized Control Problems. Department of Computing Science, University of Alberta, September 2012. pdf

M.Sc. thesis: Forced-Exploration Based Algorithms for Playing in Bandits with Large Action Sets. Department of Computing Science, University of Alberta, Spring 2009. pdf