• [May 26th, 2019]: A new paper is out on derivative free optimization with momentum with new rates and results on continuous controls tasks. arXiv.
  • [May 25th, 2019]: New paper! New provably tight interval bounds are derived for DNNs. This allows for very simple robust training of large DNNs. arXiv.
  • [May 11th, 2019]: How to train robust networks outperforming 2-21x fold data augmentation? New paper out on arXiv.
  • [May 6th, 2019]: Attended ICLR19 in New Orleans.
  • [Feb 4th, 2019]: New paper on derivative-free optimization with importance sampling is out! Paper is on arXiv.
  • [Dec 22nd, 2018]: One paper accepted to ICLR19, Louisiana, USA.
  • [Nov 6th, 2018]: One paper accepted to WACV19, Hawaii, USA.
  • [July 3rd, 2018]: One paper accepted to ECCV18, Munich, Germany.
  • [June 19th, 2018]: Attended CVPR18 and gave an oral talk on our most recent work on analyzing piecewise linear deep networks using Gaussian network moments. Tensorflow, Pytorch and MATLAB codes are released.
  • [June 17th, 2018]: Received a fully funded scholarship to attend the AI-DLDA 18 summer school in Udine, Italy. Unfortunately, I won’t be able to attend for time constraints. Link
  • [June 15th, 2018]: New paper out! “Improving SAGA via a Probabilistic Interpolation with Gradient Descent”.
  • [April 30th, 2018]: I’m interning for 6 months at the Intel Labs in Munich this summer with Vladlen Koltun.
  • [April 22nd, 2018]: Recognized as an outstanding reviewer for CVPR18. I’m also on the list of emergency reviewers. Check it out. :)
  • [March 6th, 2018]: One paper accepted as [Oral] in CVPR 2018.
  • [Feb 5, 2018]: Awarded the best KAUST poster prize in the Optimization and Big Data Conference.
  • [Decemmber 11, 2017]: TCSC code is on github.
  • [October 22, 2017]: Attened ICCV17, Venice, Italy.
  • [July 22, 2017]: Attened CVPR17 in Hawaii and gave an oral presentation on our work on solving the LASSO with FFTs, July 2017.
  • [July 16, 2017]: FFTLasso’s code is available online.
  • [July 9, 2017]: Attended the ICVSS17, Sicily, Italy.
  • [June 15, 2017]: Selected to attend the International Computer Vision Summer School (ICVSS17), Sicily, Italy.
  • [March 17, 2017]: 1 paper accepted to ICCV17.
  • [March 14, 2017]: Received my NanoDegree on Deep Learning from Udacity.
  • [March 3, 2017]: 1 oral paper accepted to CVPR17, Hawai, USA.
  • [October 19, 2016]: ECCV16’s code has been released on github.
  • [October 8, 2016]: Attended ECCV16, Amsterdam, Netherlands.
  • [July 11, 2016]: 1 spotlight paper accepted to ECCV16, Amsterdam, Netherlands.
  • [June 26, 2016]: Attended CVPR16, Las Vegas, USA. Two papers presented.
  • [May 13, 2016]: ICCVW15 code is now avaliable online.
  • [April 11, 2016]: Successfully defended my Master’s Thesis.
  • [March 2, 2016]: 2 papers (1 spotlight) accepted to CVPR16, Las Vegas, USA.
  • [November 20, 2015]: 1 paper acceted to ICCVW15, Santiago, Chile.
  • [June 8, 2015]: Attended CVPR15, Boston, USA.

Selected Publications

​​We consider the problem of unconstrained minimization of a smooth objective function in ℝd in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP). We show new complexity results for non-convex, convex and strongly convex functions. We test our method on a collection of learning to continuous control tasks on several MuJoCo environments with varying difficulty and compare against STP, other state-of-the-art derivative-free optimization algorithms and against policy gradient methods. SMTP significantly outperforms STP and all other methods that we considered in our numerical experiments. Our second contribution is SMTP with importance sampling which we call SMTP_IS. We provide convergence analysis of this method for non-convex, convex and strongly convex objectives.
arXiv, 2019.

​​Training Deep Neural Networks (DNNs) that are robust to norm bounded adversarial attacks remains an elusive problem. While verification based methods are generally too expensive to robustly train large networks, it was demonstrated in Gowal et. al. that bounded input intervals can be inexpensively propagated per layer through large networks. This interval bound propagation (IBP) approach lead to high robustness and was the first to be employed on large networks. However, due to the very loose nature of the IBP bounds, particularly for large networks, the required training procedure is complex and involved. In this paper, we closely examine the bounds of a block of layers composed of an affine layer followed by a ReLU nonlinearity followed by another affine layer. In doing so, we propose probabilistic bounds, true bounds with overwhelming probability, that are provably tighter than IBP bounds in expectation. We then extend this result to deeper networks through blockwise propagation and show that we can achieve orders of magnitudes tighter bounds compared to IBP. With such tight bounds, we demonstrate that a simple standard training procedure can achieve the best robustness-accuracy trade-off across several architectures on both MNIST and CIFAR10.
arXiv, 2019.

​​Despite the impressive performance of deep neural networks (DNNs) on numerous vision tasks, they still exhibit yet-to-understand uncouth behaviours. One puzzling behaviour is the subtle sensitive reaction of DNNs to various noise attacks. Such a nuisance has strengthened the line of research around developing and training noise-robust networks. In this work, we propose a new training regularizer that aims to minimize the probabilistic expected training loss of a DNN subject to a generic Gaussian input. We provide an efficient and simple approach to approximate such a regularizer for arbitrary deep networks. This is done by leveraging the analytic expression of the output mean of a shallow neural network; avoiding the need for the memory and computationally expensive data augmentation. We conduct extensive experiments on LeNet and AlexNet on various datasets including MNIST, CIFAR10, and CIFAR100 demonstrating the effectiveness of our proposed regularizer. In particular, we show that networks that are trained with the proposed regularizer benefit from a boost in robustness equivalent to performing 3-21 folds of data augmentation.
arXiv, 2019.

​​We consider the problem of unconstrained minimization of a smooth objective function in a setting where only function evaluations are possible. While importance sampling is one of the most popular techniques used by machine learning practitioners to accelerate the convergence of their models when applicable, there is not much existing theory for this acceleration in the derivative-free setting. In this paper, we propose the first derivative free optimization method with importance sampling and derive new improved complexity results on non-convex, convex and strongly convex functions. We conduct extensive experiments on various synthetic and real LIBSVM datasets confirming our theoretical results. We further test our method on a collection of continuous control tasks on MuJoCo environments with varying difficulty. Experiments suggest that our algorithm is practical for high dimensional continuous control problems where importance sampling results in a significant sample complexity improvement.
arXiv, 2019.

​​We provide a novel perspective on the forward pass through a block of layers in a deep network. In particular, we show that a forward pass through a standard dropout layer followed by a linear layer and a non-linear activation is equivalent to optimizing a convex optimization objective with a single iteration of a $ au$-nice Proximal Stochastic Gradient method. We further show that replacing standard Bernoulli dropout with additive dropout is equivalent to optimizing the same convex objective with a variance-reduced proximal method. By expressing both fully-connected and convolutional layers as special cases of a high-order tensor product, we unify the underlying convex optimization problem in the tensor setting and derive a formula for the Lipschitz constant $L$ used to determine the optimal step size of the above proximal methods. We conduct experiments with standard convolutional networks applied to the CIFAR-10 and CIFAR-100 datasets, and show that replacing a block of layers with multiple iterations of the corresponding solver, with step size set via $L$, consistently improves classification accuracy.
In ICLR19, 2019.

​​We develop and analyze a new algorithm for empirical risk minimization, which is the key paradigm for training supervised machine learning models. Our method—SAGD—is based on a probabilistic interpolation of SAGA and gradient descent (GD). In particular, in each iteration we take a gradient step with probability q and a SAGA step with probability 1−q. We show that, surprisingly, the total expected complexity of the method (which is obtained by multiplying the number of iterations by the expected number of gradients computed in each iteration) is minimized for a non-trivial probability q. For example, for a well conditioned problem the choice q=1/(n−1)2, where n is the number of data samples, gives a method with an overall complexity which is better than both the complexity of GD and SAGA. We further generalize the results to a probabilistic interpolation of SAGA and minibatch SAGA, which allows us to compute both the optimal probability and the optimal minibatch size. While the theoretical improvement may not be large, the practical improvement is robustly present across all synthetic and real data we tested for, and can be substantial. Our theoretical results suggest that for this optimal minibatch size our method achieves linear speedup in minibatch size, which is of key practical importance as minibatch implementations are used to train machine learning models in practice. This is the first time linear speedup in minibatch size is obtained for a variance reduced gradient-type method by directly solving the primal empirical risk minimization problem.
arXiv, 2018.

​​The outstanding performance of deep neural networks (DNNs), for the visual recognition task in particular, has been demonstrated on several large-scale benchmarks. This performance has immensely strengthened the line of research that aims to understand and analyze the driving reasons behind the effectiveness of these networks. One impor- tant aspect of this analysis has recently gained much attention, namely the reaction of a DNN to noisy input. This has spawned research on developing adversarial input attacks as well as training strategies that make DNNs more robust against these attacks. To this end, we derive in this paper exact analytic expressions for the first and second moments (mean and variance) of a small piecewise linear (PL) network (Affine, ReLU, Affine) subject to general Gaussian input. We experimentally show that these expressions are tight under simple linearizations of deeper PL-DNNs, especially popular architectures in the literature (e.g. LeNet and AlexNet). Extensive experiments on image classifica- tion show that these expressions can be used to study the behaviour of the output mean of the logits for each class, the interclass confusion and the pixel-level spatial noise sensitivity of the network. Moreover, we show how these expres- sions can be used to systematically construct targeted and non-targeted adversarial attacks.
[Oral] In CVPR18, 2018.

​Convolutional sparse coding (CSC) has gained attention for its successful role as a reconstruction and a classification tool in the computer vision and machine learning community. Current CSC methods can only reconstruct singlefeature 2D images independently. However, learning multidimensional dictionaries and sparse codes for the reconstruction of multi-dimensional data is very important, as it examines correlations among all the data jointly. This provides more capacity for the learned dictionaries to better reconstruct data. In this paper, we propose a generic and novel formulation for the CSC problem that can handle an arbitrary order tensor of data. Backed with experimental results, our proposed formulation can not only tackle applications that are not possible with standard CSC solvers, including colored video reconstruction (5D–tensors), but it also performs favorably in reconstruction with much fewer parameters as compared to naive extensions of standard CSC to multiple features/channels.
In ICCV17, 2017.

​​In this paper, we revisit the LASSO sparse representation problem, which has been studied and used in a variety of different areas, ranging from signal processing and information theory to computer vision and machine learning. In the vision community, it found its way into many important applications, including face recognition, tracking, super resolution, image denoising, to name a few. Despite advances in efficient sparse algorithms, solving large-scale LASSO problems remains a challenge. To circumvent this difficulty, people tend to downsample and subsample the problem (e.g. via dimensionality reduction) to maintain a manageable sized LASSO, which usually comes at the cost of losing solution accuracy. This paper proposes a novel circulant reformulation of the LASSO that lifts the problem to a higher dimension, where ADMM can be efficiently applied to its dual form. Because of this lifting, all optimization variables are updated using only basic element-wise operations, the most computationally expensive of which is a 1D FFT. In this way, there is no need for a linear system solver nor matrix-vector multiplication. Since all operations in our FFTLasso method are element-wise, the subproblems are completely independent and can be trivially parallelized (e.g. on a GPU). The attractive computational properties of FFTLasso are verified by extensive experiments on synthetic and real data and on the face recognition task. They demonstrate that FFTLasso scales much more effectively than a state-of-the-art solver.
[Oral] In CVPR17, 2017.


. A Stochastic Derivative Free Optimization Method with Momentum. arXiv, 2019.


. Probabilistically True and Tight Bounds for Robust Deep Neural Network Training. arXiv, 2019.

PDF Code

. Analytical Moment Regularizer for Gaussian Robust Networks. arXiv, 2019.

PDF Code

. A Stochastic Derivative-Free Optimization Method with Importance Sampling: Theory and Learning to Control. arXiv, 2019.


. Deep Layers as Stochastic Solvers. In ICLR19, 2019.

PDF Poster

. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild. In ECCV18, 2018.

PDF Code Project Poster Video

. Improving SAGA via a Probabilistic Interpolation with Gradient Descent. arXiv, 2018.


. In Defense of Sparse Tracking: Circulant Sparse Tracker. [Spotlight] In CVPR16, 2016.

PDF Poster Video Supplementary Material

Recent & Upcoming Talks

Analytic Expressions for Probabilistic Moments of PL-DNN With Gaussian Input
Jun 21, 2018 2:50 PM
FFTLasso: Large-Scale LASSO in the Fourier Domain
Jul 24, 2017 8:45 AM
High Order Tensor Formulation for Convolutional Sparse Coding
Feb 5, 2018 2:20 PM
Target Response Adaptation for Correlation Filter Tracking
Oct 14, 2016 10:00 AM