登录 | 注册 | 退出 | 公司首页 | 繁体中文 | 满意度调查
综合馆
一种具有O(1/T)收敛速率的稀疏随机算法
  • 摘要

    随机梯度下降(stochastic gradient descent,SGD)是一种求解大规模优化问题的简单高效方法,近期的研究表明,在求解强凸优化问题时其收敛速率可通过α-suffix平均技巧得到有效的提升.但SGD属于黑箱方法,难以得到正则化优化问题所期望的实际结构效果.另一方面,COMID (composite objective mirror descent)是一种能保证L1正则化结构的稀疏随机算法,但对于强凸优化问题其收敛速率仅为O(logT/T).主要考虑“L1+Hinge”优化问题,首先引入L2强凸项将其转化为强凸优化问题,进而将COMID算法和α-suffix平均技巧结合得到L1MD-α算法.证明了L1MD-α具有O(1/T)的收敛速率,并且获得了比COMID更好的稀疏性.大规模数据库上的实验验证了理论分析的正确性和所提算法的有效性.

  • 作者

    姜纪远  夏良  章显  陶卿  Jiang Jiyuan  Xia Liang  Zhang Xian  Tao Qing 

  • 作者单位

    中国人民解放军陆军军官学院 合肥 230031

  • 刊期

    2014年9期 ISTIC EI PKU

  • 关键词

    机器学习  随机优化  稀疏性  L1正则化  COMID  machine learning  stochastic optimization  sparsity  L1 regularization  COMID (composite objective mirror descent) 

参考文献
  • [1] 陶卿,高乾坤,姜纪远,储德军. 稀疏学习优化问题的求解综述. 软件学报, 2013,11
  • [2] Beck A.;Teboulle M.. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters: A Journal of the Operations Research Society of America, 2003,3
  • [3] Wang H;Banerjee A. Online alternating direction method. New York:ACM, 2012
  • [4] Ouyang H;He N;Tran L. Stochastic alternating direction method of multipliers. New York:ACM, 2013
  • [5] Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent. New York:ACM, 2003
  • [6] Hazan E;Agarwal A;Kale S. Logarithmic regret algorithms for online convex optimization. {H}Machine Learning, 2007,2
  • [7] Gabay D;Mercier B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. {H}Computers & Mathematics with Applications, 1976,1
  • [8] Shalev-Shwartz S. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 2011,2
  • [9] Hazan E;Kale S. Beyond the regret minimization barrier:An optimal algorithm for stochastic strongly-convex optimization. {H}JOURNAL OF MACHINE LEARNING RESEARCH, 2011
  • [10] Rakhlin A;Shamir O;Sridharan K. Making gradient descent optimal for strongly convex stochastic optimization. New York:ACM, 2012
  • [11] Langford J;Li L;Zhang T. Sparse online learning via truncated gradient. {H}JOURNAL OF MACHINE LEARNING RESEARCH, 2009
  • [12] Nesterov Y. How to advance in structural convex optimization. OPTIMA:Mathematical Programming Society Newsletter, 2008,78
  • [13] Duchi J;Shalev-Shwartz S;Singer Y. Composite objective mirror descent. New York:ACM, 2010
  • [14] Bregman L M. The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. {H}USSR COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 1967,3
  • [15] Shalev-Shwartz S;Zhang T. Stochastic dual coordinate ascent methods for regularized loss minimization. {H}JOURNAL OF MACHINE LEARNING RESEARCH, 2013
  • [16] Shalev-Shwartz S;Zhang T. Proximal stochastic dual coordinate ascent. http://arxiv.org/abs/1211.2717, 2012-12-01
  • [17] Yuan G X;Chang Kaiwei;Hsieh C J. A comparison of optimization methods and software for large-scale 11regularized linear classification. {H}JOURNAL OF MACHINE LEARNING RESEARCH, 2010
  • [18] Fan R E;Chang Kaiwei;Hsieh C J. LIBLINEAR:A library for large linear classification. {H}JOURNAL OF MACHINE LEARNING RESEARCH, 2008
  • [19] Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. {H}ANNALS OF STATISTICS, 2004,1
  • [20] Shalev-Shwartz S;Singer Y;Srebro N. Pegasos:Primal estimated sub gradient solver for svm. {H}Mathematical Programming, 2011,1
  • [21] Bach F;Moulines E. Non asymptotic analysis of stochastic approximation algorithms for machine learning. New York:ACM, 2011
  • [22] Nemirovski A;Juditsky A;Lan G. Robust stochastic approximation approach to stochastic programming. {H}SIAM JOURNAL ON OPTIMIZATION, 2009,4
  • [23] Nesterov Y. A method of solving a convex programming problem with convergence rate O (1/k2). Soviet Mathematics Doklady, 1983,2
  • [24] Nemirovski A;Yudin D. Problem Complexity and Method Efficiency in Optimization. New York:WileyInterscience, 1983
  • [25] Vapnik V N. Statistical Learning Theory. {H}New York:Wiley-Interscience, 1998
  • [26] Nesterov Y. Primal-dual subgradient methods for convex problems. {H}Mathematical Programming, 2009,1
  • [27] Xiao L. Dual averaging methods for regularized stochastic learning and online optimization. {H}JOURNAL OF MACHINE LEARNING RESEARCH, 2010
查看更多︾
相似文献 查看更多>>
3.234.244.18