登录 | 注册 | 充值 | 退出 | 公司首页 | 繁体中文 | 满意度调查
综合馆
神经元动态规划综述
  • 摘要

    神经元动态规划是近年发展起来的一种优化方法.它采用计算机仿真和函数近似,简化对状态空间的搜索,可以有效克服"维数危机” ,有广阔的应用前景.本文对神经元动态规划作一综述,希望能对相关研究有所帮种

  • 作者

    金辉宇  于海斌 

  • 作者单位

    中国科学院沈阳自动化研究所

  • 刊期

    2001年4期 ISTIC PKU

  • 关键词

    动态规划  神经元动态规划  近似  仿真  暂态差分学习 

参考文献
  • [1] W Zhang;T G Dietterich. A Reinforcement Learning Approach to Job Shop Scheduling. Proceedings of the 14th IJCAI
  • [2] P Marbach;J N Tsitsiclis. A Neuro-Dynamic Programming Approach to Admission Control in ATM Networks: the Single Link Case, Technical Report LIDS-P-2402. Laboratory for Information and Decision System M I T November, 1997
  • [3] C J C H Watkins. Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University. Cambridge England, 1989
  • [4] P D Dayan. The Covergence of for General. Machine Learning, 1992
  • [5] J N Tsitsiklis;B Van Roy. An Analysis of Temporal-Difference Learning with Function Approximation. IEEE Transactions on Automatic Control, 1997,05
  • [6] J N Tsitsiklis;B Van Roy. Averange Cost Temporal-Difference Learning. Automatica, 1999,11
  • [7] C J C H Watkins;P Dayan;Q-Learning. . Machine Learning, 1992
  • [8] D P Bertsekas;J N Tsitisklis. Neuro-Dynamic Programming: An Overview, Proceedings of the 34th Conference on Decision & Control. New Orleans LA December, 1995
  • [9] J N Tsitsiklis;B Van Roy. Neuro-Dynamic Programming Overview and a Case Study in Optimal Stopping. San Diego California USA December, 1997
  • [10] B Van Roy;D P Bertsekas;Yuchun Lee. J N Tsitsiklis.A Neuro-Dynamic Programming Approach to Retailer Inventory Management. Proceedings of the 36th Conference on Decision & Control, San Diego. California USADecember, 1997
  • [11] R S Sutton. Learnning to Predict by the Methods of Temporal Differences. Machine Learning, 1988
  • [12] D P Bertsekas;李人厚;韩崇昭. 动态规划:确定性和随机性模型. 西安:西安交通大学出版社, 1988
  • [13] R E Schapire;M K Warmuth. On the Worst-Case Analysis of Temporal-Difference Learning Algorithms. Machine Learning, 1996
  • [14] 转引自张有为动态规划. 长沙:湖南科学技术出版社, 1991
  • [15] A G Barto;S J Bradtke;S P Singh. Learning to Act Using Real-Time Dynamic Programming. Artificial Intelligence, 1995
  • [16] T Jaakkola;M I Jordan;S P Singh. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Neural Computation, 1994,06
  • [17] J N Tsitsiklis;B Van Roy. Optimal Stopping of Markov Processes: Hilbert Space Theory, Approximation Algorithms, and an Application to Pricing Financial Derivatives. IEEE Transactions on Automatic Control, 1999,10
  • [18] J N Tsitsiklis;B Van Roy. Eature-Based Methods for Large Scale Dynamic Programming. Machine Learning, 1996
  • [19] D P Bertsekas;J N Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996
  • [20] D P Bertsekas;M L Homer;D A Logan;S D Patek,N R Sandell. Missile Defense and Interceptor Allocation By Neuro-Dynamic Programming. Man and Cybernetics part A, 2000
  • [21] G J Tesauro. Practcal Issues in Temporal Difference Learning. Machine Learning, 1992
  • [22] P Marbach;J N Tsitsiklis. Simulation-Based Optimization of Markov Reward Processes, Submitted to the IEEE Transactions on Automatic Control; Technical Report LIDS-P-2411. February, 1998
查看更多︾
相似文献 查看更多>>
35.153.135.60