登录 | 注册 | 充值 | 退出 | 公司首页 | 繁体中文 | 满意度调查
综合馆
异构分布式系统DAG可靠性模型与容错算法
  • 摘要

    异构分布式系统性能得到大幅度提升的同时,却造成故障率大增,以有向无环图(Directed Acyclic Graph,DAG)任务模型研究异构分布式系统的容错调度成为当前的研究热点.广泛采用的基于任务复制的容错算法存在以下问题:(1) DAG任务可靠性需求与DAG可靠性需求的约束存在缺陷且缺乏严谨的理论证明;(2)每个任务仅有一个副版任务,不足以应对任务潜在的多次发生的故障;(3)盲目地使每个任务拥有e+l个副版来容忍可能的ε个故障,虽然提高了系统的可靠性但易造成系统冗余度过高,并付出昂贵的计算资源.文中首先分析DAG图中任务依赖关系,确定DAG任务的可靠性概率模型,并建立DAG可靠性模型;接着提出满足可靠性目标的任务复制下限值算法、经济的任务复制策略算法和贪婪的任务复制策略算法,精确量化各个任务需要复制的次数,最后在上述算法的基础上提出可选策略的DAG容错算法OPDFT(Optional Policy on DAG Fault-Tolerant).实验表明,OPDFT 算法的经济复制策略和贪婪复制策略的可靠性代价分别是盲目策略算法可靠性代价的60%和70%左右.

  • 作者

    谢国琪  李仁发  刘琳  杨帆  XIE Guo-Qi  LI Ren-Fa  LIU Lin  YANG Fan 

  • 作者单位

    湖南大学嵌入式与网络计算湖南省重点实验室 国家超级计算长沙中心 长沙410082

  • 刊期

    2013年10期 ISTIC EI PKU

  • 关键词

    异构分布式系统  可靠性  容错  有向无环图  任务复制  heterogeneous distributed systems  reliability  fault-tolerant  DAG  task replication 

参考文献
  • [1] 郭辉,王智广,周敬利. 异构分布式系统中基于负载均衡的容错调度算法. 计算机学报, 2005,11
  • [2] 罗威,阳富民,庞丽萍,涂刚. 异构分布式系统中实时周期任务的容错调度算法. 计算机学报, 2007,10
  • [3] 谢勇,李仁发,阮华斌,彭鑫. 最优的FlexRay静态段配置算法. 通信学报, 2012,11
  • [4] 贾佳,杨学军. 异构系统硬件故障传播行为分析及容错优化. 软件学报, 2011,12
  • [5] Ilavarasan E;Thambidurai P. Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. Journal of Computer Sciences, 2007,02
  • [6] 严蔚敏;吴伟民. 数据结构(C语言版). {H}北京:清华大学出版社, 2007
  • [7] Shatz S M;Wang J P;Goto M. Task allocation for maximizing reliability of distributed computer systems. {H}IEEE Transactions on Computers, 1992,09
  • [8] Dick R P;Rhodes D L;Wolf W. TGFF:Task graphs for free. Seattle,USA, 1998
  • [9] Chen Y;Zeng G;Ryo K. Effects of queueing jitter on worstcase response times of CAN messages with offsets. {H}Tokyo,Japan, 2012
  • [10] Topcuoglu H;Hariri S;Wu M. Performance-effective and low-complexity task scheduling for heterogeneous computing. {H}IEEE Transactions on Parallel and Distributed Systems, 2002,03
  • [11] Qin X;Jiang H;Swanson D R. An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems. {H}Vancouver,Canada, 2002
  • [12] Qin X;Jiang H. A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems. {H}Parallel Computing, 2006,05
  • [13] Zheng Q;Veeravalli B;Tham C K. On the design of faulttolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. {H}IEEE Transactions on Computers, 2009,03
  • [14] Lin J;Cheng A M K. Real-time task assignment with replication on multiprocessor platforms. Shenzhen,China, 2009
  • [15] Gopalakrishnan S;Caccamo M. Task partitioning with replication upon heterogeneous multiprocessor systems. San Jose,USA, 2006
  • [16] Benoit A;Hakem M;Robert Y. Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. Miami,USA, 2008
  • [17] Benoit A;Hakem M;Robert Y. Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems. {H}Parallel Computing, 2009,02
  • [18] Tabbaa N;Entezari-Maleki R;Movaghar A. A fault tolerant scheduling algorithm for dDAG applications in cluster environments. Ostrava,Czech, 2011
  • [19] Zhao L;Ren Y;Xiang Y. Fault-tolerant scheduling with dynamic number of replicas in heterogeneous systems. {H}Melbourne,Australia, 2010
查看更多︾
相似文献 查看更多>>
54.166.133.84