文章詳目資料

Journal of Computers EIMEDLINEScopus

  • 加入收藏
  • 下載文章
篇名 A Greedy Approach with New Cost Model for Intermediate Datasets Storage Problem in General Workflows
卷期 29:1
作者 Zimao LiYingying Wang
頁次 166-174
關鍵字 delay tolerancegreedy algorithmintermediate datasets storagetransfer costusage rateEIMEDLINEScopus
出刊日期 201802
DOI 10.3966/199115992018012901015

中文摘要

英文摘要

Running a scientific workflow on the cloud will generate a large volume of intermediate datasets and many of them have valuable information that can be used for further study, but the cost of storing them all is unbelievably high for the enormous data size. A feasible solution is to keep some of the intermediate datasets stored and re-compute the others when needed, the intermediate dataset storage problem asks to find a tradeoff to minimize the total cost of storing or re-generating each of the intermediate datasets. This paper focuses on a new cost model for the problem with general workflow, which incorporates additional delay tolerance, usage frequency and the transfer cost to make the cost model becoming more general. Based on a directed acyclic graph describing the dependence relationship between datasets, a greedy approach for the problem is proposed and implemented. Experimental results demonstrate the effectiveness and efficiency of our algorithm.

本卷期文章目次

相關文獻