篇名 | A Greedy Approach with New Cost Model for Intermediate Datasets Storage Problem in General Workflows |
---|---|
卷期 | 29:1 |
作者 | Zimao Li 、 Yingying Wang |
頁次 | 166-174 |
關鍵字 | delay tolerance 、 greedy algorithm 、 intermediate datasets storage 、 transfer cost 、 usage rate 、 EI 、 MEDLINE 、 Scopus |
出刊日期 | 201802 |
DOI | 10.3966/199115992018012901015 |
Running a scientific workflow on the cloud will generate a large volume of intermediate datasets and many of them have valuable information that can be used for further study, but the cost of storing them all is unbelievably high for the enormous data size. A feasible solution is to keep some of the intermediate datasets stored and re-compute the others when needed, the intermediate dataset storage problem asks to find a tradeoff to minimize the total cost of storing or re-generating each of the intermediate datasets. This paper focuses on a new cost model for the problem with general workflow, which incorporates additional delay tolerance, usage frequency and the transfer cost to make the cost model becoming more general. Based on a directed acyclic graph describing the dependence relationship between datasets, a greedy approach for the problem is proposed and implemented. Experimental results demonstrate the effectiveness and efficiency of our algorithm.