文章詳目資料

先進工程學刊

  • 加入收藏
  • 下載文章
篇名 Nymph:以可合成Verilog HDL 設計之新型32 核心多處理器
卷期 6:4
並列篇名 Nymph: A Novel 32-Cores Multicore Processor Designed by Synthesizable Verilog HDL
作者 朱守禮許詔傑李耕學
頁次 277-286
關鍵字 MIPS 處理器多核心處理器匯流排互聯網路MIPS processormulticore processorbusinterconnection network
出刊日期 201110

中文摘要

現今高階電腦系統內都需要有一高效能處理器,用以快速完成使用者所下達任務。以往提高處理器效能的方法,主要是以製程技術以及深度管線化,提升處理器工作頻率。然而高工作頻率亦帶來難以解決的散熱問題。因此近年來,高效能處理器的設計重點,已從提高單一程式執行效率,轉向提高系統總產出量。其中多核心處理器就是一種可行方案,也就是以更多處理器做更多的任務,達到高產出量。本論文設計了一多核心處理器架構,名為Nymph,其中包含了單核心處理器的實現,以及串連32個處理器的互聯網路,並實際以DSPstone Benchmark驗證其功能正確性,更進一步探討其效能增益與瓶頸。Nymph 多核心架構內部包含32顆以MIPS指令集架構為基礎的處理器,整合8個記憶體模組,構成一共享記憶體的架構。為求面積成本與傳輸效率間的平衡,互聯網路由8x8 Crossbar與Bus組合而成;整個系統以Crossbar連接八個luster,
而Cluster內部透過Bus溝通,每個Cluster包含四個核心及一個記憶體。本論文所提及之架構,均以RTL Verilog實現。為能繼續進行後續的晶片開發,除了完成模型的製作之外,更著重使其能符合Verilog可合成設計的準則。設計完成後,進行架構的Verilog模擬,根據模擬結果,相較於單核心處理器,本多核心架構最高可達到18倍的效能。

英文摘要

A high performance processor is necessary in the modern computer system to
accomplish the complex missions of users. The major techniques to improving processor’s working frequency for high performance come from advancing semiconductor technology and deep pipelining stages of the processor in the past. However, high working frequency brings unsolvable cooling problems. For this reason, the design of the high performance processor focuses on the high throughput but not high working frequency. The multicore processor is one of the workable solutions because it can do more works by multiple cores in the processor. Accordingly, a novelmulticore processor, named Nymph, is proposed to illustrate the implementation of single-core processor and the interconnection network connecting with 32 processors. It has been examined by using DSPStone benchmark to verify the correctness of the function and analyze the efficiency benefit and limit. The inside architecture of Nymph includes 32 processors based on MIPS ISA and the combination of the eight memory modules. In order to reach the balance between the cost and transmission efficiency, interconnection network is composed by 8x8 crossbar and Bus. The crossbar connects with eight clusters which communicate by bus inside. Each cluster includes four cores and one memory. The architecture this paper mentioned is implemented by synthesizable RTL Verilog HDL so that it can be implemented into a chip by typical ASIC flow. According to simulate result, Nymph architecture comparing to single-core processor will reach eighteen times performance speedup.

相關文獻