Optimizations and Cost Models for multi-core architectures: an approach based on parallel paradigms

دانلود کتاب Optimizations and Cost Models for multi-core architectures: an approach based on parallel paradigms

58000 تومان موجود

کتاب بهینه سازی ها و مدل های هزینه برای معماری های چند هسته ای: رویکردی مبتنی بر پارادایم های موازی نسخه زبان اصلی

دانلود کتاب بهینه سازی ها و مدل های هزینه برای معماری های چند هسته ای: رویکردی مبتنی بر پارادایم های موازی بعد از پرداخت مقدور خواهد بود
توضیحات کتاب در بخش جزئیات آمده است و می توانید موارد را مشاهده فرمایید


این کتاب نسخه اصلی می باشد و به زبان فارسی نیست.


امتیاز شما به این کتاب (حداقل 1 و حداکثر 5):

امتیاز کاربران به این کتاب:        تعداد رای دهنده ها: 10


توضیحاتی در مورد کتاب Optimizations and Cost Models for multi-core architectures: an approach based on parallel paradigms

نام کتاب : Optimizations and Cost Models for multi-core architectures: an approach based on parallel paradigms
عنوان ترجمه شده به فارسی : بهینه سازی ها و مدل های هزینه برای معماری های چند هسته ای: رویکردی مبتنی بر پارادایم های موازی
سری :
نویسندگان :
ناشر : Università di Pisa
سال نشر : 2014
تعداد صفحات : 313

زبان کتاب : English
فرمت کتاب : pdf
حجم کتاب : 7 مگابایت



بعد از تکمیل فرایند پرداخت لینک دانلود کتاب ارائه خواهد شد. درصورت ثبت نام و ورود به حساب کاربری خود قادر خواهید بود لیست کتاب های خریداری شده را مشاهده فرمایید.


فهرست مطالب :


I Introduction
Introduction
Structured parallel programming
Parallel patterns and their optimizations
Multiple memory interfaces
Automatic Cache Coherence
Introducing a performance model
Towards a parallel programming environment
List of Contributions of the Thesis
Outline of the Thesis
Current publications by the author
Background
Chip MultiProcessor architectures
Processor architecture
Interconnection network
Memory bandwidth and organization
Atomic operations and synchronizations
Cache coherence
Number of cores
Parallel programming on Chip MultiProcessors
Programming Languages
Libraries
Our vision of parallel programming
Performance model for multiprocessors
Algorithm oriented performance models for multiprocessors
Hardware-oriented performance cost models
Summary
Structured parallel programming for multi-core
The need for high level parallel programming
Structured parallel programming
Parallel Paradigms
Stream Parallelism
Task-Farm
Pipeline
Data Parallelism
Map
Reduce
Map + Reduce, a notable composition
Data-Parallel with Stencil
Stencil Transformations
Expressing Parallel Paradigms
Skeletons
ASSIST: Beyond the classical skeleton approach
The Virtual Processors approach
Parallel patterns and their (many) implementations
Mastering the possibilities, one piece at a time
Towards a novel parallel programming environment
Target architectures
II Cost Models
A hardware-dependent model based on QNs
A general approach to parallel performance prediction
The case of single-element streams
Performance prediction of a parallel module
An example: cost model for a trivial task-farm implementation
Sequential code analysis
Latency Model
Service Time Model
Evaluating the model parameters
Evaluating the sequential time
Modeling communications latencies
The final model for the task-farm example
Performance degradation on shared memory architectures
Extensions to the original queueing network
Modeling caches
Bus interconnections
Multiple Requests per processor
Complex interconnection networks
Cache coherency
Adapting the model to a concrete parallel architecture
Summary
A Queueing Network Model for Tilera TILEPro64™
EQNSim: a testing environment for queueing network models
Architecture overview of Tilera TILEPro64™
Processors
Cache Hierarchy and Coherency
Hash-for-Home
Single-Home
No-Home
Restriction on the model
Interconnection Network
Under Load Latency
Memory Subsystem
Memory Read Service Time
Memory Write Service Time
Working with Caches
Model Validation
Evaluation of Rq for store_linear
Evaluation of Rq for store_linear with a different store rate
Considerations on the accuracy of the model
Summary
III Optimizations
Exploiting Multiple Memory Controllers
Programming multi-cores
Memory allocation models
SMP-like memory allocation
NUMA-like memory allocation
Process allocation
Evaluation by mean of synthetic benchmarks
Experimental results on the target architectures
Concluding Remarks
Farm parallelization of the Sobel Operator
Experimental results on the target architectures
Concluding Remarks
Farm parallelization of the Vector Addition
Experimental results on the target architectures
Data-Parallel parallelization of the FFT
Parallel FFT
Experimental results on the target architectures
Concluding Remarks
Modeling policies in the architectural model
Summary
Software-based Cache Coherence
The cost of automatic cache coherence
Optimizing cache coherence for the farm pattern
Automatic cache coherence with hashed home node
Automatic cache coherence with fixed home node
Disabling automatic cache coherence
Experimental Results
Optimizing cache coherence for a data-parallel pattern
Automatic cache coherence with hashed home node
Automatic cache coherence with fixed home node
Disabling local caches
Disabling automatic cache coherence
Experimental Results
Summary
IV Wrapping Up
Wrapping up: compiling a parallel module on TilePro64
Example module and its application
Parallel pattern and its implementations
Parallel Patterns
Farm Implementations
Study of the message passing implementation
Architecture Model Parameters
Predicted Service Times
Study of the message passing impl. with copy on receive
Architecture Model Parameters
Predicted Service Times
Study of the pointer passing implementation
Architecture Model Parameters
Predicted Service Times
Selection of the best implementation
Impact of a multi-chip configuration
A multi-chip TilePro64 configuration
Network Latencies
Core reservation and placement on the mesh
Implementations and model parameters
Performance study
Summary
Conclusions
Bibliography




پست ها تصادفی