Introduction to High Performance Computing for Scientists and Engineers (Chapman & Hall/CRC Computational Science)

Introduction to High Performance Computing for Scientists and Engineers (Chapman & Hall/CRC Computational Science)

Georg Hager

Language: English

Pages: 360

ISBN: 143981192X

Format: PDF / Kindle (mobi) / ePub


Written by high performance computing (HPC) experts, Introduction to High Performance Computing for Scientists and Engineers provides a solid introduction to current mainstream computer architecture, dominant parallel programming models, and useful optimization strategies for scientific HPC. From working in a scientific computing center, the authors gained a unique perspective on the requirements and attitudes of users as well as manufacturers of parallel computers.

The text first introduces the architecture of modern cache-based microprocessors and discusses their inherent performance limitations, before describing general optimization strategies for serial code on cache-based architectures. It next covers shared- and distributed-memory parallel computer architectures and the most relevant network topologies. After discussing parallel computing on a theoretical level, the authors show how to avoid or ameliorate typical performance problems connected with OpenMP. They then present cache-coherent nonuniform memory access (ccNUMA) optimization techniques, examine distributed-memory parallel programming with message passing interface (MPI), and explain how to write efficient MPI code. The final chapter focuses on hybrid programming with MPI and OpenMP.

Users of high performance computers often have no idea what factors limit time to solution and whether it makes sense to think about optimization at all. This book facilitates an intuitive understanding of performance limitations without relying on heavy computer science knowledge. It also prepares readers for studying more advanced literature.

Read about the authors’ recent honor: Informatics Europe Curriculum Best Practices Award for Parallelism and Concurrency

Introduction to Operating System Design and Implementation: The OSP 2 Approach (Undergraduate Topics in Computer Science)

GPU Pro 4: Advanced Rendering Techniques

Introduction to Artificial Intelligence (Undergraduate Topics in Computer Science)

Writing Compilers and Interpreters: A Software Engineering Approach

Google Secrets

 

 

 

 

 

 

 

 

 

 

 

 

 

 

11 00 10 0101 01 1 0 0 1 0 1 00 11 00 11 01 00 11 11 00 11 00 11 00 11 00 Cache 0 1 1 0 1 0 1 0 Way 2 Way 1 Memory Figure 1.11: In an m-way set-associative cache, memory locations which are located a multiple of m1 th of the cache size apart can be mapped to either of m cache lines (here shown for m = 2). equal in size, so-called ways. The number of ways m is the number of different cache lines a memory address can be mapped to (see Figure 1.11 for an example of a two-way set-associative

CONCURRENCY IN PROGRAMMING LANGUAGES Matthew J. Sottile, Timothy G. Mattson, and Craig E Rasmussen INTRODUCTION TO SCHEDULING Yves Robert and Frédéric Vivien SCIENTIFIC DATA MANAGEMENT: CHALLENGES, TECHNOLOGY, AND DEPLOYMENT Edited by Arie Shoshani and Doron Rotem INTRODUCTION TO THE SIMULATION OF DYNAMICS USING SIMULINK® Michael A. Gray INTRODUCTION TO HIGH PERFORMANCE COMPUTING FOR SCIENTISTS AND ENGINEERS, Georg Hager and Gerhard Wellein K10600_FM.indd 2 6/1/10 11:51:56 AM Introduction to

timing routines should be interpreted with some care. The most frequent mistake with code timings occurs when the time periods to be measured are in the same order of magnitude as the timer resolution, i.e., the minimum possible interval that can be resolved. 2.2 Common sense optimizations Very simple code changes can often lead to a significant performance boost. The most important “common sense” guidelines regarding the avoidance of performance pitfalls are summarized in the following

twice to obtain the current entries from a and b. STL may define this operator in the following way (adapted from the GNU ISO C++ library source): 1 2 const T& operator[](size_t __n) const { return *(this->_M_impl._M_start + __n); } Although this looks simple enough to be inlined efficiently, current compilers refuse to apply SIMD vectorization to the summation loop above. A single layer of abstraction, in this case an overloaded index operator, can thus prevent the creation of optimal loop

lower triangular matrix with a vector: 165 166 1 2 3 4 5 6 7 8 9 Introduction to High Performance Computing for Scientists and Engineers do k=1,NITER !$OMP PARALLEL DO SCHEDULE(RUNTIME) do row=1,N do col=1,row C(row) = C(row) + A(col,row) * B(col) enddo enddo !$OMP END PARALLEL DO enddo (Note that privatizing the inner loop variable is not required here because this is automatic in Fortran, but not in C/C++.) If static scheduling is used, this problem obviously suffers from severe load

Download sample

Download