Distributed vs parallel computing

Question Detail: 

I often hear people talking about parallel computing and distributed computing, but I'm under the impression that there is no clear boundary between the 2, and people tend to confuse that pretty easily, while I believe it is very different:

  • Parallel computing is more tightly coupled to multi-threading, or how to make full use of a single CPU.
  • Distributed computing refers to the notion of divide and conquer, executing sub-tasks on different machines and then merging the results.

However, since we stepped into the Big Data era, it seems the distinction is indeed melting, and most systems today use a combination of parallel and distributed computing.

An example I use in my day-to-day job is Hadoop with the Map/Reduce paradigm, a clearly distributed system with workers executing tasks on different machines, but also taking full advantage of each machine with some parallel computing.

I would like to get some advice to understand how exactly to make the distinction in today's world, and if we can still talk about parallel computing or there is no longer a clear distinction. To me it seems distributed computing has grown a lot over the past years, while parallel computing seems to stagnate, which could probably explain why I hear much more talking about distributing computations than parallelizing.

Asked By : Charles Menguy
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/1580

Answered By : Gilles

This is partly a matter of terminology, and as such, only requires that you and the person you're talking to clarify it beforehand. However, there are different topics that are more strongly associated with parallelism, concurrency, or distributed systems.

Parallelism is generally concerned with accomplishing a particular computation as fast as possible, exploiting multiple processors. The scale of the processors may range from multiple arithmetical units inside a single processor, to multiple processors sharing memory, to distributing the computation on many computers. On the side of models of computation, parallelism is generally about using multiple simultaneous threads of computation internally, in order to compute a final result. Parallelism is also sometimes used for real-time reactive systems, which contain many processors that share a single master clock; such systems are fully deterministic.

Concurrency is the study of computations with multiple threads of computation. Concurrency tends to come from the architecture of the software rather than from the architecture of the hardware. Software may be written to use concurrency in order to exploit hardware parallelism, but often the need is inherent in the software's behavior, to react to different asynchronous events (e.g. a computation thread that works independently of a user interface thread, or a program that reacts to hardware interrupts by switching to an interrupt handler thread).

Distributed computing studies separate processors connected by communication links. Whereas parallel processing models often (but not always) assume shared memory, distributed systems rely fundamentally on message passing. Distributed systems are inherently concurrent. Like concurrency, distribution is often part of the goal, not solely part of the solution: if resources are in geographically distinct locations, the system is inherently distributed. Systems in which partial failures (of processor nodes or of communication links) are possible fall under this domain.

No comments

Powered by Blogger.