TPL Dataflow is an in-process actor library on top of the Task Parallel Library enabling more robust concurrent programming.
Parallel computing is a form of computation in which multiple operations are carried out simultaneously.Parallel computing is closely related to asynchronous programming, using many of the same core concepts and support. Asynchronous programming is an approach to writing code that involves invoking operations such that they don’t block the current thread of execution.Many personal computers and workstations have two or four or 8 cores (that is, CPUs) that enable multiple threads to be executed simultaneously. Computers in the near future are expected to have significantly more cores. To take advantage of the hardware of today and tomorrow, you can parallelize your code to distribute work across multiple processors. In the past, parallelization required low-level manipulation of threads and locks.
The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
Data parallelism refers to scenarios in which the same operation is performed concurrently (that is, in parallel) on elements in a source collection or array. In data parallel operations, the source collection is partitioned so that multiple threads can operate on different segments concurrently.
The Task Parallel Library (TPL) is based on the concept of a task, which represents an asynchronous operation. In some ways, a task resembles a thread or ThreadPool work item, but at a higher level of abstraction. The term task parallelism refers to one or more independent tasks running concurrently. Tasks provide two primary benefits:More efficient and more scalable use of system resources & More programmatic control than is possible with a thread or work item.
The concurrency models we has discussed so far have the notion of shared state (data) in common.Shared state can be accessed by multiple threads at the same time and must be thus protected, either by locking or by using transactions. Both, mutability and sharing of state are not just inherent for these models, they are also inherent for the complexities.Unfortunately, programmers have found it very difficult to reliably build robust multi-threaded applications using the shared data and locks model, especially as applications grow in size and complexity.Making things worse, testing is not reliable with multi-threaded code. Since threads are non-deterministic, you might successfully test a program one thousand times, yet still the program could go wrong the first time it runs on a customer’s machine.
We now have a look at an entirely different approach that bans the notion of shared state altogether. State is still mutable, however it is exclusively coupled to single entities that are allowed to alter it, so-called actors.The actor model in computer science is a mathematical model of concurrent computation that treats “actors” as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.For communication, the actor model uses asynchronous message passing. In particular, it does not use any intermediate entities such as channels. Instead, each actor possesses a mailbox and can be addressed. These addresses are not to be confused with identities, and each actor can have no, one or multiple addresses. When an actor sends a message, it must know the address of the recipient. In addition, actors are allowed to send messages to themselves, which they will receive and handle later in a future step.
The Task Parallel Library (TPL) provides dataflow components to help increase the robustness of concurrency-enabled applications. These dataflow components are collectively referred to as the TPL Dataflow Library. This dataflow model promotes actor-based programming by providing in-process message passing for coarse-grained dataflow and pipelining tasks.The TPL Dataflow Library provides a foundation for message passing and parallelizing CPU-intensive and I/O-intensive applications that have high throughput and low latency. It also gives you explicit control over how data is buffered and moves around the system.
If you want to scale your application beyond single machine or process then ServiceBus (NServiceBus, Microsoft Azure, etc) are the best candidates. These are designed around message oriented architecture and you can achieve highly reliable, available and scalable application. As an architect i always focus on reliability, after all, a highly available and scalable service that produces unreliable results isn’t very valuable. – We will need another post to cover the in-depth of service bus..
TPL Dataflow (TDF) is a library for building concurrent applications. It promotes actor/agent-oriented designs through primitives for in-process message passing, dataflow, and pipelining. I have been playing with dataflow since its CTP was released and i found its very use in cases where you have to process data in form of a pipeline.With just few in-build blocks you can easily and quickly build concurrent app..
The primitive blocks provided by dataflow are
- Buffering Blocks – Holds data for use by data consumers.
- BufferBock(T) – FIFO queue of message that can be written to multiple sources or read from by multiple targets.
- BroadcastBlock(T) – Used when you must pass multiple messages to another component.
- WriteOnceBlock(T) – similar to broadcastblock except object can be written to one time only
- Execution Block – call a user provided delegate for each piece of received data
- ActionBlock(t) – calls a delegate when it receives a data – excepts synchronous or asynchronous delegates
- TransformBlock(Tinout,TOutput) – call function delegates to transform the incoming message to another type- excepts synchronous or asynchronous delegates
- TransformManyBlock(TInput , TOutput) – similar to TransformBlock except it can produce zero or more output values for each input value, instead of only one output value for each input value. – excepts synchronous or asynchronous delegates
Degree of Parallelism
Every ActionBlock<TInput>, TransformBlock<TInput, TOutput>, and TransformManyBlock<TInput, TOutput> object buffers input messages until the block is ready to process them. By default, these classes process messages in the order in which they are received, one message at a time. You can also specify the degree of parallelism to enable ActionBlock<TInput>, TransformBlock<TInput, TOutput> and TransformManyBlock<TInput, TOutput> objects to process multiple messages concurrently.
Now lets look at the implementation details of a web crawler
/ Save ParseLink
The messages to download a Url is received in the request buffer which is downloaded by a TranformBlock and converted into type safe page type message. The page message is then broadcasted using Broadcast block, which is further received by save ActionBlock and ParseLink Block. The parse link block parses the urls in the page and if they belong to same page it will raise an event for each url. The consumer of engine will receive the url and if its a new URL it will be posted back to engine.. The save action block will save the page to disk.
This is very basic example but you can see the message based approach is much more simpler than a shared resource + threading approach.
The TPL dataflow is good in case you don’t want to scale the solution beyond single machine as it offer a in process message base approach.With a proper service bus like NServiceBus you can scale out the solution beyond single machine and multiple servers could process the request to achieve the high throughput and off-course with easy to build,maintain clean code base.
In production there are more things you have to take care like logging, error handling, transactions, unexpected failure recovery, dependency injection,loosely coupled components, extensibility, scalability and so on.. Frameworks like NServiceBus provides all these features alone with API to handle more complex business problems.
Download Code : Here (Partially finished but working POC)