| HyperStreaming
Architecture |
| Multiple
Streams with Pipelining and Concurrent Execution
(I) |
Server
applications consist of multiple threads or processes
that can be executed in parallel. On-line transaction
processing and Web services and server applications
have an abundance of software threads that can be
executed simultaneously for better performance.
Other desktop applications are also becoming increasingly
parallel. Thus, todays applications tend to be programmed
with the technique of multiple threads. These threads
might be from the same applications, from different
applications running simultaneously, from operating
system services, or from
operating system threads doing background maintenance.
From the software perspective, operating systems
and user programs can schedule processes or threads
to available processing units. From the CPU micro
architectures point of view, many advanced techniques
are implemented in current CPU to increase parallelism
such as super-pipelining, branch prediction, super-scalar
execution, out-of-order execution or even Hyper-Threading
techniques. Thus, how to make the full path from
CPU to devices parallel have become increasingly
important. HyperStreaming architecture provides
parallel architecture inside North-Bridge, link
between North-Bridge and South-Bridge and inside
embedded device controllers. It makes the applications
run smoothly and in parallel without any bottleneck.
Pipelining is an implementation
technique whereby multiple commands are overlapped
in transferring between the CPU and devices. A
pipeline is like an assembly line. In an automobile
assembly line, there are many steps, each
contributing something to the construction of
the car. Each step operates in parallel with the
other steps, though on a different car. In a chipset
pipeline, each step in the pipeline completes
a part of a command. Like the assembly line, different
steps are completing different parts of different
commands in parallel. Each of these steps is called
a pipe stage or a pipe segment. The stages are
connected one to the next to form a pipe. The
behavior is just like cars in an assembly line.
It exploits parallelism among the commands in
multiple streams.
In order to implement pipelining
feature into the HyperStreaming architecture,
split transaction technique is necessary to be
applied first. The read transaction is broken
into a read request transaction phase that contains
the address, and a response transaction phase
that contains the data. This split transaction
makes the buses and channels available for other
transactions while the device reads the words
from the requested address. This is accomplished
by relinquishing control of the bus during the
waiting time between the request and the response. |
| In
our system, we provide a split transaction function
on transferring commands. Transactions that require
a response are split into two independent sub-transactions,
a request transaction and a response transaction,
as shown in Fig. 3. Other transactions are allowed
to intervene between them so that chipset resources
can be used while the response to the original request
is being generated. Buffering is used inside the
chipset to allow multiple transactions to be outstanding
while waiting for responses from the controllers.
Combining the split
transaction technique with pipelining utilizes
the chipset more efficiently and hence more commands
can share the resources, as shown in Fig. 4. After
split transaction, the request parts of NP transactions
N through N+3 can be processed one after the other
and response parts of NP transactions N through
N+3 can also be processed in the same way. Thus,
the total processing time of these four transactions
is greatly reduced. The fluent of pipelining depends
greatly on how many commands can be issued in
the same time. The number of commands can be issued
in the same time depend on flow control and ordering
relaxed. Flow control and ordering are maintained
efficiently in HyperStreaming thus we can explore
parallelism of both split transaction and pipelining
techniques. |
Commands
issued from different devices provide massive parallelism
due to independence of streams being issued. Traditional
PCs will view these streams as dependent on each
other, each one waiting for its predecessor to be
completed. Such implicit parallelism is not fully
exploited and accesses from different devices have
to be forwarded and executed in order. In HyperStreaming
architecture, however, they are viewed as independent
streams and can be forwarded and executed in parallel.
As shown in Fig. 5, NP transactions N and N+1 can
be processed concurrently if there is no dependence
between them. PT transactions N and N+1 can also
be processed concurrently. As discussed above, NP
transactions N and N+1 can be pipelined with NP
transactions N+2 and N+3 respectively because they
depend on their previous commands. Combined with
the previously mentioned split transaction and pipelining
techniques, concurrent execution in HyperStreaming
architecture exploits massive parallelism from all
possible sources. |
|
|