Shabtay’s take 04/28 - Resolved subject
to review of the reference implementation.
Committee comment: Non-blocking interface has been
modified to resolve this.
Per>
I have enclosed an example implementation of a version of dpi_pipe_c_send() that can handle messages of unlimited size. By unlimited I refer to messages that are larger than the current pipe buffer size. For now I called the function dpi_pipe_c_send_nl(). `nl' is for `no limit'. It has the exact same prototype as dpi_pipe_c_send() and can thus be used as a direct replacement. Internally, I chose to use John's dpi_pipe_c_send() to transfer the individual segments of the message. The code also shows some possible optimizations when one can exploit that the segment size is an integer number of svBitVecVal words. This happens when number of elements in the segment multiplied by number of bytes per segment is divisible by 4 (4 bytes per svBitVecVal word). It turns out that if the pipe depth is 4 elements or more one can always find such a nice segment size. When the segment size has this property, copying and extracting bit fields out of the original svBitVecVal data buffer can be avoided.
The code for dpi_pipe_c_receive_nl() would follow a similar pattern. I can create this if there is any interest.
Note, that my code demonstrates that the functionality I was calling for can be easily created at the application layer. Even though this is the case, I believe the interface should support this at least for the blocking API since it is fairly straightforward and it makes the interface more intuitive. This is also better aligned to the goal of handling variable length messages.
I can live with the non-blocking API having messages limited to the buffer size, although it would be preferable to have it support segmentation/reassembly of messages as well.
I believe unlimited message size should only be supported on the software side. The reason is that on the hardware side messages or message fragments are read into bit vectors passed to the pipe functions. These vectors are fixed hardware resources that must be allocated. Typically they get implemented as flip flops and are thus expensive. Typically much more expensive than the pipe buffers for the same number of bits. Therefore it is reasonable to assume that most practical uses of the pipes API will use fairly narrow data vectors (compared to average message size) and it is not unreasonable to require pipe buffers to be larger than the largest data vector used to access that particular pipe. In other words, it is not unreasonable to require that pipe buffers are always large enough to accommodate any calls from the hardware side and segmentation/reassembly is thus not required on the hardware side.
< Per
JohnS >
I took a look at this model and took a slightly different approach that I think gets around some of the tricky parts you were dealing with.
I've attached the new version (see attached file).
Basically what I do is put in a small optimization where if the bytes_per_element is a multiple of sizeof(svBitVecVal) it means that all elements are word aligned and you can do it slightly more efficiently.
>> Per
The case of pipe depth >= 4 in my code was actually more general than that. Basically when pipe depth is at least 4 the code would pick a number of elements to transfer such that the total number of bytes,
i.e., nElements * bytesPerElement is a multiple of 4, i.e., svBitVecVal aligned. Your optimization is a special case of this.
For pipe depth < 4 I briefly considered to optimize for the case when pipeDepth * bytesPerElement is a multiple of 4, but decided I already had too many special cases :-) In any event, I found it reasonable to assume that pipe depth is normally not so small so there is not much benefit to optimizing for this case.
But all of the above is moot anyway considering your proposal to add the byte offset to the try_send() function. I like this a lot and it is obvious how much it simplifies the dpi_pipe_c_send() function.
BTW, we should work on changing the names of those dpi_pipe functions . . .
<< Per
But if that is not the case, then the code deals with the data transfer at byte granularity using the svPut/GetPartselBit() helper functions.
Furthermore, I modified dpi_pipe_c_send() directly to handle unlimited buffer sizes. But it calls an internal dpi_pipe_c_send_to_fit() which is a verbatim copy of the original dpi_pipe_c_send() that I had.
Symetric changes can be done for dpi_pipe_c_receive().
I also agree with your comments about not needing to support unlimited buffer size on the HDL side due to practical considerations.
I verified that this compiled but have not yet had a chance to test it but it should be fairly easy.
I also was thinking about another approach of putting byte level support inside the non-blocking call itself. The idea is that the non-blocking call would be given a byte index into the svBitVecVal array that it gets, and return the number of bytes it successfully did transfer. This returned value could then be used by the caller to bump the byte index for the next call in the loop, without changing the base svBitVecVal[] data pointer.
This would allow for a more efficient implementation that would potentially reduce the amount of data copying required.
The new prototype for the call would look something like this:
int dpi_pipe_c_try_send(
void *pipe_handle, // input: pipe handle
int byte_offset, // input: byte offset within data array
int bytes_per_element, // input: #bytes/element
int num_elements, // input: #elements to be written
const svBitVecVal *data, // input: data
svBit eom ); // input: end-of-message marker flag
Instead of return success (1) or fail (0) it would return # elements transferred (which on a full pipe would still return a 0 as before).
The revised dpi_pipe_c_send() would then look something like this:
void dpi_pipe_c_send(
void *pipe_handle, // input: pipe handle
int bytes_per_element, // input: #bits/element
int num_elements, // input: #elements to be written
const svBitVecVal *data, // input: data
svBit eom ) // input: end-of-message marker flag
{
int byte_offset = 0, elements_sent;
while( num_elements ){
elements_sent =
dpi_pipe_c_try_send(
pipe_handle, byte_offset,
bytes_per_element, num_elements, data, eom )
* bytes_per_element;
// if( pipe is full ) wait until OK to send more
if( elements_sent == 0 ){
sc_event *ok_to_send = (sc_event *)
dpi_pipe_get_notify_context( pipe_handle );
// if( notify ok_to_send context has not yet been set up ) ...
if( ok_to_send == NULL ){
ok_to_send = new sc_event;
dpi_pipe_set_notify_callback(
pipe_handle, notify_ok_to_send_or_receive, ok_to_send );
}
wait( *ok_to_send );
}
else {
byte_offset += elements_sent * bytes_per_element;
num_elements -= elements_sent;
}
}
}
Almost as concise as the original - but now with unlimited size !
This is a little cleaner because now at the level of dpi_pipe_c_send(), you don't even care about querying buffer size. It's all handled within the NB API.
< JohnS