SystemC P1666 list for Technical Review: Re: [tlmwg] Revisit of

From: Robert Guenzel <robert.guenzel@greensocs.com>
Date: Wed Jan 05 2011 - 05:03:56 PST

Hi Yossi,

in a way you seem to contradicting yourself.
You say that there is no rule forcing your bus to behave
deterministically. Fair enough. But then please do accept
the non-deterministic behavior of your simulation
(i.e. dependence on the random execution order of
simulation processes).

And of course a delay of a pico second or a cylce
or whatever can change the timing of your simulation
by 100%, especially when you use non-deterministic
models.

You say:
> T answers as fast as it can, it does not have to consider the fact that
> allowing the read response to overtake the outstanding write response
> will just double the time (this is an artifact of the rules and not the
> application).
I agree that it is a result of the protocol in use (the BP). The
response channels are shared between
reads and writes in the BP. Allowing a response to overtake another
means accepting
that the overtaken response has to wait now. A user of the BP is (or
should be)
aware of that. If that is not acceptable for your case (e.g. if you try
to model independent
read and write response channels like in AXI) then the BP is not
suitable. You will have to define
another protocol. Or use separate sockets for reads and writes (which
then use the BP) and make the
routing within your bus command dependent.

You claim the rules are broken, but they are not. They just do not fit
your expectation of
fully independent read and write channels within a single socket.

Also you say
> Writing applications
> with AT is hard enough without considering all the time the artificial
> effects of the TLM2.0 rules.
and I disagree. It is a totally natural thing that when writing an AT-BP
application
means considering all the rules and all effects of the protocol in use
all the time.
And frankly, I do not understand why you call those effects artificial.
They result
out of the fact that TLM-2.0 allows temporal decoupling on a discrete
event simulator,
such that execution order and timing order (can) diverge. And then the
question is what
order dictates the rules. For TLM-2.0 it is execution order. And it has
to be execution order
because time order is unspecified in case of pseudo-simultaneous activities.
And then it is pretty natural that a change of execution order has
significant effects.
And since waiting can effect execution order, changes in timing (i.e.
waiting) can significantly
effect the overall timing as well. Using the BP means accepting that.

As Jakob said: You cannot expect the simulation to behave identically
when meddling with
changes of execution and or time order.

Let's take the classical deadlock in discrete event simulation (DES):

void proc1()
{
event.notify();
}

void proc2()
{
wait(event);
do_stuff();
}

it now depends on whether the simulator executes proc1 or proc2 first.
If it takes proc2 you're fine. If it takes proc1 you deadlock.
If you wanna avoid that you need to make your model behave correctly no
matter which proc executes first. You have to accept that. It's a DES.
And the same applies to TLM-2.0. Your model has to be written in a way
such that it behaves like you want no matter what happens first.

Summary: I still don't think the BP is broken.

best regards
Robert

Veller, Yossi wrote:
> Hi Robert,
>
> Thanks for your correction to the example. However we both agree that
> the mistake does not change much.
>
> In TLM2.0 we are not on solid ground because ASFAIK there are no claims
> about the abilities of the TLM2.0 rules.
> However I assumed that an implicit claim specifies that, unless it as a
> part of the application requirements, the rules will not cause an
> arbitrary change in the order of the events to change the timing by 100%
> (can be more for examples with more initiators). It also should be true
> that a delay of a cycle or a Pico second should not change the timing by
> 100% due to the rules.
>
> My target and bus try to finish the transfers as fast as possible and
> still the timing results vary because of the rules and events order.
> T answers as fast as it can, it does not have to consider the fact that
> allowing the read response to overtake the outstanding write response
> will just double the time (this is an artifact of the rules and not the
> application).
> There is no rule that forces a bus to work deterministically, and if
> there was one (e.g. by waiting a cycle as you proposed and choosing
> always the same request), I could delay one BEGIN_REQ by a cycle, and
> show that just a cycle delay causes the same misbehavior.
>
> What you presented as the purpose of the simulation is just, in this
> case, concentrating on how to achieve some workaround to the TLM2.0
> rules and not to achieve the application results. Writing applications
> with AT is hard enough without considering all the time the artificial
> effects of the TLM2.0 rules.
>
> Unless you want to construe rules that will make such scenarios as I
> presented impossible (which I suppose that you can't or that they will
> be inacceptable), I see the current base protocol as broken.
>
> Regards
> Yossi
>
>
>> -----Original Message-----
>> From: tlmwg@lists.systemc.org [mailto:tlmwg@lists.systemc.org] On
>>
> Behalf Of
>
>> Robert Guenzel
>> Sent: Wednesday, January 05, 2011 10:54 AM
>> To: tlmwg@lists.systemc.org
>> Cc: systemc-p1666-technical@eda.org
>> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
>>
>> My comments are in-line below
>>
>> Veller, Yossi wrote:
>>
>>> I tried to have the shortest example and maybe it was a mistake. So
>>> let me try again:
>>>
>>> Let us look at two initiators I1 and I2 connected to a bus B and
>>>
> target T.
>
>>> I1 writes to T a burst whose data transfer should take 310 NS and
>>>
> the
>
>>> time END_REQ->BEGIN_RESP should take 10 NS.
>>>
>>> I2 reads from T a burst whose data transfer should take 311 NS and
>>>
> the
>
>>> time END_REQ->BEGIN_RESP should take 9 NS
>>>
>>> Both start at the same time (t1).
>>>
>>> Let us denote the generic payload (GP) from I1 GP1 and the GP from
>>>
> I2 GP2
>
>>> t= t1 I1 sends GP1(BEGIN_REQ) to B
>>>
>>> B passes the GP1(BEGIN_REQ) to T
>>>
>>> T computes that the written data takes 310 NS (because of rule
>>>
> 16.2.6
>
>>> b) and waits.
>>>
>>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because of the
>>> BEGIN_REQ rule 16.2.6 e).
>>>
>>> t= t1+310 NS T sends GP1(END_REQ) and B passes it to I1 then B takes
>>> GP2(BEGIN_REQ) from the PEQ and calls T.
>>>
>>> T returns TLM_UPDATED and changes the phase to END_REQ and B sends
>>> GP2(END_REQ) to I2.
>>>
>>> t= t1+319 NS T sends GP2(BEGIN_RESP) and B passes it to I2.
>>>
>>> I2 computes that the read data takes 311 NS (because of rule 16.2.6
>>>
> c)
>
>>> and waits.
>>>
>>> t= t1+320 NS T sends GP1(BEGIN_RESP) and B pushes it into the PEQ
>>> (because of the BEGIN_RESP rule16.2.6 f).
>>>
>>>
>> No. T does not send BEGIN_RESP here. It can't. Due to the END_RESP
>>
> rule.
>
>>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T (and the
>>>
> read
>
>>> finishes)
>>>
>>> B sends the GP1(BEG_RESP) to I1 which replies with TLM_COMPLETED
>>>
>>> B sends the GP1(END_RESP) to T (and the write finishes)
>>>
>>>
>> After getting the END_RESP with GP2, T will now send BEGIN_RESP which
>>
> will
>
>> finish with TLM_COMPLETED.
>> That doesn't change much, but the example should not violate the rules
>> it questions.
>>
>>> Here I arranged the time so that your points about the behavior of T
>>> do not hold (It now has only one alternative). We've got to the same
>>> result. The added delays were forced by the TLM rules and not by any
>>> of the models.
>>>
>>>
>> And it seems perfectly correct to me. Read and writes share the same
>> request and response channels.
>> T allows the read response to overtake the outstanding write response.
>> By that it accepts that now the write can
>> only finish when the read response has finished. If that is not
>>
> desired
>
>> T will have to schedule the read response
>> behind the write response. that will increase the END_REQ->BEGIN_RESP
>> delay from the desired 9ns
>> to (at least) 10 ns (plus a potential BEGIN_RESP->END_RESP delay in
>>
> the
>
>> write),
>> but that is the whole purpose of the simulation, right? If everything
>> was independent
>> from everything else, well then you can simply calculate how long
>>
> things
>
>> will take and you do not need
>> to simulate anything.
>>
>>> However if the simulator chooses to schedule I2 before of I1 the
>>>
> whole
>
>>> operation will take only 320 NS.
>>>
>>>
>> And in my opinion that is a problem of B. Two things happen at the
>>
> same
>
>> time in a simulation.
>> Apparently B is using some kind of combinatorial arbitration because
>>
> it
>
>> directly forwards
>> an incoming BEGIN_REQ to T. When doing that it should employ a
>>
> mechanism
>
>> that allows it
>> to make deterministic choices. It could use some kind of delta cycle
>> waiting or whatever to make
>> sure that the BEGIN_REQs from I1 and I2 both have arrived and it can
>> then forward the correct one.
>> In this case the simulation order of I1 and I2 will not effect the
>> resulting communication order anymore.
>>
>> If such a mechanism is not desired at AT, because it effects
>>
> simulation
>
>> speed, then you must accept that
>> your AT model is execution order dependent.
>>
>>> So do you agree that the rules are broken?
>>>
>>>
>> No (not yet). And I try to explain my view below.
>>
>>> Besides this: I don't agree with your points about the broken
>>>
> target,
>
>>> there is no rule that forces T to respond with TLM_UPDATED and not
>>> TLM_ACCEPTED even though the first is more efficient. The TLM rules
>>> should enable deterministic timing regardless of such choices of the
>>> target (or initiator or bus) behavior as long as their communication
>>> complies with the rule and they try to achieve their separate
>>>
> timing.
>
>> And I disagree. In my opinion the choices should have an effect on
>>
> timing.
>
>> If I (a target) return TLM_ACCEPTED to BEGIN_REQ I disallow a
>>
> subsequent
>
>> BEGIN_REQ at the same time,
>> if I return TLM_UPDATED with END_REQ I allow it (even if I increase
>>
> the
>
>> timing annotation).
>> If I return TLM_UPDATED with BEGIN_RESP I also allow a subsequent
>> BEGIN_REQ, but
>> I also make clear that the response cannot be overtaken by any other
>> response.
>> If I return TLM_COMPLETED I disallow any other delays. The initiator
>>
> has
>
>> no chance of
>> adding some BEGIN_RESP->END_RESP delay.
>>
>> Similar things apply to the choice if I skip END_REQ or not.
>>
>> It is obvious that those choices control what a connected bus or
>> initiator can do to me.
>> That is the whole purpose of flow control. If the choices would not
>> effect timing
>> than the question would be why there are choices at all.
>>
>> To explain that a little more:
>> Let's say we have
>> nb_trans_fw(gp1, BEGIN_REQ, t)
>> and I wanna send an END_REQ in 10 ns
>>
>> now I can do
>> option 1:
>> return TLM_ACCEPTED; wait(10 ns); send nb_trans_bw(gp1, END_REQ, 0s)
>> option 2:
>> t+=10ns;
>> ph=END_REQ;
>> return TLM_UPDATED;
>>
>>
>> With option 1 I can be sure NOT to get a BEGIN_REQ within the next
>>
> 10ns
>
>> because
>> I did not provide an END_REQ.
>> With option 2 I can get a BEGIN_REQ in the next 10 ns, because I
>>
> already
>
>> gave the END_REQ
>> (although it becomes "effective" only 10 ns in the future).
>>
>> So depending on that choice I can influence the timing of subsequent
>> incoming BEGIN_REQs.
>> The reason is that the rules are based on call order alone and not on
>> their timing.
>> IF you wanted to enforce that timings are the same for subsequent
>> BEGIN_REQs in both cases (TLM_ACCEPTED or TLM_UPDATED)
>> you would either have to say that the rules are based on timing, or
>>
> that
>
>> you disallow timing annotations
>> at AT.
>> IF you base the rules on timing you enforce PEQs everywhere which is
>> (from a sim speed PoV) equivalent to disallowing timing
>> annotations.
>>
>> regards
>> Robert
>>
>>> Regards
>>>
>>> Yossi
>>>
>>>
>>>> -----Original Message-----
>>>>
>>>> From: Robert Guenzel [mailto:robert.guenzel@greensocs.com]
>>>>
>>>> Sent: Tuesday, January 04, 2011 3:07 PM
>>>>
>>>> To: Veller, Yossi
>>>>
>>>> Cc: systemc-p1666-technical@eda.org; tlmwg@lists.systemc.org
>>>>
>>>> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
>>>>
>>>> IMHO the scenario you describe does not show that the rules are
>>>>
> broken.
>
>>>> It's the target that might be broken.
>>>>
>>>> Or it's me who didn't fully understand the problem.
>>>>
>>>> I'll try to explain me view:
>>>>
>>>> T sends GP1(END_REQ) at t1+320ns. By doing that it indicates
>>>>
>>>> that it wants to insert a delay between END_REQ and BEGIN_RESP
>>>>
>>>> (otherwise it would directly do GP1(BEGIN_RESP) )
>>>>
>>>> The target knows what it did. Now it gets GP2(BEGIN_REQ).
>>>>
>>>> If the target now returns UPDATED(BEGIN_RESP) it _knows_
>>>>
>>>> that it now moves the RESP of GP1 _behind_ the RESP of GP2
>>>>
>>>> (because it now assigns the RESP channel to GP2).
>>>>
>>>> If it does not want to do that it can either return TLM_ACCEPTED
>>>>
>>>> and finish GP1 first (by sending BEGIN_RESP), or return
>>>>
>>> TLM_UPDATED(END_REQ)
>>>
>>>
>>>> and finish GP1 afterwards as well.
>>>>
>>>> You say:
>>>>
>>>>> The TLM2.0 base protocols rules created a scenario where two
>>>>>
>>>>> concurrent reads and writes of 320 NS took both 640 NS (this
>>>>>
> should be
>
>>>>> impossible).
>>>>>
>>>> There is no BP rule that forces T to do what you describe.
>>>>
>>>> And I do not think that it should be impossible. If the target
>>>>
> wants
>
>>>> to establish such a link between GP1 and GP2 it should be allowed
>>>>
>>>> to do so. But it can decide to avoid/reduce the link like that:
>>>>
>>>> t= t1 I1 sends GP1(BEGIN_REQ) to B
>>>>
>>>> B passes the GP1(BEGIN_REQ) to T
>>>>
>>>> T computes that the written data takes 320 NS (because of rule
>>>>
> 16.2.6 b)
>
>>>> and waits.
>>>>
>>>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because of the
>>>>
>>>> BEGIN_REQ rule 16.2.6 e).
>>>>
>>>> t= t1+320 NS T sends GP1(BEGIN_RESP) and B passes it to I1
>>>>
>>>> I1 returns TLM_COMPLETED (seen both by B and T). GP1 is DONE.
>>>>
>>>> Then B takes GP2(BEGIN_REQ) from the PEQ and calls T.
>>>>
>>>> T returns TLM_UPDATED and changes the phase to BEG_RESP and B send
>>>>
>>>> GP2(BEG_RESP) to I2.
>>>>
>>>> I2 computes that the read data takes 320 NS (because of rule
>>>>
> 16.2.6 c)
>
>>>> and waits.
>>>>
>>>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T. GP2 is
>>>>
> DONE.
>
>>>> Now at the target side both GP1 and GP2 last 320 ns. For I2 GP2
>>>>
> lasts
>
>>>> 640 ns but that is due to
>>>>
>>>> the fact that GP1 and GP2 are scheduled by B.
>>>>
>>>> I agree that there is the problem of preemption and interleaving,
>>>>
>>>> but the question is if that is something that AT abstracts away or
>>>>
> not?
>
>>>> I must confess that I am not sure about the answer to that
>>>>
> question.
>
>>>> Due to the BEGIN_REQ rule preemption is not possible with another
>>>>
>>>> BEGIN_REQ.
>>>>
>>>> But how about ignorable phases TRY_PREEMPT?
>>>>
>>>> After GP1(BEGIN_REQ) an initiator/interconnect can try a
>>>>
> preemption by
>
>>>> sending GP1(TRY_PREEMPT) and if it gets back TLM_COMPLETED
>>>>
>>>> the preemption was successful if it gets back TLM_ACCEPTED the
>>>>
> target
>
>>>> is unable to preempt GP1. Should be BP compatible.
>>>>
>>>> For interleaving I'd say that is something that is abstracted away
>>>>
> at
>
>>>> the AT level.
>>>>
>>>> best regards
>>>>
>>>> Robert
>>>>
>>>> Veller, Yossi wrote:
>>>>
>>>>> During the Review of the Draft LRM I revisited the TLM2.0 as the
>>>>>
> new
>
>>>>> addition to the standard and found out the following:
>>>>>
>>>>> Let us look at two initiators I1 and I2 connected to a bus B and
>>>>>
>>> target T.
>>>
>>>
>>>>> I1 writes to T a burst whose data transfer should take 320 NS
>>>>>
> and
>
>>>>> I2 reads from T a burst whose data transfer should take 320 NS
>>>>>
> at the
>
>>>>> same time (t1).
>>>>>
>>>>> Let us denote the generic payload (GP) from I1 GP1 and the GP
>>>>>
> from
>
>>> I2 GP2
>>>
>>>
>>>>> t= t1 I1 sends GP1(BEGIN_REQ) to B
>>>>>
>>>>> B passes the GP1(BEGIN_REQ) to T
>>>>>
>>>>> T computes that the written data takes 320 NS (because of rule
>>>>>
> 16.2.6
>
>>>>> b) and waits.
>>>>>
>>>>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because of
>>>>>
> the
>
>>>>> BEGIN_REQ rule 16.2.6 e).
>>>>>
>>>>> t= t1+320 NS T sends GP1(END_REQ) and B passes it to I1 then B
>>>>>
> takes
>
>>>>> GP2(BEGIN_REQ) from the PEQ and calls T.
>>>>>
>>>>> T returns TLM_UPDATED and changes the phase to BEG_RESP and B
>>>>>
> send
>
>>>>> GP2(BEG_RESP) to I2.
>>>>>
>>>>> I2 computes that the read data takes 320 NS (because of rule
>>>>>
> 16.2.6 c)
>
>>>>> and waits.
>>>>>
>>>>> T sends GP1(BEG_RESP) and B pushes it into the PEQ (because of
>>>>>
> the
>
>>>>> BEGIN_RESP rule16.2.6 f).
>>>>>
>>>>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T (and
>>>>>
> the
>
>>>>> write finishes)
>>>>>
>>>>> B sends the GP1(BEG_RESP) to I1 which replies with TLM_COMPLETED
>>>>>
>>>>> B sends the GP1(END_RESP) to T (and the read finishes)**
>>>>>
>>>>> The TLM2.0 base protocols rules created a scenario where two
>>>>>
>>>>> concurrent reads and writes of 320 NS took both 640 NS (this
>>>>>
> should be
>
>>>>> impossible).
>>>>>
>>>>> Other scenarios show that relying on simulation order both read
>>>>>
> and
>
>>>>> write can finish after 320 NS or that either one can finish
>>>>>
> after 320
>
>>>>> NS and the other one after 640 NS. All the above happens because
>>>>>
> there
>
>>>>> is an artificial linkage between the request stage of the read
>>>>>
> and the
>
>>>>> data phase of the write (they use the same phases) and another
>>>>>
>>>>> artificial linkage between the acknowledge stage of the write
>>>>>
> and the
>
>>>>> data phase of the read.
>>>>>
>>>>> IMHO these scenarios show that the rules are broken and have to
>>>>>
> be
>
>>> fixed.
>>>
>>>
>>>>> Moreover these rules don't support the following:
>>>>>
>>>>> 1. Preemption where a higher priority transaction can abort a
>>>>>
> lower
>
>>>>> priority long burst of data in order to transmit its data and
>>>>>
> let the
>
>>>>> aborted transaction continue.
>>>>>
>>>>> 2. Interleaving of the data of a slow burst with data of other
>>>>>
> bursts.
>
>>>>> The fix cab be the following:
>>>>>
>>>>> 1. Specify that the BEGIN_REQ -> END_REQ stage is the address
>>>>>
> passing
>
>>>>> stage.
>>>>>
>>>>> 2. Retain the BEGIN_REQ rule (16.2.6 e) that allows a target to
>>>>>
> slow
>
>>>>> down upstream components.
>>>>>
>>>>> 3. Specify that the END_REQ -> BEGIN_RESP stage is the write
>>>>>
> data
>
>>>>> passing stage.
>>>>>
>>>>> 4. Remove the BEGIN_RESP (rule16.2.6 f). An initiator should not
>>>>>
>>>>> anyways issue too many outstanding requests that it can't
>>>>>
> handle.
>
>>>>> This fix will also support Preemption and Interleaving.
>>>>>
>>>>> Regards
>>>>>
>>>>> Yossi
>>>>>
>
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Received on Wed Jan 5 05:04:33 2011

This archive was generated by hypermail 2.1.8 : Wed Jan 05 2011 - 05:04:36 PST