SystemC P1666 list for Technical Review: RE: [tlmwg] Revisit of

From: Veller, Yossi <Yossi_Veller@mentor.com>
Date: Wed Jan 05 2011 - 05:36:22 PST

Hi Robert,

I'll just have to go back to my basic claim.

We have agreed (I assume) that the example that I presented had
demonstrated that two transactions each of which should have taken 320
NS, finish BOTH after 640 NS.

On any reasonable bus protocol (or at least any that I know of) this
can't happen.
Hence IMHO the BP is not the vehicle to model any reasonable bus
protocol and should be fixed.

What do you think?

Regards
Yossi

> -----Original Message-----
> From: Robert Guenzel [mailto:robert.guenzel@greensocs.com]
> Sent: Wednesday, January 05, 2011 3:04 PM
> To: Veller, Yossi
> Cc: tlmwg@lists.systemc.org; systemc-p1666-technical@eda.org
> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
>
> Hi Yossi,
>
> in a way you seem to contradicting yourself.
> You say that there is no rule forcing your bus to behave
> deterministically. Fair enough. But then please do accept
> the non-deterministic behavior of your simulation
> (i.e. dependence on the random execution order of
> simulation processes).
>
> And of course a delay of a pico second or a cylce
> or whatever can change the timing of your simulation
> by 100%, especially when you use non-deterministic
> models.
>
> You say:
> > T answers as fast as it can, it does not have to consider the fact
that
> > allowing the read response to overtake the outstanding write
response
> > will just double the time (this is an artifact of the rules and not
the
> > application).
> I agree that it is a result of the protocol in use (the BP). The
> response channels are shared between
> reads and writes in the BP. Allowing a response to overtake another
> means accepting
> that the overtaken response has to wait now. A user of the BP is (or
> should be)
> aware of that. If that is not acceptable for your case (e.g. if you
try
> to model independent
> read and write response channels like in AXI) then the BP is not
> suitable. You will have to define
> another protocol. Or use separate sockets for reads and writes (which
> then use the BP) and make the
> routing within your bus command dependent.
>
> You claim the rules are broken, but they are not. They just do not fit
> your expectation of
> fully independent read and write channels within a single socket.
>
> Also you say
> > Writing applications
> > with AT is hard enough without considering all the time the
artificial
> > effects of the TLM2.0 rules.
> and I disagree. It is a totally natural thing that when writing an
AT-BP
> application
> means considering all the rules and all effects of the protocol in use
> all the time.
> And frankly, I do not understand why you call those effects
artificial.
> They result
> out of the fact that TLM-2.0 allows temporal decoupling on a discrete
> event simulator,
> such that execution order and timing order (can) diverge. And then the
> question is what
> order dictates the rules. For TLM-2.0 it is execution order. And it
has
> to be execution order
> because time order is unspecified in case of pseudo-simultaneous
activities.
> And then it is pretty natural that a change of execution order has
> significant effects.
> And since waiting can effect execution order, changes in timing (i.e.
> waiting) can significantly
> effect the overall timing as well. Using the BP means accepting that.
>
> As Jakob said: You cannot expect the simulation to behave identically
> when meddling with
> changes of execution and or time order.
>
> Let's take the classical deadlock in discrete event simulation (DES):
>
> void proc1()
> {
> event.notify();
> }
>
> void proc2()
> {
> wait(event);
> do_stuff();
> }
>
> it now depends on whether the simulator executes proc1 or proc2 first.
> If it takes proc2 you're fine. If it takes proc1 you deadlock.
> If you wanna avoid that you need to make your model behave correctly
no
> matter which proc executes first. You have to accept that. It's a DES.
> And the same applies to TLM-2.0. Your model has to be written in a way
> such that it behaves like you want no matter what happens first.
>
> Summary: I still don't think the BP is broken.
>
> best regards
> Robert
>
>
> Veller, Yossi wrote:
> > Hi Robert,
> >
> > Thanks for your correction to the example. However we both agree
that
> > the mistake does not change much.
> >
> > In TLM2.0 we are not on solid ground because ASFAIK there are no
claims
> > about the abilities of the TLM2.0 rules.
> > However I assumed that an implicit claim specifies that, unless it
as a
> > part of the application requirements, the rules will not cause an
> > arbitrary change in the order of the events to change the timing by
100%
> > (can be more for examples with more initiators). It also should be
true
> > that a delay of a cycle or a Pico second should not change the
timing by
> > 100% due to the rules.
> >
> > My target and bus try to finish the transfers as fast as possible
and
> > still the timing results vary because of the rules and events order.
> > T answers as fast as it can, it does not have to consider the fact
that
> > allowing the read response to overtake the outstanding write
response
> > will just double the time (this is an artifact of the rules and not
the
> > application).
> > There is no rule that forces a bus to work deterministically, and if
> > there was one (e.g. by waiting a cycle as you proposed and choosing
> > always the same request), I could delay one BEGIN_REQ by a cycle,
and
> > show that just a cycle delay causes the same misbehavior.
> >
> > What you presented as the purpose of the simulation is just, in this
> > case, concentrating on how to achieve some workaround to the TLM2.0
> > rules and not to achieve the application results. Writing
applications
> > with AT is hard enough without considering all the time the
artificial
> > effects of the TLM2.0 rules.
> >
> > Unless you want to construe rules that will make such scenarios as I
> > presented impossible (which I suppose that you can't or that they
will
> > be inacceptable), I see the current base protocol as broken.
> >
> > Regards
> > Yossi
> >
> >
> >> -----Original Message-----
> >> From: tlmwg@lists.systemc.org [mailto:tlmwg@lists.systemc.org] On
> >>
> > Behalf Of
> >
> >> Robert Guenzel
> >> Sent: Wednesday, January 05, 2011 10:54 AM
> >> To: tlmwg@lists.systemc.org
> >> Cc: systemc-p1666-technical@eda.org
> >> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
> >>
> >> My comments are in-line below
> >>
> >> Veller, Yossi wrote:
> >>
> >>> I tried to have the shortest example and maybe it was a mistake.
So
> >>> let me try again:
> >>>
> >>> Let us look at two initiators I1 and I2 connected to a bus B and
> >>>
> > target T.
> >
> >>> I1 writes to T a burst whose data transfer should take 310 NS and
> >>>
> > the
> >
> >>> time END_REQ->BEGIN_RESP should take 10 NS.
> >>>
> >>> I2 reads from T a burst whose data transfer should take 311 NS and
> >>>
> > the
> >
> >>> time END_REQ->BEGIN_RESP should take 9 NS
> >>>
> >>> Both start at the same time (t1).
> >>>
> >>> Let us denote the generic payload (GP) from I1 GP1 and the GP from
> >>>
> > I2 GP2
> >
> >>> t= t1 I1 sends GP1(BEGIN_REQ) to B
> >>>
> >>> B passes the GP1(BEGIN_REQ) to T
> >>>
> >>> T computes that the written data takes 310 NS (because of rule
> >>>
> > 16.2.6
> >
> >>> b) and waits.
> >>>
> >>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because of the
> >>> BEGIN_REQ rule 16.2.6 e).
> >>>
> >>> t= t1+310 NS T sends GP1(END_REQ) and B passes it to I1 then B
takes
> >>> GP2(BEGIN_REQ) from the PEQ and calls T.
> >>>
> >>> T returns TLM_UPDATED and changes the phase to END_REQ and B sends
> >>> GP2(END_REQ) to I2.
> >>>
> >>> t= t1+319 NS T sends GP2(BEGIN_RESP) and B passes it to I2.
> >>>
> >>> I2 computes that the read data takes 311 NS (because of rule
16.2.6
> >>>
> > c)
> >
> >>> and waits.
> >>>
> >>> t= t1+320 NS T sends GP1(BEGIN_RESP) and B pushes it into the PEQ
> >>> (because of the BEGIN_RESP rule16.2.6 f).
> >>>
> >>>
> >> No. T does not send BEGIN_RESP here. It can't. Due to the END_RESP
> >>
> > rule.
> >
> >>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T (and the
> >>>
> > read
> >
> >>> finishes)
> >>>
> >>> B sends the GP1(BEG_RESP) to I1 which replies with TLM_COMPLETED
> >>>
> >>> B sends the GP1(END_RESP) to T (and the write finishes)
> >>>
> >>>
> >> After getting the END_RESP with GP2, T will now send BEGIN_RESP
which
> >>
> > will
> >
> >> finish with TLM_COMPLETED.
> >> That doesn't change much, but the example should not violate the
rules
> >> it questions.
> >>
> >>> Here I arranged the time so that your points about the behavior of
T
> >>> do not hold (It now has only one alternative). We've got to the
same
> >>> result. The added delays were forced by the TLM rules and not by
any
> >>> of the models.
> >>>
> >>>
> >> And it seems perfectly correct to me. Read and writes share the
same
> >> request and response channels.
> >> T allows the read response to overtake the outstanding write
response.
> >> By that it accepts that now the write can
> >> only finish when the read response has finished. If that is not
> >>
> > desired
> >
> >> T will have to schedule the read response
> >> behind the write response. that will increase the
END_REQ->BEGIN_RESP
> >> delay from the desired 9ns
> >> to (at least) 10 ns (plus a potential BEGIN_RESP->END_RESP delay in
> >>
> > the
> >
> >> write),
> >> but that is the whole purpose of the simulation, right? If
everything
> >> was independent
> >> from everything else, well then you can simply calculate how long
> >>
> > things
> >
> >> will take and you do not need
> >> to simulate anything.
> >>
> >>> However if the simulator chooses to schedule I2 before of I1 the
> >>>
> > whole
> >
> >>> operation will take only 320 NS.
> >>>
> >>>
> >> And in my opinion that is a problem of B. Two things happen at the
> >>
> > same
> >
> >> time in a simulation.
> >> Apparently B is using some kind of combinatorial arbitration
because
> >>
> > it
> >
> >> directly forwards
> >> an incoming BEGIN_REQ to T. When doing that it should employ a
> >>
> > mechanism
> >
> >> that allows it
> >> to make deterministic choices. It could use some kind of delta
cycle
> >> waiting or whatever to make
> >> sure that the BEGIN_REQs from I1 and I2 both have arrived and it
can
> >> then forward the correct one.
> >> In this case the simulation order of I1 and I2 will not effect the
> >> resulting communication order anymore.
> >>
> >> If such a mechanism is not desired at AT, because it effects
> >>
> > simulation
> >
> >> speed, then you must accept that
> >> your AT model is execution order dependent.
> >>
> >>> So do you agree that the rules are broken?
> >>>
> >>>
> >> No (not yet). And I try to explain my view below.
> >>
> >>> Besides this: I don't agree with your points about the broken
> >>>
> > target,
> >
> >>> there is no rule that forces T to respond with TLM_UPDATED and not
> >>> TLM_ACCEPTED even though the first is more efficient. The TLM
rules
> >>> should enable deterministic timing regardless of such choices of
the
> >>> target (or initiator or bus) behavior as long as their
communication
> >>> complies with the rule and they try to achieve their separate
> >>>
> > timing.
> >
> >> And I disagree. In my opinion the choices should have an effect on
> >>
> > timing.
> >
> >> If I (a target) return TLM_ACCEPTED to BEGIN_REQ I disallow a
> >>
> > subsequent
> >
> >> BEGIN_REQ at the same time,
> >> if I return TLM_UPDATED with END_REQ I allow it (even if I increase
> >>
> > the
> >
> >> timing annotation).
> >> If I return TLM_UPDATED with BEGIN_RESP I also allow a subsequent
> >> BEGIN_REQ, but
> >> I also make clear that the response cannot be overtaken by any
other
> >> response.
> >> If I return TLM_COMPLETED I disallow any other delays. The
initiator
> >>
> > has
> >
> >> no chance of
> >> adding some BEGIN_RESP->END_RESP delay.
> >>
> >> Similar things apply to the choice if I skip END_REQ or not.
> >>
> >> It is obvious that those choices control what a connected bus or
> >> initiator can do to me.
> >> That is the whole purpose of flow control. If the choices would not
> >> effect timing
> >> than the question would be why there are choices at all.
> >>
> >> To explain that a little more:
> >> Let's say we have
> >> nb_trans_fw(gp1, BEGIN_REQ, t)
> >> and I wanna send an END_REQ in 10 ns
> >>
> >> now I can do
> >> option 1:
> >> return TLM_ACCEPTED; wait(10 ns); send nb_trans_bw(gp1, END_REQ,
0s)
> >> option 2:
> >> t+=10ns;
> >> ph=END_REQ;
> >> return TLM_UPDATED;
> >>
> >>
> >> With option 1 I can be sure NOT to get a BEGIN_REQ within the next
> >>
> > 10ns
> >
> >> because
> >> I did not provide an END_REQ.
> >> With option 2 I can get a BEGIN_REQ in the next 10 ns, because I
> >>
> > already
> >
> >> gave the END_REQ
> >> (although it becomes "effective" only 10 ns in the future).
> >>
> >> So depending on that choice I can influence the timing of
subsequent
> >> incoming BEGIN_REQs.
> >> The reason is that the rules are based on call order alone and not
on
> >> their timing.
> >> IF you wanted to enforce that timings are the same for subsequent
> >> BEGIN_REQs in both cases (TLM_ACCEPTED or TLM_UPDATED)
> >> you would either have to say that the rules are based on timing, or
> >>
> > that
> >
> >> you disallow timing annotations
> >> at AT.
> >> IF you base the rules on timing you enforce PEQs everywhere which
is
> >> (from a sim speed PoV) equivalent to disallowing timing
> >> annotations.
> >>
> >> regards
> >> Robert
> >>
> >>> Regards
> >>>
> >>> Yossi
> >>>
> >>>
> >>>> -----Original Message-----
> >>>>
> >>>> From: Robert Guenzel [mailto:robert.guenzel@greensocs.com]
> >>>>
> >>>> Sent: Tuesday, January 04, 2011 3:07 PM
> >>>>
> >>>> To: Veller, Yossi
> >>>>
> >>>> Cc: systemc-p1666-technical@eda.org; tlmwg@lists.systemc.org
> >>>>
> >>>> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
> >>>>
> >>>> IMHO the scenario you describe does not show that the rules are
> >>>>
> > broken.
> >
> >>>> It's the target that might be broken.
> >>>>
> >>>> Or it's me who didn't fully understand the problem.
> >>>>
> >>>> I'll try to explain me view:
> >>>>
> >>>> T sends GP1(END_REQ) at t1+320ns. By doing that it indicates
> >>>>
> >>>> that it wants to insert a delay between END_REQ and BEGIN_RESP
> >>>>
> >>>> (otherwise it would directly do GP1(BEGIN_RESP) )
> >>>>
> >>>> The target knows what it did. Now it gets GP2(BEGIN_REQ).
> >>>>
> >>>> If the target now returns UPDATED(BEGIN_RESP) it _knows_
> >>>>
> >>>> that it now moves the RESP of GP1 _behind_ the RESP of GP2
> >>>>
> >>>> (because it now assigns the RESP channel to GP2).
> >>>>
> >>>> If it does not want to do that it can either return TLM_ACCEPTED
> >>>>
> >>>> and finish GP1 first (by sending BEGIN_RESP), or return
> >>>>
> >>> TLM_UPDATED(END_REQ)
> >>>
> >>>
> >>>> and finish GP1 afterwards as well.
> >>>>
> >>>> You say:
> >>>>
> >>>>> The TLM2.0 base protocols rules created a scenario where two
> >>>>>
> >>>>> concurrent reads and writes of 320 NS took both 640 NS (this
> >>>>>
> > should be
> >
> >>>>> impossible).
> >>>>>
> >>>> There is no BP rule that forces T to do what you describe.
> >>>>
> >>>> And I do not think that it should be impossible. If the target
> >>>>
> > wants
> >
> >>>> to establish such a link between GP1 and GP2 it should be allowed
> >>>>
> >>>> to do so. But it can decide to avoid/reduce the link like that:
> >>>>
> >>>> t= t1 I1 sends GP1(BEGIN_REQ) to B
> >>>>
> >>>> B passes the GP1(BEGIN_REQ) to T
> >>>>
> >>>> T computes that the written data takes 320 NS (because of rule
> >>>>
> > 16.2.6 b)
> >
> >>>> and waits.
> >>>>
> >>>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because of
the
> >>>>
> >>>> BEGIN_REQ rule 16.2.6 e).
> >>>>
> >>>> t= t1+320 NS T sends GP1(BEGIN_RESP) and B passes it to I1
> >>>>
> >>>> I1 returns TLM_COMPLETED (seen both by B and T). GP1 is DONE.
> >>>>
> >>>> Then B takes GP2(BEGIN_REQ) from the PEQ and calls T.
> >>>>
> >>>> T returns TLM_UPDATED and changes the phase to BEG_RESP and B
send
> >>>>
> >>>> GP2(BEG_RESP) to I2.
> >>>>
> >>>> I2 computes that the read data takes 320 NS (because of rule
> >>>>
> > 16.2.6 c)
> >
> >>>> and waits.
> >>>>
> >>>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T. GP2 is
> >>>>
> > DONE.
> >
> >>>> Now at the target side both GP1 and GP2 last 320 ns. For I2 GP2
> >>>>
> > lasts
> >
> >>>> 640 ns but that is due to
> >>>>
> >>>> the fact that GP1 and GP2 are scheduled by B.
> >>>>
> >>>> I agree that there is the problem of preemption and interleaving,
> >>>>
> >>>> but the question is if that is something that AT abstracts away
or
> >>>>
> > not?
> >
> >>>> I must confess that I am not sure about the answer to that
> >>>>
> > question.
> >
> >>>> Due to the BEGIN_REQ rule preemption is not possible with another
> >>>>
> >>>> BEGIN_REQ.
> >>>>
> >>>> But how about ignorable phases TRY_PREEMPT?
> >>>>
> >>>> After GP1(BEGIN_REQ) an initiator/interconnect can try a
> >>>>
> > preemption by
> >
> >>>> sending GP1(TRY_PREEMPT) and if it gets back TLM_COMPLETED
> >>>>
> >>>> the preemption was successful if it gets back TLM_ACCEPTED the
> >>>>
> > target
> >
> >>>> is unable to preempt GP1. Should be BP compatible.
> >>>>
> >>>> For interleaving I'd say that is something that is abstracted
away
> >>>>
> > at
> >
> >>>> the AT level.
> >>>>
> >>>> best regards
> >>>>
> >>>> Robert
> >>>>
> >>>> Veller, Yossi wrote:
> >>>>
> >>>>> During the Review of the Draft LRM I revisited the TLM2.0 as the
> >>>>>
> > new
> >
> >>>>> addition to the standard and found out the following:
> >>>>>
> >>>>> Let us look at two initiators I1 and I2 connected to a bus B and
> >>>>>
> >>> target T.
> >>>
> >>>
> >>>>> I1 writes to T a burst whose data transfer should take 320 NS
> >>>>>
> > and
> >
> >>>>> I2 reads from T a burst whose data transfer should take 320 NS
> >>>>>
> > at the
> >
> >>>>> same time (t1).
> >>>>>
> >>>>> Let us denote the generic payload (GP) from I1 GP1 and the GP
> >>>>>
> > from
> >
> >>> I2 GP2
> >>>
> >>>
> >>>>> t= t1 I1 sends GP1(BEGIN_REQ) to B
> >>>>>
> >>>>> B passes the GP1(BEGIN_REQ) to T
> >>>>>
> >>>>> T computes that the written data takes 320 NS (because of rule
> >>>>>
> > 16.2.6
> >
> >>>>> b) and waits.
> >>>>>
> >>>>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because of
> >>>>>
> > the
> >
> >>>>> BEGIN_REQ rule 16.2.6 e).
> >>>>>
> >>>>> t= t1+320 NS T sends GP1(END_REQ) and B passes it to I1 then B
> >>>>>
> > takes
> >
> >>>>> GP2(BEGIN_REQ) from the PEQ and calls T.
> >>>>>
> >>>>> T returns TLM_UPDATED and changes the phase to BEG_RESP and B
> >>>>>
> > send
> >
> >>>>> GP2(BEG_RESP) to I2.
> >>>>>
> >>>>> I2 computes that the read data takes 320 NS (because of rule
> >>>>>
> > 16.2.6 c)
> >
> >>>>> and waits.
> >>>>>
> >>>>> T sends GP1(BEG_RESP) and B pushes it into the PEQ (because of
> >>>>>
> > the
> >
> >>>>> BEGIN_RESP rule16.2.6 f).
> >>>>>
> >>>>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T (and
> >>>>>
> > the
> >
> >>>>> write finishes)
> >>>>>
> >>>>> B sends the GP1(BEG_RESP) to I1 which replies with TLM_COMPLETED
> >>>>>
> >>>>> B sends the GP1(END_RESP) to T (and the read finishes)**
> >>>>>
> >>>>> The TLM2.0 base protocols rules created a scenario where two
> >>>>>
> >>>>> concurrent reads and writes of 320 NS took both 640 NS (this
> >>>>>
> > should be
> >
> >>>>> impossible).
> >>>>>
> >>>>> Other scenarios show that relying on simulation order both read
> >>>>>
> > and
> >
> >>>>> write can finish after 320 NS or that either one can finish
> >>>>>
> > after 320
> >
> >>>>> NS and the other one after 640 NS. All the above happens because
> >>>>>
> > there
> >
> >>>>> is an artificial linkage between the request stage of the read
> >>>>>
> > and the
> >
> >>>>> data phase of the write (they use the same phases) and another
> >>>>>
> >>>>> artificial linkage between the acknowledge stage of the write
> >>>>>
> > and the
> >
> >>>>> data phase of the read.
> >>>>>
> >>>>> IMHO these scenarios show that the rules are broken and have to
> >>>>>
> > be
> >
> >>> fixed.
> >>>
> >>>
> >>>>> Moreover these rules don't support the following:
> >>>>>
> >>>>> 1. Preemption where a higher priority transaction can abort a
> >>>>>
> > lower
> >
> >>>>> priority long burst of data in order to transmit its data and
> >>>>>
> > let the
> >
> >>>>> aborted transaction continue.
> >>>>>
> >>>>> 2. Interleaving of the data of a slow burst with data of other
> >>>>>
> > bursts.
> >
> >>>>> The fix cab be the following:
> >>>>>
> >>>>> 1. Specify that the BEGIN_REQ -> END_REQ stage is the address
> >>>>>
> > passing
> >
> >>>>> stage.
> >>>>>
> >>>>> 2. Retain the BEGIN_REQ rule (16.2.6 e) that allows a target to
> >>>>>
> > slow
> >
> >>>>> down upstream components.
> >>>>>
> >>>>> 3. Specify that the END_REQ -> BEGIN_RESP stage is the write
> >>>>>
> > data
> >
> >>>>> passing stage.
> >>>>>
> >>>>> 4. Remove the BEGIN_RESP (rule16.2.6 f). An initiator should not
> >>>>>
> >>>>> anyways issue too many outstanding requests that it can't
> >>>>>
> > handle.
> >
> >>>>> This fix will also support Preemption and Interleaving.
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> Yossi
> >>>>>
> >
> >

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Received on Wed Jan 5 05:35:52 2011

This archive was generated by hypermail 2.1.8 : Wed Jan 05 2011 - 05:35:55 PST

RE: [tlmwg] Revisit of the TLM2.0 phases rules