SystemC P1666 list for Technical Review: Re: [tlmwg] Revisit of

From: Robert Guenzel <robert.guenzel@greensocs.com>
Date: Thu Jan 06 2011 - 08:05:11 PST

Yossi,

again it seems to me that your problem is that the BP does not have some
features of AXI.
AXI has independent rd/wr request and response channels, while the BP
has shared channels.

Still (like James said) you can use the BP to model AXI but then you
abstract from the
independence. If the achievable approximation of AXI
you can reach with the BP is not good enough for you, you will have to
use another protocol.

I understand that AXI is a common protocol and that there are many IP
blocks for AXI,
but the BP is not supposed to be as close as possible to the most
prominent protocol.
Instead it should represent a subset of most of the protocols in use,
and as James said
most of them (e.g. PLB, OPB, APB, AHB and OCP) have shared request
and/or shared
response channels and since shared channels can be seen as an
abstraction of independent ones
(at least IMHO) having shared channels allows to model the
aforementioned protocols as well
as AXI.

Remember, the BP is meant to enable highest possible interoperability
not highest possible
timing accuracy. if you are aiming at more accurate results AT could
simply be too abstract
for your use case.

best regards
Robert

Veller, Yossi wrote:
>
> Hi John and Robert,
>
>
>
> There SHOULD be a rule that specifies who is responsible to calculate
> the transfer time (apart from the bus latencies). Otherwise it may
> happen that both the target and the initiator may consume the time or
> none of them. Hence that why I interpreted 16.2.6 b) and c) as rules.
>
>
>
> About the timing of the write it is more natural to interpret the
> END_REQ->BEGIN_RESP time frame as the data transfer phase and leave
> the BEGIN_REQ->END_REQ time frame of both read and write to the
> address request channel (that exists in most busses). Hence I would
> prefer it specified this way or at least not specify it at all in
> 16.2.6 b (that you’ve said is really not a rule).
>
>
>
> For out-of-order protocols I think that I’ve shown that the TLM2 rules
> contribute to scenarios that don’t seem plausible (with all due
> respect to one-of-a-thousand configuration of OCP). E.g. I would not
> use the BP to approximate AXI (which seems to me a pretty common
> protocol) because of the scenario that I’ve shown. Otherwise one can
> model pretty accurately the throughput of AXI with the BP. This is a
> critical limitation in my view. OK, I stopped using the word broken, I
> agree with Robert that for in-order protocols the scenario that I’ve
> shown does not appear.
>
>
>
> The removal of the BEGIN_RESP (rule16.2.6 f) can fix the problem. At
> first I contend that an initiator should not anyways issue too many
> outstanding requests that it can’t handle. Hence there is no real need
> to enable it to stop responding targets through this rule.
>
>
>
> The removal rule16.2.6 f will also enable the following scenario:
>
> There are two targets T1 and T2 (T2 has higher priority). T1 sends a
> read burst through B and some time afterwards T2 requests also to send
> a read burst. The bus can send a BEGIN_RESP to the initiator in order
> to show that the higher priority request preempts the lower priority
> one and the master can first finish the higher priority transaction
> and delay the end of the lower priority one accordingly. Almost the
> same scenario (only that the lower priority transaction’s end can
> follow the end of the higher priority transaction’s end) can model
> interleaving of the data of slower higher priority bursts and faster
> lower priority bursts.
>
>
>
> Conceptually thinking of the write data as passing in the
> END_REQ->BEGIN_RESP time frame will also show the way for modeling
> preemption and data interleaving on write transactions in a similar
> way to the read.
>
>
>
> So slightly changing the rules will bring big advantages and make BP
> conformable with more actual protocols. Don’t you think so?
>
>
>
> Regards
>
> Yossi
>
>
>
> BTW I did not mean any timing annotations to be used in my example.
> The initiators, target and bus all schedule delayed event
> notifications and call nb_transport at the right time. My apologies
> for the sloppiness with which I've written the example that just
> caused confusion.
>
>
>
> *From:* john.aynsley@doulos.com [mailto:john.aynsley@doulos.com]
> *Sent:* Wednesday, January 05, 2011 9:40 PM
> *To:* robert.guenzel@greensocs.com
> *Cc:* P1666 Technical WG; tlmwg@lists.systemc.org; Veller, Yossi
> *Subject:* Re: [tlmwg] Revisit of the TLM2.0 phases rules
>
>
>
> Yossi,
>
> I fully agree with Robert's answers throughout.
>
> From the outset, I have been concerned about the way you seem to be
> interpreting 16.2.6 b) and c). Note that these are not "rules" in the
> sense that they do not impose any specific obligations. Each TLM-2.0
> component is free to make its own modeling decisions and the BP rules
> allow quite a bit of flexibility. A BP component can chose how it will
> use flow control according to how it wishes to map an actual protocol
> (e.g. AXI) onto the BP abstraction. For example, your bus or your
> target could return END_REQ immediately and implement any delay
> internally rather than bunching the entire 310 NS as
> BEGIN_REQ->END_REQ delay (in which case it had better be able to
> buffer multiple transactions).
>
> You have not written anything that suggest to me that the BP is
> "broken". I think you believe the BP is not precisely equivalent to
> any specific current standard protocol, and I guess we would all
> agree, but so what? The BP is an abstraction, but it IS A PROTOCOL
> nonetheless, and so any BP-compliant model has to be written in full
> knowledge of the protocol rules.
>
> Also, I am concerned by your statements concerning accuracy
> expectations at AT. There are none! Each TLM-2.0 model is free to
> choose whether to add a non-zero timing annotation to an outgoing
> transaction, and how to interpret a non-zero timing annotation on an
> incoming transaction (LT versus AT coding guidelines do not impose any
> obligations).
>
> Cheers,
>
> John A
>
>
> -----<tlmwg@lists.systemc.org> wrote: -----
>
> To: "Veller, Yossi" <Yossi_Veller@mentor.com>
> From: Robert Guenzel
> Sent by:
> Date: 01/05/2011 02:16PM
> Cc: tlmwg@lists.systemc.org, systemc-p1666-technical@eda.org
> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
>
> Hi Yossi,
>
> what you describe in your example is a behavior
> where the protocol has shared request and response channels
> for reads and writes (because of the BP) but it allows
> out-of-order responses (allowed within the BP but not a very common
> bus feature).
>
> The OCP can be configured like that and then it would behave exactly
> as you described in your examples (both transactions ending around
> 640ns).
>
> That means (IMHO) that the BP can be used to model a reasonable bus
> protocol.
> Now if you wanted to bring the BP closer to PLB, OPB, AHB, or APB
> (which do not allow out-of-order responses) then
> you would have to disallow out-of-order responses. Then your target
> would not
> be allowed to let the read response overtake the write response.
> But that would mean in-order with respect to timing, not only
> execution
> order.
>
> And since we have temporal decoupling at AT, in-order with respect
> to timing
> would mean to have PEQs everywhere, and that would render the
> whole idea
> of temporal decoupling meaningless (but I confess that I always
> wondered if
> temporal decoupling at AT is really useful...).
> I believe that was one (the?) reason why we decided to allow
> out-of-order responses.
>
> So do you agree that the problem is basically just the fact that
> responses can overtake
> each other? If so, I'd say the BP is the appropriate vehicle to model
> reasonable busses,
> because nobody forces you to implement your target like you did.
> You can
> easily implement
> it such that it keeps the responses in order. Sending out-of-order
> responses is an option
> not an obligation (whereas tolerating out-of-order responses is
> mandatory, but reordering
> wouldn't be that hard either).
>
> best regards
> Robert
>
> Veller, Yossi wrote:
> > Hi Robert,
> >
> > I'll just have to go back to my basic claim.
> >
> > We have agreed (I assume) that the example that I presented had
> > demonstrated that two transactions each of which should have
> taken 320
> > NS, finish BOTH after 640 NS.
> >
> > On any reasonable bus protocol (or at least any that I know of) this
> > can't happen.
> > Hence IMHO the BP is not the vehicle to model any reasonable bus
> > protocol and should be fixed.
> >
> > What do you think?
> >
> > Regards
> > Yossi
> >
> >
> >> -----Original Message-----
> >> From: Robert Guenzel [mailto:robert.guenzel@greensocs.com]
> >> Sent: Wednesday, January 05, 2011 3:04 PM
> >> To: Veller, Yossi
> >> Cc: tlmwg@lists.systemc.org; systemc-p1666-technical@eda.org
> >> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
> >>
> >> Hi Yossi,
> >>
> >> in a way you seem to contradicting yourself.
> >> You say that there is no rule forcing your bus to behave
> >> deterministically. Fair enough. But then please do accept
> >> the non-deterministic behavior of your simulation
> >> (i.e. dependence on the random execution order of
> >> simulation processes).
> >>
> >> And of course a delay of a pico second or a cylce
> >> or whatever can change the timing of your simulation
> >> by 100%, especially when you use non-deterministic
> >> models.
> >>
> >> You say:
> >>
> >>> T answers as fast as it can, it does not have to consider the fact
> >>>
> > that
> >
> >>> allowing the read response to overtake the outstanding write
> >>>
> > response
> >
> >>> will just double the time (this is an artifact of the rules and not
> >>>
> > the
> >
> >>> application).
> >>>
> >> I agree that it is a result of the protocol in use (the BP). The
> >> response channels are shared between
> >> reads and writes in the BP. Allowing a response to overtake another
> >> means accepting
> >> that the overtaken response has to wait now. A user of the BP is (or
> >> should be)
> >> aware of that. If that is not acceptable for your case (e.g. if you
> >>
> > try
> >
> >> to model independent
> >> read and write response channels like in AXI) then the BP is not
> >> suitable. You will have to define
> >> another protocol. Or use separate sockets for reads and writes
> (which
> >> then use the BP) and make the
> >> routing within your bus command dependent.
> >>
> >> You claim the rules are broken, but they are not. They just do
> not fit
> >> your expectation of
> >> fully independent read and write channels within a single socket.
> >>
> >> Also you say
> >>
> >>> Writing applications
> >>> with AT is hard enough without considering all the time the
> >>>
> > artificial
> >
> >>> effects of the TLM2.0 rules.
> >>>
> >> and I disagree. It is a totally natural thing that when writing an
> >>
> > AT-BP
> >
> >> application
> >> means considering all the rules and all effects of the protocol
> in use
> >> all the time.
> >> And frankly, I do not understand why you call those effects
> >>
> > artificial.
> >
> >> They result
> >> out of the fact that TLM-2.0 allows temporal decoupling on a
> discrete
> >> event simulator,
> >> such that execution order and timing order (can) diverge. And
> then the
> >> question is what
> >> order dictates the rules. For TLM-2.0 it is execution order. And it
> >>
> > has
> >
> >> to be execution order
> >> because time order is unspecified in case of pseudo-simultaneous
> >>
> > activities.
> >
> >> And then it is pretty natural that a change of execution order has
> >> significant effects.
> >> And since waiting can effect execution order, changes in timing
> (i.e.
> >> waiting) can significantly
> >> effect the overall timing as well. Using the BP means accepting
> that.
> >>
> >> As Jakob said: You cannot expect the simulation to behave
> identically
> >> when meddling with
> >> changes of execution and or time order.
> >>
> >> Let's take the classical deadlock in discrete event simulation
> (DES):
> >>
> >> void proc1()
> >> {
> >> event.notify();
> >> }
> >>
> >> void proc2()
> >> {
> >> wait(event);
> >> do_stuff();
> >> }
> >>
> >> it now depends on whether the simulator executes proc1 or proc2
> first.
> >> If it takes proc2 you're fine. If it takes proc1 you deadlock.
> >> If you wanna avoid that you need to make your model behave correctly
> >>
> > no
> >
> >> matter which proc executes first. You have to accept that. It's
> a DES.
> >> And the same applies to TLM-2.0. Your model has to be written in
> a way
> >> such that it behaves like you want no matter what happens first.
> >>
> >> Summary: I still don't think the BP is broken.
> >>
> >> best regards
> >> Robert
> >>
> >>
> >> Veller, Yossi wrote:
> >>
> >>> Hi Robert,
> >>>
> >>> Thanks for your correction to the example. However we both agree
> >>>
> > that
> >
> >>> the mistake does not change much.
> >>>
> >>> In TLM2.0 we are not on solid ground because ASFAIK there are no
> >>>
> > claims
> >
> >>> about the abilities of the TLM2.0 rules.
> >>> However I assumed that an implicit claim specifies that, unless it
> >>>
> > as a
> >
> >>> part of the application requirements, the rules will not cause an
> >>> arbitrary change in the order of the events to change the timing by
> >>>
> > 100%
> >
> >>> (can be more for examples with more initiators). It also should be
> >>>
> > true
> >
> >>> that a delay of a cycle or a Pico second should not change the
> >>>
> > timing by
> >
> >>> 100% due to the rules.
> >>>
> >>> My target and bus try to finish the transfers as fast as possible
> >>>
> > and
> >
> >>> still the timing results vary because of the rules and events
> order.
> >>> T answers as fast as it can, it does not have to consider the fact
> >>>
> > that
> >
> >>> allowing the read response to overtake the outstanding write
> >>>
> > response
> >
> >>> will just double the time (this is an artifact of the rules and not
> >>>
> > the
> >
> >>> application).
> >>> There is no rule that forces a bus to work deterministically,
> and if
> >>> there was one (e.g. by waiting a cycle as you proposed and choosing
> >>> always the same request), I could delay one BEGIN_REQ by a cycle,
> >>>
> > and
> >
> >>> show that just a cycle delay causes the same misbehavior.
> >>>
> >>> What you presented as the purpose of the simulation is just, in
> this
> >>> case, concentrating on how to achieve some workaround to the TLM2.0
> >>> rules and not to achieve the application results. Writing
> >>>
> > applications
> >
> >>> with AT is hard enough without considering all the time the
> >>>
> > artificial
> >
> >>> effects of the TLM2.0 rules.
> >>>
> >>> Unless you want to construe rules that will make such scenarios
> as I
> >>> presented impossible (which I suppose that you can't or that they
> >>>
> > will
> >
> >>> be inacceptable), I see the current base protocol as broken.
> >>>
> >>> Regards
> >>> Yossi
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: tlmwg@lists.systemc.org [mailto:tlmwg@lists.systemc.org] On
> >>>>
> >>>>
> >>> Behalf Of
> >>>
> >>>
> >>>> Robert Guenzel
> >>>> Sent: Wednesday, January 05, 2011 10:54 AM
> >>>> To: tlmwg@lists.systemc.org
> >>>> Cc: systemc-p1666-technical@eda.org
> >>>> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
> >>>>
> >>>> My comments are in-line below
> >>>>
> >>>> Veller, Yossi wrote:
> >>>>
> >>>>
> >>>>> I tried to have the shortest example and maybe it was a mistake.
> >>>>>
> > So
> >
> >>>>> let me try again:
> >>>>>
> >>>>> Let us look at two initiators I1 and I2 connected to a bus B and
> >>>>>
> >>>>>
> >>> target T.
> >>>
> >>>
> >>>>> I1 writes to T a burst whose data transfer should take 310 NS and
> >>>>>
> >>>>>
> >>> the
> >>>
> >>>
> >>>>> time END_REQ->BEGIN_RESP should take 10 NS.
> >>>>>
> >>>>> I2 reads from T a burst whose data transfer should take 311
> NS and
> >>>>>
> >>>>>
> >>> the
> >>>
> >>>
> >>>>> time END_REQ->BEGIN_RESP should take 9 NS
> >>>>>
> >>>>> Both start at the same time (t1).
> >>>>>
> >>>>> Let us denote the generic payload (GP) from I1 GP1 and the GP
> from
> >>>>>
> >>>>>
> >>> I2 GP2
> >>>
> >>>
> >>>>> t= t1 I1 sends GP1(BEGIN_REQ) to B
> >>>>>
> >>>>> B passes the GP1(BEGIN_REQ) to T
> >>>>>
> >>>>> T computes that the written data takes 310 NS (because of rule
> >>>>>
> >>>>>
> >>> 16.2.6
> >>>
> >>>
> >>>>> b) and waits.
> >>>>>
> >>>>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because
> of the
> >>>>> BEGIN_REQ rule 16.2.6 e).
> >>>>>
> >>>>> t= t1+310 NS T sends GP1(END_REQ) and B passes it to I1 then B
> >>>>>
> > takes
> >
> >>>>> GP2(BEGIN_REQ) from the PEQ and calls T.
> >>>>>
> >>>>> T returns TLM_UPDATED and changes the phase to END_REQ and B
> sends
> >>>>> GP2(END_REQ) to I2.
> >>>>>
> >>>>> t= t1+319 NS T sends GP2(BEGIN_RESP) and B passes it to I2.
> >>>>>
> >>>>> I2 computes that the read data takes 311 NS (because of rule
> >>>>>
> > 16.2.6
> >
> >>> c)
> >>>
> >>>
> >>>>> and waits.
> >>>>>
> >>>>> t= t1+320 NS T sends GP1(BEGIN_RESP) and B pushes it into the PEQ
> >>>>> (because of the BEGIN_RESP rule16.2.6 f).
> >>>>>
> >>>>>
> >>>>>
> >>>> No. T does not send BEGIN_RESP here. It can't. Due to the END_RESP
> >>>>
> >>>>
> >>> rule.
> >>>
> >>>
> >>>>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T (and the
> >>>>>
> >>>>>
> >>> read
> >>>
> >>>
> >>>>> finishes)
> >>>>>
> >>>>> B sends the GP1(BEG_RESP) to I1 which replies with TLM_COMPLETED
> >>>>>
> >>>>> B sends the GP1(END_RESP) to T (and the write finishes)
> >>>>>
> >>>>>
> >>>>>
> >>>> After getting the END_RESP with GP2, T will now send BEGIN_RESP
> >>>>
> > which
> >
> >>> will
> >>>
> >>>
> >>>> finish with TLM_COMPLETED.
> >>>> That doesn't change much, but the example should not violate the
> >>>>
> > rules
> >
> >>>> it questions.
> >>>>
> >>>>
> >>>>> Here I arranged the time so that your points about the
> behavior of
> >>>>>
> > T
> >
> >>>>> do not hold (It now has only one alternative). We've got to the
> >>>>>
> > same
> >
> >>>>> result. The added delays were forced by the TLM rules and not by
> >>>>>
> > any
> >
> >>>>> of the models.
> >>>>>
> >>>>>
> >>>>>
> >>>> And it seems perfectly correct to me. Read and writes share the
> >>>>
> > same
> >
> >>>> request and response channels.
> >>>> T allows the read response to overtake the outstanding write
> >>>>
> > response.
> >
> >>>> By that it accepts that now the write can
> >>>> only finish when the read response has finished. If that is not
> >>>>
> >>>>
> >>> desired
> >>>
> >>>
> >>>> T will have to schedule the read response
> >>>> behind the write response. that will increase the
> >>>>
> > END_REQ->BEGIN_RESP
> >
> >>>> delay from the desired 9ns
> >>>> to (at least) 10 ns (plus a potential BEGIN_RESP->END_RESP
> delay in
> >>>>
> >>>>
> >>> the
> >>>
> >>>
> >>>> write),
> >>>> but that is the whole purpose of the simulation, right? If
> >>>>
> > everything
> >
> >>>> was independent
> >>>> from everything else, well then you can simply calculate how long
> >>>>
> >>>>
> >>> things
> >>>
> >>>
> >>>> will take and you do not need
> >>>> to simulate anything.
> >>>>
> >>>>
> >>>>> However if the simulator chooses to schedule I2 before of I1 the
> >>>>>
> >>>>>
> >>> whole
> >>>
> >>>
> >>>>> operation will take only 320 NS.
> >>>>>
> >>>>>
> >>>>>
> >>>> And in my opinion that is a problem of B. Two things happen at the
> >>>>
> >>>>
> >>> same
> >>>
> >>>
> >>>> time in a simulation.
> >>>> Apparently B is using some kind of combinatorial arbitration
> >>>>
> > because
> >
> >>> it
> >>>
> >>>
> >>>> directly forwards
> >>>> an incoming BEGIN_REQ to T. When doing that it should employ a
> >>>>
> >>>>
> >>> mechanism
> >>>
> >>>
> >>>> that allows it
> >>>> to make deterministic choices. It could use some kind of delta
> >>>>
> > cycle
> >
> >>>> waiting or whatever to make
> >>>> sure that the BEGIN_REQs from I1 and I2 both have arrived and it
> >>>>
> > can
> >
> >>>> then forward the correct one.
> >>>> In this case the simulation order of I1 and I2 will not effect the
> >>>> resulting communication order anymore.
> >>>>
> >>>> If such a mechanism is not desired at AT, because it effects
> >>>>
> >>>>
> >>> simulation
> >>>
> >>>
> >>>> speed, then you must accept that
> >>>> your AT model is execution order dependent.
> >>>>
> >>>>
> >>>>> So do you agree that the rules are broken?
> >>>>>
> >>>>>
> >>>>>
> >>>> No (not yet). And I try to explain my view below.
> >>>>
> >>>>
> >>>>> Besides this: I don't agree with your points about the broken
> >>>>>
> >>>>>
> >>> target,
> >>>
> >>>
> >>>>> there is no rule that forces T to respond with TLM_UPDATED
> and not
> >>>>> TLM_ACCEPTED even though the first is more efficient. The TLM
> >>>>>
> > rules
> >
> >>>>> should enable deterministic timing regardless of such choices of
> >>>>>
> > the
> >
> >>>>> target (or initiator or bus) behavior as long as their
> >>>>>
> > communication
> >
> >>>>> complies with the rule and they try to achieve their separate
> >>>>>
> >>>>>
> >>> timing.
> >>>
> >>>
> >>>> And I disagree. In my opinion the choices should have an effect on
> >>>>
> >>>>
> >>> timing.
> >>>
> >>>
> >>>> If I (a target) return TLM_ACCEPTED to BEGIN_REQ I disallow a
> >>>>
> >>>>
> >>> subsequent
> >>>
> >>>
> >>>> BEGIN_REQ at the same time,
> >>>> if I return TLM_UPDATED with END_REQ I allow it (even if I
> increase
> >>>>
> >>>>
> >>> the
> >>>
> >>>
> >>>> timing annotation).
> >>>> If I return TLM_UPDATED with BEGIN_RESP I also allow a subsequent
> >>>> BEGIN_REQ, but
> >>>> I also make clear that the response cannot be overtaken by any
> >>>>
> > other
> >
> >>>> response.
> >>>> If I return TLM_COMPLETED I disallow any other delays. The
> >>>>
> > initiator
> >
> >>> has
> >>>
> >>>
> >>>> no chance of
> >>>> adding some BEGIN_RESP->END_RESP delay.
> >>>>
> >>>> Similar things apply to the choice if I skip END_REQ or not.
> >>>>
> >>>> It is obvious that those choices control what a connected bus or
> >>>> initiator can do to me.
> >>>> That is the whole purpose of flow control. If the choices
> would not
> >>>> effect timing
> >>>> than the question would be why there are choices at all.
> >>>>
> >>>> To explain that a little more:
> >>>> Let's say we have
> >>>> nb_trans_fw(gp1, BEGIN_REQ, t)
> >>>> and I wanna send an END_REQ in 10 ns
> >>>>
> >>>> now I can do
> >>>> option 1:
> >>>> return TLM_ACCEPTED; wait(10 ns); send nb_trans_bw(gp1, END_REQ,
> >>>>
> > 0s)
> >
> >>>> option 2:
> >>>> t+=10ns;
> >>>> ph=END_REQ;
> >>>> return TLM_UPDATED;
> >>>>
> >>>>
> >>>> With option 1 I can be sure NOT to get a BEGIN_REQ within the next
> >>>>
> >>>>
> >>> 10ns
> >>>
> >>>
> >>>> because
> >>>> I did not provide an END_REQ.
> >>>> With option 2 I can get a BEGIN_REQ in the next 10 ns, because I
> >>>>
> >>>>
> >>> already
> >>>
> >>>
> >>>> gave the END_REQ
> >>>> (although it becomes "effective" only 10 ns in the future).
> >>>>
> >>>> So depending on that choice I can influence the timing of
> >>>>
> > subsequent
> >
> >>>> incoming BEGIN_REQs.
> >>>> The reason is that the rules are based on call order alone and not
> >>>>
> > on
> >
> >>>> their timing.
> >>>> IF you wanted to enforce that timings are the same for subsequent
> >>>> BEGIN_REQs in both cases (TLM_ACCEPTED or TLM_UPDATED)
> >>>> you would either have to say that the rules are based on
> timing, or
> >>>>
> >>>>
> >>> that
> >>>
> >>>
> >>>> you disallow timing annotations
> >>>> at AT.
> >>>> IF you base the rules on timing you enforce PEQs everywhere which
> >>>>
> > is
> >
> >>>> (from a sim speed PoV) equivalent to disallowing timing
> >>>> annotations.
> >>>>
> >>>> regards
> >>>> Robert
> >>>>
> >>>>
> >>>>> Regards
> >>>>>
> >>>>> Yossi
> >>>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>>
> >>>>>> From: Robert Guenzel [mailto:robert.guenzel@greensocs.com]
> >>>>>>
> >>>>>> Sent: Tuesday, January 04, 2011 3:07 PM
> >>>>>>
> >>>>>> To: Veller, Yossi
> >>>>>>
> >>>>>> Cc: systemc-p1666-technical@eda.org; tlmwg@lists.systemc.org
> >>>>>>
> >>>>>> Subject: Re: [tlmwg] Revisit of the TLM2.0 phases rules
> >>>>>>
> >>>>>> IMHO the scenario you describe does not show that the rules are
> >>>>>>
> >>>>>>
> >>> broken.
> >>>
> >>>
> >>>>>> It's the target that might be broken.
> >>>>>>
> >>>>>> Or it's me who didn't fully understand the problem.
> >>>>>>
> >>>>>> I'll try to explain me view:
> >>>>>>
> >>>>>> T sends GP1(END_REQ) at t1+320ns. By doing that it indicates
> >>>>>>
> >>>>>> that it wants to insert a delay between END_REQ and BEGIN_RESP
> >>>>>>
> >>>>>> (otherwise it would directly do GP1(BEGIN_RESP) )
> >>>>>>
> >>>>>> The target knows what it did. Now it gets GP2(BEGIN_REQ).
> >>>>>>
> >>>>>> If the target now returns UPDATED(BEGIN_RESP) it _knows_
> >>>>>>
> >>>>>> that it now moves the RESP of GP1 _behind_ the RESP of GP2
> >>>>>>
> >>>>>> (because it now assigns the RESP channel to GP2).
> >>>>>>
> >>>>>> If it does not want to do that it can either return TLM_ACCEPTED
> >>>>>>
> >>>>>> and finish GP1 first (by sending BEGIN_RESP), or return
> >>>>>>
> >>>>>>
> >>>>> TLM_UPDATED(END_REQ)
> >>>>>
> >>>>>
> >>>>>
> >>>>>> and finish GP1 afterwards as well.
> >>>>>>
> >>>>>> You say:
> >>>>>>
> >>>>>>
> >>>>>>> The TLM2.0 base protocols rules created a scenario where two
> >>>>>>>
> >>>>>>> concurrent reads and writes of 320 NS took both 640 NS (this
> >>>>>>>
> >>>>>>>
> >>> should be
> >>>
> >>>
> >>>>>>> impossible).
> >>>>>>>
> >>>>>>>
> >>>>>> There is no BP rule that forces T to do what you describe.
> >>>>>>
> >>>>>> And I do not think that it should be impossible. If the target
> >>>>>>
> >>>>>>
> >>> wants
> >>>
> >>>
> >>>>>> to establish such a link between GP1 and GP2 it should be
> allowed
> >>>>>>
> >>>>>> to do so. But it can decide to avoid/reduce the link like that:
> >>>>>>
> >>>>>> t= t1 I1 sends GP1(BEGIN_REQ) to B
> >>>>>>
> >>>>>> B passes the GP1(BEGIN_REQ) to T
> >>>>>>
> >>>>>> T computes that the written data takes 320 NS (because of rule
> >>>>>>
> >>>>>>
> >>> 16.2.6 b)
> >>>
> >>>
> >>>>>> and waits.
> >>>>>>
> >>>>>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because of
> >>>>>>
> > the
> >
> >>>>>> BEGIN_REQ rule 16.2.6 e).
> >>>>>>
> >>>>>> t= t1+320 NS T sends GP1(BEGIN_RESP) and B passes it to I1
> >>>>>>
> >>>>>> I1 returns TLM_COMPLETED (seen both by B and T). GP1 is DONE.
> >>>>>>
> >>>>>> Then B takes GP2(BEGIN_REQ) from the PEQ and calls T.
> >>>>>>
> >>>>>> T returns TLM_UPDATED and changes the phase to BEG_RESP and B
> >>>>>>
> > send
> >
> >>>>>> GP2(BEG_RESP) to I2.
> >>>>>>
> >>>>>> I2 computes that the read data takes 320 NS (because of rule
> >>>>>>
> >>>>>>
> >>> 16.2.6 c)
> >>>
> >>>
> >>>>>> and waits.
> >>>>>>
> >>>>>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T. GP2 is
> >>>>>>
> >>>>>>
> >>> DONE.
> >>>
> >>>
> >>>>>> Now at the target side both GP1 and GP2 last 320 ns. For I2 GP2
> >>>>>>
> >>>>>>
> >>> lasts
> >>>
> >>>
> >>>>>> 640 ns but that is due to
> >>>>>>
> >>>>>> the fact that GP1 and GP2 are scheduled by B.
> >>>>>>
> >>>>>> I agree that there is the problem of preemption and
> interleaving,
> >>>>>>
> >>>>>> but the question is if that is something that AT abstracts away
> >>>>>>
> > or
> >
> >>> not?
> >>>
> >>>
> >>>>>> I must confess that I am not sure about the answer to that
> >>>>>>
> >>>>>>
> >>> question.
> >>>
> >>>
> >>>>>> Due to the BEGIN_REQ rule preemption is not possible with
> another
> >>>>>>
> >>>>>> BEGIN_REQ.
> >>>>>>
> >>>>>> But how about ignorable phases TRY_PREEMPT?
> >>>>>>
> >>>>>> After GP1(BEGIN_REQ) an initiator/interconnect can try a
> >>>>>>
> >>>>>>
> >>> preemption by
> >>>
> >>>
> >>>>>> sending GP1(TRY_PREEMPT) and if it gets back TLM_COMPLETED
> >>>>>>
> >>>>>> the preemption was successful if it gets back TLM_ACCEPTED the
> >>>>>>
> >>>>>>
> >>> target
> >>>
> >>>
> >>>>>> is unable to preempt GP1. Should be BP compatible.
> >>>>>>
> >>>>>> For interleaving I'd say that is something that is abstracted
> >>>>>>
> > away
> >
> >>> at
> >>>
> >>>
> >>>>>> the AT level.
> >>>>>>
> >>>>>> best regards
> >>>>>>
> >>>>>> Robert
> >>>>>>
> >>>>>> Veller, Yossi wrote:
> >>>>>>
> >>>>>>
> >>>>>>> During the Review of the Draft LRM I revisited the TLM2.0
> as the
> >>>>>>>
> >>>>>>>
> >>> new
> >>>
> >>>
> >>>>>>> addition to the standard and found out the following:
> >>>>>>>
> >>>>>>> Let us look at two initiators I1 and I2 connected to a bus
> B and
> >>>>>>>
> >>>>>>>
> >>>>> target T.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> I1 writes to T a burst whose data transfer should take 320 NS
> >>>>>>>
> >>>>>>>
> >>> and
> >>>
> >>>
> >>>>>>> I2 reads from T a burst whose data transfer should take 320 NS
> >>>>>>>
> >>>>>>>
> >>> at the
> >>>
> >>>
> >>>>>>> same time (t1).
> >>>>>>>
> >>>>>>> Let us denote the generic payload (GP) from I1 GP1 and the GP
> >>>>>>>
> >>>>>>>
> >>> from
> >>>
> >>>
> >>>>> I2 GP2
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> t= t1 I1 sends GP1(BEGIN_REQ) to B
> >>>>>>>
> >>>>>>> B passes the GP1(BEGIN_REQ) to T
> >>>>>>>
> >>>>>>> T computes that the written data takes 320 NS (because of rule
> >>>>>>>
> >>>>>>>
> >>> 16.2.6
> >>>
> >>>
> >>>>>>> b) and waits.
> >>>>>>>
> >>>>>>> I2 sends GP2(BEGIN_REQ) to B, B queues it in a PEQ (because of
> >>>>>>>
> >>>>>>>
> >>> the
> >>>
> >>>
> >>>>>>> BEGIN_REQ rule 16.2.6 e).
> >>>>>>>
> >>>>>>> t= t1+320 NS T sends GP1(END_REQ) and B passes it to I1 then B
> >>>>>>>
> >>>>>>>
> >>> takes
> >>>
> >>>
> >>>>>>> GP2(BEGIN_REQ) from the PEQ and calls T.
> >>>>>>>
> >>>>>>> T returns TLM_UPDATED and changes the phase to BEG_RESP and B
> >>>>>>>
> >>>>>>>
> >>> send
> >>>
> >>>
> >>>>>>> GP2(BEG_RESP) to I2.
> >>>>>>>
> >>>>>>> I2 computes that the read data takes 320 NS (because of rule
> >>>>>>>
> >>>>>>>
> >>> 16.2.6 c)
> >>>
> >>>
> >>>>>>> and waits.
> >>>>>>>
> >>>>>>> T sends GP1(BEG_RESP) and B pushes it into the PEQ (because of
> >>>>>>>
> >>>>>>>
> >>> the
> >>>
> >>>
> >>>>>>> BEGIN_RESP rule16.2.6 f).
> >>>>>>>
> >>>>>>> t= t1+640 NS I2 sends GP2(END_RESP) and B passes it to T (and
> >>>>>>>
> >>>>>>>
> >>> the
> >>>
> >>>
> >>>>>>> write finishes)
> >>>>>>>
> >>>>>>> B sends the GP1(BEG_RESP) to I1 which replies with
> TLM_COMPLETED
> >>>>>>>
> >>>>>>> B sends the GP1(END_RESP) to T (and the read finishes)**
> >>>>>>>
> >>>>>>> The TLM2.0 base protocols rules created a scenario where two
> >>>>>>>
> >>>>>>> concurrent reads and writes of 320 NS took both 640 NS (this
> >>>>>>>
> >>>>>>>
> >>> should be
> >>>
> >>>
> >>>>>>> impossible).
> >>>>>>>
> >>>>>>> Other scenarios show that relying on simulation order both read
> >>>>>>>
> >>>>>>>
> >>> and
> >>>
> >>>
> >>>>>>> write can finish after 320 NS or that either one can finish
> >>>>>>>
> >>>>>>>
> >>> after 320
> >>>
> >>>
> >>>>>>> NS and the other one after 640 NS. All the above happens
> because
> >>>>>>>
> >>>>>>>
> >>> there
> >>>
> >>>
> >>>>>>> is an artificial linkage between the request stage of the read
> >>>>>>>
> >>>>>>>
> >>> and the
> >>>
> >>>
> >>>>>>> data phase of the write (they use the same phases) and another
> >>>>>>>
> >>>>>>> artificial linkage between the acknowledge stage of the write
> >>>>>>>
> >>>>>>>
> >>> and the
> >>>
> >>>
> >>>>>>> data phase of the read.
> >>>>>>>
> >>>>>>> IMHO these scenarios show that the rules are broken and have to
> >>>>>>>
> >>>>>>>
> >>> be
> >>>
> >>>
> >>>>> fixed.
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> Moreover these rules don't support the following:
> >>>>>>>
> >>>>>>> 1. Preemption where a higher priority transaction can abort a
> >>>>>>>
> >>>>>>>
> >>> lower
> >>>
> >>>
> >>>>>>> priority long burst of data in order to transmit its data and
> >>>>>>>
> >>>>>>>
> >>> let the
> >>>
> >>>
> >>>>>>> aborted transaction continue.
> >>>>>>>
> >>>>>>> 2. Interleaving of the data of a slow burst with data of other
> >>>>>>>
> >>>>>>>
> >>> bursts.
> >>>
> >>>
> >>>>>>> The fix cab be the following:
> >>>>>>>
> >>>>>>> 1. Specify that the BEGIN_REQ -> END_REQ stage is the address
> >>>>>>>
> >>>>>>>
> >>> passing
> >>>
> >>>
> >>>>>>> stage.
> >>>>>>>
> >>>>>>> 2. Retain the BEGIN_REQ rule (16.2.6 e) that allows a target to
> >>>>>>>
> >>>>>>>
> >>> slow
> >>>
> >>>
> >>>>>>> down upstream components.
> >>>>>>>
> >>>>>>> 3. Specify that the END_REQ -> BEGIN_RESP stage is the write
> >>>>>>>
> >>>>>>>
> >>> data
> >>>
> >>>
> >>>>>>> passing stage.
> >>>>>>>
> >>>>>>> 4. Remove the BEGIN_RESP (rule16.2.6 f). An initiator
> should not
> >>>>>>>
> >>>>>>> anyways issue too many outstanding requests that it can't
> >>>>>>>
> >>>>>>>
> >>> handle.
> >>>
> >>>
> >>>>>>> This fix will also support Preemption and Interleaving.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>>
> >>>>>>> Yossi
> >>>>>>>
> >>>>>>>
> >>>
> >
> >
>
> =
>

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Received on Thu Jan 6 08:06:10 2011

This archive was generated by hypermail 2.1.8 : Thu Jan 06 2011 - 08:06:11 PST