Processing

Please wait...

Settings

Settings

1. WO2007002550 - PRIMITIVES TO ENHANCE THREAD-LEVEL SPECULATION

Note: Text based on automatic Optical Character Recognition processes. Please use the PDF version for legal matters

What is claimed is:
1. An apparatus, comprising:
a plurality of thread units to concurrently execute a plurality of threads; and
a memory buffer storage area to store data for a memory write instruction encountered during execution of an atomic block of instructions for a particular one of the plurality of threads;
wherein the memory buffer storage area is part of a persistent state such that precise architected state is defined at the retirement boundary of each instruction of the atomic block.

2. The apparatus of Claim 1, further comprising:
a control storage area whose contents may be updated responsive to a user-level programming instruction in the particular thread.
3. The apparatus of Claim 2, wherein:
the contents of the control storage area are to control whether the memory- write data is to be stored in the memory address storage area.
4. The apparatus of Claim 2, wherein:
said control storage area is a register that includes one or more fields to hold a state value;
said state value to indicate one or more of the following states: (a) whether to store the memory- write data in the memory buffer storage area, (b) whether to reset the memory buffer storage area, and (c) whether to bypass the memory buffer storage area and instead write directly to a memory.

5. The apparatus of Claim 1 , further comprising:
a memory address storage area to maintain the address of a memory read instruction encountered during execution of the atomic block.

6. The apparatus of Claim 1, further comprising: logic to perform an atomic update from the memory buffer storage area to a memory.
7. The apparatus of Claim 6, wherein said logic to perform an atomic update is further to:
perform an atomic update only if the atomic block has completed execution successfully.
8. The apparatus of Claim 1, further comprising:
a user- visible status storage area whose contents reflects whether the atomic block has failed to be successfully executed.

9. A method, comprising:
executing a selected instruction during execution by a processor of a transactional block of instructions in a speculative thread; and
maintaining precise architected state of the processor at the execution boundary of the selected instruction.
10. The method of claim 9, further comprising:
servicing a trap or exception while maintaining precise architected state for the transactional block.

11. The method of claim 9, further comprising:
performing single-stepping of the transactional block instructions while maintaining precise architected state for the transactional block.

12. A method, comprising:
buffering local memory writes during execution of an atomic block, where said buffering is performed responsive to a first user-level programming instruction;
monitoring for a failure during execution of the atomic block;

taking, as a non- failure condition, a trap or exception during execution of the atomic block;
maintaining the buffered local memory writes as persistent state during handling of the trap or exception;
resuming execution of the atomic block after handling the exception or interrupt; and
selectively performing an atomic memory update of the buffered memory writes, based on whether the failure has occurred.

13. The method of Claim 12, wherein:
said monitoring is performed responsive to a user-level programming instruction that indicates a trigger scenario and a handler address for an interrupt.

14. The method of Claim 13, wherein:
wherein said trigger scenario further comprises a change in the value of one or more status bits.

15. The method of Claim 14, wherein:
wherein said one or more status bits are specified by a mask associated with the user-level programming instruction.

16. A method, comprising:
concurrently executing a plurality of cooperative threads;
suspending execution of all but a first one of the cooperative threads in order to allow the first thread to execute a block of instructions atomically;
wherein said suspending is triggered by action of the first thread to invoke a hardware mechanism; and resuming the other cooperative threads after the first thread has completed atomic execution of the block of instructions.

17. The method of Claim 16, wherein:
said action of a first thread to invoke a hardware mechanism further comprises writing a pre-defined value to a specified memory location.

18. The method of Claim 17, wherein:
said suspending is further triggered by an interrupt generated as a result of said action of the first thread, such that said suspending is achieved without polling, by the other cooperative threads, of the specified memory location.

19. The method of Claim 16, wherein:
said method is performed by a multi-threaded processor that includes hardware to support transactional execution.
20. The method of Claim 19, wherein:
said hardware includes a storage area to buffer memory writes of an atomic block.

21. The method of Claim 19, wherein:
said hardware includes a storage area to maintain addresses of memory reads of an atomic block.

22. The apparatus of Claim 1, wherein:
each said thread unit further comprises decode logic to receive and decode a user-level transactional execution instruction.

23. The apparatus of Claim 22, wherein: said decode logic is further to receive and decode a user-level atomic demarcation instruction.

24. The apparatus of Claim 22, wherein:
said decode logic is further to receive and decode a user-level instruction to read a transaction status.

25. The apparatus of Claim 22, wherein:
said decode logic is further to receive and decode a user-level instruction to enable traps during transactional execution.

26. The apparatus of Claim 22, wherein:
said decode logic is further to receive and decode a user-level instruction to perform an atomic memory update.

27. The apparatus of Claim 5, further comprising:
logic to determine whether another of the plurality of threads has written to the address of the memory read instruction during the particular thread's execution of the atomic block.

28. The apparatus of Claim 5, further comprising:
a user- visible mechanism to control whether the memory-read address is to be stored in the second storage area.

29. The apparatus of Claim 28, wherein said user- visible mechanism further
comprises:
a storage area whose contents may be updated responsive to a user-level programming instruction.

30. The apparatus of Claim 1, wherein said plurality of thread units further
comprise:
a plurality of processor cores.

31. The apparatus of Claim 1 , wherein said plurality of thread units further
comprise:
a plurality of logical processors associated with a single processor core.

32. The method of Claim 16, wherein:
said suspending is initiated in response to a user-level software instruction.

33. An system, comprising:
a memory to store software instructions for a plurality of threads;
a plurality of thread units to concurrently execute the plurality of threads; and a memory buffer storage area to store data for a memory write instruction encountered during execution of an atomic block of instructions for a particular one of the plurality of threads;
wherein the memory buffer storage area is part of a persistent state such that precise architected state is defined at the retirement boundary of each instruction of the atomic block.

34. The system of Claim 33, wherein:
said memory is a DRAM.

35. The system of Claim 33 , further comprising: a memory address storage area to maintain the address of a memory read instruction encountered during execution of the atomic block.

36. The system of Claim 33, further comprising:
a control storage area whose contents may be updated responsive to a user-level programming instruction in the particular thread.

37. The system of Claim 33, wherein:
the contents of the control storage area are to control whether a trap may be taken as a non- failure condition during execution of the atomic block.

38. The system of Claim 33 , wherein:
each said thread unit further comprises decode logic to receive and decode a user-level transactional execution instruction.