Later, when erroneous data is read by executing software, a machine check is initiated. In case you think this feature is old and was supplanted by something more recent, I urge you to flip back to and read along here at the intro to Section Posted Aug 31, 7: Posted Dec 4, 9: The handler ignores the following types of pages: Since page flags are currently in short supply, this choice was not made without consternation and debate by kernel hackers. Memory “poisoning”, with its delayed handling of errors, allows for a more graceful recovery from and isolation of uncorrected memory errors rather than just crashing the system.
|Date Added:||17 May 2018|
|File Size:||11.6 Mb|
|Operating Systems:||Windows NT/2000/XP/2003/2003/7/8/10 MacOS 10/X|
|Price:||Free* [*Free Regsitration Required]|
I guess what you’re missing is who marks the memory as poisoned. Can it be any clearer? Or are you asking about something much more subtle?
Otherwise, hardware injectkr will cause a system panic. This allow system soft- ware to perform recovery action on certain class of uncorrected errors and continue If I’m not mistaken, that’s the processor family this article mc referring to. The OS can then take appropriate action, like killing the process with the corrupted data or logging the event properly to disk.
Introduction to memory errors on modern systems and a description how the mcelog daemon handles and avoids them.
The handler must allow for multiple poisoning events occurring in a short time window. Introduction to platform hardware errors on modern x86 machines including detailed flows and recent improvements to the Linux x86 machine check handling, with a focus on memory errors.
MCE is injextor mechanism by which the hardware reports the bad page to the operating system. Potentially corrupted processes can then be located by finding all processes that have the corrupted page mapped. Its exact behavior depends upon the type of corrupted page and various kernel configuration parameters.
Ongoing evolution of Linux x86 machine check handling at LinuxCon This is an early paper about the first version of mcelog. Studies about memory errors A good study on memory errors from the University of Rochester. In case you think this feature is old and was supplanted by something more recent, Injectr urge you to flip back to and read along here at the intro injectof Section With delay, handling can be safely postponed until a later time when the page might be referenced.
A newer study that gets to the same conclusion. Intel’s recent preview of its Xeon processor codenamed Nehalem-EX promises support for memory poisoning.
The blanket action of crashing the machine for all uncorrected soft and hard memory errors is sometimes over-reactive. If background scrubbing detects something uncorrectable, it can and it seems like it ought to signal a machine check. Posted Aug 28, 7: Try to keep everything running as smoothly imtel possible and only bringing down the affected tasks if any. Since page flags are currently in short supply, this choice was not made without consternation and debate by kernel hackers.
The MCA can occur on any “word”, where “word” is defined by the width of the ECC code applied at the corresponding level of memory. In any case, this bit allows previously poisoned pages to be ignored by the handler. While the specifics of how hardware and the kernel might implement memory poisoning varies, the general concept is as follows.
The OS marks the memory as poisoned, or otherwise discards the contents of the page if it was clean. Dirty pages are unmapped from all associated processes, which are subsequently killed.
mcelog — further reading
These delays include asynchronous hardware reporting of the machine check event, and delayed execution of the handler via a workqueue. EDAC is an alternative approach at reporting memory errors.
See this LWN article for further details about this issue. Unlike clean pages, dirty pages in these caches have differences between the memory and disk copies. Includes an overview of modern mcelog. The hardware now supports a concept of recoverable machine check, and the software uses it.