SABERTOOTH X58 clock drift

**DevilsPGD[_4_]** · April 6th 11, 04:42 AM posted to alt.comp.periphs.mainboard.asus

Anyone else out there with a SABERTOOTH X58 have trouble keeping the
clock accurate?

I've got two of these systems, both with i7-950 CPUs, and both seem to
be losing several minutes an hour until NTP notices and corrects it,
then the process starts over again.

Updated the BIOS (longshot, I know), all applicable drivers on my
primary system are as up to date as possible with drivers from
individual manufacturers where available. On the second system, I'm
running only ASUS reference drivers for the motherboard itself, plus
applicable drivers for add-on hardware.

The two systems have little else in common besides the CPU, motherboard,
same brand and type of RAM (12GB and 6GB) and same keyboard, all other
components are different. Both running Windows 7 Pro SP1 x64.

**Paul** · April 6th 11, 05:57 PM posted to alt.comp.periphs.mainboard.asus

DevilsPGD wrote:
Anyone else out there with a SABERTOOTH X58 have trouble keeping the
clock accurate?

I've got two of these systems, both with i7-950 CPUs, and both seem to
be losing several minutes an hour until NTP notices and corrects it,
then the process starts over again.

Updated the BIOS (longshot, I know), all applicable drivers on my
primary system are as up to date as possible with drivers from
individual manufacturers where available. On the second system, I'm
running only ASUS reference drivers for the motherboard itself, plus
applicable drivers for add-on hardware.

The two systems have little else in common besides the CPU, motherboard,
same brand and type of RAM (12GB and 6GB) and same keyboard, all other
components are different. Both running Windows 7 Pro SP1 x64.

I find the best overview on time keeping, is provided by the virtual
machine software writers. They do a much better job, than any single
description provided by an OS designer.

http://www.vmware.com/files/pdf/Time...alMachines.pdf

Network time protocol (NTP), can do a couple things for you. At
the instant you make a request to a server (sync), you get a
single point correction to the correct time. But, in addition,
some clients will also track the nature of the offset, and can
compute a "drift" factor. Say the hardware clock source (oscillator)
that is used to pace the clock tick interrupts, is 20 ppm slow.
After enough observations have been made, NTP can "see" the steady drift.
This allows two kinds of corrections to be made. You can make a
correction each time you consult the server (say, every three days).
But, you can also "dribble out" corrections to the perceived drift factor,
at a much higher speed. Say, for example, you know you'll lose 10 seconds
within the next three days. You could take the inverse of that, and
every 7.2 hours, make an "unsynced" one second correction. When the
third day rolls around, if the time keeping error is a simple drift,
then your sync will be "bang on".

Because of that design possibility, with the right NTP client,
you can virtually eliminate initial oscillator accuracy as an issue.
And this is why I wouldn't personally waste a lot of time, belaboring
the quality of motherboard implementations. If there is a steady
drift, you could fix it.

But drift isn't entirely like that. Perhaps you're 20ppm slow, but
there is also a slight temperature dependency (real oscillators use
temperature compensation, via the temperature coefficient of some
of the components in the circuit). So if the room gets
warm, that creates a divergence from the simple model. While an NTP
client can do the "first order dribble", it would need to take
into account all other physical dependencies and curve fit them,
to do more than that. Say, for example, the computer had an oscillator
temperature sensor - temperature readings could be captured at the
point of NTP time sync, and you could then build a temperature model.
(You would then make much more frequent local temperature measurements,
to compute the expected amount of required correction, and dribble
that out too.)

Now, in addition to the various ideas in the VMWare document, on
real computers you also have the issue of SMM. SMM allows a motherboard
manufacturer to run BIOS code, while the OS is running, without
you knowing it, or having any means to detect it directly. This
can degrade the real-time responsiveness of a computer, and mainly
because the OS has no say in the matter - the SMM is not maskable.

http://en.wikipedia.org/wiki/System_Management_Mode

A possible reason for running such code at regular intervals, is
for fan speed control. Or perhaps some other control function,
such as something to do with the number of phases enabled on
the Vcore regulator or the like.

When SMM runs, the OS has no way of knowing it. If the SMM
code execution time is short, no harm is done. If the SMM
execution time exceeds the period of one clock tick interrupt,
then a clock tick can get lost. This causes the software maintained
clock to run slow. And if the SMM isn't consistently causing
a problem, then NTP can't null out all the effects.

An indirect way to monitor this, is checking "DPC service latency".
DPC is a deferred procedure call. On your computer, when an interrupt
routine services an interrupt, it saves the "heavyweight" part of
the code for later. Only the most critical code runs at interrupt level,
while code requiring longer run time, runs at user level. A DPC is
scheduled for later, to finish servicing the interrupt. The following
program, measures how long it takes for a DPC sitting in queue,
to finally get serviced. Long delays, implies *something* is going
on in the background.

http://www.thesycon.de/deu/latency_check.shtml

Now, one thing I've noticed on my system here, is there is a pretty
good sized spike, when my video card changes in or out of 3D mode.
But other than that, I don't see any signs I have an SMM problem.
My RMS latency is pretty low, as seen in the DPC latency check window.

Gigabyte released a couple of boards, that needed BIOS updates to
cure DPC latency issues. To track an issue like this, you find
people who build audio workstations, and see what boards they're having
problems with. The motherboard manufacturers never admit to what
monkey business they're up to, under the hood, so it's not
possible to say much more about SMM code, and why a BIOS update
may or may not fix it. They certainly won't admit "our SMM code
exceeded one clock tick" in their release notes. Presumably more
than Gigabyte has had this problem - I'm not trying to pick on
Gigabyte here, I just read a couple long-running threads about
attempts to get that kind of stuff fixed. Some people are very
sensitive to the qualities of that DPC latency check tool above,
and they'll toss motherboards that don't have good behavior.

Another mechanism that destroys time keeping, is actual hardware
defects. On the Nforce2 chipset, some kind of issue with the
interrupt controller, while the chipset was slightly overclocked,
caused really bad time problems. And a log of the sync info from
NTP, showed +/- errors of large magnitude (sometimes fast, sometimes
slow). So much so, that even if NTP was cranked to the wall, the
system clock was useless. Disabling APIC (one of the two flavors
of legacy interrupt controller), or returning the FSB clock
to a canonical value, would fix it (most of the time). Not every
Nforce2 system experienced that - more info on that one, can be
found on Nforcershq.com .

So maybe that'll give you some ideas to look into.

HTH,
Paul

**DevilsPGD[_4_]** · April 8th 11, 12:45 AM posted to alt.comp.periphs.mainboard.asus

In message someone claiming to be Paul
typed:

DevilsPGD wrote:
Anyone else out there with a SABERTOOTH X58 have trouble keeping the
clock accurate?

I've got two of these systems, both with i7-950 CPUs, and both seem to
be losing several minutes an hour until NTP notices and corrects it,
then the process starts over again.

Updated the BIOS (longshot, I know), all applicable drivers on my
primary system are as up to date as possible with drivers from
individual manufacturers where available. On the second system, I'm
running only ASUS reference drivers for the motherboard itself, plus
applicable drivers for add-on hardware.

The two systems have little else in common besides the CPU, motherboard,
same brand and type of RAM (12GB and 6GB) and same keyboard, all other
components are different. Both running Windows 7 Pro SP1 x64.

I find the best overview on time keeping, is provided by the virtual
machine software writers. They do a much better job, than any single
description provided by an OS designer.

http://www.vmware.com/files/pdf/Time...alMachines.pdf

Network time protocol (NTP), can do a couple things for you. At
the instant you make a request to a server (sync), you get a
single point correction to the correct time. But, in addition,
some clients will also track the nature of the offset, and can
compute a "drift" factor. Say the hardware clock source (oscillator)
that is used to pace the clock tick interrupts, is 20 ppm slow.
After enough observations have been made, NTP can "see" the steady drift.
This allows two kinds of corrections to be made. You can make a
correction each time you consult the server (say, every three days).
But, you can also "dribble out" corrections to the perceived drift factor,
at a much higher speed. Say, for example, you know you'll lose 10 seconds
within the next three days. You could take the inverse of that, and
every 7.2 hours, make an "unsynced" one second correction. When the
third day rolls around, if the time keeping error is a simple drift,
then your sync will be "bang on".

Because of that design possibility, with the right NTP client,
you can virtually eliminate initial oscillator accuracy as an issue.
And this is why I wouldn't personally waste a lot of time, belaboring
the quality of motherboard implementations. If there is a steady
drift, you could fix it.

But drift isn't entirely like that. Perhaps you're 20ppm slow, but
there is also a slight temperature dependency (real oscillators use
temperature compensation, via the temperature coefficient of some
of the components in the circuit). So if the room gets
warm, that creates a divergence from the simple model. While an NTP
client can do the "first order dribble", it would need to take
into account all other physical dependencies and curve fit them,
to do more than that. Say, for example, the computer had an oscillator
temperature sensor - temperature readings could be captured at the
point of NTP time sync, and you could then build a temperature model.
(You would then make much more frequent local temperature measurements,
to compute the expected amount of required correction, and dribble
that out too.)

Now, in addition to the various ideas in the VMWare document, on
real computers you also have the issue of SMM. SMM allows a motherboard
manufacturer to run BIOS code, while the OS is running, without
you knowing it, or having any means to detect it directly. This
can degrade the real-time responsiveness of a computer, and mainly
because the OS has no say in the matter - the SMM is not maskable.

http://en.wikipedia.org/wiki/System_Management_Mode

A possible reason for running such code at regular intervals, is
for fan speed control. Or perhaps some other control function,
such as something to do with the number of phases enabled on
the Vcore regulator or the like.

When SMM runs, the OS has no way of knowing it. If the SMM
code execution time is short, no harm is done. If the SMM
execution time exceeds the period of one clock tick interrupt,
then a clock tick can get lost. This causes the software maintained
clock to run slow. And if the SMM isn't consistently causing
a problem, then NTP can't null out all the effects.

An indirect way to monitor this, is checking "DPC service latency".
DPC is a deferred procedure call. On your computer, when an interrupt
routine services an interrupt, it saves the "heavyweight" part of
the code for later. Only the most critical code runs at interrupt level,
while code requiring longer run time, runs at user level. A DPC is
scheduled for later, to finish servicing the interrupt. The following
program, measures how long it takes for a DPC sitting in queue,
to finally get serviced. Long delays, implies *something* is going
on in the background.

http://www.thesycon.de/deu/latency_check.shtml

Now, one thing I've noticed on my system here, is there is a pretty
good sized spike, when my video card changes in or out of 3D mode.
But other than that, I don't see any signs I have an SMM problem.
My RMS latency is pretty low, as seen in the DPC latency check window.

Gigabyte released a couple of boards, that needed BIOS updates to
cure DPC latency issues. To track an issue like this, you find
people who build audio workstations, and see what boards they're having
problems with. The motherboard manufacturers never admit to what
monkey business they're up to, under the hood, so it's not
possible to say much more about SMM code, and why a BIOS update
may or may not fix it. They certainly won't admit "our SMM code
exceeded one clock tick" in their release notes. Presumably more
than Gigabyte has had this problem - I'm not trying to pick on
Gigabyte here, I just read a couple long-running threads about
attempts to get that kind of stuff fixed. Some people are very
sensitive to the qualities of that DPC latency check tool above,
and they'll toss motherboards that don't have good behavior.

Another mechanism that destroys time keeping, is actual hardware
defects. On the Nforce2 chipset, some kind of issue with the
interrupt controller, while the chipset was slightly overclocked,
caused really bad time problems. And a log of the sync info from
NTP, showed +/- errors of large magnitude (sometimes fast, sometimes
slow). So much so, that even if NTP was cranked to the wall, the
system clock was useless. Disabling APIC (one of the two flavors
of legacy interrupt controller), or returning the FSB clock
to a canonical value, would fix it (most of the time). Not every
Nforce2 system experienced that - more info on that one, can be
found on Nforcershq.com .

So maybe that'll give you some ideas to look into.

I'm aware I can bandaid around it with NTP, but 5+ minutes per hour is a
pretty serious defect, well beyond an acceptable level of clock drift
for any purpose.

I'm running a stock default BIOS configuration, except that I've enabled
AHCI. I did experiment with turning off the spread-spectrum options (a
poor implementation can apparently cause clock drift issues).

Think it's worth playing with APIC or is that likely to be specific to
Nforce2's implementation? At least from what I can tell the problem
seems to be more related to Linux's APIC implementation.

FWIW I actually don't even care about an omission, if Asus would release
a BIOS update fixing it I'd be a happy guy. My desktop is on the
latest, the other one is on the as-shipped BIOS, both seem to be having
the same issue.

**Paul** · April 8th 11, 12:41 PM posted to alt.comp.periphs.mainboard.asus

DevilsPGD wrote:

I'm aware I can bandaid around it with NTP, but 5+ minutes per hour is a
pretty serious defect, well beyond an acceptable level of clock drift
for any purpose.

I'm running a stock default BIOS configuration, except that I've enabled
AHCI. I did experiment with turning off the spread-spectrum options (a
poor implementation can apparently cause clock drift issues).

Think it's worth playing with APIC or is that likely to be specific to
Nforce2's implementation? At least from what I can tell the problem
seems to be more related to Linux's APIC implementation.

FWIW I actually don't even care about an omission, if Asus would release
a BIOS update fixing it I'd be a happy guy. My desktop is on the
latest, the other one is on the as-shipped BIOS, both seem to be having
the same issue.

Well, I'm grasping at straws here.

Time keeping, while the OS is running, is based on counting clock tick
interrupts. But alternate mechanisms exist, which do much the same thing.

All schemes, eventually trace to a motherboard oscillator.

Even with spread spectrum, the oscillator has a "mean" value, meaning if you
count the pulses over an interval longer than milliseconds, the spread
is no longer apparent. If the time keeping function was interested in
microsecond level resolution, there might be a barely visible effect.
But at the seconds level, this is all averaged out and invisible.

Say you have a 100MHz oscillator. With "center spread" type enabled,
the mean value is 100.0MHz, when measured over a seconds long interval.
With "down spread", the mean value might be 99.5MHz. Even without
calibration, I doubt down spread could account for a 5 minutes per hour
level of error. And any timekeeping scheme, should have initial calibration,
to establish whether the tick rate has any relationship to a canonical
value.

http://www.mecxtal.com/images/ssc_centerdown.gif

An NTP client, should be able to null out any "average" behavior. Say,
for example, the motherboard oscillator was off by 1%, because the
register in the oscillator chip didn't have a "near enough" value.
(Some motherboards actually cheat, and the manufacturer bumps the
clock a tiny bit, to win at benchmarking done in reviews.)
By means of calibration against the RTC, or by the usage of NTP,
any deviation from the correct average value could be handled.
By dribbling out corrections at regular intervals, you get correct
clock time to the nearest second.

It's a matter of finding out, what is either causing the interrupts
to get lost, to not be counted, for too many interrupts to show up,
and so on. Doing the DPC Lat test, is intended to show potentially
how much time any SMM routine might be stealing. But other than that,
it would be pretty hard to say anything definitive about the
interrupt arrival rate itself. I've tried in the past, to access
any performance counters that might be available on a per
interrupt basis, but I was not successful at doing that.
With the systems we used to build years ago, we had extensive
interrupt monitoring capabilities, mainly because we fouled
up interrupts so often :-) About all I can get from a modern
system, is a total count of interrupts from all sources.

If the problem would "stand still", NTP could fix it. It's situations
where the problem occurs sporadically, that prevents NTP from fixing
it. In some cases, it's the usage of a single application, that upsets
clock time. Since the clock tick interrupt, by design, has a very
high priority, normally userland shouldn't be able to do anything
like that (upset the clock tick).

On a Linux system, the kernel boot line has some options, so you
can change the source used for timekeeping. For example, I've
tried in the past, to get Linux OSes running better within VPC2007,
and my notes mention this as one of my test cases. I switched to PIT,
because the virtual environment didn't happen to have HPET. The
problem I had, was audio sound playing at the wrong sampling rate
("chipmunks" problem). I found the best solution, was to build
up a Gentoo system, eliminate PulseAudio, and the problem was
mostly eliminated. Sound still didn't work quite as well as
a Windows OS in the same virtual environment, but it was getting
a lot closer. The VGA mode selected here, is used to reduce the
time it takes the OS to shutdown at the end of a VPC2007 session.
There is no equivalent effect, on real hardware, so if I was booting
this OS native, I wouldn't need the VGA option.

GRUB_CMDLINE_LINUX_DEFAULT="vga=786 noacpi clocksource=pit"

Paul

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
SABERTOOTH X58 sometimes loses RAIDo	[email protected]	Asus Motherboards	6	March 25th 11 08:10 PM
Sabertooth X58 MemOK/F1 error on boot	QX	Asus Motherboards	0	January 22nd 11 05:36 AM
ASUS SABERTOOTH X58: how many blocks of RAM for 12 GB?	[email protected]	Asus Motherboards	5	January 20th 11 12:28 AM
ASUS Sabertooth X58 Any Comments?	Retired Sergeant	Asus Motherboards	3	September 3rd 10 03:32 AM
X58 Sabertooth	QX	Asus Motherboards	2	August 26th 10 04:36 PM