Those Win9x Crashes on Fast Machines…

Thom Holwerda 2020-06-03 Windows 24 Comments

It is well known that Win9x variants prior to Windows 98 have a tendency to crash on fast CPUs. The definition of “fast” is of course fuzzy but the problems were known to occur on AMD K6-2 processors running at 350 MHz or faster as early as 1998. This led to some acrimony when Microsoft attempted to charge $35 for the fix. The crashes were intermittent on the 350 MHz parts but harder to avoid with faster clock speeds.
The problem soon started affecting other CPUs with higher frequencies, but it didn’t affect Intel processors for a while. Were Intel CPUs somehow better? Not exactly, but there was a reason for that; more about it later.
I have been long aware of this problem but never looked into the details. And when I did, at first I didn’t realize it. An acquaintance mentioned that Windows 3.11 for Workgroups no longer works in a VM.

A good and interesting read.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

24 Comments

2020-06-03 9:52 pm

igoradsilva
The link to the article at os2museum.com appears to be broken. When I click on it, I get error 403.

2020-06-04 12:57 am

Paulhekje
link works for me
2020-06-04 1:43 am

bert64
Looks like the source address you’re connecting from is blacklisted…
2020-06-04 1:59 am

Alfman verbose=1
igoradsilva,

The link to the article at os2museum.com appears to be broken. When I click on it, I get error 403.

Now that you mention it, it didn’t work the first time I opened the link either, but it works for me now. Is it still broken for you?

I vaguely remember os2museum.com having issues in the past with cloudflare, but it was a long time ago.

2020-06-04 9:54 am

igoradsilva
Still doesn’t work from my usual browsers, but I managed to access the article using Tor Browser. It really seems to be some issue about the origin of the request. Thanks!

2020-06-04 1:15 am

Drumhellar
What a fascinating article. There’s some good ones on that site, but I’m definitely digging through to find other gems that I’ve missed.
2020-06-04 3:00 am

cpcf
Here is the hidden value of gems like this article. I’ve two pieces of very high value vintage scientific equipment that suffer this clock problem. Previously I’ve solved it by performing or ordering physical downgrades of processor and bus hardware.

I knew the clock speed was the culprit, I did not how but we guessed the why, a divide by zero, which is a common bug when you put old custom hardware that differences the time between events into a newer PC. We found we could slow the Clock/Bus and resolve the problem in most cases. I never knew this patch existed, I didn’t need it because the devices were retired from daily use long before the patch arrived. They are just kept functional for referencing old standards.

So you know what is on my list of ToDos now, those gadgets will be running under Win 95 on newer better supported PC hardware before you know it!
2020-06-04 9:39 am

blaer
Nice read! Scrolling through the comments on that article there’s some discussion about the recently ‘opensourced’ GW-BASIC sources. Then about the timing needed for reading cassette tapes in GW-BASIC. Then someone points out that the cassette routine in the GW-BASIC code is stubbed out with a link to that file on github. There i see it is indeed stubbed out. But i see that that code is written by one”Len Oorthuys.” That sounds like a Dutch name – I think – so i google that name. And then i stumble on this:

https://www.mobilize.net/blog/bid/346674/History-of-BASIC-King-James-Edition

Very fun read!
2020-06-04 10:43 am

Yamin
I thoroughly enjoyed the read and understand it.

One question I have is why is this kind of code needed? What problem is it actually trying to solve?

I’m going to make the assumption that Microsoft engineers in the 90s weren’t idiots, so there was probably a reasonable-ish reason to do this. Also as to the question of why weren’t additional checks in place to make sure the time changed. Was this code called in a tight loop which had to be optimized for speed, so those additional checks would not be good?

For example old DOS games which run too fast made somewhat sense as a design decision at the time. Especially since they are just games and every processor tick counted. I ‘understand’ the old DOS game problem is I can picture the typical game programming loop they were trying to implement.

I have little to no context on this particular code.

2020-06-04 11:22 am

XtoF
I’m not 100% sure, but I would imagine that they wanted to be able to wait for a few us when they were sending low level commands to the network card or the disk controller.
If the sleeping time was very short, there may have been no other way to achieve it other than “counting the cycles”.

It would be really interesting to know how Windows NT did that, as it was more robust and unaffected by this problem.

2020-06-04 12:51 pm

Alfman verbose=1
XtoF,

I’m not 100% sure, but I would imagine that they wanted to be able to wait for a few us when they were sending low level commands to the network card or the disk controller.
If the sleeping time was very short, there may have been no other way to achieve it other than “counting the cycles”.

Yes, I don’t know much about what windows did specifically, but it was a common practice to implement delays. A DOS PC had standard clock frequency of ~18.2HZ
https://en.wikipedia.org/wiki/Intel_8253

All PC compatibles operate the PIT at a clock rate of 105/88 = 1.19318 MHz, 1/3 the NTSC colorburst frequency. This frequency, divided by 216 (the largest divisor the 8253 is capable of) produces the ≈18.2 Hz timer interrupt used in MS-DOS and related operating systems.

Messing with the clock rate was technically possible, but it was often used by TSRs/drivers/clock/etc and it was usually assumed to be at 18.2HZ. This was used to calibrate a simple loop that could then be used for arbitrary delays independent of the hardware timer. I’m pretty sure I’ve done this as well in my own DOS programs.

It would be really interesting to know how Windows NT did that, as it was more robust and unaffected by this problem.

If I were to guess, maybe they used larger integers or floating point. It’s also quite possible they had a faster timer in NT, which would have decreased the number of cycles needed. I know some code used port IO instructions to add hardware delays since port IO operates proportional to standard bus speeds rather than CPU speeds. These days that might be more reliable since CPU speed can change dynamically. But obviously now that we have high precision clocks all these methods are redundant/obsolete.

2020-06-04 12:50 pm

krebizfan
Accessing devices like the floppy drive requires fairly precise timing to handle delays optimally. Since the LOOP instruction on Intel takes multiple cycles, the duration of a million LOOPs can give a very accurate baseline of the speed of the CPU which can be used adjust all the device timing loops.

The consensus on why Intel’s LOOP instruction was so slow was that Intel tried to handle any faults that might occur during the LOOP by storing the value of ECX so that after the fault is resolved, the LOOP can continue right where it left off.

2020-06-04 1:03 pm

Alfman verbose=1
krebizfan,

The consensus on why Intel’s LOOP instruction was so slow was that Intel tried to handle any faults that might occur during the LOOP by storing the value of ECX so that after the fault is resolved, the LOOP can continue right where it left off.

I could be wrong about this, but I thought that around the 486/pentium period the loop instruction was implemented as microcode that couldn’t execute as efficiently as the more fundamental cmp/jnz instructions did. The pipelines weren’t optimized for it.

2020-06-04 6:42 pm

krebizfan
LOOP on the 386 is quite slow as well so it isn’t a pipeline issue. Intel was doing something in the microcode beyond what a simple loop would require.

2020-06-04 8:05 pm

Alfman verbose=1
krebizfan

LOOP on the 386 is quite slow as well so it isn’t a pipeline issue. Intel was doing something in the microcode beyond what a simple loop would require.

I looked up the instruction clocks…
http://aturing.umcs.maine.edu/~meadow/courses/cos335/80×86-Integer-Instruction-Set-Clocks.pdf

DEC Decrement
286 = 2
386 = 2
486 = 1
P1 = 1

Jcc Jump on condition code
286 = 3/7+m
386 = 3/7+m
486 = 1/3
P1 = 1

LOOP Loop control with CX counter
286 = 4/8+m
386 = 11+m
486 = 6/7
P1 = 5/6

So if I calculated right a “dec CX, JNZ” sequence should yield:

286 = 5/9+m (1 clock slower than loop)
386 = 5/9+m (2-6 clocks faster than loop)
486 = 2/4 (3-4 clocks faster than loop)
P1 = 2 (3-4 clocks faster than loop)

2020-06-04 6:24 pm

Joshua Clayton
Bits like this are why, despite the ill-advised political rants, OSAlert is still a frequent stop for me. Thanks, Thom.
2020-06-04 8:15 pm

cpcf
It’s hard to judge the why and wherefore back then, there could be any number of reasons for optimising routines that would now seem pointless. For example not necessarily related, you might find some powerful IBM or Intel client wanted a trading desktop that defeated opponents to the punch and a few milliseconds could make all the difference.

Otherwise, it could just be some computer scientist or intern somewhere concentrating on a specific issue and oblivious or indifferent to the wider issues. It happens all the time, it is not about being good or bad at something, it’s just human nature. We put bigger better wheels or tyres on our car, and unknowingly accelerate the demise of a differential or gearbox!

Actually this can work both ways, in cases where we’ve needed to slow things down for old hardware to persist, we’ve actually altered routines that were historically optimised with integer math to become all floating point, then we discard the tail. To slow down the modern OS and hardware so that some older hardware and software can continue to function. The beauty of OOP hard at work, the client is ignorant. It’s a convenient hack solution when old software is running on new hardware and thinks COMMs have timed out, too many clock cycles since a response. You can arbitrarily make the calculation precision as high as you like to slow the system response.

2020-06-04 9:39 pm

Alfman verbose=1
cpcf,

Actually this can work both ways, in cases where we’ve needed to slow things down for old hardware to persist, we’ve actually altered routines that were historically optimised with integer math to become all floating point, then we discard the tail. To slow down the modern OS and hardware so that some older hardware and software can continue to function. The beauty of OOP hard at work, the client is ignorant. It’s a convenient hack solution when old software is running on new hardware and thinks COMMs have timed out, too many clock cycles since a response. You can arbitrarily make the calculation precision as high as you like to slow the system response.

I’d be curious to see exactly what you had in mind, but I would think this would be a highly ineffective way to control the speed of old code. Floating point operations on modern computers are faster than integer operations on old computers. If an IO routine needs a specific delay, IMHO it would make far more sense to fix the code to produce said delay explicitly rather than playing around with data types to try and find an exotic way to make the code slower.

On the other hand, things that work by coincidence and break randomly on newer hardware can be good for job security

2020-06-05 1:20 am

cpcf
Agreed Alfman, it is highly ineffective.

But I’m not talking about general purpose machines. These are dedicated hardware control systems that just have to do one thing and do it with high reliability. A lot of the stuff I work on will have circular buffers and must capture and deliver data continuously at a controlled rate for a specific period of time or blob size. It makes no difference if the mouse cursor jumps across the screen while that is happening.

2020-06-05 4:56 am

Z_God
> I have been long aware of this problem but never looked into the details. And when I did, at first I didn’t realize it. An acquaintance mentioned that Windows 3.11 for Workgroups no longer works in a VM.

This is not correct. You can virtualize it but you can’t use its own DPMI server. DOSEMU2, HX, DPMIONE and QDPMI all run Windows 3.1 and you can run it virtualized with for instance KVM that way.

The same is possible for win9x and running Merge/Win4Lin (which seem to replace the same part) inside KVM should provide a stable (as far as win9x goes) virtualized win9x environment.
2020-06-05 9:18 am

kurkosdr
Makes you appreciate how much effort went into making sure the current “big 5” kernels (Windows NT, Linux, FreeBSD, SVR4/Solaris, QNX) co-ordinate the operation of all that hardware so smoothly, while handling security, multithreading, all kinds of interrupts, VM virtualization quirks, variable “turbo” clocks and dynamic frequency scaling on laptops.
2020-06-05 5:49 pm

JeffR
The specs of my Windows98 machine are lost to the mists of time, (even with Linux on it, it didn’t have sentimental or historical value to me, unlike the Amiga 1200 I still have) but this may very well be the reason why it used to crash all the time. Not the first time I’ve thanked Microsoft for annoying me so much I jumped ship to Linux. (Not at work, sadly, but when the work PC breaks, fixing it is someone else’s problem.)

EDIT: – Ah, *prior* to Windows98. Interesting.

2020-06-05 8:24 pm

kurkosdr
In all fairness, show me a 90s operating system (designed to run on systems with RAM measured in the tens of megabytes) that wasn’t unstable and temperamental. Even MacOS has its fair share of sad Macs (on a vertically integrated market nonetheless!). And so did Amiga and OS/2 (SIQ anyone?). Nobody used BeOS so I won’t consider it because practically nobody has experiences of it.

2020-06-06 7:13 pm

JeffR
The Amiga did because, for all its fancy microkernel goodness and multitasking, it didn’t have protected and virtual memory; thus, it was as vulnerable as DOS or System 7 to misbehaving programs or running out of memory. Can’t speak for OS/2 or BeOS, as I’ve never really run the former and only ran the latter for a short time (I’ve since run Haiku, the BeOS clone, off and on on virtual machines).

But at least one 90s era operating system was as stable as a rock: Linux. Sure, it had its problems, like lack of software and device driver support, and to a much smaller degree they still exist today. But generally speaking, a device driver either works well in Linux, or not at all. Except for a brief period after Hans Reiser killed his wife and I was still using his file system, I’ve only ever seen maybe one kernel panic in Linux that wasn’t caused by passing the wrong boot parameters to GRUB, i.e. user error. That’s why I’ve stuck with it and only put up with Windows because (a) I’m paid to and (b) when it breaks, it’s someone else’s problem. My Dad said that he’d never experienced any of the myriad problems Vista was derided for, yet within half an hour of using it, I’d already crashed it *without trying*, and since getting a work laptop with Win10 on it in December I’ve already seen one BSOD. Maybe more if I hadn’t been on furlough since late April.

Given that it’s basically a reimplementation of VMS, an OS with uptimes measured in years, it wouldn’t surprise me if NT was just as stable before they put all the Win9x compatibility crap in it to create XP and the like; then again, it’s from Microsoft, so it wouldn’t surprise me if it wasn’t. (Commodore bought the Amiga in what Amiga Format later referred to as “a rare fit of insight;” early Commodore versions of the Amiga operating system included the Easter egg “We built Amiga; Commodore f*cked it up,” without the euphemism: it wouldn’t surprise me if Dave Cutler and his formerly VMS people did, or wanted to do, the same for NT.)