Yankee Group’s second annual Server Operating System Reliability survey polled 700 users from 27 countries worldwide. The latest independent, non-sponsored Web-based survey revealed that all versions of UNIX — which typically carry very high workloads — are near bulletproof, achieving 99.999% reliability. IBM’s AIX UNIX led all server operating systems for reliability with just over 30 minutes of per server annual downtime but Hewlett-Packard and Sun Microsystems also got high scores.
Those are really interesting numbers. It was interesting to see that Suse got almost the same reliability as Red Hat as I’ve always heard that Suse wasn’t as stable (but faster). It also looks like linux in general has really stabilized. It has much less down time compared to a year ago.
The Windows numbers were also pretty interesting. I expected (with all of the hype that Windows is just as stable as Unix) that Windows would be at least close to the *nix OSes. Well, so much for hype.
The other interesting number was for HP/UX. It had a really low down-time number but that was limited to version 11.1 which, as far as I know, does not support Itanium. It’s unfortunate that the downtime for Itanium servers (HP/UX 11.23) wasn’t mentioned. I know our 16-way Itanium servers have a lot more down-time than I would have thought (although I don’t have any exact numbers). Anybody know the reliability of Itanium servers?
I think this survey is not representative.
Why should the numbers vary THAT much, just from one year to the next. There usually is not that much operating system change to justify this large difference.
I think what we see here is statistical noise. A year has 8760 hours, and looking at values of 0 to 10 hours MUST be a noisy signal. For example, if you measure a current of 1000 A, and scale your Y-Axis from 999 to 1001, you will likely see “huge” spikes of probably 0.5 A. Then you would compare the “Microsoft” Powerline which delivers 999.5 A to the AIX Powerline which provides 1000 A.
You simply HAVE to see statistical noise. Even more so this has to be true for uptime. Make a survey with 10000 responses, then we start talking about reliability.
Another flawed piece of art by Laura Didio. This time not pro Microsoft, but pro UNIX. Probably she got invited to a nice Hotel somewhere by IBM to get a list of customers to ask.
Analysts sell out without obviously seeming to do so, that is their business model.
Surveys tend to be, err, inaccurate by design.
Numbers -can- very YonY.
Windows 2K3 might have had a -very- bad year: Microsoft possibly invested more resources on Win2K8 and Vista, taking them from the Win2K3 team; Hackers might have figured out how Win2K3 works and started exploiting it, etc.
By itself, the variation in numbers between surveys doesn’t necessarily nulls the survey.
As opposed to Laura Didio, that is…
While you might be statistically right, you’re completely off mark here.
I work in the five-9’s world. Both our software (that runs on a large number of servers – both RHEL and Win2K3) must be able to log ~6m of down-time per year. (Granted, I doubt that we are capable of achieving more then 4/9’s – but that’s something else…)
Look at it from my employer’s perspective – 50% of our software solution is using Windows 2K3; According to this survey Windows 2K3 doesn’t even come close to logging 3/9’s while RHEL logs close to 4/9’s. (~30m/y in our own experience – using a highly customized version RHEL5)
Given the basic requirement for 5/9’s and these numbers (statistical noise or not), should my employer risk his head choosing Windows 2K3? Doubt it.
I must agree.
As much as I like these numbers (and plan on using them to get my employer to port additional products to RHEL instead of Windows 2K3), Yankee group’s survey have a -very- problematic history. (TCO/Get-the-facts)
– Gilboa
Edited 2008-04-18 04:20 UTC
Indeed, what an achievement for IBM. AIX, 0.60 hours of downtime per year, is almost as good as Mandriva (and Turbolinux), 0.38 hours of downtime per year
Tom
Downtime average per year may not always be a true test of the OS.
Windows/Linux tend to run on diverse hardware. Some without hotswap drives and such so harware failures could account for the downtime. Vs. the Big Unix systems which have hot swap drives and failover systems, build in at the hardware level.
Also the Userfriendlyness plays a role too, not the actual program reialability. So if it goes down how easy is it for the expert to fix the problem.
I would also like to see some hardware statistics. How does downtime with Solaris compare on x86 vs SPARC? What were the most typical causes of downtime from the sample of operating systems?
Well the downtime encompass the fact that server was down and the time it took to bring it back up. So no user-friendliness excuse… which would be questionable anyhow – I find my system very user-friendly to me If your IT does not know how to fix his server, change IT.
Downtime average per year may or may not be a true test of the OS … but it is a true test of downtinme average per year.
If you are running a server, and you want it to be reliable, what you want to know about is … downtime average per year.
That survey is a joke.
A lot servers run FreeBSD because of its reliability and performance, the survey does not even mention a word about BSDs, for example chech netcraft: http://uptime.netcraft.com/perf/reports/performance/Hosters?tn=marc…
BSD isn’t either UNIX or UNIX-based anymore?
BSD is UNIX, there are two major trees in UNIX history, SVR4 UNIX and BSD UNIX, but what do that have to the survey? The survey even mentions Ubuntu Linux.
All BSDs (FreeBSD / NetBSD / OpenBSD) uses original 4.4BSD UNIX OS code as a base.
There are even books about BSD UNIX:
http://amazon.com/Design-Implementation-UNIX-Operating-System/dp/02…
http://amazon.com/BSD-UNIX-Toolbox-Commands-FreeBSD/dp/0470376031/
http://amazon.com/Absolute-OpenBSD-UNIX-Practical-Paranoid/dp/18864…
…
Why do people always ask questions about obvious things?
What I was TRYING to say:
Why are you complaining that BSDs weren’t mentioned???
BSD ^aS+ “all versions of UNIX”
Sorry my friend, seems that I did not get your point, propably std::misunderstanding
Edited 2008-04-16 19:46 UTC
Windows will always struggle in this regard due to a couple inherent weaknesses:
1. File-locking. Windows (last I heard) couldn’t replace a file that was in use. Most of the reboots in Windows is due to this problem.
2. Restarting services – many times, Windows servers are rebooted because it is the easiest way to “get things working”. Many Windows administrators are not aware of the means of restarting subsystemes like the network, etc. The quickest way (and safest way due to problem #1 above) is to reboot the box. It is common on Linux, etc. to do things like /etc/init.d/network restart after a major change.
Laura DiDio is the person who created this report. She is also the one who lost all credibility when She attacked Pamela Jones of Groklaw. Why *anyone* still listens to Her, submits articles from Her, or even links to pages associated with Her, is a mystery to me. I went to the sink to scrub my hands after I clicked on the TFA without realizing She was the author. The content of this article is *absolutely* worthless due to it’s source.
Please stop contributing to Her by submitting articles. Please stop giving Her any kind of publicity or recognition.
Didiot cant even draw the correct conclusions from her own report. She says:
“Additionally, there is far less disparity now, in the number and severity of unplanned server outages and the time that businesses experience on their standard Linux, Windows and UNIX platforms, than at any time in recent memory.”
From her graph at:
http://www.iaps.com/exc/yankee-group-2007-2008-server-reliability.p…
You can extract the following numbers:
1. Between 2006 and 2007 Windows 2003 downtime increased from 7.09 to 8.9 hours downtime.
2. All Linux server OS downtimes decreased drastically over the same period.
3. RH downtime decreased from 7.14 to 1.73 hours downtime.
4. SUSE downtime decreased from 4.06 to 1.08 hours downtime.
5. Ubuntu new in the survey came in at 1.10 hours downtime.
6. Windows 2000 server downtime also increased over the same period to 9.86 hours achieving the distinction of being most unreliable server OS in 2007 and beating Windows 2003 server into second place.
Real conclusion: Windows servers now really stink compared to all Linux server distributions for reliability and they are getting worse.
Edited 2008-04-17 03:37 UTC
I think that is why my Slackware server never goes down!
Awesome…
-2501
This survey is very weird. There is provided no details on how it was done. How big is the uncertainty on these numbers? There are obvious omissions in the chosen ‘unices’. There are some of the titles of the bars just don’t make any sense. The descriptions are really weird like:
‘OpenSource Linux’ (given the GPL I thought all Linux was open source)
‘Linux from Suse’
‘Linux with Suse with Customizations’ (can I please have some Linux with a bit of Suse and some Customizations on the side, please)
And it gets even more vague with ‘Other Linux’ and ‘Other Linux with Customizations’. This is meaningless. It conveys absolutely no information.
There’s a lot of snobbish distain of Ubuntu in the systems world, but, if this report is to be trusted, then ubuntu comes out pretty well. Un-Customized*, the only named OS/Distros that beats it are: AIX, and SUSE.
Given that Ubuntu is known for being used by less experienced users, these figures are really quite reassuring.
* It’s not clear what ‘Customized’ means here. If they’re talking about using custom kernels in production, then I’m assuming that ‘they’ know what they’re doing enough to have stable systems.
I would hazard to guess that a major factor in the stability statistics is the quality of the administrators. Windows admins are a dime dozen, many of them poorly trained on “click, click, DONE” GUI interfaces, and often have no idea how the OS functions. On the other hand, UNIX admins, especially the ones for a high-end proprietary OS like AIX, are more scarce, and tend to be highly trained and very experienced.
Edited 2008-04-17 04:42 UTC