Would you recommend a server without ECC Memory?

trevm999

Well-Known Member
Reaction score
907
Location
Canada
I've just came across a disconnect in my thinking. I wouldn't recommend a business to go for a server without ECC memory, but I have deployed NAS (which were linux servers) that didn't have ECC RAM (didn't even think about it at the time)

Do you deploy any solutions for clients that don't use ECC memory? For those clients/solutions, are the risks so small vs the costs that it's not worth thinking about?
 
For me, it would depend how mission-critical the application was. Generally though, I would never recommend a server without ECC memory since there's almost always going to be something on there that you really don't want to get corrupted, like SQL or AD. NAS units, on the other hand, I rarely use in mission-critical applications, so it wouldn't really matter what type of memory they were using, as long as they were reliable enough. Most NAS units I install serve purely as backup storage and quite often they're a secondary (or even tertiary) backup location. The users wouldn't even notice if one of them crashed and rebooted.
 
I would always recommend it for Servers and for primary backup storage devices. Not only does it help with stability of the Server but bad memory could cause in-flight data corruption as the device receives the data to write to disk. So for example if the device is writing and storing your backups to disk, and If your backup software does not periodically verify/re-verify all backup files, you may never know something is messed up until its too late. I dont like taking the risk on something like that.
 
Another reason to start looking cloud-ward. While Synology looks like a cheap option for SMB, and they are adding the ability for it to be a domain controller soon, but looks like Synology doesn't go with ECC memory until their enterprise devices.
 
Last edited:
What about routers? Surely most of the time any ram errors end up as dropped packets which get resent? But other times couldn't it be the cause of the dreaded power-cycle? With more things moving to the cloud, and the internet being crucial for business operations, should we be insisting on using devices with ECC RAM? Or is the risk too low for most SMB? I've dealt with one cloud application vendor before that insisted we had a router with ECC memory in order to receive a certain level of support from them.

@NETWizz
 
What about routers? Surely most of the time any ram errors end up as dropped packets which get resent? But other times couldn't it be the cause of the dreaded power-cycle? With more things moving to the cloud, and the internet being crucial for business operations, should we be insisting on using devices with ECC RAM? Or is the risk too low for most SMB? I've dealt with one cloud application vendor before that insisted we had a router with ECC memory in order to receive a certain level of support from them.

@NETWizz

Great questions...

A: Routers have ECC memory, too because they are very much critical.

That said bad things DO happen, and Routers (or Switches) with RAM errors don't merely just drop packets. Usually they completely cease all network communications write a crashdump to something like Flash:crashinfo_20170120-000142. The phone blows up off the hook unless there is something like HSRP or VRRP to have another hot-standby Router take-over.

I actually had a console cable on Cisco router with bad memory and this was what it did:

memorypool type is I/O
data check, ptr = 0x07A00030

next memory block, bp = 0x07A00050,
memorypool type is I/O
data check, ptr = 0x07A00080
bp_prev(0x00000000) not in any mempool
========= Dump bp = 0x07A00000 ======================

7A00000: DEADBEEF 0 0 62D17004 0 7A00050 649F40D8 10
7A00020: 0 0 0 0 DEADBEEF 0 0 0
7A00040: 649F4130 649F412C 0 0 AB1234CD FFFE0000 0 6325B53C
7A00060: 60488A8C 7A00190 7A00014 80000088 1 0 1 64FFD5D0
7A00080: AFACEFAD 7A001C4 1 1 4B6 63403214 0 7A000B8
7A000A0: 7A000BC FFFF 0 30000 65095164 0 302E3100 0
7A000C0: 0 0 0 0 0 0 0 0
7A000E0: 0 0 0 0 0 0 0 0
7A00100: 0 0 0 0 0 0 0 0
7A00120: 0 0 0 0 0 0 0 0
7A00140: 0 0 0 0 0 0 0 0
7A00160: 0 0 0 0 0 0 0 0
7A00180: 0 0 0 FD0110DF AB1234CD FFFE0000 0 6325B53C
7A001A0: 60488A8C 7A002D0 7A00064 80000088 1 0 1 64FFD5D0
7A001C0: AFACEFAD 0 2 2 11B 62E128E8 75CD81 7A001F8
7A001E0: 7A001FC 1FFFF 0 0 650954E8 0 312E3000 0
========= Dump bp->next = 0x07A00050 ======================

7A00000: DEADBEEF 0 0 62D17004 0 7A00050 649F40D8 10
7A00020: 0 0 0 0 DEADBEEF 0 0 0
7A00040: 649F4130 649F412C 0 0 AB1234CD FFFE0000 0 6325B53C
7A00060: 60488A8C 7A00190 7A00014 80000088 1 0 1 64FFD5D0
7A00080: AFACEFAD 7A001C4 1 1 4B6 63403214 0 7A000B8
7A000A0: 7A000BC FFFF 0 30000 65095164 0 302E3100 0
7A000C0: 0 0 0 0 0 0 0 0
7A000E0: 0 0 0 0 0 0 0 0
7A00100: 0 0 0 0 0 0 0 0
7A00120: 0 0 0 0 0 0 0 0
7A00140: 0 0 0 0 0 0 0 0
7A00160: 0 0 0 0 0 0 0 0
7A00180: 0 0 0 FD0110DF AB1234CD FFFE0000 0 6325B53C
7A001A0: 60488A8C 7A002D0 7A00064 80000088 1 0 1 64FFD5D0
7A001C0: AFACEFAD 0 2 2 11B 62E128E8 75CD81 7A001F8
7A001E0: 7A001FC 1FFFF 0 0 650954E8 0 312E3000 0
========== Dump bp->previous = 0x649F40D8 =====================

649F3FD8: 0 0 649F4018 64BD5BB8 649F3FD8 64B856A8 2 50000
649F3FF8: 0 0 62FBB6B0 623AA224 648CBF74 0 0 0
649F4018: 649F3058 649F3FE0 649F4010 64B856A8 3 50000 0 0
649F4038: 62FBB6B0 623AA224 648CBF74 0 0 0 0 659CF890
649F4058: 18 1 65C513AC 65C514CC 65C515EC 0 62FBB6C4 2
649F4078: 649F52B8 649F52B8 64CF4A68 649F5040 649F4078 64B856A8 7 10000
649F4098: 1 0 62FBB6C4 623AA224 648CBF74 0 E1660 1D0000
649F40B8: E1660 5FFFD0 600000 20 0 0 0 0
649F40D8: 7A00000 0 1 18000 8000 0 0 0
649F40F8: 0 0 0 0 0 649F42F8 649F449C 10
649F4118: E 0 E 0 33 7A00030 0 0
649F4138: 0 0 0 7A00040 0 0 0 34
649F4158: F 34 34 59 649F416C 0 0 0
649F4178: 0 0 649F4168 649F4118 0 0 5A 35
649F4198: 5A 5A 7F 649F41A8 0 0 0 0
649F41B8: 0 649F41A4 649F4154 649F41CC 0 80 5B 80
============================================


%Software-forced reload


00:01:42 UTC Fri Mar 1 2002: Breakpoint exception, CPU signal 5, PC = 0x60A745C0



--------------------------------------------------------------------
Possible software fault. Upon reccurence, please collect
crashinfo, "show tech" and contact Cisco Technical Support.
--------------------------------------------------------------------


-Traceback= 0x60A745C0 0x60A72B90 0x6001D5A8 0x6001D774 0x6001E50C 0x6001FF3C 0x623A1A88 0x623A1A6C
$0 : 00000000, AT : 649D0000, v0 : 00000000, v1 : 00000000
a0 : 00000000, a1 : 0000FF00, a2 : 00000000, a3 : 637B0000
t0 : 64ADDAC8, t1 : 64CF0000, t2 : 60A79AB0, t3 : FFFF00FF
t4 : 60A79AB0, t5 : 00000060, t6 : 3400FF01, t7 : 3400FF00
s0 : 62D17218, s1 : 00000000, s2 : 646C0000, s3 : 64630000
s4 : 640F0000, s5 : 7FFF0000, s6 : AB1234CD, s7 : AB1234AB
t8 : 64CF0000, t9 : 00000043, k0 : BFC003E0, k1 : 0000FF00
gp : 649DE5E0, sp : 6508D928, s8 : 00000008, ra : 60A72B90
EPC : 60A745C0, ErrorEPC : 00000000, SREG : 3400FF03
MDLO : 00000000, MDHI : 00000007, BadVaddr : 00000000
CacheErr : 00000000, DErrAddr0 : 00000000, DErrAddr1 : 00000000
DATA_START : 0x62D16000
Cause 00000024 (Code 0x9): Breakpoint exception

File flash:crashinfo_20020301-000142 Device Error :No device available
File flash:crashinfo_20020301-000142 Device Error :No device available
File slot0:crashinfo_20020301-000142 Device Error :No device available
File flash:crashinfo_20020301-000142 Device Error :No device available
File slot0:crashinfo_20020301-000142 Device Error :No device available

00:01:42 UTC Fri Mar 1 2002: Breakpoint exception, CPU signal 5, PC = 0x60A745C0



--------------------------------------------------------------------
Possible software fault. Upon reccurence, please collect
crashinfo, "show tech" and contact Cisco Technical Support.
--------------------------------------------------------------------


-Traceback= 0x60A745C0 0x60A72B90 0x6001D5A8 0x6001D774 0x6001E50C 0x6001FF3C 0x623A1A88 0x623A1A6C
$0 : 00000000, AT : 649D0000, v0 : 00000000, v1 : 00000000
a0 : 00000000, a1 : 0000FF00, a2 : 00000000, a3 : 637B0000
t0 : 64ADDAC8, t1 : 64CF0000, t2 : 60A79AB0, t3 : FFFF00FF
t4 : 60A79AB0, t5 : 00000060, t6 : 3400FF01, t7 : 3400FF00
s0 : 62D17218, s1 : 00000000, s2 : 646C0000, s3 : 64630000
s4 : 640F0000, s5 : 7FFF0000, s6 : AB1234CD, s7 : AB1234AB
t8 : 64CF0000, t9 : 00000043, k0 : BFC003E0, k1 : 0000FF00
gp : 649DE5E0, sp : 6508D928, s8 : 00000008, ra : 60A72B90
EPC : 60A745C0, ErrorEPC : 00000000, SREG : 3400FF03
MDLO : 00000000, MDHI : 00000007, BadVaddr : 00000000
CacheErr : 00000000, DErrAddr0 : 00000000, DErrAddr1 : 00000000
DATA_START : 0x62D16000
Cause 00000024 (Code 0x9): Breakpoint exception

-Traceback= 0x60A745C0 0x60A72B90 0x6001D5A8 0x6001D774 0x6001E50C 0x6001FF3C 0x623A1A88 0x623A1A6
C


Cisco said it was most likely bad memory, which on that router was replaceable. I had tons of the same router, and as soon as I replaced the memory, it booted... Instead of getting that, it booted... something like this:

Launching IOS image at 0x80008000...

Smart Init is disabled. IOMEM set to: 5

Using iomem percentage: 5

Restricted Rights Legend

Use, duplication, or disclosure by the Government is
subject to restrictions as set forth in subparagraph
(c) of the Commercial Computer Software - Restricted
Rights clause at FAR sec. 52.227-19 and subparagraph
(c) (1) (ii) of the Rights in Technical Data and Computer
Software clause at DFARS sec. 252.227-7013.

cisco Systems, Inc.
170 West Tasman Drive
San Jose, California 95134-1706



Cisco IOS Software, 7200 Software (C7206-ADVENTERPRISEK9-M), Version 15.3(41d), RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2016 by Cisco Systems, Inc.
Compiled Wed 18-Aug-16 07:55 by prod_rel_team

This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.
 
It seems like the cheapest way to get a high quality solution is to roll-your-own with a refurbished server. However, I would still rather a client have a NAS without ECC RAM than file sharing off a workstation.
 
I'd install a server without ECC about as much as I'd install a SATA (desktop drive) based server, or a software onboard fake-RAID based server. Ain't gonna happen!

Servers should be considered the most important component on a network. Maximum uptime and stability. I often see crappy servers running a network, and people work around limitations from those wanna-be-servers. I'll often see some clue like "secondary DNS is the ISPs DNS servers"..and I ask why is that there, and the other tech says "Cuz if the server is down clients can surf the internet!"

... I just shake my head in disbelief...."THE SERVER ISN"T SUPPOSED TO BE DOWN!"
..and if the server caught fire and melted into a puddle, how hard is it to log into the router and quickly flip on DHCP on the very rare chance that happens?"

Anyways, I never looked at the RAM under a NAS to check if it was ECC...it's just running a lean *nix distro doing samba shares, not running databases or heavy stuff.
 
I'd install a server without ECC about as much as I'd install a SATA (desktop drive) based server, or a software onboard fake-RAID based server. Ain't gonna happen!

Servers should be considered the most important component on a network. Maximum uptime and stability. I often see crappy servers running a network, and people work around limitations from those wanna-be-servers. I'll often see some clue like "secondary DNS is the ISPs DNS servers"..and I ask why is that there, and the other tech says "Cuz if the server is down clients can surf the internet!"

... I just shake my head in disbelief...."THE SERVER ISN"T SUPPOSED TO BE DOWN!"
..and if the server caught fire and melted into a puddle, how hard is it to log into the router and quickly flip on DHCP on the very rare chance that happens?"

Anyways, I never looked at the RAM under a NAS to check if it was ECC...it's just running a lean *nix distro doing samba shares, not running databases or heavy stuff.

Unless you're saying a NAS isn't a server, I feel like your post contradicts itself. In many NAS you put in SATA drives, and if you've never checked if the specs for a NAS had ECC, than I'm willing to be you've installed some without ECC RAM.

I consider a NAS a server, so what I'm getting from your post is that the requirement for ECC RAM for a server depends on the job the server going to be put in production to do.
 
Unless you're saying a NAS isn't a server,.

I don't see any contradiction, as I don't consider a NAS a full "server". Not running Windows server, not running Windows applications or databases. Not running active directory. It's just running a leaned out linux distro doing samba. In simple terms.."not complicated stuff". Ohh...a file was copied! Whew///that was tough! Seriously, can take a raspberry pi and put up a *nix samba file server. CPU and RAM usage....virtually nothing..doesn't break a sweat...cuz it's not doing much!

ECC is for stability and servers doing massive calculations, databases like SQL, Exchange servers (just a database actually), servers that bust their balls to the wall full bore all day, fire up task manager and those graphs are up high. Here's where you want your error checking and redundancy, you want solid reliable hardware for that.
 
I don't see any contradiction, as I don't consider a NAS a full "server". Not running Windows server, not running Windows applications or databases. Not running active directory. It's just running a leaned out linux distro doing samba. In simple terms.."not complicated stuff". Ohh...a file was copied! Whew///that was tough! Seriously, can take a raspberry pi and put up a *nix samba file server. CPU and RAM usage....virtually nothing..doesn't break a sweat...cuz it's not doing much!

ECC is for stability and servers doing massive calculations, databases like SQL, Exchange servers (just a database actually), servers that bust their balls to the wall full bore all day, fire up task manager and those graphs are up high. Here's where you want your error checking and redundancy, you want solid reliable hardware for that.

I think this a good place to draw the line. The problem is that NAS vendors are not drawing the line there. You can install all kinds of databases on small business Synology and QNAP NAS. A QNAP NAS can be a domain controller, and DSM 6.1 is bringing that functionality to Synology units too. People even run DCs on Raspberry Pi. I know some of you won't put a DC on linux, but imo Samba 4 is very stable.

I think we should just remember to impart this wisdom down. I thought a Synology or QNAP NAS with Active Directory was a great idea for small businesses (since I'm a fan of having a domain whenever possible), until I started thinking about ECC memory.
 
Back
Top