Strategy: Terminate SSL Connections in Hardware and Reduce Server Count by 40%

Thursday

Aug122010

Strategy: Terminate SSL Connections in Hardware and Reduce Server Count by 40%

Thursday, August 12, 2010 at 9:01AM

This is an interesting tidbit from near the end of the Packet Pushers podcast Show 15 – Saving the Web With Dinky Putt Putt Firewalls. The conversation was about how SSL connections need to terminate before they can be processed by a WAF (Web Application Firewall), which inspects HTTP for security problems like SQL injection and cross-site scripting exploits. Much was made that if programmers did their job better these appliances wouldn't be necessary, but I digress.

To terminate SSL most shops run SSL connections into Intel based Linux boxes running Apache. This setup is convenient for developers, but it's not optimized for SSL, so it's slow and costly. Much of the capacity of these servers are unnecessarily consumed processing SSL.

Load balancers on the other hand have crypto cards that terminate SSL very efficiently in hardware. Efficiently enough that if you are willing to get rid of the general purpose Linux boxes and use your big iron load balancers, your server count can be decreased by 40%. Client performance will also be greatly increased because SSL excelerators are faster at SSL than generic boxes.

Developers don't like this option because they don't trust load balancers. These devices are out of their control and are difficult to debug, provision, and test. But if you already have or are considering load balancing appliances and you can work out the trust issues, a whole lot of CPU can easily be reclaimed.

HighScalability Team |

23 Comments |

Permalink |

Print Article

Email Article

Strategy

Reader Comments (23)

So-called "hardware" load-balancers versus dedicated servers terminating SSL (i.e. running nginx) in front of the real web servers.

What gives the best TCO? Consider licensing, hardware cost, product life cycle, troubleshooting costs.

August 12, 2010 |

Federico

SSL session resumes are also worth considering. They're substantially cheaper than renegotiating from scratch, since it bypasses the expensive public key part of the handshake.

In general, if you pass the SSL through the LB back to the server pool, with naive round robin loadbalancing you'll end up only being able to resume 1/N sessions. If you terminate on the LB you'll be able to resume them all, assuming the session cache on the LB is large enough.

It's also worth noting that some LBs can pick a backend based on SSL session id, so resumes are handled efficiently (the one I'm familiar with is Netscaler, but I assume the rest of the industry cannot possibly suck any worse).

August 12, 2010 |

laz

An additional vanilla box running nginx will work in a cloud architecture where you can't use custom hardware and cost FAR, FAR less. See write up here:

http://www.o3magazine.com/4/a/0/2.html

August 12, 2010 |

Mike Perham

Worth mentioning that terminating at the LB isn't always an option... some interpretations of the PCI card payment rules require the transaction "be encrypted from the browser to the card processing server" i.e. end to end.

Terminating at the LB means that theoretically the card details could be sniffed on the LAN inside the datacentre.

August 12, 2010 |

Steve Thair

I call BS. The 40% less servers sounds like vendor marketing. Show me the real world data or it didn't happen.

(Yeah, I know you can't possibly publish this comment but that would sort of just prove my point.)

August 12, 2010 |

doidh

It's a bit unfair to conclude that "software terminating SSL" sucks, because Apache-ssl sucks? Please do some real-world testing with more "decent" ssl-software load balancers. Measure performance, then take a 10k "hardware" loadbalancer and measure that performance.

August 12, 2010 |

Frank

Can you purchase a good crypto-optimized hardware firewall without engaging with a VAR? If you are a small business with a clue, the known inefficiency of cheap Intel hardware is much easier to swallow than the unknown and exciting inefficiencies and heartburn that come from dealing with a vendor.

-danny

August 12, 2010 |

Daniel Howard

One more thing to keep in mind is that some CAs charge by how many devices you intend to put the certificate on. Putting the cert on two fat load balancers will cost you half as much as putting it on four servers.

There's also the security aspect. If your SSL offloading device only does that and very few people have access to that then there's less chance of your cert being stolen than if you put it on every server that all the test people may have access to. Sometimes losing the cert is worse than just losing the data. The hacker will be able to impersonate you until the cert expires because nobody actually checks CRLs.

August 12, 2010 |

Henry

I'd expect everyone to know that there are no absolutes, and that any solution depends on your operating environment.

If you have 10 web servers and 1 IP address, then throwing nginx in front of them is probably totally fine. If you have 10k web servers split across 4 continents with a pile of LBs and a gigantic service contract to go with them, then the value proposition may change.

@Steve: good call on PCI requirements. I've seen SSL terminated on LBs, and then re-encrypting the ccard fields in the HTTP POST for transport inside the LAN.

@danny: for sure. Dealing with LB vendors is a massive PITA even when you're a big customer... so as a small business, it has to suck far worse.

August 12, 2010 |

laz

Hi,

I run some websites with a large number of servers that uses ssl negotiated by squid and / or nginx. I saw the PP podcast as well, and am interested in the possibility of doing this sort of thing. It is worth mentioning that squid supports using ssl accelerator cards. Although it's easy for PP to just up and say 40% fewer servers, what's the real deal here? One of my apps pushes a lot of bandwidth, the other is very cpu intensive.

How would a dedicated SSL appliance or SSL accelerator card fare with 1gbit of SSL to decode, and how much would it cost? How would a dedicated SSL appliance fare with 2000 requests / second of SSL at around 200 megabit, and how much would that cost? Is there any evidence that a single linux box running nginx to terminate the SSL can't handle this on a modern cpu?

August 12, 2010 |

Gabriel Ramuglia

http://www.imperialviolet.org/2010/06/25/overclocking-ssl.html

August 12, 2010 |

Adam

@Steve & @laz
I think it would make more sense in general, and what I've seen more often, to use a cryptographic accelerator card on the server in the case of meeting PCI (and other similarly stringent) requirements. Combine that approach with session affinity on your hardware or software LB and you get a pretty stable, reasonably efficient SSL termination.

August 12, 2010 |

Scott

SSL accelerators are essentially "software on a stick". Preintegrated boxes with a modestly customized Linux (or if you're lucky, FreeBSD) stack, a middle-of-the road x86 CPU and a handful of crypto coprocessors. The package is sold at huge markups, and quite often the cost of the accelerator is going to be higher than the price savings from the servers. In that kind of situation, it makes more sense to consider a machine with a dedicated crypto processor, like the Sun T2000.

The big win from using a SSL accelerator is that it allows you to get more capacity when the constraint is not your CAPEX budget but power and HVAC in the colo.

August 12, 2010 |

Fazal Majid

I'm the person who made the podcast and made the claims. I'll respond to the level of ignorance being shown here by server administrators.

In a server farm of 40 web servers for two large dot com organisation, we achieved significant CPU reductions (from 40%CPU for sixty second WMA to 25% ) across the farm with the implemention of SSL offload to an F5 LTM load balancer.

More importantly, the user response time and conversion rate for low speed / high loss connections improved dramatically, with wide ranging improvements to application response time for all connections. That is, users with modems or transnational connections with firewall interception were much improved and service conversion ratios grew signficantly. This is due to improved TCP handling in the F5 appliance compared to the Linux protocol stack. While Linux TCP implementation is very fine, it's ability to handle tens of thousands TCP connections at varying speeds and to optimise the buffer handling for every client is limited.

For most at-scale online business, the cost of additional servers is high, and becoming exponentially higher. Power, mtce, software all cause the addition of servers to be less practical. Not all business are able, or have deisgned to use open source software, and the use of commercial software

Although server admins love to congratulate themselves with the number of servers that they have in their farm, it's not always the best solution.

With regards to the PCI DSS issue, you can use TCP connection reuse feature on certain loadbals, and then form that connection as a SSL. This provides all the benefits of terminating the SSL negotiation in hardware AND the TCP connection on the load balancer with its improved TCP connection handling, and then passing just a handful of SSL encrypted connection to the Linux servers. This meets the PCI DSS requirements, indeed, that is what a WAF does.

Some of your comments are true about low cost implementations, but I also suspect that you may not be working at scale if you make those comments.

August 13, 2010 |

EtherealMind

@Adam: Thanks for the link to the SSL write-up! Favourite quote:

"On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load"

August 13, 2010 |

doidh

Here is one add-on card for encryption/compression - http://www.soekris.com/vpn1401.htm

August 13, 2010 |

Marki

@etherealmind i follow your pp podcast with great interest (being half-networking, half-server guy myself). I know the F5 products, and I am a fan. However, they are very very very expensive if you want any decent performance. For smaller sites, the smaller-priced F5's will give you a lower performance than an nginx or haproxy without much tuning!

You are right you should offload SSL from your application servers to LB's, but I think you should consider using the "right" tool. An F5 might be the answer if you have for instance 5Gbps of SSL traffic, or if you have very specific needs (the scripting and L7 inspection language is nice). But in most normal cases, using an nginx or other software LB, will give you more performance, for about a tenth of the price.

August 13, 2010 |

Frank

Greg (EtherealMind), thanks for following up with more background. Don't ever change. Great job on the show.

doidh, I publish every comment that isn't obvious SPAM. So you can be mean if that's the way you want to be.

Adam and Mike, thanks for the papers. Mike, the all components in software aspect of the cloud dictates a lot of the stack. I don't know if cloud providers will push LB APIs down to customers.

laz, thanks for the sanity.

Steve, I didn't know about the card processing requirements. It makes a lot of optimizations a no go.

Fazal, they are definitely software on a stick, but I've done some work on security appliances and these specialized security chips kick ass, so there may be value there.

It's interesting to see the enmity towards VARs. The just swipe your credit card model of the cloud may account for more adoption than widely recognized.

August 13, 2010 |

HighScalability Team

@Todd: If pointing out the absence of real world data is mean, I'm a happy mean guy. =)

@EtherealMind: The 40% server reduction claim in the title is a bit off then? I think that more people than myself are interested in the number of servers reduced. I don't doubt that you got a lot of positive effects by implementing these changes.

@.*: I've just seen to much blogvetising to belive claims like this and frankly I'm disappointed by many manufacturers unwillingness to provide repeatable real world test data. So, I'm calling BS when ever I see it.

August 13, 2010 |

doidh

And now with Intel adding AES-NI instructions in the processor to speed encryption and OpenSSL (old and patched) or 1+ can use then it would be interesting to test out terminating on a seperate boxxen.

August 15, 2010 |

Jay Ess

While I can't claim that SSL termination has done a lot for me in the way of performance, I can claim that it has made deploying my application quite a bit easier. I'm doing much less management of certs, since they all reside on my load balancer.

However, what might be important to think about when choosing to terminate SSL on the lb is what effect it will have on your application. The normal way to check if a connection was secure in php is to check if $_SERVER['HTTPS'] is non-empty. However, once I switched over to the LB, that variable was always empty. So I had to send additional headers that were written into the request. While the extra overhead of the check for the header definitely won't be more than actually decrypting a request, it's something to be aware of.

August 22, 2010 |

Chris Henry

EtherealMind: If you are seeing a win on slow connections, then it isn't coming from SSL acceleration. I think most of your "40% win" is coming from using a load balancer in general, which is plausible, though there are likely cheaper solutions than F5.

I quoted above because based on your statement, there wasn't a 40% win. It was a reduction in CPU load on your servers from 40% to 25%. That's great and all, but really only helps if you are CPU bound in the first place, which is becoming an increasingly rare phenomenon.

August 22, 2010 |

Christopher Smith

I know this is a fairly old post, but it does highlight a classic way to sell something that the customer doesn't need. Load balancer vendors (ADCs) are always trying to tell you that you need SSL acceleration on the ADC rather than in the cluster. Why on earth would you waste the cheap raw CPU power in your cluster? Your average Xeon chip can do 2000 x 2048bit SSL terminations per second! Doesn't sound a lot? Remember that active sessions can go on for several minutes without re-terminating with very little CPU overhead. So if we assume 5 minutes average session time, that gives 10,000 SSL users per server in the cluster. When servers cost less than $1,000 why on earth would you pay more than $1,000 per 10,000 concurrent SSL users ...oh and to top it off , if you have a problem with your load balancers you are screwed.

Yes technically the article is correct and has the right figures, yes F5 is a great bit of kit. No it is not and has never been cost effective compare to a properly designed cluster.

September 28, 2015 |

Malcolm Turnbull

Post a New Comment

Enter your information below to add a new comment.

Author:

Author Email (optional):

Author URL (optional):

Post:

↓ | ↑

Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>