Quantcast
Channel: Ian Chilton » Linux
Viewing all articles
Browse latest Browse all 7

Nginx and why you should be running it instead of, or at least in front of Apache

$
0
0

After 9 years of development, Nginx hit a milestone release this week when version 1.0.0 was released (on 12th April 2011). Despite only now reaching a 1.0 release, it is already in widespread use, powering a lot of high traffic websites and CDN’s and is very popular with developers in particular. With such a milestone release though, I thought it a good opportunity to get motivated and do some posts on it here.

Nginx (pronounced “engine-x”) is a free, open-source, high-performance HTTP server (aka web server) and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in 2004.

Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. It is built specifically to be able handle more than 10,000 request/sec and do so using minimal server resources. It does this by using a non-blocking event based model.

In this article, i’m going to look at the problems with Apache and explain why you would want to use Nginx. In a subsequent article, i’m going to explain how to install and configure Nginx.

The most popular web server, Apache powers around 60% of the world’s web sites. I’ve been using Apache for around 10 years but more recently have been using Nginx. Due to it’s widespread use, Apache is well used, well understood and reliable. However, it does have some problems when we are dealing with high traffic websites. A lot of these problems center around the fact that it uses a blocking process based architecture.

The typical setup for serving PHP based websites in a LAMP (Linux, Apache, MySQL and PHP) environment uses the Prefork MPM and mod_php. The way this works is to have the PHP binary (and any other active Apache modules) embedded directly into the Apache process. This gives very little overhead and means Apache can talk to PHP very fast but also results in each Apache process consuming between 20MB and 50MB of RAM. The problem with this is that once a process is dealing with a request, it can not be used to serve another request so to be able to able to handle multiple simultaneous requests (and remember that even a single person visiting a web page will generate multiple requests because the page will almost certainly contain images, stylesheets and javascript files which all need to be downloaded before the page can render), Apache spawns a new child process for each simultaneous request it is handling. Because the PHP binary is always embedded (to keep the cost of spawning processes to a minimum), each of these processes takes the full 20MB-50MB of RAM, even if it is only serving static files so you can see how a server can quickly run out of memory.

To compound the problem, if a PHP script takes a while to execute (due to either processing/load or waiting on an external process like MySQL) or the client is on a slow/intermittent connection like a mobile device then the Apache process is tied up until the execution and transmission etc has completed which could be a while. These factors and a lot of traffic can often mean that Apache has hundreds of concurrent processes loaded and it can easily hit the maximum number of processes (configured) or completely exhaust the available RAM in the system (at which point it will start using the virtual memory on the hard disk and everything will get massively slower and further compound the problem). If a web page has say 10 additional assets (css, javascript and images), that’s 11 requests per user. If 100 users hit the page at the same time, that’s 1,100 requests and up to around 50GB of RAM required (although in reality you would have a limit on the number of Apache processes much lower than this so the requests would actually be queued and blocked until a process became free and browsers will generally open up a few simultaneous connections to a server at a time). Hopefully you are starting to see the problem.

With Nginx’s event based processing model, each request triggers events to a process and the process can handle multiple events in parallel. What this means is that Nginx can handle many simultaneous requests and deal with execution delays and slow clients without spawning processes. If you look at the two graphs from webfaction, you can quite clearly see that Nginx can handle a lot more simultaneous requests while using significantly less, and quite a constant level (and low amount) of RAM.

Nginx excels at serving static files and it can do so very fast. What we can’t do is embed something like PHP into the binary as PHP is not asynchronous and would block requests and therefore render the event based approach of Nginx useless. What we therefore do is have either PHP over FastCGI or Apache+mod_php in the background handle all the PHP requests. This way, Nginx can be used to serve all static files (css, javascript, images, pdf’s etc), handle slow clients etc but pass php requests over to one of these backend processes, receive the response back and handle delivering it to the client leaving the backend process free to handle other requests. Nginx doesn’t block while wating for FastCGI or Apache it just carries on handing events as they happen.

The other advantage of this “reverse proxy” mode is that Nginx can act as a load balancer and distribute requests to not just one but multiple backend servers over a network. Nginx can also act as a reverse caching proxy to reduce the amount of dynamic requests needing to be processed by the backend PHP server. Both of these functions allow even more simultaneous dynamic requests.

What this means is that if your application requires a specific Apache configuration or module then you can gain the advantages of Nginx handling simultaneous requests and serving static files but still use Apache to handle the requests you need it to.

If these is no requirement for Apache then Nginx also supports communication protocols like FastCGI, SCGI and UWSGI. PHP also happens to support FastCGI so we can have Nginx interact with PHP over FastCGI without needing the whole of Apache around.

In the past, you either had to use a script called spawn-fcgi to spawn FastCGI processes or handle FastCGI manually and then use some monitoring software to monitor them to ensure they were running. However, as of PHP 5.3.3, something called PHP-FPM is (which distributions often package up in a package called php5-fpm) part of the PHP core code which handle all this for you in a way similar to Apache – you can set the minimum and maximum number of proceses and how many you would like to spawn and keep around waiting. The other advantage to this is that PHP-FPM is an entirely separate process to Nginx so you can change configurations and restart each of them independently of each other (and Nginx actually supports reloading it’s configuration and upgrading it’s binary on-the-fly so it doesn’t require a restart).

In the next post in this series, i’ll explain how to install and configure Nginx for serving both static and dynamic content.

One of the disadvantages of Nginx is that it doesn’t support .htaccess files to dynamically modify the server configuration – all configuration must be stored in the Nginx config files and can not be changed at runtime. This is a positive for performance and security but makes it less suitable for running “shared hosting” platforms.


Viewing all articles
Browse latest Browse all 7

Trending Articles