Coding Tips – How To Bulletproof Node.js Server Script
Node.js (server-side JavaScript) is a new kid in the block. It offers tremendous advantages over PHP based API servers when it comes to handling application calls at scale. It also exploits Google’s V8 engine to convert the Javascript code into native machine code, making call execution super fast. Moreover, it uses asynchronous programming to achieve parallelism by utilizing a pool of worker threads, instead of making various processes. Threads are relatively much lighter than the processes and; hence, a typical Node.js server can handle approximately 10 times more concurrent calls as compared to a similar PHP server.
However, it take some efforts to get the Node.js code suitable for the production environment. One of the major issues faced is, whenever an error is left unattended in Node.js, the stack unwinds and is left in an unrecoverable state. The only way out from this situation (i.e. to get the stack working) is to restart the Node.js process. The problem is very well documented and there are modules like ‘forever’ to handle this. But for the production level environments, we encountered a lot of practical issues which is why we came up with a recovery mechanism by bulletproofing Node.js coding.
Below are the details of the Three Layered Mechanism we used to handle the cases where Node.js stack unwinds due to an error:
1) Process Monitor: Use ‘pm2’ as a process monitor instead of ‘forever’
2) Internal cron: Monitors Node.js endpoints with an ability to kill and restart ‘pm2’ and child processes
3) External Monitor: Monitors Node.js endpoints with an ability to restart the corrupted instance
Layered recovery architecture is useful because it makes the server downtime proportional to the severity of the error. In general, more severe errors occur less frequently than less severe errors. Restarting the corrupted instance resolves most severe issues, but it results in a 2 minute downtime. Hence, it is used sparingly. On the contrary, restarting ‘pm2’ results in a downtime of the order 10s of seconds, and resolves most of the errors which it is unable to handle. ‘pm2’ itself monitors the Node.js process and restarts the same in seconds, whenever it detects the stack unwinding because of an error.
With the span of time, we will monitor the logs regarding the errors which are occurring, and keep on refining our codebase to ensure that the downtime is kept minimal.
Despite all these production level issues, the minimal response time of the Node.js server even at very high traffic loads compensates well for all these issues.
No comments yet.