As applications are promoted from the development environment to the CI or the QA environment and then into the production environment, it is very common for the configuration settings to be changed as the code is promoted. For example, the settings for the database connection pools are typically lower in development environment than the QA/Load Testing environment. The primary reason for the existence of the configuration setting differences is to enhance application performance. However, occasionally there are instances where the application code is mistakenly promoted into production without changing these settings. In such cases, such promotion of code can cause performance havoc in the production environment. This blog describes one such scenario.
During their proof of concept, a prospect requested we identify and resolve an issue that they were observing in the production environment. They had just promoted a newer version of their critical application to production and soon after the promotion of the code they stared to see significant increase in the response time when the end users tried to login to their web application. To diagnose the issue the prospect injected our agents into their production JVMs that were exhibiting the issue to the end user. With Dynatrace agents injected, we observed both high response time for login and that most of the time for the login request was spent in class loading.
Additionally, breakdown of the response time showed that most of the time for login web request was spent in synchronization (92%).
With the response time breakdown showing the highest amount of time in synchronization we took a thread dump on the JVM where the login request was being processed to get insight into thread locking issue. The thread dump showed 67 threads that were blocked in the JVM.
Further analysis of the content of the thread dump showed that most of the blocked threads were waiting for the resource (CompoundClassLoader) held by one running thread (Thread Id 5570678).
The thread stack trace for the running thread showed that it is trying to load the class from the file system. Examining the full trace of the thread dump shows that the threads on this JVM are creating a new facelet every time a web request is received.
While it is very normal for a JVM to spend some time to load classes when the JVM is initially started, it is NOT normal (or good for performance) for the class loading to continue even after the JVM has warmed up and most of the classes are already loaded.
The thread stack showed call to getFacelet(java.util.URL) method in the com.sun.facelets.impl.DefaultFaceletFactory class. Review of the source of this class showed that the method tries to load the class if the method needsToBeRefreshed() returns true.
And finally the code for needsToBeRefreshed() clearly shows that it returns true if the refreshPeriod is set to 0.
The Facelets are capable of precompiling if you set javax.faces.FACELETS_REFRESH_PERIOD to -1. However, once set to -1, the JSF never re-compile/re-parses the Facelets files and holds the entire SAX-compiled/parsed XML tree.
In the development process the REFRESH_PERIOD is typically set to 0 because it allows the developers to keep editing the Facelets file without the need to restart the server. What happened at this prospect is that the application code was promoted into production with the REFRESH_PERIOD is to 0 value and hence every time the user tried to login, the Facelets were forced to recompile and that in turn resulted in high response time.
Conclusion
Configuration settings that are enabled during the application development are very useful as they reduce the number of times the application server has to be started to test code changes. However, as this example shows, it is very important to disable development-level settings as the code moves out of the development environment because they can cause performance havoc in the production environment. One of the best practices to eliminate occurrences of such scenario is to make the configuration changes a part of the continuous integration tools like Jenkins.
Looking for answers?
Start a new discussion or ask for help in our Q&A forum.
Go to forum