What else is running on my EC2 instance?…

Aside

At Thinkear we use a lot of Amazon Webservices. We use Elastic Compute Cloud (EC2) to host our Apache Tomcat server running Java 6. We recently had some major performance issues with our service, which lead us to analyze our EC2 hosts and figure out what was running on it.

At peak hours our servers handle ~35K requests/second and we have a 100 ms SLA to maintain with our partners. In this type of environment performance is the top priority and we were surprised by some of the things we found running on our EC2 instances. I thought I would share what we found. Some of it was surprising.  Some of it was documented after you knew what to look for – but I find the Amazon docs hard to navigate. Throughout our interactions with Amazon support, we found that not all representatives were aware of some of these points below.

For an AMI we have the Amazon Linux  X86 64-bit (ami-e8249881) running a Tomcat 7 Linux Configuration.

1. Apache runs as a proxy.

Our load balancer (AWS Elastic Load Balancing), directs port 80 traffic to port 80 on the hosts. Then Apache runs a proxy that forwards requests from port 80 to port 8080 (default Tomcat port).

Config files are located in /etc/httpd/conf.d and /etc/httpd/conf. We had to tweak settings in /etc/httpd/conf/httpd.conf based on our use case. These settings were the root cause of our issues. We had never looked into them because everything seemed to work.

We tried by-passing Apache because we didn’t need the features it brings. Unfortunately, we had issues with our servers on deployment when we by-passed apache. We haven’t found the root cause of this as of yet.

2. Logrotate Elastic Beanstalk Log Files

elasticbeanstalk.conf in /etc/httpd/conf.d/ defines the ErrorLog and AccessLog properties for Apache. These files are then rotated out by /etc/cron.hourly/logrotate-elasticbeanstalk-httpd. The problem was that we didn’t know these log files existed and we felt the settings were too aggressive for us.

These are our current settings: https://gist.github.com/KamilMroczek/7296477. We changed the size parameter to be 50 MB and to only keep 1 rotated file. Smaller files take less time to compress. We didn’t need all those extra copies.

3. Logrotate Tomcat Log Files

logrotate-elasticbeanstalk in /etc/cron.hourly defines rotating catalina.out and localhost_access_log.txt out of the Tomcat logs directory! As nice as it is for them to do that, we had no idea. It didn’t have a large impact on us, since we handled the rotating of our log files ourselves already at shorter intervals. We ended up removing this unnecessary step anyway.

Original Log rotate script: https://gist.github.com/KamilMroczek/7296539

4. Publishing logs to S3

We noticed that we had CPU spikes at 20 minute intervals on our hosts at 10, 30 and 50 minutes passed the hour. We couldn’t explain these. When we looked at our CPU usage through top we found the culprit.

/etc/cron.d/publish_logs is a python script that publishes:

  • /var/log/httpd/*.gz (#2 above)
  • /var/log/tomcat7/*.gz (#3 above)

I originally thought that we were uploading the same files a ton since the logrotate only rotated every hour and kept the last 9 copies, but the publishing happened 3 times an hour. But we found out that the code has de-duplicating logic.

We removed this cron task because we didn’t need the data uploaded. We already uploaded our tomcat logs separately and the beanstalk logs were of no use to us at the time. Nor have we ever used them to troubleshoot issues.

5. Amazon Host Configuration

The entire configuration for your environment can be found through Amazon Cloud Formation. There is a describe-stacks (or cfn-describe-stacks depending on the CLI version) call that allows you to pull the entire configuration for an environment. We are in the process of auditing ours. More complete instructions are here:

http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-describing-stacks.html

As will all tough problems, after troubleshooting them you inevitably get a deeper understanding of your system and its architecture. When you own and provision your own servers you understand everything on them because you are responsible for creating the template. When you use a hosted solution such as Amazon Webservices, you can run into the problem of not knowing everything about your image. But we learned that you need to take the time to understand what you are getting.

Advertisement

VisualVM Profiling Apache Tomcat through SSH Tunnel

So I had the job of trying to setup profiling of our Apache Tomcat instances on EC2.  There were a lot of instructions out there that I tried to cobble together for a solution.  The tricky part with profiling on a remote host are the firewall rules, since the RMI server will pick a random port to use.

The method I used was to open an SSH tunnel to the running machine and pretend like everything is local.  I finally got my monitoring up an running so wanted to share.

Props to Thiago for having a nice write-up which had details that many people ommitted.

1. First download the proper version of the catalina-jmx-remote.jar from the apache archives.  We were running on 7.0.23 so I downloaded: (you can just change the version on the link below)

http://archive.apache.org/dist/tomcat/tomcat-7/v7.0.23/bin/extras/

2.  Copy the JAR to your Tomcat’s lib dir.

3.  Open server.xml in Tomcat’s conf dir (mine was /opt/tomcat7/conf/server.xml) and add the following listener.  You should have a group of Listeners in the file already so just add to the bottom:

<Listener className=”org.apache.catalina.mbeans.JmxRemoteLifecycleListener” rmiRegistryPortPlatform=”10001″ rmiServerPortPlatform=”10002″ useLocalPorts=”true” />

– The two ports listed here are arbitrary (you can pick your own).  By specifying both ports, you set which ports the RMI server uses instead of the server arbitarily picking them.

IMPORTANT: Many walkthroughs for SSH tunneling forget this part.  You need to set this to true.

4. Add the following params to startup:

-Djava.rmi.server.hostname=localhost

-Dcom.sun.management.jmxremote

-Dcom.sun.management.jmxremote.authenticate=false

-Dcom.sun.management.jmxremote.ssl=false

– Many walkthroughs on the net tell you to set the “java.rmi.server.hostname” variable to the public IP.  But since we are using SSH tunneling this should be localhost.

 

5. Restart Tomcat and check to make sure that your Tomcat instance started with the params from step 4: ps -aux | grep tomcat

6.  Also verify that its listening on the 2 ports from Step 3: netstat -nlp

Screen_shot_2012-11-15_at_8

7.  Now you should be ready to connect with Visual VM.  Drop catalina-jmx-remote.jar into the VISUALVM_HOME/platform/lib directory.

8.  Create a SSH tunnel on BOTH ports.  Sub in your EC2 instance (and potentially change the user)

ssh N -L10001:localhost:10001 -L10002:localhost:10002 ec2-user@<EC2 public ip>

– Many instructions on the web only tell you to use 1 port, but you need both.

9.  Start VisualVM.

10.  Right-click on Local and and add a new JMX Connection and use the following service URL:

service:jmx:rmi://localhost:10002/jndi/rmi://localhost:10001/jmxrmi

Screen_shot_2012-11-15_at_8

 

– Notice that the ports must match up with the ports you specified in Step 3.

And that should be it!  You should now be able to profile your application.

 

 

Remote Java Profiling on EC2

Had to do some profiling and there are a bunch of articles strewn across everywhere, so thought I would help the cause for people trying to find them. 

Props to Mriddle: http://rukuro-blog.heroku.com/2011/07/26/running-visualvm-through-an-ssh-tunnel

Another good article by Neil Figg: http://neilfigg.blogspot.com/2011/07/remote-jvm-profiling-on-amazon-ec2.html

I didn’t use Neils because I didn’t need to go deeper in my profiling at this stage.  I ran into a bunch of gotchas which Mriddle helped with and some comments:

1.  Create an SSH tunnel and make visual VM proxy through that:

– sudo ssh -i .ssh/yourkeyfileifyougotone.pem -D 9696 your.ip.goes.here

(options to add to jvisualvm)

-J-Dnetbeans.system_socks_proxy=localhost:9696 -J-Djava.net.useSystemProxies=true)

2.  Command to check if jstatd is listening:

sudo netstat -nlp | grep jstatd

3.  Look at startup command for tomcat

ps aux | grep tomcat

4.  VisualVM looks at temp directories for process info, so make sure that you start jvisualvm and specify the proper temp dir (you can see it from step #3):

(option to add to jvisualvm)

 -J-Djava.io.tmpdir=/opt/tomcat7/temp

 

Things that I read but didn’t work/didn’t try:

1.  Make sure the java bin you are using matches that on the remote server.  Apparently, there is a difference between the jre java bin and jdk java bin.

 

2.  Open the proper ports after your instance comes up on EC2.  Since jstatd picks a random port, you need to add the port to your security policy.

 

ec2-authorize <security_group> -p <port> -s <your ip>

 

My env on EC2:

OpenJDK Runtime Environment (IcedTea6 1.11.4) (amazon-52.1.11.4.46.amzn1-x86_64)

OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Linux version 3.2.12-3.2.4.amzn1.x86_64 (mockbuild@gobi-build-31003) (gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) )

Apache Tomcat/7.0.23

Locally:

VisualVM: 1.6.0_33 (Build 110613); platform 110613-unknown-revn

Mac OS X (10.7.4) , x86_64 64bit

1.6.0_33; Java HotSpot(TM) 64-Bit Server VM (20.8-b03-424, mixed mode)