Being the head of Information Technology for the EVE Online Alliance ‘C0nvicted’ (alliance tag A.I.F), I have had a lot of experiences within the last few months setting up from a forum and killboard database dump to getting all previous services restored.
This initially started one day after I logged into the alliance TeamSpeak and was asked by one of my corporations leadership, “Can we borrow your one servers for the alliance?” Which was right out questionable. I quickly learned that the alliance’s IT team, along with several other members, we’re leaving the alliance and there was a lot of positions being emptied. One of which was IT. I was assigned immediately to the job and I began my assessment of what needed to be replaced.
At the time, I had a dedicated server from GoDaddy running a small Minecraft server, this blog, the corporation’s killboard and forums, TeamSpeak. It was an older Core 2 Duo 2.4 GHz plan with 2 GB of RAM and 2 x 300 GiB hard drives running CentOS 5.8 32 bit for $90 a month. For what it was doing, it was more then enough. Luckly, they those who were leaving were being extremely nice as they wanted to form their own corp to join either the CFC or HBC and remain blue. The IT personnel I was working with were nice enough to immediately drop the forum and killboard databases, which I took in immediately onto my server and backed up on my personal network. Along with that, I started work on getting the domain for convicted transferred.
The previous set up the alliance had before I got to it was fairly simple:
- SMF 2 Forum
- EDK based Killboard
- TeamSpeak 3
- OpenFire XMPP server
- POS Tracker
For their corp, they had another POS Tracker, secondary OpenFire XMPP server, personal forum, and an instance of EVE Corp Management. This was all running on a Linode cloud server with 1.5 GiB of RAM. They disabled the alliance killboard as it continued to utilizing a greater portion the memory and utilizing more then enough MySQL query time during high load periods (after major battles, linking a kill to more then several people, etc), which in turn (from their analysis), continued to slow down their web services and crashing the XMPP servers.
Working as a help desk and technician for a job, I understand the importance of uptime, reliability, and building confidence within your services and products that you support. This setup they had was anything but that. Uptime was sporadic with no confidence if it could be relied on as a means of communication, and the corporations within the alliance continued to have a backup communication software (TeamSpeak for the most part). This I would not stand for.
I did estimate that server I did have would be able to handle the server load itself with the expected worse case scenario being over 2000 simultaneous connections to a multiple links on the killboard and forums while handling Teamspeak with 4 to 8 channels having people talking and multiple concurrent conversations on the XMPP server.
Within 36 hours, I had the basics setup and working. The forums were restored, killboards working, and teamspeak up and running with a non-profit license with 2 virtual server instances with everything running as a subdomain on the corporation’s DNS. After three days though, the server dropped off the net unexpectedly. Could not ping and power cycling did not bring up the server. After putting in an urgent ticket though GoDaddy to figure out what happened, they reported back something that hurt me. The OS wouldn’t boot with errors right from boot. It was being important to bring these services back up. An immediate second dedicated server was purchased and setup. Within 6 hours, I had restored what I would, which was the original database dumps for the alliance. The original one had the hard drives put into external cases, fresh hard drives put in and reprovisioned, and the externals connected back.
From here I did bit of diagnostics on the original server, transferring the contents of the second hard drive (which contained the static files for the web server) to the new server. But the first hard drive, which also contained the database files, had unfortunately died after extended use. There was no restoring the drive itself. The back ups I did have for the services, to include this blogs database, were either several years too old and the newer backups were corrupted. I had on my hands a worse case scenario. All data was unrecoverable and I had to start with nothing from before the data dumps. I was fortunate to have the static files though.
With the new server had CentOS 6 (64 bit) loaded onto it by default. The OS itself had no issues, but it became quickly identified as a problem with older libraries. TeamSpeak requires older MySQL libraries to run in a MySQL configuration, along with the OpenFire XMPP server would crash every 48 to 72 hours (though this would be something completely different and not OS related). Unfortunately, even with the REMI and EPEL repositories, I would not acquire the proper libraries to run TeamSpeak. I loaded TeamSpeak and OpenFire on the newly provisioned server as subdomains.
I also continued to monitor the new server’s performance. The CPU (Intel i5 at 3 GHz, dual core, sandy bridge architecture) was barely utilized, but during the heavier loads of pulling data, there was a very surprising bit of data that presented itself. The server memory usage was just over 3.5 GiB of memory usage and over took the 2 GiB of RAM installed in the machine, spilling into swap. Render times for the forums and killboard were between 15 to 45 seconds per page.
Additionally, between the two server, it was costing me nearly $200 a month in rental c0sts to GoDaddy, which I started to research into other options. Though they were good servers, they were limiting. They were on shared 100 Mbit lines and seemed to have some traffic shaping done to their connections. Though there were many others who offered better services and hardware, they could not match the same price as one server without compromising the hardware itself. Co-location became the only option. I still have personal development servers from Dell and HP on a personal rack located 4 feet from me from the time of writing this entry. With that, I had extensive work with Both the Dell PowerEdge and HP Prolaint DL series servers from both work and personally owned. With co-location, I would be able to dictate my hardware and pay for the hardware upfront. Power, bandwidth, IP’s, and rackspace would be the main cost (other then the occassional upgrade or maintanance cost for replacement hard drives).
My selection was a simple choice between the Dell PowerEdge 1950 and the HP Prolaint DL 360 G5. Both are dual socket Intel Xeon based servers, SAS based RAID controllers, redundant power supplies, and expandability. Though both are now life cycled servers with no manufacturer warranty available, they are still reliable and powerful servers. With that, I chose the HP Prolaint DL360 G5 (second one I own) with the Intel Xeon E5405 (2 Ghz, quad core), 16 GB of RAM (8 x 2 GiB DDR2-677 FB-DIMM), P400 Smart Array with 512 MB and backup battery, and dual redundant power supplies. I put in six Western Digital WD3200BEKT into the server in a RAID 6 configuration with the controller configured for 20% read / 80% write cache setting on the P400.
With that, I quickly configured the entire server with CentOS 5.8 64 bit and configured the entire server with PHP 5.3 and MySQL 5.5. Downloaded all databases, static files, setup the DNS and all services, to include OpenFire and TeamSpeak. With that, I FastPCNet for colocation, which were fast at setting up everything. Within a few days, the server was packaged and picked up for shipment on a Saturday, which was then setup on Monday. The site was transferred from the GoDaddy servers with that day’s databases, and all DNS entries, to include redirecting the domains name servers to point at the new server. A little bumpy as not everyone’s DNS servers and cache updated immediately, but within 72 hours, everyone was on the same server. With no more relevant data on the GoDaddy servers, they were canceled and my total payments now stand at $110 a month for 10 TB of allocated bandwidth, power, 1U space, and usage on a gigabit line (which is still shared, but we have downloaded over at 800 Mbit per second from another site).
There was one last thing though that kept going down after the colocation: the Openfire XMPP server. This would continue over a month as I did what I could to keep it running, to include restarting the server before it crashed. What was more was it crashed due to a running out of memory. Even when giving it over 4 GB of RAM and even putting an aggressive garbage collector start up. This would continue until I finally analyzed several memory dumps after crashing. One of the services that had a memory leak from the OpenFire server kept taking up all the space on the server’s instance. With some instance, I finally entered the property into the server
As false. This turned off the service that kept crashing the server. The server has remained active since it’s entry.
Going Beyond the Basics
The last team ran the services as they were without any modification or additions, in addition to rarely being active on jabber or even showing up on the TeamSpeak they ran, reacting to situations with their services instead of proactively preventing them from becoming an issue. After taking over, I began to do what ever I could to improve the services I provided. This included installing modifications and background services, writing custom scripts to do tasks and react to situations that would have taken down services or even accelerated them. Modifications include:
- Installing SimplePortal for additional use of the forum’s already blank space to give users a side bar with information and a shoutbox (which replaced the original one installed by the previous team only visible on the front page of the forum.), which is present on all parts of the forum.
- Installing an extended award system, fleet and ship statistics on the killmails and related kills.
- Increasing security and ease of use of the killboard by installing the TSM to authenticate users with the forum for users to post killmails, comment on them, and control access from outsiders as passwords are easily spread and constantly changing them does near nothing from preventing them from being leaked to outside sources.
- Installing an event tracker though the forum calendar function and continue to improve the system by developing an in-house tracking system with a countdown timer till the fleet.
- A TeamSpeak fleet communication link system for temporarily communication between non-authenticated personnel, allowing for a quicker fleet form up without the need to either have the members sign up for TeamSpeak though the forums or manually dragging personnel to the proper channels.
- A wiki authentication module for a one time sign in system though the forums for members to quickly get secure information without doing additional work though registration.
One thing that has become essential is a one time sign in system though a central authentication system. For convicted, that currently is through the forums as it already provides a secure means of authenticating users though the EVE Online API. Not only that, but being transparent allows the end user less hassle on their end and working more on the tasks they are assigned to. With all the services provided by my end, this allows the leadership and members at any level to quickly communicate though any medium, both at a workstation or though mobile device, allowing for their peers and members to get the messages and actions just as fast. No longer is the need for a computer capable of running EVE Online itself to communicate as these services do exactly what was envisioned by CCP though their tools.
To that, I end this with the fact that we are privileged to be part of one of the two biggest entities in the EVE Online universe that has allowed me to perform these actions and provide the services. The actions and effort I put into my work directly affect how my entire alliance performs. If anything I do degrades their performance, the fleets they run, the things they do, the actions they take are also vastly affected. It may not destroy or stop the day to day operations of the alliance, but how well it performs as a whole or individuals, but it will hurt them. The same can be said for any alliance or coalition in EVE Online with a strong IT backbone.