EN24 discord
sov map

Tranquilitys Many Upgrades

April 2, 2016

Tranquility, or TQ for short, has had two previous major upgrades. The first was in 2010 and the other in 2011.

2010 UPGRADE

On a cloudy Wednesday on the 23rd of June 2010 (Coinciding with the Eruption of Eyjafjallajokull) the core of Eve Online – the server bank – was moved. For 8 years they had lived in a handful of cabinets in a London data centro, however with the large influx of new players CCP had decided that the game had outgrown the servers and upgraded them. The new servers sent 79 Kw of power to 12 cabinets that allowed CCP to move all Eve content (TQ, SiSi, Website, Forums, Account Management) into a single location with the aim to reduce server latency and bottlenecks.

The main parts of the move that are documented, and therefore the ones I’ll be exploring, are:

  • Heat Management
  • Networking
  • Final Specs

Prior to the 2010 Upgrade Eve’s servers had used ambient cooling, which is a highly inefficient system as the cool air is not directed at the servers (Think of air conditioning). Part of the 2010 upgrade meant that the servers were now cooled by a closed aisle cooling system. This is where cold air from the aisle is directed into the cabinets, increasing efficiency. Because the majority of traffic on TQ is happening internally, the majority of network and performance bottleneck’s reside in the internal switching cards. During the 2010 update the TQ servers got an 800% increase in capacity along with an upgrade to their C7600’s and RSP720’s with some Cisco DFC3’s.  Techies see the photo below.

Finally, the upgraded specs of Eve’s TQ Sever:
  • 64 x IBM H21 Which Include:
  • 2x Dual Core 3.33GHz CPU’s
  • 32 GB RAM
  • 72GB HDD
  • And
  • 2x IBM X3850 M2’s Which Include:
    • 2x 6-Core 2.66GHz CPU’s
    • 128GB RAM
    • 4 x 146GB HDD
Leaving us with a Total of:
  • Processors: 280 Cores
  • Clock Speed: ~1THz
  • RAM: 2.3TB Locally with 256GB in SAN
  • Storage: 4.8TB Locally with 2TB SSD’s in SAN
  • Network: Gigabit Ethernet 4Gb/s

The next major server upgrade was in March 2011, this time focused mainly on the database servers, and this time Iceland’s Volcanic God agreed that it was probably for the better and allowed the upgrade to continue without an eruption this time.

2011 Update:

 

The main parts of this update were:

  • Storage Network Bandwidth
  • RAM
  • CPU’s
  • Storage
Storage Network Bandwidth:

CCP had previously used aging two 4Gb/s connections, however since 2010 improved and refined technology had allowed them to upgrade to more a future-proof set of four 8Gb/s connections. This overall allowed CCP to Upgrade from a total of 8Gb/s to 32Gb/s (400% Increase).

RAM:

Prior to this upgrade CCP had used 182 GB of the industry standard, DDR2 RAM. However, they decided to quadruple this to 512GB of DDR3 RAM, which increased the Buffer Cache Ratio (Basically a little thing that counts how many times the server uses the buffer instead of disks, using a buffer is much faster than disks because the data is already there). Overall, CCP upgraded 128GB DDR2 to 512GB DDR3, and 111GB of DB Cache to 460GB DB Cache.

CPU:

TQ’s non-hyper-threaded X7460’s gave the database threading issues. To solve this, TQ got some shiny new X7560’s, which gave TQ two more cores per chip and hyper-threading. This brought the backup process down from 90% CPU Usage to Under 55%, increasing the amount of logical processors from 12 to 32.

CPU Usage before upgrade

CPU Usage After the Upgrade

As you can see, it’s a huge increase in performance.

Storage:

CCP had previously used RamSan 500’s, which are actually a really good piece of tech. Unfortunately, however, they are limited in their ability to be scaled, have security flaws and are quite large. CCP moved database storage to the IBM V7000 with 18 drives of High-Performance SSD’s and 72 Drives of 15K RPM SAS Drives.

This meant that CCP had upgraded its transfer speed from 2.9Gb/s to 10Gb/s, It had doubled total SSD Capacity and Aggregate Storage had increased from 2TB of Non-Redundant to 11.5TB of 1 for 1 Redundant hardware.

Before Upgrade:

After Upgrade:

 

So with the 2011 upgrade the System looked like this:

Per Server Specs:

  • 2 x Octo-Core Intel x7560’s @ 2.26Ghz
  • 32 x 16GB DDR3 RAM
  • 4 x 8Gb/s Fiber channel Cards
  • 4 x 15K 300GB SAS Internal Drives
  • 9 x 300GB SSD’s in RAID 5
  • 36 x 600GB SAS Drives in RAID 10

Overall System Specs

  • 1TB RAM
  • 64 Processors with Hyper-Threading Enabled
  • 32Gb/s Storage Throughput
  • 51TB Raw Storage

Eve TQ has traditionally used the IBM Blade servers, however, for the first time in Eve’s history CCP has upgraded the servers to IBM FLEX. But what does this even mean?

Well, for example, TQ’s Blade servers run 4 x 1Gb Network Connections and each server has access to to 2 x 1G, as they have 2 Network Cards (Think of these as USB dongles but plugged into the green thing inside the computer). The FLEX has 4 x 10Gb Network Connections and each server has access to 2 x 10Gb throughput.

This is actually quite overkill for EVE at its current state, however this prevents CCP having to upgrade the servers for at least the next couple years and allows it to scale better even at upgrade time without having to completely rip out the entire thing (Thing of changing a key on a keyboard instead of replacing the entire keyboard).

The new server:

 

CCP currently has 3 Main DB Clusters:

Tranquility: 2 CPU’s with 32 Hyper Threaded Cores

Website: 2 CPU’s with 24 Hyper Threaded Cores

Account and Payment: 2 CPU’s with 24 Hyper Threaded Cores

These servers run on hardware from the 2011 update and they rightly deserve to be replaced.

The X7560’s running a clock speed of 2.26Ghz will be replaced by Intel’s new E7-8893 v3 running a Clock Speed of 3.2 Ghz, according to Intel’s website and subsequent specifications. This allows for a 45% Increase in clock speed, a 75% Increase in memory bus speed and a 123% increase in RAM, up to 1.5TB, which is more RAM than many modern computers have in disk space.

Then there’s visualization, I won’t cover this in depth as it’s hard to explain without going into depth and detail but the basic gist of this is that when TQ servers need maintenance, instead of making two nodes run on a one node cluster with only one point of failure, CCP are basically v Motioning the passive cluster to another, meaning two passive nodes will be a little over-allocated but two more hosts would have to fail before that becomes an issue.

Finally, the 2011 specs vs. the 2016 specs:

Node role TQ Tech II(2011) TQ Tech III (2016)
Standard SOL Nodes 51 x IBM HS21XM – Intel X5260 CPU @ 3.33GHz with 32GB RAM (1333MHz) 30x IBM x240 – Intel E5-2637-V3 CPU @ 3.5GHz CPU with 64GB RAM (2133MHz)
Enhanced SOL Nodes 9 x IBM HS23 – Intel Xeon CPU @ 3.30GHz with 64GB RAM (1333MHz) 6x IBM x240 – Intel E5-2667 v3 CPU @ 3.2GHz with 128GB RAM (2133MHz)
Everest Node NDA  N/A
DUST514 Proxy Node 4 x IBM HS22 – Intel X5687 @ 3.60GHz with 24GB RAM (1333MHz)  Will run on Eve proxies
EVE Proxy Node 5 x IBM HS23 – Intel E5-2643 0 @ 3.30GHz with 64GB RAM (1600MHz)  6x IBM x240 – Intel E5-2667 v3 CPU @ 3.2GHz with 128GB (2133MHz)
ESXi / Virtual hosts 4x IBM HS 22 – Intel E5620 @ 2.4GHz 146GB RAM (1333MHz)

3x IBM HS 22 – Intel E2640 @ 2.6GHz 146GB RAM (1333MHz)

2x IBM HS 22 – Intel X5690 @ 3.4GHz 96GB RAM (1333MHz)

 6x IBM Flex x240 Intel E5 2640 v3 @ 2.6GHz 386GB RAM (2133MHz)
Database Server  2x IBM X3960 X5 – Intel X7560 @ 2.26 GHz with 512GB RAM (1066 MHz) 2x X880 X6 FlexNode – Intel E7-8893 V3 @ 3.2 GHz with 768GB RAM (1866 MHz)