top of page

Operations And Maintenance Strategies For Data Centers

Long before planning and construction has even started on a hyperscale date center, the owners should be thinking about how they are going to keep it efficiently operating. Because most new data centers are built to take on more than one client, owners have to determine whether to staff the center with direct employees, or hire a third-party company to run the data center. Sometimes there is a combination of the two, or even more than one third-party company tasked with different aspects of operation and maintenance. 



ree


When hiring a company to manage the center, owners need someone who understands the full context of the center and what needs to be done ensure minimal downtime, as well as keeping maintenance and repair costs low. It always should be kept in mind that operations and maintenance go hand in hand for data centers. 

There are three main components to the operation and maintenance of a data center: 

Equipment Operation and Management 

  • Cables 

  • Servers 

  • Software 

Facilities Operation and Management 

  • Power Systems 

  • Cooling Systems 

  • Building Systems (lighting, plumbing, structural) 

Security 

  • Physical plant security 

  • Securing IT 

Managers for all of the maintenance and operations (M&O) at a data center, must be guided by a master plan and a program director who understands the “big picture.” For instance, the replacement of something as simple as a light fixture might need advance planning if power interruptions are expected. 

Everyone on an M&O team should understand the context of the center’s mission; to assure the optimal and continuous operation. 

Maintenance and Operations 

Operations refer to the overall processes of a business or location so that it functions properly on a continuous basis. Operations touch on every aspect of a company or business location, from assuring it is staffed to paying the bills and invoicing clients. 

No matter the system in any company operation, maintenance is a key component. This could be something as trivial as regular updates of software systems or changing air filters in an HVAC system. The tasks may not be complex, but they are important to operations. 

Maintenance is the regular practice of inspecting, servicing, monitoring, cleaning, and repairing hardware, software and building systems such as mechanical, electric and HVAC. For data centers, there also are extensive challenges with telecommunications cables, network infrastructure, and other IT equipment.  

Importance of Data Center Maintenance 

Maintenance is important for operations, but especially so for data centers. Maintenance failures could result in such issues as power outages, equipment failure, accumulation of dust and dirt, cable issues, software incompatibility, and security vulnerabilities. All of these could result in expensive down time on key systems. 

Failure to maintain means that when those problems arise, they are addressed with suboptimal solutions because a repair needs to be made quickly and sometimes without the right tools or replacement parts. It is always better to be proactive rather than waiting for problems to arise before fixing them. The key, then, is a robust preventative maintenance program. 

An effective preventative maintenance program includes regularly scheduled tasks such as hardware inspections, software updates, cleaning, equipment testing and adjustments, and part replacement. For example, air filters on ventilation systems are regularly replaced. Doing to mitigates the possibility of motors burning out; something that could lead to other catastrophic failures. 

The purpose of preventative maintenance is to extend equipment life and prevent the effects of aging and latent failures.  


Other Maintenance Types 

 While preventative maintenance can be effective, it also has its drawbacks. Most notable is if a preventative maintenance schedule calls for the replacement of a part regardless of whether it is close to failure. This can lead to increased operational costs. 

Other maintenance systems add more context to recurring maintenance. Reliability-Centered Maintenance takes into account critical areas and is tailored to each piece of equipment, determining likelihood of failure and importance to total mission.  

For instance, a cooling fan in a non-critical area may only be replaced upon failure, whereas a cooling fan in a server rack should be replaced more frequently and not just when it stops working. Experts agree that this type of maintenance is more cost efficient. 

Predictive Maintenance is another type of maintenance, which is done through aggressive monitoring of data center systems in combination with complex analytics. The idea is to identify potential failures before they occur. Something as simple as monitoring vibration patterns on a cooling system can predict a failure.  

While more expensive to implement, Predictive Maintenance greatly reduces the risk of equipment failure.  

Operations Depend on Maintenance 

Each maintenance process has its pros and cons, which is why most data center management teams usually depend on a combination of all three. 

Operations and Maintenance for data centers requires thoughtful planning and knowledgeable professionals who understand how different data center functions work together.  

Critical Energy Infrastructure Services (CEIS) is a nation-wide leader in providing safety and quality assurance for the data center industry. Contact Corey Englebrake at 505-220-3022 or corey.englebrake@ceis.com for more information and start the process of making your project safer. 

 
 
 

Comments


bottom of page