The path to an automated monitoring system

1155 palavras 5 páginas
The path to an automated monitoring system.
Juliano Martinez Francisco Freire Abstract
After years struggling with manual and hand crafted monitoring systems Locaweb got to a point where the number of services and the data generated by these systems is huge. We needed to follow the company's growth, scale the system and learn from past errors in record time. The challenge: design and implement an automated and integrated monitoring system in a short time. This paper shows what how we built our automated monitoring system in less than 3 months, with almost 400k service checks using cfengine, check_mk, python and a home grown project called leela. We will talk about the challenges we faced to design, develop, integrate everything and put this project on production plus how to leverage a heuristic to automatically open tickets without flooding our operations team 1 ­ Introduction Monitoring have been one of the biggest problems on system administration for years, “how to scale monitoring?”, “how cover everything that must be monitored?”, “which alarms are more or less critical?”, “what have to be monitored from the application perspective?”, those questions live on system administrators head. Our work will focus on remove any false­positive from services and applications being monitored, have a good way to calculate a composite SLA, use one solution to keep all system administrators speaking the same language. 2 ­ Challenges Locaweb has a huge environment ( more than 6k physical servers and 13k virtual machines ) we have to offer the most recent products and systems. Everything can change from day by night and starts completely different on the next day, based on this requirement, the monitoring system need to be effective to work and grow along the infrastructure.

Relacionados

  • Dededede
    4378 palavras | 18 páginas
  • Sa p n e t w e av e r b p m w h i t e p a p e r
    5188 palavras | 21 páginas
  • Sexo
    1957 palavras | 8 páginas
  • Cobit
    85189 palavras | 341 páginas
  • Cobit 4.1
    81996 palavras | 328 páginas
  • Casos de uso
    85189 palavras | 341 páginas
  • 2011 Airline fraud report us
    5839 palavras | 24 páginas
  • Atps
    1012 palavras | 5 páginas
  • Cobit
    81438 palavras | 326 páginas
  • Autoself managment
    88442 palavras | 354 páginas