Nagendravishnu K

Senior Site Reliability Engineer

Building resilient, scalable infrastructure with a passion for observability, automation, and engineering excellence.

Observability Real-time Insights
Automation Reduce Toil
Kubernetes Cloud Native

About Me

I'm a results-driven Site Reliability Engineer with 6+ years of experience building and maintaining highly scalable, resilient cloud infrastructure. My journey started with traditional operations and evolved into architecting modern cloud-native solutions.

I specialize in transforming infrastructure through automation, reducing manual toil by 80%+, and establishing best practices for reliability. I'm passionate about observability, GitOps, and fostering a culture of continuous improvement.

When I'm not optimizing systems, you'll find me exploring new technologies, mentoring junior engineers, and contributing to the SRE community.

6+
Years Experience
100+
Microservices Managed
99.99%
Uptime SLA
60%
MTTR Reduction

Experience

Senior Site Reliability Engineer

Nov 2023 - Present

Cognizant Technological Solutions, Coimbatore, TN

  • Architected and deployed Kubernetes clusters managing 100+ microservices with 99.99% uptime SLA
  • Implemented OpenTelemetry-based observability stack with Datadog, reducing MTTR by 60%
  • Led GitOps transformation using ArgoCD for continuous deployment, achieving zero-downtime deployments
  • Designed SLO/SLI metrics framework resulting in 40% reduction in customer-impacting incidents
  • Automated Datadog monitoring setup with Python Flask, enabling self-service monitoring for 50+ teams
  • Orchestrated migration from Bamboo to GitHub Actions, consolidating CI/CD toolchain
  • Implemented eBPF-based network observability for real-time performance insights
  • Reduced infrastructure costs by 35% through rightsizing and spot instance optimization

Site Reliability Engineer

Jan 2020 - Nov 2023

Tata Consultancy Services Ltd, Chennai, TN

  • Built and maintained production infrastructure supporting 24x7x365 availability for e-commerce platform processing $50M+ annual transactions
  • Designed sophisticated alerting and dashboard ecosystem using Splunk, New Relic, and Azure Monitor with ML-based anomaly detection
  • Managed on-premises to Azure cloud migration for 200+ application components using Infrastructure as Code (Terraform)
  • Engineered CDN configuration management (Akamai) resulting in 40% improvement in global content delivery
  • Developed Node.js automation tool integrated with Jenkins for data pipeline automation, reducing manual effort by 80%
  • Optimized logging pipeline using Splunk, reducing costs by $100K annually

Skills & Expertise

Container Orchestration

Kubernetes Docker Helm Container Registry

Cloud Platforms

Microsoft Azure AWS Multi-cloud

Infrastructure as Code

Terraform Ansible CloudFormation Helm Charts

CI/CD & GitOps

GitHub Actions Jenkins ArgoCD Flux CD

Observability & Monitoring

Datadog Prometheus Grafana Splunk OpenTelemetry eBPF

Programming Languages

Python Bash/Shell Node.js SQL Go

DevOps Tools

GitHub JIRA ServiceNow PagerDuty

Advanced Topics

SLO/SLI Metrics Incident Management Cost Optimization Blameless Post-mortems

Certifications

Azure AZ-900

Microsoft Certified: Azure Fundamentals

Azure AI-900

Microsoft Certified: AI Fundamentals

SRE Practitioner

NIIT & Cognizant

Kubernetes (In Progress)

Linux Foundation

Get In Touch

Location

Coimbatore, Tamil Nadu, India

Send me a Message