With 5 years of experience in Kubernetes operations and development, and an overall 5-year career in expert operations, I possess expertise in Kubernetes source code and operator development. Passionate about open source, I actively contribute to communities such as Cilium, Autoscaler (VPA), and Karmada. I am enthusiastic about the internet, enjoy exploring new technologies, and continuously strive to push beyond my comfort zone.

With an intermediate level of proficiency in Golang, I am currently seeking opportunities in Kubernetes-related development within cloud vendors. Feel free to reach out if you have suitable positions available.

  • Designed and developed a CNI plugin based on Tencent Cloud Elastic Network Interface, addressing limitations of the original network plugin which allocated subnets based on nodes, restricting cluster node count due to subnet size. Architecturally based on operator and agent models, employing policy routing mode on nodes.
  • Reduced Kubernetes cluster costs by developing resource recommendation features based on VPA algorithm, including the development of related operators, resulting in a 15% cost savings.
  • Implemented a descheduler node balancing plugin with real workload awareness to address node overheating issues and improve application stability.
  • Responsible for cluster operations including deployment, stability optimization, node scaling, monitoring migration from Thanos to VictoriaMetrics, and handling alerts.

Led the implementation of Kubernetes from scratch, including cluster setup, assisting in business migration, and platform feature design.

  • Conducted initial research on container usage (deployed using Docker), identified pain points (lack of auto-scaling and graceful deployment), and designed platform architecture and release system functionalities following the principle of “developers only need to care about application configuration, status, and resource usage”.
  • Set up clusters, including self-built Kubernetes clusters and managed Kubernetes clusters on public clouds.
  • Designed platform-level Kubernetes architecture and migration plans, facilitated traffic access to existing non-Kubernetes services, and minimized changes to existing systems.
  • Built Kubernetes ecosystem peripheral systems such as monitoring and traffic access, integrated existing build systems, log systems, and alarm systems.
  • After launch, onboarded 500 pods and over 20 nodes, improving service availability from degraded releases to smooth releases, rapidly scaling up/down, and increasing release efficiency from an average of half an hour to 2 minutes.

Responsible for operations related to live streaming services.

  • Facilitated the migration of services from physical machines to Kubernetes clusters, resolving performance issues post-migration (by adjusting application parameters and enabling CPUSet), reducing PHP program latency from around 20 milliseconds to about 5 milliseconds within Kubernetes.
  • Identified and resolved various issues occurring in production (e.g., slow Redis, Nginx retries causing request amplification).
  • Participated in fault diagnosis and resolution, authored post-mortem reports, and tracked subsequent optimizations.

Responsible for overall company operations.

  • Standardized and normalized server operating systems.
  • Built, operated, and optimized various systems such as Nginx, Redis, release systems, and ELK.
  • Designed and implemented GitLab CI + Docker continuous deployment for testing environments, enhancing operational automation.
  • Organized servers and standardized operations.
  • Managed version deployments, changes, troubleshooting, and issue resolution.
  • Scripted solutions for daily needs including database backups.