Ansible Performance

From tannerjc wiki
Jump to: navigation, search

strategies

getting started

many tasks

many hosts

many groups

many vars

metrics

duration

Duration is a measurement of time. There are multiple facets of "duration" when observing Ansible.

  • total duration of a playbook
  • total duration of a task
  • total duration of a host within a task
  • total duration of an ssh call for a host within a task
  • total duration of wating for a sudo password prompt
  • total duration of executing python on the remote host
  • total duration of processing the results for a host/worker

cpu utilization

Aka CPU time. Many tools exist to measure cpu time / utilization and it's important to understand the various metrics each provides.

https://access.redhat.com/solutions/1160343

An important metric for basic ansible cpu utilization is the "b" (aka "blocked) column from vmstat.

https://linux.die.net/man/8/vmstat
http://www.dba-oracle.com/t_linux_oracle_vmstat.htm
https://access.redhat.com/solutions/792683

Multiple factors could cause the number of blocked processes to accumulate.

  • not enough cpu cores
    • setting ansible's fork count too high
  • not enough memory
    • too many hosts returning too much data for the controller to handle
  • not enough disk IOPs

memory utilization

disk utilization

network utilization

tools

https://docs.ansible.com/ansible/devel/plugins/callback/cgroup_memory_recap.html
https://github.com/jctanner/ansible-tools/blob/master/ansible_debug_logparser

https://github.com/ansible/qa-scale-lab
https://github.com/jctanner/ansible-tools/tree/master/vagrant/ansible_test_inventory

labs

https://github.com/jctanner/ansible-tools/tree/master/playbooks/slowhost

training

https://www.redhat.com/en/services/training/rh442-red-hat-enterprise-performance-tuning
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/chap-red_hat_enterprise_linux-performance_tuning_guide-performance_monitoring_tools