Apache Mesos Monitoring
Powerful performance with an easy integration, powered by Telegraf, the open source data connector built by InfluxData.
5B+
Telegraf downloads
#1
Time series database
Source: DB Engines
1B+
Downloads of InfluxDB
2,800+
Contributors
Table of Contents
Powerful Performance, Limitless Scale
Collect, organize, and act on massive volumes of high-velocity data. Any data is more valuable when you think of it as time series data. with InfluxDB, the #1 time series platform built to scale with Telegraf.
See Ways to Get Started
Apache Mesos is an open-source project to manage computer clusters. It abstracts CPU, memory, storage and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to be built and run effectively.
Why use the Apache Mesos Telegraf Plugin?
The Apache Mesos Telegraf Plugin allows you to collect observability metrics provided by the Mesos master and agent nodes and insert them into your InfluxDB instance. The plugin can collect a set of metrics that enable cluster operators to monitor resource usage and detect issues before they become a problem.
How to monitor Apache Mesos using the Telegraf plugin
The Apache Mesos Telegraf Plugin will collect metrics from Apache Mesos and insert them into InfluxDB. By default, this plugin is not configured to gather metrics from Mesos since a cluster can be deployed in numerous ways. You will need to specify master/slave nodes for this plugin to gather metrics from.
Key Apache Mesos metrics to use for monitoring
Some of the important Apache Mesos metrics that you should proactively monitor include:
Resources:
master/cpus_percentPercentage of allocated CPUsmaster/cpus_usedNumber of allocated CPUsmaster/cpus_totalNumber of CPUsmaster/cpus_revocable_percentPercentage of allocated revocable CPUsmaster/cpus_revocable_totalNumber of revocable CPUsmaster/cpus_revocable_usedNumber of allocated revocable CPUsmaster/disk_percentPercentage of allocated disk spacemaster/disk_usedAllocated disk space in MBmaster/disk_totalDisk space in MBmaster/disk_revocable_percentPercentage of allocated revocable disk spacemaster/disk_revocable_totalRevocable disk space in MBmaster/disk_revocable_usedAllocated revocable disk space in MBmaster/gpus_percentPercentage of allocated GPUsmaster/gpus_usedNumber of allocated GPUsmaster/gpus_totalNumber of GPUsmaster/gpus_revocable_percentPercentage of allocated revocable GPUsmaster/gpus_revocable_totalNumber of revocable GPUsmaster/gpus_revocable_usedNumber of allocated revocable GPUsmaster/mem_percentPercentage of allocated memorymaster/mem_usedAllocated memory in MBmaster/mem_totalMemory in MBmaster/mem_revocable_percentPercentage of allocated revocable memorymaster/mem_revocable_totalRevocable memory in MBmaster/mem_revocable_usedAllocated revocable memory in MB
Master
master/electedWhether this is the elected mastermaster/uptime_secsUptime in seconds
System
system/cpus_totalNumber of CPUs available in this master nodesystem/load_15minLoad average for the past 15 minutessystem/load_5minLoad average for the past 5 minutessystem/load_1minLoad average for the past minutesystem/mem_free_bytesFree memory in bytessystem/mem_total_bytesTotal memory in bytes
Slaves
master/slave_registrationsmaster/slave_removalsmaster/slave_reregistrationsmaster/slave_shutdowns_scheduledmaster/slave_shutdowns_canceledmaster/slave_shutdowns_completedmaster/slaves_activemaster/slaves_connectedmaster/slaves_disconnectedmaster/slaves_inactivemaster/slave_unreachable_canceledmaster/slave_unreachable_completedmaster/slave_unreachable_scheduledmaster/slaves_unreachable
frameworks
master/frameworks_activemaster/frameworks_connectedmaster/frameworks_disconnectedmaster/frameworks_inactivemaster/outstanding_offers
framework offers
master/frameworks/subscribedmaster/frameworks/calls_totalmaster/frameworks/callsmaster/frameworks/events_totalmaster/frameworks/eventsmaster/frameworks/operations_totalmaster/frameworks/operationsmaster/frameworks/tasks/activemaster/frameworks/tasks/terminalmaster/frameworks/offers/sentmaster/frameworks/offers/acceptedmaster/frameworks/offers/declinedmaster/frameworks/offers/rescindedmaster/frameworks/roles/suppressed
tasks
master/tasks_errormaster/tasks_failedmaster/tasks_finishedmaster/tasks_killedmaster/tasks_lostmaster/tasks_runningmaster/tasks_stagingmaster/tasks_startingmaster/tasks_droppedmaster/tasks_gonemaster/tasks_gone_by_operatormaster/tasks_killingmaster/tasks_unreachable
messages
master/invalid_executor_to_framework_messagesmaster/invalid_framework_to_executor_messagesmaster/invalid_status_update_acknowledgementsmaster/invalid_status_updatesmaster/dropped_messagesmaster/messages_authenticatemaster/messages_deactivate_frameworkmaster/messages_decline_offersmaster/messages_executor_to_frameworkmaster/messages_exited_executormaster/messages_framework_to_executormaster/messages_kill_taskmaster/messages_launch_tasksmaster/messages_reconcile_tasksmaster/messages_register_frameworkmaster/messages_register_slavemaster/messages_reregister_frameworkmaster/messages_reregister_slavemaster/messages_resource_requestmaster/messages_revive_offersmaster/messages_status_updatemaster/messages_status_update_acknowledgementmaster/messages_unregister_frameworkmaster/messages_unregister_slavemaster/messages_update_slavemaster/recovery_slave_removalsmaster/slave_removals/reason_registeredmaster/slave_removals/reason_unhealthymaster/slave_removals/reason_unregisteredmaster/valid_framework_to_executor_messagesmaster/valid_status_update_acknowledgementsmaster/valid_status_updatesmaster/task_lost/source_master/reason_invalid_offersmaster/task_lost/source_master/reason_slave_removedmaster/task_lost/source_slave/reason_executor_terminatedmaster/valid_executor_to_framework_messagesmaster/invalid_operation_status_update_acknowledgementsmaster/messages_operation_status_update_acknowledgementmaster/messages_reconcile_operationsmaster/messages_suppress_offersmaster/valid_operation_status_update_acknowledgements
evqueue
master/event_queue_dispatchesmaster/event_queue_http_requestsmaster/event_queue_messagesmaster/operator_event_stream_subscribers
registrar
registrar/state_fetch_msregistrar/state_store_msregistrar/state_store_ms/maxregistrar/state_store_ms/minregistrar/state_store_ms/p50registrar/state_store_ms/p90registrar/state_store_ms/p95registrar/state_store_ms/p99registrar/state_store_ms/p999registrar/state_store_ms/p9999registrar/state_store_ms/countregistrar/log/ensemble_sizeregistrar/log/recoveredregistrar/queued_operationsregistrar/registry_size_bytes
allocator
allocator/allocation_run_msallocator/allocation_run_ms/countallocator/allocation_run_ms/maxallocator/allocation_run_ms/minallocator/allocation_run_ms/p50allocator/allocation_run_ms/p90allocator/allocation_run_ms/p95allocator/allocation_run_ms/p99allocator/allocation_run_ms/p999allocator/allocation_run_ms/p9999allocator/allocation_runsallocator/allocation_run_latency_msallocator/allocation_run_latency_ms/countallocator/allocation_run_latency_ms/maxallocator/allocation_run_latency_ms/minallocator/allocation_run_latency_ms/p50allocator/allocation_run_latency_ms/p90allocator/allocation_run_latency_ms/p95allocator/allocation_run_latency_ms/p99allocator/allocation_run_latency_ms/p999allocator/allocation_run_latency_ms/p9999allocator/roles/shares/dominantallocator/event_queue_dispatchesallocator/offer_filters/roles/activeallocator/quota/roles/resources/offered_or_allocatedallocator/quota/roles/resources/guaranteeallocator/resources/cpus/offered_or_allocatedallocator/resources/cpus/totalallocator/resources/disk/offered_or_allocatedallocator/resources/disk/totalallocator/resources/mem/offered_or_allocatedallocator/resources/mem/total
Mesos slave metric groups
- resources
slave/cpus_percentslave/cpus_usedslave/cpus_totalslave/cpus_revocable_percentslave/cpus_revocable_totalslave/cpus_revocable_usedslave/disk_percentslave/disk_usedslave/disk_totalslave/disk_revocable_percentslave/disk_revocable_totalslave/disk_revocable_usedslave/gpus_percentslave/gpus_usedslave/gpus_total,slave/gpus_revocable_percentslave/gpus_revocable_totalslave/gpus_revocable_usedslave/mem_percentslave/mem_usedslave/mem_totalslave/mem_revocable_percentslave/mem_revocable_totalslave/mem_revocable_used
- agent
slave/registeredslave/uptime_secs
- system
system/cpus_totalsystem/load_15minsystem/load_5minsystem/load_1minsystem/mem_free_bytessystem/mem_total_bytes
- executors
containerizer/mesos/container_destroy_errorsslave/container_launch_errorsslave/executors_preemptedslave/frameworks_activeslave/executor_directory_max_allowed_age_secsslave/executors_registeringslave/executors_runningslave/executors_terminatedslave/executors_terminatingslave/recovery_errors
- tasks
slave/tasks_failedslave/tasks_finishedslave/tasks_killedslave/tasks_lostslave/tasks_runningslave/tasks_stagingslave/tasks_starting
- messages
slave/invalid_framework_messagesslave/invalid_status_updatesslave/valid_framework_messagesslave/valid_status_updates
You can learn more about Apache Meso metrics on their documentation page.
Powerful Performance, Limitless Scale
Collect, organize, and act on massive volumes of high-velocity data. Any data is more valuable when you think of it as time series data. with InfluxDB, the #1 time series platform built to scale with Telegraf.
See Ways to Get Started