================= PDX Python Meetup ================= --------------- September, 2011 --------------- Delicious data with mmstats --------------------------- Problem: ~~~~~~~~~~~~ * You have an app. * It has state. * What is my app doing? * How do you inspect for that state? * Simple in-memory stats get hard to expose in multi-process environments. Solution #1: Logging ~~~~~~~~~~~~~~~~~~~~~~~~ Pros * Universally supported * Easily enhanced * Persistent Cons * Libraries suck * Operational burden (rotating, shipping, routing, etc) * Records events more than inspects state * Difficult to predict where needed * Performance impact if too verbose Solution #2: Graphite ~~~~~~~~~~~~~~~~~~~~~~~~~ Pros * Fast, sexy, "enterprise" * Python Cons * Do you want a graph or a graph? * Have fun installing it. * Still not great for introspection. Solution #3: socketconsole ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pros: * Pure Python. * Very useful for deadlocks, blocking code, threaded app. * Simple to integrate:: import socketconsole socketconsole.launch() Cons: * CPython only. * Doesn't work with gevent or eventlet monkeypatching. * Doesn't work with greenthreads. * Limited functionality. * All the fun of Python threads. Solution #4: REPL Backdoors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pros: * Pure Python! * Changing code at runtime is for winners. * Inspect all the things! Cons: * With great power comes great responsibility. * Requires threads or event loop. * Still can't reach all state. Solution #5: GDB ~~~~~~~~~~~~~~~~~~~~ Pros: * Well, you wanted introspection. * Has some Python helpers: ``pygdb``, ``gdb-heap`` Cons: * Seriously? Definitily only a a last resort. Solution #6: JMX ~~~~~~~~~~~~~~~~~~~~ Pros: * Universally supported. * Powerful, extensible, 2-way. * Helpful tools. Cons: * Where "universal" means "runs on the JVM". * Not as easy to monitor as you'd think. .. note:: Python needs this. ``mmstats`` Goals ~~~~~~~~~~~~~~~~~~~~~ * Simple API to expose state. * Separate publishing from reading, aggregating, etc. * Language, platform, framework agnostic. * Minimal & predictable performance impact. * Optional persistence (e.g. post-mortems) * 1-way (for now?) What is ``mmstats``? ~~~~~~~~~~~~~~~~~~~~~~~~ * ``mmap`` alows sharing memory between processes. * Language independent data structure: * Series of fields (structs) * Fields have ``label``, ``type``, and ``values``. * Exposed in Python app as a model class. Performance Implementation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Single writer, multi-reader * No locks. * No syscalls (write, read, send, recv, etc). * All in userspace for readers & writers. * Reading has no impact on writers. * Fixed field sizes. Consistency without Locks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Q: How are reads consistent without locks? A: Double buffering. ``django-slow-log`` or How we found a view doing 100k queries ------------------------------------------------------------- Things it does: * Query count. * Hostname of the machine. * Time delta of request started til response started. * Memory delta. * Load delta. * Etc. How it does it: * Database backend. * Uses celery. * A couple system calls for the deltas. Who should use it: * Any Django site that does anything interesting (sites that do a decent amount of traffic, several hits per minute)