Passing Curiosity: Posts tagged PyconAU 2012

PyconAU 2012: Server performance

2012-08-19T00:00:00Z

The low-hanging fruit when improving perceived performance is on the front-end, not the back-end, and certainly not the small overhead of the HTTP server.

Gunicorn:

processes = 2-4 * cpus

The GIL essentially serialises thread processing. This impacts throughput in threaded configurations.

Reverse proxy with nginx, etc. to decouple clients and app servers. Need to be able to kill backlogged requests when they aren’t needed; restarting some servers won’t clear a backlog.

Autoscaling configurations can cause load problems with fat applications; pre-configuring for required maximums.

Server monitoring tools largely treat web apps as a black box, they show the effects, not causes.

Sentry?

New Relic.

Apache modules are like “batteries included” for web-servers.

PyconAU 2012: No, Bad Pony!

2012-08-19T00:00:00Z

What turns an idea into a bad pony?

Is just wrong, impractical, doesn’t fit design.
Take project in wrong direction
Doesn’t come with an offer of assistance.

Ideas that are wrong:

violates a standard of best practice
can’t be implemented
Rusty Russell Interface Level too high
Idea violates DRY.

Impractical ideas:

Not obviously wrong, but they aren’t right either.
Solving a problem that doesn’t exist. E.g: DB cache doesn’t use the ORM.
Changes the design contract. (e.g: syncdb doesn’t touch existing tables, even though it could add manytomany, but doesn’t)
Address smaller parts of a larger problem.
Architecture astronauting. Practicality counts; perfectionists with deadlines, etc.

Design:

All the pieces that are part of Django have the same flavour, built by the same team, we like them.
Replacing the template engine, the ORM, the test framework, etc. is not going to happen: we’re happy with and want what we have.
“I love Django but it’d be great if it was completely different.” is never going to be compelling.
The small learning curve (Forms and Models look very similar) is one of the good things about Django.

Ignores philosophy

“Add GROUP BY, HAVING to ORM” misses the point that the ORM is not SQL, by default.
Adding AJAX to Forms; Django is a server-side framework.

Just add a settings

“A setting is a decision deferred”
Simplicity is a virtue, just adding settings to core makes Django more complex and harder to learn.

Wrong direction

Feature creep. Django is not a web-server or a database or anything else; it won’t add features for these things, just use one of them.
Add a backend. None of these things need to be in core. Adding it to core really means: “please look after this thing for me”. This is why there are backend APIs.
The core doesn’t need to do everything, the community can do awesome stuff, just because it’s in core doesn’t mean it’s good.
Adding apps to django.contrib: what does we get from pulling them in? Nothing, except a slower development schedule. If anything we’re pulling things out of contrib.
What is django.contrib? “An collection of optional, defacto standard implementations of common patterns.” These are pretty much universal. Tagging, etc.: not so much (and which one).

Non technical

Here’s a big job (but I’d like someone else to do it please)
Process suggestions. If you want news round ups, more blog posts, etc., then pitch in!
Massive features: schema evolution, support for non-SQL data stores,

How do you get your pony?

NO Putting your name on the ticket CC or saying “me to”.
NO Posting hyperbole on your blog because it doesn’t have X.
NO Playing games in the ticket tracker.
Offer to help
Or actually help out! Don’t just write code, advocate for your code (why else will someone review).

PyconAU 2012: Message queueing

2012-08-19T00:00:00Z

PyMQ

LOLWUT?

ZeroMQ

Basically a socket; no broker; no persistence.

RabbitMQ

Complete implementation of AMQP, plus extra stuff too. Each exchange is a separate process in RabbitMQ.

AMQP

AMQP is a standard protocol for message queueing. Producers, brokers, consumers; exchanges and queues.

Exchanges

Three types of exchanges:

Direct exchanges transmit a message with a matching routing key into the queue it is bound to.
Topic exchanges transmit messages with a routing key which matches a pattern into the queue it is bound to.
Fanout exchanges transmit a message to all queues it is bound to.

Mark exchange durable. Mark queues durable. Both need to be durable to connect to each other. Also: clients need to ask for durable.

Message persistence: set “deliverymode=2” on the message.

Architecture

Have queues on both sides of the cloud, so if the tubes get clogged messages are stored on either side.

RabbitMQ has a server plugin called Shovel which will forward the messages in a queue to a queue on another RabbitMQ server. This works around the fact that you can’t connect a queue to a queue.

Configuring it is basically writing an Erlang program.

Web Interface

Default credentials: guest / guest

Python

Piku

Kombu

http://www.tcpcatcher.fr

PyconAU 2012: Continuous Deployment

2012-08-19T00:00:00Z

Continuous deployment. The same thing as continuous delivery for our purposes here.

Being able to deploy every good version of your software.

More than technology: techniology for automation, people, environment.

Fail fast, win fast; competitive advantage; real-time control; pivoting, etc.

Have a single path to production; optimising for resilience. “If it hurts, do it more often.” Automate as much as possible.

Use vagrant to develop on as close to live as possible.
Develop, test, commit.
Run a build: unit tests, static analysis, coverage. Very fast.
Iff: functional tests, integration tests.
Iff: deploy to staging, test migration, smoke and UI tests.
Iff: you can push the deploy button.
Then: goes to production.
Users are happy.
Measure, monitor, analyse.
Learn from the information and feedback into the loop.

Environments

Make all environments look the same, minimising differences minimises surprises. Use vagrant to provision virtual boxes (configured in similar ways to staging and live environments).

Having a repeatable, versioned configuration. Use Puppet, Chef, shell scripts, whatever. Just use something.

“Snowflake” servers (every one precious and unique) are bad, “phoenix” servers are the goal.

Use Puppet for provisions and OS and services; Fabric for scripted tasks.

Use virtualenv and pip. Make all your configuration as code.

Use South for schema and data migrations. Split your expansion and contraction columns: only drop the old columns after everything is working correctly in production.

Testing

Using Django’s testing framework, driven by Jenkins. There’s a django-jenkins plugin.

Use factory_boy to generate testing data.

Test separation: don’t run slow or flaky tests every build.

Build

One way to build and deploy; all developers fit into the same process. Make “don’t deploy broken code” easier;

Deploy

django-dbbackup
pull()
apply_puppet()
django-extensions
syncdb
collectstatic

Use feature flags. Gargoyle or Waffle in Django.

Monitor

Errors go into Sentry.
Munin
NewRelic
django-statsd

Rollback

Having some way to recover from bad or failed builds.

PyconAU 2012: Unexpected Day

2012-08-18T00:00:00Z

import ctypes
import ctypes.util

# Open the SO
libpcap_name = ctypes.util.find_library('pcap')
libpcap = ctypes.CDLL(libpcap_name)

# Start capturing
errbuf = ctypes.create_string_buffer(256)
handle = libpcap.pcal_open_live("any", 65535, 0, 0, errbuf)

# Check the error
errbuf.value == 'socket: Operation not permitted'


# Add properties on the function
# .errcheck = function
# .argtypes = [...]
# .restype = ...

class c_timecal(ctypes.Structure):
  _fields = [
    ("tv_sec", ctypes.c_ulong),
    ("tv_usec", ctypes.c_ulong),
  ]

PyconAU 2012: Testing and being lazy and stuff

2012-08-18T00:00:00Z

Diminishing returns on quality assurance activity. We, as engineers, need to try to “hack the graph” and do more with less.

Reference to Terry Pratchett, Moving Images. Not just doing nothing, actively investment in future laziness.

Works in Mozilla Services on Mozilla Sync Server.

WebTest

Provides a WSGI interface for humans. Wraps a WSGI application and provides a nice interface. Good for functional test of a WSGI application, not so much for unit tests.

WSGIProxy

Wrap a HTTP target as a WSGI application. Then you can wrap this in WebTest and run your functional tests from above and run them against your live service.

This is great for deployment testing and you get it for free!

FunkLoad

Really good tool for load testing, but it wants us to invest in it completely as a stand-alone tool. We wind-up repeating much the same code against a slightly different API.

Using a dedicated tool for load testing is very important, dedicated tools will take account of ramp-up and ramp-down to make sure measurements are reliable, etc.

Reports, differential reports (how did a change impact performance),

Tools

WebTest
FunkLoad
Web server

or maybe

FunkLoad
WSGI intercept
In-process WSGI app

PyconAU 2012: Natural language processing

2012-08-18T00:00:00Z

“Human as a Second Language: Succeeding (and failing) with the Natural Language Toolkit”

Natural Language Toolkit

General ho-hum generalisations of overly logical, Spock-ish stereotypes of “programmers”. Abstraction, gender, disincentive to creative natural langauge, etc.

NLTK is, like most toolkits, a bunch of tools and resources; bridges the gap between science and art (linguistic, presumably).

Language Features 101

Stopwords

The common but semantically unimportant words. Generally remove stopwords when doing statistical tasks.
Parts of speech

High-school grammar: nouns, adjectives/adverbs, verbs. N, ADJ, ADV, V.

Also: a bunch more.
Stemming

Reduce words to their stem, so you can unify various forms; generally for statistical techniques.
Lemmatization

Similar to stemming, but results in a real word.

NLP Concepts

Training data

Copora for English language words (stopwords), Boys’ names, Girls’ names, tagging part of speech.

Wordnet linked dictionary.
Tokenisation

Split a document into individual parts. The particular type of “part” will vary depending on the task (words, sentences, etc.)

Many different tokenisation algorithms for different situations.

Applications

Sentiment analysis and opinion mining. Targeting advertising.

Establish patterns in language used to make guesses about the person talking: gender, age, etc.

Integration with BeautifulSoup for something to do with HTML? Not sure why you’d bother.

Chatbots: @PatrickAndElly use Twitter interface (Python Twitter Tools):

Tokenise words.
PoS tag.
Simply tagging (because too much grammar is too much).

PyconAU 2012: Notes on Mike Ramm's keynote

2012-08-18T00:00:00Z

Your heart only has a certain number of beats. Why are you wasting your time on this shit?

According to the Internet: 9/10 (projects|startups) fail. Probably more like 50%, but still… What can we do to not fail? The rest of the conference will be about technical awesome, this one is about how not to build something no-one wants.

Product managers: didn’t matter whether you do market research or pull it out of your arse. [citation required]

Took job as PM at SourceForge (after boardroom shuffle) to keep it away from someone worse than the old team. But how to do it? With testing!

Anecdote about SourceForge project page redesign:

Prominent download button
Grid with more information
More screenshots.

User testing is how you know you’re making progress.

What sort of world would we have if product managers used the scientific method? Arguing over product choices is stupid, just test it and do what’s right; this takes personality, arbitrary decision making processes, etc.

Dave McClure, Metrics for Pirates

Acquisition

Activation

Retention

Referral

Revenue

You need to measure how people get to your site (or whatever). If you can’t test this, you’ll have trouble testing anything else.

Testing the aspects which cause users to sign up, pay, etc.

Will they keep using it, doing it, buying it?

Will the refer their friends, etc. Of it. Net promoter score is potentially a proxy for quality: people won’t say it’s shit to your face, they just won’t refer their friends.

Anecdote about Zappos making sure they could actually sell shoes on the web before spending heaps of money building infrastructure to sell shows on the web. Started with a guy going and buying shoes when they got an order.

Need great idea, great implementation, great marketing, and something about timing.

The Structure of Scientific Revolutions. About “normal science” vs “revolution”. Pity that marketdroids stole “paradigm shift”.

Startups that are doing something interesting are those that are about paradigm shifts. Discovery vs Optimisation; normal science vs paradigm shift.

Steve Blank, The Four Steps to Epiphany. Don’t do product development, do customer development. Get customers, then build the product they want.

Desiderata are: desirable, feasible, viable?

PyconAU 2012: Python's dark corners

2012-08-18T00:00:00Z

Peter Lovett is a programmer and trainer.

Python’s dark corners. Covering 2.x with a few tips on 3; things to avoid, etc. Python is a fantastic language but it’s not perfect and there are a few dark corners which need to be worked around.

Python is a deceptively simple language: the surface simplicity hides a deal of complexity.

Reference to the algorithmic trading incident.

OO

Python really is object oriented. This has a few more implications: “modules” and functions are first-class.

References

Python uses references by default. Use is for reference equality, not ==. Some types (int, tuple, etc) are immutable.

Rebinding is sometimes an accident. When of a built-in, it’s often catastrophic. Use __builtin__ to get these things back.

Pass by reference is the default (and only option). The only pass by value is to copy it. Lots of options for lists (slice [:]), the copy module.

Operators

No ++ or -- operators. “Mutating” operators are designed so as not to be mutating: +=, etc. rather than ++.

Typing

If you’re checking types, you’re doing it wrong. “Duck typing”. #sigh

Numerics

Floating point arithmetic is floating point arithmetic.

Tuples

Immutable, but only the tuple itself (i.e. the references it contains, not their referents).

Arguments

Support both ordinal and keyword parameters.
Default values. Default values are created at load time, so should probably be immutable.

Namespaces

Scoping: scoping of variables is based on use within the scope, not dynamic.

Visibility of globals. Better to just avoid variables in global scope.

PyconAU 2012: Writing command-line applications

2012-08-18T00:00:00Z

Graeme Cross, How to write a well-behaved Python command-line application.

Goals include developing programs which are robust and well-maintained; include a flexible, powerful interface; stay within the command-line paradigm (pipes); handles errors and signals; well tested and documented.

We’re dealing with 2.6/2.7 in this session, there are some differences in Python 3 releases.

The slides and code are available at http://www.curiousvenn.com/.

What is a well-behaved application? “Do one thing well”, be UNIX, robustly handle bad input, gracefully handle errors, be well documented, platform aware.

Use Python for its wide platform support; syntax; scalability; extensive library ecosystem.

Scripts should use the following pattern:

if __name__ == 'main':
    print "running"

so that they can be imported for use as a library, testing, etc.