[tor-reports] PyCon Trip Report

Damian Johnson atagar at torproject.org
Sun Jun 12 00:16:13 UTC 2016


Once upon a time tor developers wrote trip reports for neat
events they attended. Though rare nowadays, recently meejah
and I hit Portland for PyCon, the largest Python developer
conference there is.

Lots of great talks worth sharing...

  http://blog.atagar.com/pycon2016/

Cheers! -Damian

PS. For those of ya that don't like pretty pictures below
is just the text.

PPS. Ok, maybe I went a tad overboard with this...

============================================================

I've been to quite a few conferences. LinuxFest Northwest, SeaGL,
PETS, Toorcamp, Defcon, but PyCon was particularly impressive. At
over three thousand attendees with five parallel tracks of talks
the word 'busy' hardly seems to do the conference justice.

Top TL;DR highlights for me were new capabilities in the Python
3.x series and HTTP 2.0. In particular...

  * Python 3.6 releases on Christmas, finally adding string
    interpolation!

    >>> name, job = 'Damian', 'software engineer'
    >>> print f'{name} is a {job}'
    Damian is a software engineer

  * Python 2.x support will be completely discontinued in 2020.

  * New async/await keywords in Python 3.5 provide built-in
    support for Twisted-style async IO.

  * Gradual type syntax in Python 3.5 makes code even more
    self-documenting and supportive of static analysis.

  * First major protocol update since 1999, HTTP 2.0 is now
    supported by all modern browsers and 60% of users in the
    wild. Connection multiplexing allows all site assets to
    be retrieved over a single connection, improving latency
    on the order of 50%. The new protocol also negates any
    need for the clever performance hacks we've developed
    over the years like asset minimization and sprite maps!

  https://www.youtube.com/watch?v=Mou17XxYRZk

PyCon 2017 will be in Portland one more time before moving
on to another venue, so if the following sounds interesting
then check it out!

============================================================

Serendipity is delightful. My first time taking the train,
I strongly suggest Amtrak (particularly the Coast Starlight)
if heading down to Portland. Comfortable, scenic, and by
happy coincidence sat with Sarah Leivers: PyCon speaker
with roots in the UK deaf community.

Sarah made the interesting point that even for deaf
communities in English speaking countries English is
often a second language. Signing is their native tongue,
putting them at a disadvantage when it comes to involvement
in our communities. Part of the larger ESL puzzle, our
discussion was a nice reminder of why it's important to
keep documentation as linguistically simple and accessible
as we can.

In the observation car the Parks Department described
sights we passed, my favorite being the Centralia train
station. Completed right around the time these newfangled
'airplane' things were taking off, to celebrate they
decided to christen the building with champaign. Three
bottles were loaded onto a plain and dropped. The first
couple bottles missed but the third hit dead on, puncturing
right through the roof.

Spoiler alert: this was the last building they christened
in such a way.

Go to a conference without exploring the area and you're
doing it wrong. My train left me a few hours to explore
the city, starting with the Portland Saturday Market.
Easily comparable to Pike Place, the market is four
city blocks jam packed with all the essentials of life:
hand-carved bark houses, tie die, and of course fancy
hats!

https://www.atagar.com/transfer/pycon_2016/4-street_market.jpg

Next hit the Lan Su Chinese Garden, beautiful gem nestled
into the heart of downtown...

https://www.atagar.com/transfer/pycon_2016/5-garden.jpg

Of course visited Ground Kontrol just a block away. Classic
arcade that successfully reminded me just how much I suck
at Marble Madness. In my defense haven't played since my
good old Amiga 2000...

https://www.atagar.com/transfer/pycon_2016/6-ground_kontrol.jpg

Finally, hidden below my hotel lurked a black light pirate
themed putt-putt course. So... seems that's a thing!

https://www.atagar.com/transfer/pycon_2016/7-mini_golf.jpg

------------------------------------------------------------
File Descriptors, Unix Sockets and other POSIX wizardy
------------------------------------------------------------

First talk of the first day, Christian Heimes gave a crash
course on *nix file descriptors. In python descriptors are
fetched with f.fileno() and Christian demoed interacting
with them directly to open his cd tray.

Christian's talk focused on file descriptor basics (which
honestly I'm rustier on than I should be)...

  * Descriptors 0-2 are reserved for stdin/stdout/stderr
    with -1 for errors.

  * Fork clones the current process while pointing to the
    same global entry.

  * Exec replaces the current program, inheriting the prior
    descriptors (which is why pipes continue to work).

  * Descriptors can be delegated. This is useful in
    sandboxing situations like seccomp, allowing a
    broker to open files/sockets on a sandboxed
    process' behalf.

Lastly Christian walked through a little strace example
that illustrates how descriptors are used in a basic
scenario...

  % cat reader.py
  with open('/home/atagar/Desktop/reader.py') as my_file:
    print(my_file.read())

  % strace python reader.py
  ...
  open("/home/atagar/Desktop/reader.py", O_RDONLY|O_LARGEFILE) = 3
  read(3, "with open('/home/atagar/Desktop/"..., 4096) = 80
  read(3, "", 4096)                       = 0
  close(3)                                = 0
  write(1, "with open('/home/atagar/Desktop/"..., 81) = 81

------------------------------------------------------------
Refactoring Python: Why and how to restructure your code
------------------------------------------------------------

Nice presentation by Brett Slatkin, the author of Effective
Python on how and when to make code more maintainable. As
developers we optimize for making things work in our first
pass, and for many of us that's where the story ends. To
make code that's truly easy to follow requires time and
patience to take follow-up passes that optimize for
maintainability. Something most developers don't do.

To illustrate this Brett asked: how much of your coding
time goes toward implementation? 90%? 75%? The few
developers he knows that write easy to follow code only
do so because they spend fully half their time refactoring
anything they write. Maintainability isn't cheap, and when
faced with deadlines it's often the first thing to go.

Brett's other main takeaway was that without tests you're
DOA. Refactoring requires a willingness to make mistakes,
and without high coverage any major overhaul of production
systems is in practice impossible.

This dovetailed nicely with the following talk, Code Unto
Others, which gave a few tips...

  * When it comes to maintainability remember that you
    don't scale. Any rough code you write is something
    you'll need to explain over and over to engineers
    that touches it. That's not really how you want to
    spend your time, is it?

  * Commonly people can track 5-9 things at a time which
    is why phone numbers are seven digits. Subdivide
    modules to take advantage of this. As a counter-example
    they used Mercurial's Repository class, a 17,000 line
    headache for newcomers.

  * Be wary when describing your module uses the word 'and'
    ("it does this *and* that"). If you need that word
    you're probably doing it wrong. After reading the
    first half of a class you should be able to take
    an educated guess at what you'll see in the second.

------------------------------------------------------------
Finding closure with closures
------------------------------------------------------------

Peek under the hood at how Python implements closures...

  >>> def print_greeting(first_name):
  ...   def msg(last_name):
  ...     platform = os.uname()[0]
  ...     return "Hi %s %s, you're running %s" % (first_name,
last_name, platform)
  ...   print(msg('Johnson'))
  ...   print("co_varnames: %s" % ', '.join(msg.__code__.co_varnames))
  ...   print("co_names: %s" % ', '.join(msg.__code__.co_names))
  ...   print("co_freevars: %s" % ', '.join(msg.__code__.co_freevars))
  ...
  >>> print_greeting('Damian')
  Hi Damian Johnson, you're running Linux
  co_varnames: last_name, platform
  co_names: os, uname
  co_freevars: first_name

varnames are local variables while freevars are variables
we're closing over from the outer scope. A gotcha that's
probably bitten every python dev is that assignment to a
closed over variable overwrites it with a local...

  >>> def get_score():
  ...   total = 0
  ...   def add_points():
  ...     total += random.randint(0, 5)
  ...   for i in range(3):
  ...     add_points()
  ...   return total
  ...
  >>> get_score()
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 6, in get_score
    File "<stdin>", line 4, in add_points
  UnboundLocalError: local variable 'total' referenced before assignment

Python 3.x adds a new 'nonlocal' keyword for re-binding
closures but for those of us stuck in the past our best
option is to use the mutable hack. Gross, but it works.

  >>> def get_score():
  ...   total = [0]
  ...   def add_points():
  ...     total[0] = total[0] + random.randint(0, 5)
  ...   for i in range(3):
  ...     add_points()
  ...   return total[0]
  ...
  >>> get_score()
  8

------------------------------------------------------------
What is and what can be: an exploration from 'type' to Metaclasses
------------------------------------------------------------

Owww, my head. This and another talk the previous day by
Mike Graham introduced audiences to the wonderful world
of python metaclasses...

  "The subject of metaclasses in Python has caused hairs
  to raise and even brains to explode." -Guido

Method for redefining the fundamental behavior of objects
and in doing so tear the fabric of reality, metaclasses
are what you invoke each time you extend object. Dustin
demonstrated this by defining his own metaclass that
transparently causes method invocations to be accompanied
by a bark...

  from functools import wraps
  from inspect import isfunction

  def bark(f):
    @wraps(f)
    def wrapper(*args, **kwargs):
      print("bark!")
      return f(*args, **kwargs)

    return wrapper

  class MetaDog(type):
    def __new__(meta, name, bases, attrs):
      for name, attr in attrs.items():
        if isfunction(attr):
          attrs[name] = bark(attr)

      return type.__new__(meta, name, bases, attrs)

  class Dog(metaclass = MetaDog):
    def sit(self):
      print("*sitting*")

    def stay(self):
      print("*sitting*")

  d = Dog()
  d.sit()

So why will you use this? Well... hopefully you won't.
Besides the obvious unforgivability of this sin upon
your coworkers, this is the kind of black magic Ruby
folks do all the time but Python devs know better.
Like redefining builtins, just don't.

That aside, it was interesting to learn a little more
about the abstract base class module and how python
works under the hood.

------------------------------------------------------------
Building protocol libraries the right way
------------------------------------------------------------

Cory Benfield, author of Requests, urllib3, and other
core I/O libraries discussed a common pitfall that
inflicts protocol libraries: mixture of I/O with parsing.

Python has as many HTTP parsers as there are I/O libraries.
Urllib variants, aiohttp, Twisted, Tornado, and friends all
reinvent this wheel. Code re-use is particularly great when
you have a well defined problem with a single correct
solution. Arithmetic, compression, and parsing are all
examples of this, so why don't they all share a unified
parser?

The problem is that we tangle network I/O with parsing of
the messages we read. As such all these projects trip over
the same obscure edge cases and re-implement the same
optimizations.

Cory's message was simple: keep parsing separate. Besides
code reuse this greatly improves testability because you
don't need to invoke your I/O stack for coverage.

Personally I found this talk interesting because this is
exactly something I ran into with Stem. To work our I/O
handler needs enough understanding of the control-spec
to delimit message boundaries, but beyond that parsing
is a completely separate module. This has been a great
boon for testing...

  TEST_MESSAGE = """\
  250-version=0.2.3.11-alpha-dev
  250 OK"""

  def test_single_getinfo_response(self):
    """
    Parses a GETINFO reply response for a single parameter.
    """

    control_message =
stem.response.ControlMessage.from_str(TEST_MESSAGE, msg_type =
'GETINFO')
    self.assertEqual({'version': b'0.2.3.11-alpha-dev'},
control_message.entries)

------------------------------------------------------------
HTTP can do that?!
------------------------------------------------------------

Whimsical look at lesser known bits of the HTTP
specification...

  * Need just metadata of a GET request? Use HEAD instead
    for a far lighter response.

  * Calling OPTIONS will tell you the HTTP operations a
    resource supports.

  * Besides normal CRUD operations (GET, POST, PUT,
    DELETE) the HTTP spec has PATCH to update just
    part of a resource.

  * The specification also has TRACE, LINK, and
    UNLINK methods. Nobody uses them but hey, they're
    there.

  * Few interesting headers include ETag for versioning
    resources, If-Modified-Since to only solicit a
    response if the resource has changed, and Cache-Control
    to define cacheability. Actually, the specification
    even has a From header in case you want to tell
    everybody in the world your email address...

  * Few standard but infrequently used response codes are...

    * 410 - That resource used to be here but now it's gone.
    * 304 - You asked to get this resource if it's
            been modified but it hasn't.
    * 451 - Unavailable for legal reasons. Mostly comes up
            with censorship firewalls.

  * Unsurprisingly you can make up your own status codes
    and reason strings. Sumana had several amusing ones
    she's found in the wild.

------------------------------------------------------------
Playing with Python bytecode
------------------------------------------------------------

Amusing demonstration of executing raw bytecodes in python,
including runtime manipulation to switch a functor's addition
operation to multiplication. Interesting in a 'oh god, you
can do that?' sense but even the presenters said 'kids, don't
do this at home'. Few (if any?) practical applications, and
opcodes change even between minor Python interpretor version
bumps making any such hacks a maintenance nightmare.

------------------------------------------------------------
SQLite: Gotchas and Gimmes
------------------------------------------------------------

Tips by Dave Sawyer for SQLite, mostly focusing on the
advantages over pickles (performance, safety, etc), common
pitfalls, and locking strategies...

  * Deferred - Multiple readers/writers.
  * Immediate - Multiple readers/single writer
  * Exclusive - Single reader/writier.

WAL (Write Ahead Locking) is an alternative where readers
are unlocked with the writer appending deltas. Upon
checkpoints SQLite halts all reads/writes to apply the
deltas as a batch.

------------------------------------------------------------
See Python, See Python Go, Go Python Go
------------------------------------------------------------

Last talk I attended and the one I wanted to see most.
Imagine a world where performance critical code could
be written in Go rather than C. No more memory leaks.
No compilers. Sounds great, right? Well, keep dreaming.

Both Python and Go can drop to C and Andrey gave a demo
of doing so as a bridge between them, and in the process
explained why this is a terrible idea. The CPython
Extension interface requires a bit of boilerplate but
can work with no dependencies while CFFI requires some
magic but provides a more portable solution. But in
either case crossing both the Go-to-C and C-to-Python
boundaries drop you to the least common denominator.
This means no Go interfaces or routines, and no Python
classes or generators.

GC, GIL, and JIT all add their own headaches but worse,
you need to implement your own memory management. Sharing
between Go and Python risks release of memory the other
side still references. Andrey got around this by passing
his own dereferenceable pointers but... ick.

In the end Andrey's demo worked and in fact was just as
performant as a direct Go implementation, but made it
clear there be dragons. Frustratingly, it's still better
to just call os.system(). :(

------------------------------------------------------------

This being my first PyCon I focused on talks rather than
the hallway track but none the less had some nice finds...

  * Seattle is home to quite a few technical meetups.
    Hardware hacking, TA3M, Ruby Brigade, you name it
    and there's probably a group for it. SeaPig has
    been a fun local python group but sadly its gone
    dormant in recent years. Among the booths however
    I ran into members of PuPPy, another local python
    group that seems to be quite alive and well!

  * Didn't realize in advance but AWS networking ran
    a booth during the job fair. Fun chats with Shawn -
    he has a great approach for exciting folks to apply.

  * Crossed paths with meejah several times. Together
    we whipped up a recipe combining our libraries so
    users can read stem-parsed event objects from
    txtorcon. Neat stuff!

Simply a great conference, I look forward to hitting
PyCon again next year!


More information about the tor-reports mailing list