Investigating slow startup of a gevent-based server application

Fast iteration time is critical when developing new things, and everything is fine when the server takes half a second to start. But when that server takes ten seconds to start, that's when things get annoying. So annoying that I had no choice but to spend several hours digging around for a reason for that slowdown.

The server in question was a Python/gevent/pywsgi server kinds of which I have been using quite some time now. And this was a new problem, one I had not previously encountered before: of course I wanted to get to the bottom of this.

First I tried to place some strategic print statements here and there, but those didn't help. Next I fired up the debugger and suspended the process during the startup. Nothing low-level blocking socket creation, and gevent has happily running its event loop; can't be its fault. Gevent has always done a great job not blocking anything, so that couldn't be it. (This is where I made a 'mistake', see the last paragraph.)

I was determined that it was something low-level, so it was time bring out the big guns: API Monitor. It could log every low-level API call, and then it would just be matter of digging through them all. And there indeed was some digging to be done. Then finally I found what I was looking for. The gevent event loop really did spin as it should, but the socket began processing data only after a gethostbyaddr-call returned.

That strange character is actually FE 80 00, rest of the address not rendering as a string.

This low-lever function was taking a long time to execute, and was actually executed in another thread, communicating its results via a socket. back to the main thread. For its argument it was given a link-local IPv6 address of (the first adapter from ipconfig) Hyper-V bridge adapter. Maybe due to some kind of misconfiguration or whatever that call took an excessive amount of time.

Now that I knew what was happening I wanted to know why. Intuition brought me to gevent's wrapper, where I inserted some tracking code to its implementation of gethostbyaddr. That in turn told me that as part of creating the socket the server's environmental variables were initialized. One of these was SERVER_NAME. If it was not already set, it was resolved via getfqdn - which called gethostname and that devil-ish gethostbyaddr. Only after the server name was resolved could the socket begin accepting connections.

Now that I had the general reason I didn't want to bother myself more that I had to. As the mechanism was already there, all it took was to pass environ={'SERVER_NAME': 'whatever'} as a kwargs to WSGIServer.

* * *

Had I though a bit more before letting go of the debugger and starting the API Monitor, I would probably have though about looking at the individual greenlet stacktraces. Those would have clearly told that the getfqdn call was blocking the main server greenlet.

WebRTC primer

As a relatively new technology WebRTC is still quite unheard of even though it will likely be the next Big Thing. It offers a whole new way to create interactive peer to peer multimedia applications within a browser - without requiring any additional plugins. Support is already built-in to the newest versions of Chrome on both desktops and Android. Firefox and Opera also have WebRTC capabilities, but still have some features missing.

It's bit of a chicken-egg problem really. There is not yet widespread adaptation so development doesn't have the highest priorty, and the development is not the highest priority because there is no widespread adaptation. While I can't really do much about the APIs, I can still try and present my take on the basic WebRTC connection flow. Hopefully this helps someone to create a cool WebRTC application and thereby indirectly contributing to development priorities.

But please note that this text is written as part of a project I've been working on and is not meant to be the singular introduction to WebRTC, nor is this meant to primarily be a tutorial. If you are looking for a more thorough introduction, see the great tutorial on HTML5 Rocks. After you have read that tutorial and still feel disoriented, I hope you came back here and read what I've written. Hopefully at least my diagram will clarify something.

WebRTC connection flow

In short, to establish a connection between two peers the following needs to be done:
  • Create a signaling channel between the peers
  • Get local media, and negotiate codecs
  • Perform interactive connection establishment assisted by the signaling channel
  • And finally start streaming data
This is my take on the issue. It is not the one and only way to do things, especially with the signaling channel. But then again, signaling is not covered in the WebRTC specification even though it's a very important piece of the puzzle. The easiest way is to roll your own asynchronous server using something like python-gevent or node.js, but you could as well adapt something like SIP or XMPP for this purpose.

At the very beginning both of the clients connect to this signaling / management server and mutually agree on a session. Later they then use this session to exchange the messages necessary to build their own direct connection with the steps illustrated in the diagram 1 below(use a state machine, you'll thank yourself later). Most of the functions in the diagram refer to the WebRTC-stack but some are just to illustrate a point. Also note that some functions might fail due to user actions, and some due to timeouts. Application-defined timeouts can also occur waiting for state transitions. Finally, the traffic between the peers is the traffic via the signaling server.

Diagram 1. Basic connection flow