Writing an async REPL - Part 1

  |   Source

This is a first part in a series of blog post which explain how I implemented the ability to await code at the top level scope in the IPython REPL. Don't expect the second part soon, or bother me for it. I know I shoudl write it, but time is a rarte luxury.

It is an interesting adventure into how Python code get executed, and I must admit it changed quite a bit how I understand python code now days and made me even more excited about async/await in Python.

It should also dive quite a bit in the internals of Python/CPython if you ever are interested in what some of these things are.

In [1]:
# we cheat and deactivate the new IPython feature to match Python repl behavior
%autoawait False

Async or not async, that is the question

You might now have noticed it, but since Python 3.5 the following is valid Python syntax:

In [2]:
async def a_function():
    async with contextmanager() as f:
        result = await f.get('stuff')
        return result

So you've been curious and read a lot about asyncio, and may have come across a few new libraries like aiohttp and all hte aio-libs, heard about sans-io, read complaints and we can take differents approaches, and maybe even maybe do better. You vaguely understand the concept of loops and futures, the term coroutine is still unclear. So you decide to poke around yourself in the REPL.

In [3]:
import aiohttp
In [4]:
print(aiohttp.__version__)
coro_req = aiohttp.get('https://api.github.com')
coro_req
1.3.5
Out[4]:
<aiohttp.client._DetachedRequestContextManager at 0x1045289d8>
In [5]:
import asyncio
res = asyncio.get_event_loop().run_until_complete(coro_req)
In [6]:
res
Out[6]:
<ClientResponse(https://api.github.com) [200 OK]>
<CIMultiDictProxy('Server': 'GitHub.com', 'Date': 'Thu, 06 Apr 2017 19:49:20 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Status': '200 OK', 'X-Ratelimit-Limit': '60', 'X-Ratelimit-Remaining': '50', 'X-Ratelimit-Reset': '1491508909', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept', 'Etag': 'W/"7dc470913f1fe9bb6c7355b50a0737bc"', 'X-Github-Media-Type': 'github.v3; format=json', 'Access-Control-Expose-Headers': 'ETag, Link, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval', 'Access-Control-Allow-Origin': '*', 'Content-Security-Policy': "default-src 'none'", 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'deny', 'X-Xss-Protection': '1; mode=block', 'Vary': 'Accept-Encoding', 'X-Served-By': 'a51acaae89a7607fd7ee967627be18e4', 'Content-Encoding': 'gzip', 'X-Github-Request-Id': '8182:3911:C50FFE:EF0636:58E69BC0')>
In [7]:
res.json()
Out[7]:
<generator object ClientResponse.json at 0x1052cd9e8>
In [8]:
json = asyncio.get_event_loop().run_until_complete(res.json())
json
Out[8]:
{'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}',
 'current_user_url': 'https://api.github.com/user',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': 'https://api.github.com/user/keys',
 'notifications_url': 'https://api.github.com/notifications',
 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}',
 'organization_url': 'https://api.github.com/orgs/{org}',
 'public_gists_url': 'https://api.github.com/gists/public',
 'rate_limit_url': 'https://api.github.com/rate_limit',
 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}',
 'repository_url': 'https://api.github.com/repos/{owner}/{repo}',
 'starred_gists_url': 'https://api.github.com/gists/starred',
 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}',
 'team_url': 'https://api.github.com/teams',
 'user_organizations_url': 'https://api.github.com/user/orgs',
 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}',
 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}',
 'user_url': 'https://api.github.com/users/{user}'}

It's a bit painful to pass everything to run_until_complete, you know how to write async-def function and pass this to an event loop:

In [9]:
loop = asyncio.get_event_loop()
run = loop.run_until_complete
url = 'https://api.github.com/rate_limit'

async def get_json(url):
    res = await aiohttp.get(url)
    return await res.json()

run(get_json(url))
Out[9]:
{'rate': {'limit': 60, 'remaining': 50, 'reset': 1491508909},
 'resources': {'core': {'limit': 60, 'remaining': 50, 'reset': 1491508909},
  'graphql': {'limit': 0, 'remaining': 0, 'reset': 1491511760},
  'search': {'limit': 10, 'remaining': 10, 'reset': 1491508220}}}

Good ! And the you wonder, why do I have to wrap thing ina function, if I have a default loop isn't it obvious what where I want to run my code ? Can't I await things directly ? So you try:

In [10]:
await aiohttp.get(url)
  File "<ipython-input-10-055eb13ed07d>", line 1
    await aiohttp.get(url)
                ^
SyntaxError: invalid syntax

What ? Oh that's right there is no way in Pyton to set a default loop... but a SyntaxError ? Well, that's annoying.

Outsmart Python

Hopefully you (in this case me), are in control of the REPL. You can bend it to your will. Sure you can do some things. First you try to remember how a REPL works:

In [11]:
mycode = """
a = 1
print('hey')
"""
def fake_repl(code):
    import ast
    module_ast = ast.parse(mycode)
    bytecode = compile(module_ast, '<fakefilename>', 'exec')
    global_ns = {}
    local_ns = {}
    exec(bytecode, global_ns, local_ns)
    return local_ns

fake_repl(mycode)
hey
Out[11]:
{'a': 1}

We don't show global_ns as it is huge, it will contain all that's availlable by default in Python. Let see where it fails if you use try a top-level async statement:

In [12]:
import ast
mycode = """
import aiohttp
await aiohttp.get('https://aip.github.com/')
"""

module_ast = ast.parse(mycode)
  File "<unknown>", line 3
    await aiohttp.get('https://aip.github.com/')
                ^
SyntaxError: invalid syntax

Ouch, so we can't even compile it. Let be smart can we get the inner code ? if we wrap in async-def ?

In [13]:
mycode = """
async def fake():
    import aiohttp
    await aiohttp.get('https://aip.github.com/')
"""
module_ast = ast.parse(mycode)
ast.dump(module_ast)
Out[13]:
"Module(body=[AsyncFunctionDef(name='fake', args=arguments(args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))], decorator_list=[], returns=None)])"
In [14]:
ast.dump(module_ast.body[0])
Out[14]:
"AsyncFunctionDef(name='fake', args=arguments(args=[], vararg=None, kwonlyargs=[], kw_defaults=[], kwarg=None, defaults=[]), body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))], decorator_list=[], returns=None)"

As a reminder, as AST stands for Abstract Syntax Tree, you may construct an AST which is not a valid Python, program, like an if-else-else. AST tree can be modified. What we are interested in it the body of the function, which itself is the first object of a dummy module:

In [15]:
body = module_ast.body[0].body
body
Out[15]:
[<_ast.Import at 0x105d503c8>, <_ast.Expr at 0x105d50438>]

Let's pull out the body of the function and put it at the top level of a newly created module:

In [16]:
async_mod = ast.Module(body)
ast.dump(async_mod)
Out[16]:
"Module(body=[Import(names=[alias(name='aiohttp', asname=None)]), Expr(value=Await(value=Call(func=Attribute(value=Name(id='aiohttp', ctx=Load()), attr='get', ctx=Load()), args=[Str(s='https://aip.github.com/')], keywords=[])))])"

Mouahahahahahahahahah, you managed to get a valid top-level async ast ! Victory is yours !

In [17]:
bytecode = compile(async_mod, '<fakefile>', 'exec')
  File "<fakefile>", line 4
SyntaxError: 'await' outside function

Grumlgrumlgruml. You haven't said your last word. Your going to take your revenge later. Let's see waht we can do in Part II, not written yet.