05-YAML Notebook
YAML IPython notebook¶
Little experiment base on the fact that apparently YAML is made to be better readable by Humans than JSON. We've also had some complaint that metadata are not keep in nbconvert when roundtripping through markdown, those two made me think that I could try to see what ipynb files stored as YAML would look like.
I'll also use this post to do some experiment for nbviewer future nbviewer features, if you see anything wrong with the css on some device, please tell me.
First atempt¶
Apparently Json is a subset of YAML:
cp foo.ipynb foo.ipyamlnb
Yeah, Mission acomplished !
Second try¶
Install PyYaml, and see what we can do.
import json
import yaml
from IPython.nbformat import current as nbf
ls Y*.ipynb
with open('YAML Notebook.ipynb') as f:
nbook = nbf.read( f, 'json')
nbook.worksheets[0].cells[9]
I'll skipp the fiddling around with the yaml converter. In short, you have to specify explicitely the part you want to dump in the literal form, otherwise they are exported as list of strings, which is a little painfull to edit afterward. I'm using the safe_dump
and safe_load
methods (or pass safeLoader and Dumper). Those should be default or otherwise you could unserialise arbitrary object, and have code exucuted.
We probably don't want to reproduct the recent file Rail's critical vulnerability that append not so long ago.
# we'll patch a safe Yaml Dumper
sd = yaml.SafeDumper
# Dummy class, just to mark the part we want with custom dumping
class folded_unicode(unicode): pass
class literal_unicode(unicode): pass
I know classes should be wit upper case, but we just want to hide the fact that thoses a class to end user. At the same time I define a folded method to use it with markdown cell. when markdown contain really long lines, those will be wrapped in the yaml document.
def folded_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')
sd.add_representer(folded_unicode, folded_unicode_representer)
sd.add_representer(literal_unicode, literal_unicode_representer)
with open('YAML Notebook.ipynb') as f:
nbjson = json.load(f)
now we patch the part of the ipynb file we know we want to be literal or folded
for tcell in nbjson['worksheets'][0]['cells']:
if 'source' in tcell.keys():
tcell['source'] = folded_unicode("".join(tcell['source']))
if 'input' in tcell.keys():
tcell['input'] = literal_unicode("".join(tcell['input']))
with open('Yaml.ipymlnb','w') as f:
f.write(yaml.dump(nbjson, default_flow_style=False, Dumper=sd))
You can round trip it to json, and it's still a valid ipynb file that can be loaded. Haven't fiddled with it much more. There are just a few gotchas with empty lines as well as trailing whitespace at EOL that can respectively diseapear or make the dumper fall back to a string quoted methods to store values.
You can skip down to the end of this notebook to look at how it looks like. It's probably much compact than the current json we emit, in some cases it might be more easy to read, but I don't think it is worth considering using in the format specification.
ipynb files are ment to be humanely fixable, and I strongly prefere having a consistent format with simple rules than having to explain what are the meaning of the differents shenigan like : |2+
for literal string.
Also support across languages are not consistent, and it would probably be too much of a security burden for all code that will support loading ipynb to take care of sanitazing Yaml.
One area where I woudl use it would be to describe the ipynb format at a talk for example, and/or to have metadata editing more human readable/writable.
!cat Yaml.ipymlnb