# 05-YAML Notebook

|   Source

## YAML IPython notebook¶

Little experiment base on the fact that apparently YAML is made to be better readable by Humans than JSON. We've also had some complaint that metadata are not keep in nbconvert when roundtripping through markdown, those two made me think that I could try to see what ipynb files stored as YAML would look like.

I'll also use this post to do some experiment for nbviewer future nbviewer features, if you see anything wrong with the css on some device, please tell me.

##### First atempt¶

Apparently Json is a subset of YAML:

cp foo.ipynb foo.ipyamlnb

Yeah, Mission acomplished !

##### Second try¶

Install PyYaml, and see what we can do.

In [42]:
import json
import yaml

In [43]:
from IPython.nbformat import current as nbf

In [44]:
ls Y*.ipynb

YAML Notebook.ipynb

In [45]:
with open('YAML Notebook.ipynb') as f:

In [46]:
nbook.worksheets[0].cells[9]

Out[46]:
{u'cell_type': u'code',
u'collapsed': False,
u'input': u'from IPython.nbformat import current as nbf',
u'language': u'python',
u'outputs': []}

I'll skipp the fiddling around with the yaml converter. In short, you have to specify explicitely the part you want to dump in the literal form, otherwise they are exported as list of strings, which is a little painfull to edit afterward. I'm using the safe_dump and safe_load methods (or pass safeLoader and Dumper). Those should be default or otherwise you could unserialise arbitrary object, and have code exucuted.

We probably don't want to reproduct the recent file Rail's critical vulnerability that append not so long ago.

In [47]:
# we'll patch a safe Yaml Dumper
sd = yaml.SafeDumper

# Dummy class, just to mark the part we want with custom dumping
class folded_unicode(unicode): pass
class literal_unicode(unicode): pass


I know classes should be wit upper case, but we just want to hide the fact that thoses a class to end user. At the same time I define a folded method to use it with markdown cell. when markdown contain really long lines, those will be wrapped in the yaml document.

In [48]:
def folded_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

with open('YAML Notebook.ipynb') as f:


now we patch the part of the ipynb file we know we want to be literal or folded

In [49]:
for tcell in nbjson['worksheets'][0]['cells']:
if 'source' in tcell.keys():
tcell['source'] = folded_unicode("".join(tcell['source']))
if 'input' in tcell.keys():
tcell['input'] = literal_unicode("".join(tcell['input']))

In [50]:
with open('Yaml.ipymlnb','w') as f:
f.write(yaml.dump(nbjson, default_flow_style=False, Dumper=sd))


You can round trip it to json, and it's still a valid ipynb file that can be loaded. Haven't fiddled with it much more. There are just a few gotchas with empty lines as well as trailing whitespace at EOL that can respectively diseapear or make the dumper fall back to a string quoted methods to store values.

You can skip down to the end of this notebook to look at how it looks like. It's probably much compact than the current json we emit, in some cases it might be more easy to read, but I don't think it is worth considering using in the format specification.

ipynb files are ment to be humanely fixable, and I strongly prefere having a consistent format with simple rules than having to explain what are the meaning of the differents shenigan like : |2+ for literal string.

Also support across languages are not consistent, and it would probably be too much of a security burden for all code that will support loading ipynb to take care of sanitazing Yaml.

One area where I woudl use it would be to describe the ipynb format at a talk for example, and/or to have metadata editing more human readable/writable.

In [51]:
!cat Yaml.ipymlnb

metadata:
name: YAML Notebook
nbformat: 3
nbformat_minor: 0
worksheets:
- cells:
level: 1
source: >-
YAML IPython notebook
- cell_type: markdown
source: "Little experiment base on the fact that apparently YAML is made to be\
\ are not keep in nbconvert when roundtripping through markdown, those two\n\
made me think that I could try to see what ipynb files stored as YAML would\
\ look like. "
level: 4
source: >-
First atempt
- cell_type: markdown
source: >-
Apparently Json is a subset of YAML:
- cell_type: markdown
source: >2+
cp foo.ipynb foo.ipyamlnb

- cell_type: markdown
source: >-
Yeah, Mission acomplished !
level: 4
source: >-
Second try
- cell_type: markdown
source: "Install PyYaml, and see what we can do. "
- cell_type: code
collapsed: false
input: |-
import json
import yaml
language: python
outputs: []
- cell_type: code
collapsed: false
input: |-
from IPython.nbformat import current as nbf
language: python
outputs: []
- cell_type: code
collapsed: false
input: |-
ls Y*.ipynb
language: python
outputs: []
- cell_type: code
collapsed: false
input: |-
with open('YAML Notebook.ipynb') as f:
language: python
outputs: []
- cell_type: code
collapsed: false
input: |-
nbook.worksheets[0].cells[9]
language: python
outputs: []
- cell_type: markdown
source: >-
I'll skipp the fiddling around with the yaml converter. In short, you have to
specify explicitely the part you want to dump in the literal form, otherwise
they are exported as list of strings, which is a little painfull to edit afterward.
I'm using the safe_dump and safe_load methods (or pass safeLoader and Dumper).
Those should be default or otherwise you could unserialise arbitrary object,
and have code exucuted.

We probably don't want to reproduct the recent file Rail's critical vulnerability
that append not so long ago.
- cell_type: code
collapsed: false
input: |-
# we'll patch a safe Yaml Dumper
sd = yaml.SafeDumper

# Dummy class, just to mark the part we want with custom dumping
class folded_unicode(unicode): pass
class literal_unicode(unicode): pass
language: python
outputs: []
- cell_type: markdown
source: >-
I know classes should be wit upper case, but we just want to hide the fact that
thoses a class to end user. At the same time I define a folded method if I want
to use it later.
- cell_type: code
collapsed: false
input: |-
def folded_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

with open('YAML Notebook.ipynb') as f:
language: python
outputs: []
- cell_type: markdown
source: >-
now we patch the part of the ipynb file we know we want to be literal or folded
- cell_type: code
collapsed: false
input: |-
for tcell in nbjson['worksheets'][0]['cells']:
if 'source' in tcell.keys():
tcell['source'] = folded_unicode("".join(tcell['source']))
if 'input' in tcell.keys():
tcell['input'] = literal_unicode("".join(tcell['input']))
language: python
outputs: []
- cell_type: code
collapsed: false
input: |-
with open('Yaml.ipymlnb','w') as f:
f.write(yaml.dump(nbjson, default_flow_style=False, Dumper=sd))
language: python
outputs: []
- cell_type: markdown
source: >-
You can round trip it to json, and it's still a valid ipynb file that can be
loaded. Haven't fiddled with it much more.

There are just a few gotchas with empty lines as well as trailing whitespace
at EOL that can respectively diseapear or make the dumper fall back to a string
quoted methods to store values.

One could also try to tiker with folded_unicode in markdown cell that tipically
have long lines to play a little more nicely with VCS.
- cell_type: markdown
source: >-
You can skip down to the end of this notebook to loko at how it looks like.
It's probably much compact than the current json we emit, in **some** cases
it might be more easy to read, but I don't think it is worth considering using
in the format specification.

ipynb files are ment to be humanely fixable, and I strongly prefere having a
consistent format with simple rules than having to explain what are the meaning
of the differents shenigan like : |2+ for literal string.

Also support across languages are not consistent, and it would probably be too
much of a security burden for all code that will support loading ipynb to take
care of sanitazing Yaml.

One area where I woudl use it would be to describe the ipynb format at a talk