06-NBconvert-Doc-Draft

2013-06-01 20:00 | Source

How to Use NBConvert¶

NBconvert migration¶

NBconvert has now been merged into IPython itself. You will need IPython 1.0 or above to have this works (asuuming the API have not changed)

Intro¶

In this post I will introduce you to the programatic API of nbconvert to show you how to use it in various context.

For this I will use one of @jakevdp great blog post. I've explicitely chosen a post with no javascript tricks as Jake seem to be found of right now, for the reason that the becommings of embeding javascript in nbviewer, which is based on nbconvert is not fully decided yet.

This will not focus on using the command line tool to convert file. The attentive reader will point-out that no data are read from, or written to disk during the conversion process. Indeed, nbconvert as been though as much as possible to avoid IO operation and work as well in a database, or web-based environement.

Quick overview¶

The main principle of nbconvert is to instanciate a Exporter that controle a pipeline through which each notebook you want to export with go through.

Let's start by importing what we need from the API, and download @jakevdp's notebook.

In [1]:

import requests
response = requests.get('http://jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb')
response.content[0:60]+'...'

Out[1]:

'{\n "metadata": {\n  "name": "XKCD_plots"\n },\n "nbformat": 3,\n...'

We read the response into a slightly more convenient format which represent IPython notebook. There are not real advantages for now, except some convenient methods, but with time this structure should be able to guarantee that the notebook structure is valid.

In [2]:

from IPython.nbformat import current as nbformat
jake_notebook = nbformat.reads_json(response.content)
jake_notebook.worksheets[0].cells[0]

Out[2]:

{u'cell_type': u'heading',
 u'level': 1,
 u'metadata': {},
 u'source': u'XKCD plots in Matplotlib'}

So we have here Jake's notebook in a convenient for, which is mainly a Super-Powered dict and list nested. You don't need to worry about the exact structure.

The nbconvert API exposes some basic exporter for common format and default options. We will start by using one of them. First we import it, instanciate an instance with all the defautl parameters and fed it the downloaded notebook.

In [3]:

import IPython.nbconvert

In [6]:

from IPython.config import Config
from IPython.nbconvert import HTMLExporter

## I use basic here to have less boilerplate and headers in the HTML.
## we'll see later how to pass config to exporters.
exportHtml = HTMLExporter(config=Config({'HTMLExporter':{'default_template':'basic'}}))

In [7]:

(body,resources) = exportHtml.from_notebook_node(jake_notebook)

The exporter returns a tuple containing the body of the converted notebook, here raw HTML, as well as a resources dict. The resource dict contains (among many things) the extracted PNG, JPG [...etc] from the notebook when applicable. The basic HTML exporter does keep them as embeded base64 into the notebook, but one can do ask the figures to be extracted. Cf advance use. So for now the resource dict should be mostly empty, except for 1 key containing some css, and 2 others whose content will be obvious.

Exporter are stateless, you won't be able to extract any usefull information (except their configuration) from them. You can directly re-use the instance to convert another notebook. Each exporter expose for convenience a from_file and from_filename methods if you need.

In [6]:

print resources.keys()
print resources['metadata']
print resources['output_extension']
# print resources['inlining'] # too lng to be shown

['inlining', 'output_extension', 'metadata']
defaultdict(None, {'name': 'Notebook'})
html

In [7]:

# Part of the body, here the first Heading
start = body.index('<h1 id', )
print body[:400]+'...'

<div class="text_cell_render border-box-sizing rendered_html">
<h1 id="XKCD-plots-in-Matplotlib">XKCD plots in Matplotlib<a class="anchor-link" href="#XKCD-plots-in-Matplotlib">&#182;</a></h1>
</div>

<div class="text_cell_render border-box-sizing rendered_html">
<p>This notebook originally appeared as a blog post at <a href="http://jakevdp.github.com/blog/2012/10/07/xkcd-style-plots-in-matplotli...

You can directly write the body into an HTML file if you wish, as you see it does not contains any body tag, or style declaration, but thoses are included in the default HtmlExporter if you do not pass it a config object as I did.

Extracting Figures¶

When exporting one might want to extract the base64 encoded figures to separate files, this is by default what does the RstExporter does, let see how to use it.

In [8]:

from IPython.nbconvert import RSTExporter

rst_export = RSTExporter()

(body,resources) = rst_export.from_notebook_node(jake_notebook)

In [9]:

print body[:970]+'...'
print '[.....]'
print body[800:1200]+'...'

XKCD plots in Matplotlib
========================


This notebook originally appeared as a blog post at `Pythonic
Perambulations <http://jakevdp.github.com/blog/2012/10/07/xkcd-style-plots-in-matplotlib/>`_
by Jake Vanderplas.

 *Update: the matplotlib pull request has been merged! See* `*This
post* <http://jakevdp.github.io/blog/2013/07/10/XKCD-plots-in-matplotlib/>`_
*for a description of the XKCD functionality now built-in to
matplotlib!*

One of the problems I've had with typical matplotlib figures is that
everything in them is so precise, so perfect. For an example of what I
mean, take a look at this figure:
In[1]:
.. code:: python

    from IPython.display import Image
    Image('http://jakevdp.github.com/figures/xkcd_version.png')





.. image:: output_3_0.png



Sometimes when showing schematic plots, this is the type of figure I
want to display. But drawing it by hand is a pain: I'd rather just use
matplotlib. The problem is, matplotlib is a bit...
[.....]
owing schematic plots, this is the type of figure I
want to display. But drawing it by hand is a pain: I'd rather just use
matplotlib. The problem is, matplotlib is a bit too precise. Attempting
to duplicate this figure in matplotlib leads to something like this:
In[2]:
.. code:: python

    Image('http://jakevdp.github.com/figures/mpl_version.png')





.. image:: output_5_0.png



It just doesn'...

Here we see that base64 images are not embeded, but we get what look like file name. Actually those are (Configurable) keys to get back the binary data from the resources dict we havent inspected earlier.

So when writing a Rst Plugin for any blogengine, Sphinx or anything else, you will be responsible for writing all those data to disk, in the right place. Of course to help you in this task all those naming are configurable in the right place.

let's try to see how to get one of these images

In [10]:

resources['outputs'].keys()

Out[10]:

[u'output_13_1.text',
 u'output_18_0.text',
 u'output_3_0.text',
 u'output_18_1.png',
 u'output_12_0.text',
 u'output_5_0.text',
 u'output_5_0.png',
 u'output_13_1.png',
 u'output_16_0.text',
 u'output_13_0.text',
 u'output_18_1.text',
 u'output_3_0.png',
 u'output_16_0.png']

We have extracted 5 binary figures, here pngs, but they could have been svg, and then wouldn't appear in the binary sub dict. keep in mind that a object having multiple repr will store all it's repr in the notebook.

Hence if you provide _repr_javascript_,_repr_latex_ and _repr_png_to an object, you will be able to determine at conversion time which representaition is the more appropriate. You could even decide to show all the representaition of an object, it's up to you. But this will require beeing a little more involve and write a few line of Jinja template. This will probably be the subject of another tutorial.

Back to our images,

In [11]:

from IPython.display import Image
Image(data=resources['outputs']['output_3_0.png'],format='png')

Out[11]:

Yep, this is indeed the image we were expecting, and I was able to see it without ever writing or reading it from disk. I don't think I'll have to show to you what to do with those data, as if you are here you are most probably familiar with IO.

Extracting figures with HTML Exporter ?¶

Use case:

I write an awesome blog in HTML, and I want all but having base64 embeded images. Having one html file with all inside is nice to send to coworker, but I definitively want resources to be cached ! So I need an HTML exporter, and I want it to extract the figures !

Some theory¶

The process of converting a notebook to a another format with the nbconvert Exporters happend in a few steps:

Get the notebook data and other required files. (you are responsible for that)
Feed them to the exporter that will
- sequentially feed the data to a number of Transformers. Transformer only act on the structure of the notebook, and have access to it all.
- feed the notebook through the jinja templating engine
  - the use templates are configurable.
  - templates make use of configurable macros called filters.
The exporter return the converted notebook as well as other relevant resources as a tuple.
Write what you need to disk, or elsewhere. (You are responsible for it)

Here we'll be interested in the Transformers. Each Transformer is applied successively and in order on the notebook before going through the conversion process.

We provide some transformer that do some modification on the notebook structure by default. One of them, the ExtractOutputTransformer is responsible for crawling notebook, finding all the figures, and put them into the resources directory, as well as choosing the key (filename_xx_y.extension) that can replace the figure in the template.

The ExtractOutputTransformer is special in the fact that it should be availlable on all Exporters, but is just inactive by default on some exporter.

In [12]:

# second transformer shoudl be Instance of ExtractFigureTransformer
exportHtml._transformers # 3rd one shouel be <ExtractOutputTransformer>

Out[12]:

[<function IPython.nbconvert.transformers.coalescestreams.wrappedfunc>,
 <IPython.nbconvert.transformers.svg2pdf.SVG2PDFTransformer at 0x10c203e90>,
 <IPython.nbconvert.transformers.extractoutput.ExtractOutputTransformer at 0x10c20e410>,
 <IPython.nbconvert.transformers.csshtmlheader.CSSHTMLHeaderTransformer at 0x10c20e490>,
 <IPython.nbconvert.transformers.revealhelp.RevealHelpTransformer at 0x10c1cbf10>,
 <IPython.nbconvert.transformers.latex.LatexTransformer at 0x10c203550>,
 <IPython.nbconvert.transformers.sphinx.SphinxTransformer at 0x10c203690>]

To enable it we will use IPython configuration/Traitlets system. If you are have already set some IPython configuration options, this will look pretty familiar to you. Configuration option are always of the form:

ClassName.attribute_name = value

A few ways exist to create such config, like reading a config file in your profile, but you can also do it programatically usign a dictionary. Let's create such a config object, and see the difference if we pass it to our HtmlExporter

In [13]:

from IPython.config import Config

c =  Config({
            'ExtractOutputTransformer':{'enabled':True}
            })

exportHtml = HTMLExporter()
exportHtml_and_figs = HTMLExporter(config=c)

(_, resources)          = exportHtml.from_notebook_node(jake_notebook)
(_, resources_with_fig) = exportHtml_and_figs.from_notebook_node(jake_notebook)

print 'resources without the "figures" key :'
print resources.keys()

print ''
print 'Here we have one more field '
print resources_with_fig.keys()
resources_with_fig['outputs'].keys()

resources without the "figures" key :
['inlining', 'output_extension', 'metadata']

Here we have one more field 
['outputs', 'inlining', 'output_extension', 'metadata']

Out[13]:

[u'output_13_1.text',
 u'output_18_0.text',
 u'output_3_0.text',
 u'output_18_1.png',
 u'output_12_0.text',
 u'output_5_0.text',
 u'output_5_0.png',
 u'output_13_1.png',
 u'output_16_0.text',
 u'output_13_0.text',
 u'output_18_1.text',
 u'output_3_0.png',
 u'output_16_0.png']

So now you can loop through the dict and write all those figures to disk in the right place...

Custom transformer¶

Of course you can imagine many transformation that you would like to apply to a notebook. This is one of the reason we provide a way to register your own transformers that will be applied to the notebook after the default ones.

To do so you'll have to pass an ordered list of Transformers to the Exporter constructor.

But what is an transformer ? Transformer can be either decorated function for dead-simple Transformers that apply independently to each cell, for more advance transformation that support configurability You have to inherit from Transformer and define a call method as we'll see below.

All transforers have a magic attribute that allows it to be activated/disactivate from the config dict.

In [14]:

from IPython.nbconvert.transformers import Transformer
import IPython.config
print "Four relevant docstring"
print '============================='
print Transformer.__doc__
print '============================='
print Transformer.call.__doc__
print '============================='
print Transformer.transform_cell.__doc__
print '============================='

Four relevant docstring
=============================
 A configurable transformer

    Inherit from this class if you wish to have configurability for your
    transformer.

    Any configurable traitlets this class exposed will be configurable in profiles
    using c.SubClassName.atribute=value

    you can overwrite transform_cell to apply a transformation independently on each cell
    or __call__ if you prefer your own logic. See corresponding docstring for informations.

    Disabled by default and can be enabled via the config by
        'c.YourTransformerName.enabled = True'
    
=============================

        Transformation to apply on each notebook.
        
        You should return modified nb, resources.
        If you wish to apply your transform on each cell, you might want to 
        overwrite transform_cell method instead.
        
        Parameters
        ----------
        nb : NotebookNode
            Notebook being converted
        resources : dictionary
            Additional resources used in the conversion process.  Allows
            transformers to pass variables into the Jinja engine.
        
=============================

        Overwrite if you want to apply a transformation on each cell.  You 
        should return modified cell and resource dictionary.
        
        Parameters
        ----------
        cell : NotebookNode cell
            Notebook cell being processed
        resources : dictionary
            Additional resources used in the conversion process.  Allows
            transformers to pass variables into the Jinja engine.
        index : int
            Index of the cell being processed
        
=============================

We don't provide convenient method to be aplied on each worksheet as the data structure for worksheet will be removed. (not the worksheet functionnality, which is still on it's way)

Example¶

I'll now demonstrate a specific example requested while nbconvert 2 was beeing developped. The ability to exclude cell from the conversion process based on their index.

I'll let you imagin how to inject cell, if what you just want is to happend static content at the beginning/end of a notebook, plese refer to templating section, it will be much easier and cleaner.

In [15]:

from IPython.utils.traitlets import Integer

In [16]:

class PelicanSubCell(Transformer):
    """A Pelican specific transformer to remove somme of the cells of a notebook"""
    
    # I could also read the cells from nbc.metadata.pelican is someone wrote a JS extension
    # But I'll stay with configurable value. 
    start = Integer(0, config=True, help="first cell of notebook to be converted")
    end   = Integer(-1, config=True, help="last cell of notebook to be converted")
    
    def call(self, nb, resources):

        #nbc = deepcopy(nb)
        nbc = nb
        # don't print in real transformer !!!
        print "I'll keep only cells from ", self.start, "to ", self.end, "\n\n"
        for worksheet in nbc.worksheets :
            cells = worksheet.cells[:]
            worksheet.cells = cells[self.start:self.end]                    
        return nbc, resources

In [17]:

# I create this on the fly, but this could be loaded from a DB, and config object support merging...
c =  Config({
            'PelicanSubCell':{
                            'enabled':True,
                            'start':4,
                            'end':6,
                             }
            })

I'm creating a pelican exporter that take PelicanSubCell extra transformers and a config object as parameter. This might seem redundant, but with configuration system you'll see that one can register an inactive transformer on all exporters and activate it at will form its config files and command line.

In [18]:

pelican = RSTExporter(transformers=[PelicanSubCell], config=c)

In [19]:

print pelican.from_notebook_node(jake_notebook)[0]

I'll keep only cells from  4 to  6 



Sometimes when showing schematic plots, this is the type of figure I
want to display. But drawing it by hand is a pain: I'd rather just use
matplotlib. The problem is, matplotlib is a bit too precise. Attempting
to duplicate this figure in matplotlib leads to something like this:
In[2]:
.. code:: python

    Image('http://jakevdp.github.com/figures/mpl_version.png')





.. image:: output_5_0.png

Update¶

All part on figure naming in template removed since many thinfs in API have changed

I think this is enough for now, As you have seen there are a few bugs here and there I need to correct before continuing. Next time I'll show you how to modify template :

{%- extends 'fullhtml.tpl' -%}
{% block input_group -%}
{% endblock input_group %}

... and you just removed all the codecell by keeping the output and markdown codecell, isn't that wonderfull ? You want to wrap each cell in your own div ?

{%- extends 'fullhtml.tpl' -%}
{% block codecell %}
<div class="myclass">
{{ super() }}
</div>
{%- endblock codecell %}

Try to look at what Jinja can do, thenlearn about Jinja Filters and imagine they can magically read your config file.

For example we provide a filter that highlight by presupposing code is Python. Or one that wraps text at a default length of 80 char... Want a rot13 filter on some codecell when doing exercises for student ? See you next time !

More...¶

One more example from one Pull-Request.

In [20]:

from IPython.nbconvert.filters.highlight import _pygment_highlight
from pygments.formatters import HtmlFormatter

from IPython.nbconvert.exporters import HTMLExporter
from IPython.config import Config

from IPython.nbformat import current as nbformat


def my_highlight(source, language='ipython'):
    formatter = HtmlFormatter(cssclass='highlight-ipynb')
    return _pygment_highlight(source, formatter, language)
        
c = Config({'CSSHtmlHeaderTransformer':
                    {'enabled':True, 'highlight_class':'highlight-ipynb'}})

exportHtml = HTMLExporter( config=c , filters={'highlight': my_highlight} )
(body,resources) = exportHtml.from_notebook_node(jake_notebook)

In [21]:

from jinja2 import DictLoader

dl = DictLoader({'html_full.tpl': 
"""
{%- extends 'html_basic.tpl' -%} 

{% block footer %}
FOOOOOOOOTEEEEER
{% endblock footer %}
"""})


exportHtml = HTMLExporter( config=None , filters={'highlight': my_highlight}, extra_loaders=[dl] )
(body,resources) = exportHtml.from_notebook_node(jake_notebook)
for l in body.split('\n')[-4:]:
    print l

<p>This post was written entirely in an IPython Notebook: the notebook file is available for download <a href="http://jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb">here</a>. For more information on blogging with notebooks in octopress, see my <a href="http://jakevdp.github.com/blog/2012/10/04/blogging-with-ipython/">previous post</a> on the subject.</p>
</div>
FOOOOOOOOTEEEEER