158

I have a problem when the output from a notebook is really long and it's saved into the notebook, any time I want to open this particular notebook again the browser crashes and can't display correctly.

To fix this I have to open it with a text editor and delete all output from that cell causing the problem.

I wonder if there is a way to clean all output from the notebook so one can open it again without problem. I want to delete all output since deleting a specific one seems more troublesome.

1

12 Answers 12

230

nbconvert 6.0 should fix --clear-output

The option had been broken for a long time previously, bug report with merged patch: https://github.com/jupyter/nbconvert/issues/822

Usage should be for in-place operation:

jupyter nbconvert --clear-output --inplace my_notebook.ipynb

Or to save to another file called my_notebook_no_out.ipynb:

jupyter nbconvert --clear-output \
  --to notebook --output=my_notebook_no_out my_notebook.ipynb

This was brought to my attention by Harold in the comments.

Before nbconvert 6.0: --ClearOutputPreprocessor.enabled=True

Same usage as --clear-output:

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace my_notebook.ipynb
jupyter nbconvert --ClearOutputPreprocessor.enabled=True \
  --to notebook --output=my_notebook_no_out my_notebook.ipynb

Tested in Jupyter 4.4.0, notebook==5.7.6.

Sign up to request clarification or add additional context in comments.

12 Comments

This will convert the notebook to html, which does not seem to be what the op wants..
@Jacquot What version of Jupyter are you in? I have just re-tested and it modifies the .ipynb inplace without creating HTML.
I read too quickly your comment and didn't know the --inplace option ; I learned something. But it appears for my version 5.3.1, the option --clear-output is available, that summarizes --ClearOutputPreprocessor.enabled=True --inplace
The option --clear-output was broken, see issue #822. This has been fixed last month (July 2020) so it should work again in the next release.
Not to criticize the answer, but my recent experience (Dec 2024) with nbconvert is that - if I hook it up with git filter, it slows down local git operation significantly. There are other folks experience the same in another similar question. We should have this perf impact in mind when using nbconvert with git filter.
|
70

If you create a .gitattributes file, you can run a filter over certain files before they are added to git. This will leave the original file on disk as-is, but commit the "cleaned" version.

For this to work, add this to your local .git/config or global ~/.gitconfig:

[filter "strip-notebook-output"]
    clean = "jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR"

Then create a .gitattributes file in your directory with notebooks, with this content:

*.ipynb filter=strip-notebook-output

How this works:

  • The attribute tells git to run the filter's clean action on each notebook file before adding it to the index (staging).
  • The filter is our friend nbconvert, set up to read from stdin, write to stdout, strip the output, and only speak when it has something important to say.
  • When a file is extracted from the index, the filter's smudge action is run, but this is a no-op as we did not specify it. You could run your notebook here to re-create the output (nbconvert --execute).
  • Note that if the filter somehow fails, the file will be staged unconverted.

My only minor gripe with this process is that I can commit .gitattributes but I have to tell my co-workers to update their .git/config.

If you want a hackier but much faster version, try JQ:

  clean = "jq '.cells[].outputs = [] | .cells[].execution_count = null | .'"

6 Comments

this is the best of both worlds. Thanks for sharing this
Didn’t know about this. This is super-useful.
A slightly improved alternative is as follows. It cleans the metadata, and doesn't add outputs and execution_count to non code cells like the proposed JQ solution (which results in a warning): clean = "jq '.cells |= map(if .\"cell_type\" == \"code\" then .outputs = [] | .execution_count = null else . end | .metadata = {}) | .metadata = {}'"
As a final step, you probably want to scrub and recommit all of your existing notebooks, otherwise you could get heinous merge conflicts later. To do that run git add --renormalize . and then commit.
Is there a way to temporarily turn off the filter for a specific commit? E.g., if my repository is closer to maturation than it used to be and now I want to use the notebook as a demonstration of using the code including outputs and figures.
|
16

nbstripout worked well for me.

Open the Jupyter terminal, navigate to the folder containing your notebook, and then run the following line:

nbstripout my_notebook.ipynb

2 Comments

Excellent - or even nbstripout *.ipynb :)
Might be obvious for most, but you need to first install nbstripout with something like: pip install nbstripout
11

Use --ClearOutputPreprocessor.enabled=True and --clear-output

Following this command:

jupyter nbconvert --ClearOutputPreprocessor.enabled=True --clear-output *.ipynb

Comments

7

To extend the answer from @dirkjot to resolve issue regarding sharing configuration:

Create a local .gitconfig file, rather than modifying .git/config. This makes the command that needs to be run on other machines slightly simpler. You can also create a script to run the git config command:

git config --local include.path ../.gitconfig

Note I have also changed the log level to INFO because I did want to see confirmation that the clean was running.

repo/.gitconfig

[filter "strip-notebook-output"]
    clean = "jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=INFO"

repo/.gitattributes

*.ipynb filter=strip-notebook-output

repo/git_configure.sh

git config --local include.path ../.gitconfig

Users then just need to run:

$ chmod u+x git_configure.sh
$ ./git_configure.sh

Comments

4

Use clean_ipynb, which not only clears notebook output but can also clean the code.

Install by pip install clean_ipynb

Run by clean_ipynb hello.ipynb

1 Comment

nbclean is a tool that can do that with some handy additional features, such as only removing only certain blocks of code/text, that make it handy for use for teaching.
4

I must say I find jupyer nbconvert painfully slow for the simple job of clearing some sub-arrays and resetting some execution numbers. It’s a superior solution in maintainability because that tool is expected to be updated if there is a change in the notebook source code format. However, the alternate solution below is faster and may also be useful if you don’t have nbconvert 6.0 (I have an environment running 5.6.1 at the moment…)

A very simple jq (a sort of sed for json) script does the trick very fast:

jq 'reduce path(.cells[]|select(.cell_type == "code")) as $cell (.; setpath($cell + ["outputs"]; []) | setpath($cell + ["execution_count"]; null))' notebook.ipynb > out-notebook.ipynb

Very simply, it identifies code cells, and replaces their outputs and execution_count attributes with [] and null respectively.


Or if you only want to remove the outputs and keep execution numbers, you can do even simpler:

jq 'del(.cells[]|select(.cell_type == "code").outputs[])' notebook.ipynb > out-notebook.ipynb

Comments

1

As mentioned in one of the previous answers you can use the command-line json processor jq to perform this task notably quicker than with nbconvert. A complete command for getting rid of metadata, outputs and execution counts can be found in this blog post:

jq --indent 1 \
    '
    (.cells[] | select(has("outputs")) | .outputs) = []
    | (.cells[] | select(has("execution_count")) | .execution_count) = null
    | .metadata = {"language_info": {"name":"python", "pygments_lexer": "ipython3"}}
    | .cells[].metadata = {}
    ' 01-parsing.ipynb

If desired, you could modify to just clean a specific part of the output, such as execution counts (recursively wherever they occur in the json), and then add this as a git filter:

[filter "nbstrip"]
    clean = jq --indent 1 '(.. |."execution_count"? | select(. != null)) = null'
    smudge = cat

And add the following to ~/.config/git/attributes to have the filter applied globally to all your local repos:

*.ipynb filter=nbstripout

There is also nbstripout which is made for this purpose, but it's a bit slower.

Comments

0

I suggest using pre-commit approach, using something like:

  - repo: local
    hooks:
      - id: jupyter-nb-clear-output
        name: jupyter-nb-clear-output
        files: \.ipynb$
        stages: [commit]
        language: python
        entry: jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace
        additional_dependencies: ['jupyterlab']

also explained more in this blog.

3 Comments

This update reminds me that there's also a GitHub action for cleaning notebooks, too. See here.
when I do it this way on Github Desktop, I get the following error: "jupyter-nb-clear-output..................................................Failed - hook id: jupyter-nb-clear-output - exit code: 1 Executable jupyter not found"
does this thread help?
0

Parse the json:

#LARGE Notebook Clean Make a Copy FIRST and Run this only on the COPY!!!!

import json 
filename = 'COPY_of_Huge_Notebook.ipynb' 
f = open(filename) 
large_ntbk = json.load(f) 
f.close() 
outputs = large_ntbk['cells'] 
for o in outputs:
    if 'outputs' in o:
        outputs['outputs'] = []

small = open('small.ipynb', 'w') 
json.dump(large_ntbk, small, indent = 2) 
small.close()

Comments

0

Here is my homebrew solution that I used from within a notebook to clear the 200MB output from another notebook:

with open('input.ipynb', 'r') as input_file, open('output.ipynb', 'w') as output_file:
    outblock=False 
    s2 = '   ],\n'
    s0 = '   "outputs": [],\n'
    s1 = '   "outputs": [\n'
    
    for line in input_file:
        if outblock:
            if line == s2:
                print(f'match{s2[:-1]}')
                outblock = False
                output_file.write(s0)
            continue     
        if line == s0:
            print(f'match{s0[:-1]}')
            output_file.write(line)
            continue
        if line == s1:
            print(f'match{s1[:-1]}')
            outblock = True
            continue
        output_file.write(line)

Comments

0

A function inspired by preceding answers that can be called on a list of files.

import json

def ntbk_clean(nb_path):  <==== pass in file path

    with open(nb_path, 'rb') as f:   <==== open the file
        notebook = json.load(f)
        outputs = notebook['cells']   <==== body of notebook

    for index, o in enumerate(outputs):
        if 'outputs' in o:
            o['outputs'] = []   <===== where there is an output replace with empty list

    with open(nb_path, 'w') as f:
        json.dump(notebook, f, indent = 2). <=== write out the modified notebook body to same path

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.