5
$\begingroup$

I use Jupyter notebooks to teach programming, using markdown in text cells, and I want to separate the concepts by level-1 headings (starting with # Heading), for example lists, functions, modules, etc. Then I want to generate a Jupyter notebook that includes some of these modules but not others, like I would do in LaTeX:

\include{lists.ipynb}
\include{functions.ipynb}
\include{modules.ipynb}

Is that possible in Jupyter, or some other code that compiles to a Jupyter notebook?

$\endgroup$
3
  • 1
    $\begingroup$ nbformat should be the easiest/best $\endgroup$ Commented Oct 4 at 20:20
  • 3
    $\begingroup$ nbformat lacks built-in section slicing, but you can implement it easily if you 1) read them all (nbformat.read()), 2) extract cells between headings (by finding markdown cells starting with # Heading), 3) concatenate the required sections, 4) write with nbformat.write(),5) create a build script listing sections as [(filename, heading_name)] tuples. This gives you $\LaTeX$-style \include for notebook sections. I haven't tested it, but if you need it, I can write it up as a proper (MWE) answer ? $\endgroup$ Commented Oct 4 at 20:36
  • 1
    $\begingroup$ OK will do. I've actually done a bit of work on it already and was hoping to have it by now, but it's taking longer than I thought - will try to finish in the morning ! $\endgroup$ Commented Oct 6 at 20:40

1 Answer 1

5
$\begingroup$

Further to my comment to the OP, what follows is hopefully a minimum working example for my suggested solution, which I have tested. I've included some slightly verbose comments, docstrings, and output to explain both the approach and the code, so hopefully it is self-explanatory. It also includes basic error handling, though you may want to add more robust validation for your own use.

Note that most of what follows is gleaned from the official nbformat documentation, the standard/recommended library for programmatic notebook manipulation.

As mentioned, nbformat doesn't have built-in section slicing, but we can implement LaTeX-style \include functionality by:

  1. Reading notebooks with nbformat.read()
  2. Extracting cells between headings (by finding markdown cells starting with # Heading)
  3. Concatenating the desired sections
  4. Writing the result with nbformat.write()

This gives us a build script where we list sections as [(filename, heading_name)] tuples, similar to $\LaTeX$'s \include command.

For demonstration purposes, first we create the 3 (very simple) sample notebooks, as per the OP (lists.ipynb, functions.ipynb, modules.ipynb), each with multiple sections marked by level-1 headings (# Heading$^\ddagger$). Then we define a function to extract the relevant sections (extract_section()), by reading a notebook and extracting all cells between the specified heading and the next level-1 heading. Finally, we create a function to merge them (merge_notebooks() for the simple case of merging entire notebooks, and merge_sections() to merge specific sections using [(filename, heading)] tuples $\LaTeX$-style.

And of course the MWE would not be complete without showing how to use the code in practice.

import nbformat
from nbformat.v4 import new_notebook, new_markdown_cell, new_code_cell

def create_sample_notebooks():
    """Create sample notebooks for demonstration"""
    
    # Create lists.ipynb
    nb_lists = new_notebook()
    nb_lists.cells = [
        new_markdown_cell("# Lists\n\nIntroduction to Python lists."),
        new_code_cell("my_list = [1, 2, 3, 4, 5]\nprint(my_list)"),
        new_markdown_cell("Lists are mutable and can contain any type."),
        new_markdown_cell("# List Methods\n\nCommon list operations."),
        new_code_cell("my_list.append(6)\nprint(my_list)")
    ]
    with open('lists.ipynb', 'w', encoding='utf-8') as f:
        nbformat.write(nb_lists, f)
    
    # Create functions.ipynb
    nb_functions = new_notebook()
    nb_functions.cells = [
        new_markdown_cell("# Functions\n\nDefining and using functions in Python."),
        new_code_cell("def greet(name):\n    return f'Hello, {name}!'\n\nprint(greet('World'))"),
        new_markdown_cell("Functions help organise and reuse code."),
        new_markdown_cell("# Advanced Functions\n\nLambda functions and decorators."),
        new_code_cell("square = lambda x: x**2\nprint(square(5))")
    ]
    with open('functions.ipynb', 'w', encoding='utf-8') as f:
        nbformat.write(nb_functions, f)
    
    # Create modules.ipynb
    nb_modules = new_notebook()
    nb_modules.cells = [
        new_markdown_cell("# Modules\n\nImporting and using Python modules."),
        new_code_cell("import math\nprint(f'Pi is approximately {math.pi:.2f}')"),
        new_markdown_cell("Modules extend Python's functionality.")
    ]
    with open('modules.ipynb', 'w', encoding='utf-8') as f:
        nbformat.write(nb_modules, f)
    
    print("Created sample notebooks: lists.ipynb, functions.ipynb, modules.ipynb")


def extract_section(notebook_file, heading_name):
    """
    Extract cells between a specific heading and the next top-level heading.
    
    Args:
        notebook_file: Input notebook filename
        heading_name: The heading text to search for (without the # prefix)
    
    Returns:
        List of cells in the section, or empty list if heading not found
    """
    try:
        with open(notebook_file, 'r', encoding='utf-8') as f:
            nb = nbformat.read(f, as_version=4)
    except FileNotFoundError:
        print(f"Error: {notebook_file} not found")
        return []
    except Exception as e:
        print(f"Error reading {notebook_file}: {e}")
        return []
    
    section_cells = []
    in_section = False
    
    for cell in nb.cells:
        if cell.cell_type == 'markdown' and cell.source.startswith('# '):
            # Extract heading text (remove '# ' and any trailing whitespace/newlines)
            cell_heading = cell.source.split('\n')[0].replace('# ', '').strip()
            
            if cell_heading == heading_name:
                in_section = True
                section_cells.append(cell)
            elif in_section:
                # Hit the next top-level heading, stop
                break
        elif in_section:
            section_cells.append(cell)
    
    if not section_cells:
        print(f"Warning: Heading '{heading_name}' not found in {notebook_file}")
    
    return section_cells


def merge_notebooks(notebook_files, output_file, add_separators=False):
    """
    Merge multiple Jupyter notebooks into a single notebook.
    
    Args:
        notebook_files: List of input notebook filenames
        output_file: Output notebook filename
        add_separators: If True, add markdown separators between notebooks
    """
    if not notebook_files:
        print("Error: No notebook files specified")
        return
    
    merged = None
    
    for i, fname in enumerate(notebook_files):
        print(f"  Reading {fname}...")
        try:
            with open(fname, 'r', encoding='utf-8') as f:
                nb = nbformat.read(f, as_version=4)
                
                if merged is None:
                    merged = nb
                else:
                    # Add separator between notebooks if requested
                    if add_separators and i > 0:
                        merged.cells.append(
                            new_markdown_cell(f"\n---\n\n*Source: {fname}*\n")
                        )
                    
                    # Extend cells from subsequent notebooks
                    merged.cells.extend(nb.cells)
        except FileNotFoundError:
            print(f"Error: {fname} not found, skipping...")
            continue
        except Exception as e:
            print(f"Error reading {fname}: {e}, skipping...")
            continue
    
    if merged is None:
        print("Error: No notebooks could be merged")
        return
    
    print(f"Writing merged notebook to {output_file}")
    with open(output_file, 'w', encoding='utf-8') as f:
        nbformat.write(merged, f)


def merge_sections(sections, output_file, add_separators=False):
    """
    Merge specific sections from multiple notebooks (LaTeX-style \\include).
    
    Args:
        sections: List of tuples [(filename, heading_name), ...]
        output_file: Output notebook filename
        add_separators: If True, add markdown separators between sections
    
    Example:
        merge_sections([
            ('lists.ipynb', 'Lists'),
            ('functions.ipynb', 'Functions')
        ], 'custom_lecture.ipynb')
    """
    if not sections:
        print("Error: No sections specified")
        return
    
    merged = new_notebook()
    
    for i, (fname, heading) in enumerate(sections):
        print(f"  Extracting '{heading}' from {fname}...")
        section_cells = extract_section(fname, heading)
        
        if section_cells:
            # Add separator between sections if requested
            if add_separators and i > 0:
                merged.cells.append(
                    new_markdown_cell(f"\n---\n\n*Source: {fname} → {heading}*\n")
                )
            
            merged.cells.extend(section_cells)
    
    if not merged.cells:
        print("Error: No sections could be extracted")
        return
    
    print(f"Writing merged notebook to {output_file}")
    with open(output_file, 'w', encoding='utf-8') as f:
        nbformat.write(merged, f)


if __name__ == '__main__':
    # Step 1: Create sample notebooks
    print("Step 1: Creating sample notebooks...")
    create_sample_notebooks()
    
    # Step 2: Example 1 - Merge entire notebooks (basic approach)
    print("\nStep 2: Merging entire notebooks...")
    merge_notebooks(
        notebook_files=[
            'lists.ipynb',
            'functions.ipynb',
            'modules.ipynb'
        ],
        output_file='combined_lecture.ipynb',
        add_separators=False
    )
    
    # Step 3: Example 2 - Merge specific sections (LaTeX-style \include)
    print("\nStep 3: Merging specific sections (LaTeX-style)...")
    merge_sections(
        sections=[
            ('lists.ipynb', 'Lists'),
            ('functions.ipynb', 'Functions'),
        ],
        output_file='custom_lecture.ipynb',
        add_separators=True
    )
    
    # Step 4: Example 3 - Cherry-pick subsections
    print("\nStep 4: Cherry-picking specific subsections...")
    merge_sections(
        sections=[
            ('lists.ipynb', 'Lists'),
            ('functions.ipynb', 'Advanced Functions'),  # Only the advanced section
        ],
        output_file='advanced_topics.ipynb',
        add_separators=True
    )

When running the code, you should see the following output:

Step 1: Creating sample notebooks...
  Created sample notebooks: lists.ipynb, functions.ipynb, modules.ipynb

Step 2: Merging entire notebooks...
  Reading lists.ipynb...
  Reading functions.ipynb...
  Reading modules.ipynb...
  Writing merged notebook to combined_lecture.ipynb

Step 3: Merging specific sections (LaTeX-style)...
  Extracting 'Lists' from lists.ipynb...
  Extracting 'Functions' from functions.ipynb...
  Writing merged notebook to custom_lecture.ipynb

Step 4: Cherry-picking specific subsections...
  Extracting 'Lists' from lists.ipynb...
  Extractin 'Advanced Functions' from functions.ipynb...
  Writing merged notebook to advanced_topics.ipynb

After running the script once to generate the sample notebooks, you can customise it for your own needs:

# Merge entire notebooks
merge_notebooks(
    notebook_files=['intro.ipynb', 'advanced.ipynb'],
    output_file='complete_course.ipynb'
)

# LaTeX-style: select specific sections only
merge_sections(
    sections=[
        ('basics.ipynb', 'Variables'),
        ('basics.ipynb', 'Functions'),
        ('advanced.ipynb', 'Classes'),
    ],
    output_file='custom_lesson.ipynb'
)

$\ddagger$ Note that the extract_section() function extracts content under a Level-1 heading (# Heading) as per the OP, and stops at the next Level-1 heading. This means:

  • Sub-headings (##, ###, etc.) within a section are correctly included as content, and
  • it should work fine if you use # exclusively for major module boundaries

However, if your actual notebooks use Level-2 headings (##) to separate major topics, you'll need to modify the stop condition in extract_section() to check for both # and ## headings. I hope that makes sense!

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.