3

I am looking for a script, method or tool to sort nested markdown lists. I use sublime text, which has a built sort lines function, but this function destroys the order of any nested list. For instance, if I want to sort:

* Zoo Animals
    * Herbivores
        * Zebra
        * Gazelle
    * Carnivores
        * Tiger
        * Lion
    * Omnivores
        * Gorilla
        * Baboon
        * Chimpanzee
* Domestic Animals
    * Canines
        * German Shepherd
        * Cocker Spaniel

Using sublime sort lines function, I get:

        * Baboon
        * Chimpanzee
        * Cocker Spaniel
        * Gazelle
        * German Shepherd
        * Gorilla
        * Lion
        * Tiger
        * Zebra
    * Canines
    * Carnivores
    * Herbivores
    * Omnivores
* Domestic Animals
* Zoo Animals

Clearly, this is not what I want. What I want is a "scoped sort" which sorts relative to each bullet level, without destroying the nested relationships, like so:

* Domestic Animals
    * Canines
        * Cocker Spaniel
        * German Shepherd
* Zoo Animals
    * Carnivores
        * Lion
        * Tiger
    * Herbivores
        * Gazelle
        * Zebra
    * Omnivores
        * Baboon
        * Chimpanzee
        * Gorilla

Here's some things I've looked into and my thoughts on each:

  • Use a sublime package to sort. I can't find one. However, maybe there's a way to use CSSComb package and adapt it to a markdown list?
  • Use a manual process in sublime to sort the list, perhaps by selecting bullet levels and sorting by those? The problem with this is that the selection of lines to be sorted must be at the same indentation level, and must not be separated by any other line of another indentation level, otherwise the sort is completely messed up. Unless I am missing something?
  • Use a script to sort the lines. I am familiar with ruby, so maybe there's a way to import this list into ruby, and treat the nested list like a nested hash along the lines of this post. I'm sure I could accomplish my goal using a ruby script, but I didn't want to go down that road if there was already a solution available.

How would you go about sorting a large nested markdown list?

UPDATE #1:

@J4G created a great Atom package that solved the original sort problem, see his answer for the link.

The previous list is a simple list without code blocks and numbered lists. However, when sorting a real life markdown list, we have code blocks and numbered lists and lines starting with special characters - nested within the list like so:

* Commands
    * Migrations
        * `rake db:migrate` - push all migrations to the database
        * 'STEP=3' - revert the last 3 migrations
    * `Rails`
        * `c` - start rails console, run code from your app!
    * `Rake`
        * Rake Task
        ```ruby
        desc 'process csv'
        task process_csv: :environment do
            Node.process_csv
        end
        ```
* Package Upgrade Status:
    1. Install Package
    2. Attach Plugin
    3. Review Installation
    ~~~
    |Install|Status|
    |Yes|Pending|
    ~~~

After sorting, I would think the above markdown list should return unchanged since tick marks and quotes have no sort significance and code blocks / numbered lists are already created in their proper order.

1
  • 1
    Doesn't look to have been solved by anyone. I can write one in Node really quick if you want. Commented Sep 6, 2015 at 21:56

3 Answers 3

4

If you're interested in using Atom (I highly recommend it as a free alternative to Sublime), I just made a package to do what you need.

https://atom.io/packages/markdown-sort-list

Sign up to request clarification or add additional context in comments.

1 Comment

This appears to be a better way to perform the desired sort than to use Ruby, or other general-purpose scripting languages.
3

This is one way you could do it with Ruby. Suppose the string is held by the variable str.

Code

def sort_indented(str)
  arr = str.lines.map { |s| [indentation(s), s.chomp] }
  indent_offset = arr.map(&:first).uniq.sort.each_with_index.
    with_object({}) { |(indent, i),h| h[indent] = i }
  dim_size = indent_offset.size 
  prev = []
  arr.map do |indent, s|
    a = ['']*dim_size
    offset = indent_offset[indent]
    a[offset] = s
    a[0,offset] = prev.first(offset)
    prev = a
    a
  end.sort.map { |a| a[a.rindex { |s| s != '' }] }.join("\n") 
end

def indentation(s)
  s[/^\s*/].size
end

Example

str =<<THE_END 
* Zoo Animals
    * Herbivores
        * Zebra
        * Gazelle
    * Carnivores
        * Tiger
        * Lion
    * Omnivores
        * Gorilla
        * Baboon
        * Chimpanzee
* Domestic Animals
    * Canines
        * German Shepherd
        * Cocker Spaniel
THE_END

In Ruby this construct for defining a string literal is called a "here document", or "here doc".

puts sort_indented(str)

* Domestic Animals
    * Canines
        * Cocker Spaniel
        * German Shepherd
* Zoo Animals
    * Carnivores
        * Lion
        * Tiger
    * Herbivores
        * Gazelle
        * Zebra
    * Omnivores
        * Baboon
        * Chimpanzee
        * Gorilla

General Approach

When Ruby sorts an array of arrays, such as:

a = [1,2,4]
b = [4,5,6]
c = [1,2,3,5]]
[a, b, c]

it will first sort by the first element of each array. Since a and c both have the same element 1 at offset zero, and b has a 4 at that offset, both a and c will come before b in the sorted array. Ruby looks at the second elements of a and c to break the tie. As they both equal 2, Ruby goes on to the third element, where the tie is broken: c precedes a since 3 < 4.

I will convert arr to the following array:

result =     
[["* Zoo Animals"     , ""                , ""],
 ["* Zoo Animals"     , "    * Herbivores", ""],
 ["* Zoo Animals"     , "    * Herbivores", "        * Zebra"],
 ["* Zoo Animals"     , "    * Herbivores", "        * Gazelle"],
 ["* Zoo Animals"     , "    * Carnivores", ""],
 ["* Zoo Animals"     , "    * Carnivores", "        * Tiger"],
 ["* Zoo Animals"     , "    * Carnivores", "        * Lion"], 
 ["* Zoo Animals"     , "    * Omnivores" , ""],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Gorilla"],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Baboon"],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Chimpanzee"],
 ["* Domestic Animals", ""                , ""],
 ["* Domestic Animals", "    * Canines"   , ""],
 ["* Domestic Animals", "    * Canines"   , "        * German Shepherd"],
 ["* Domestic Animals", "    * Canines"   , "        * Cocker Spaniel"]]

Once in this form, we can sort:

result.sort
  #=> [["* Domestic Animals", "", ""],
  #    ["* Domestic Animals", "    * Canines", ""],
  #    ["* Domestic Animals", "    * Canines", "        * Cocker Spaniel"],
  #    ["* Domestic Animals", "    * Canines", "        * German Shepherd"],
  #    ["* Zoo Animals", "", ""], ["* Zoo Animals", "    * Carnivores", ""],
  #    ["* Zoo Animals", "    * Carnivores", "        * Lion"],
  #    ["* Zoo Animals", "    * Carnivores", "        * Tiger"],
  #    ["* Zoo Animals", "    * Herbivores", ""],
  #    ["* Zoo Animals", "    * Herbivores", "        * Gazelle"],
  #    ["* Zoo Animals", "    * Herbivores", "        * Zebra"],
  #    ["* Zoo Animals", "    * Omnivores", ""],
  #    ["* Zoo Animals", "    * Omnivores", "        * Baboon"],
  #    ["* Zoo Animals", "    * Omnivores", "        * Chimpanzee"],
  #    ["* Zoo Animals", "    * Omnivores", "        * Gorilla"]] 

The final step is to extract the last non-empty string from each element of the sorted array.

Detailed Explanation

First we define a helper method to compute the indentation of a string:

def indentation(s)
  s[/^\s*/].size
end

For example,

            #1234
indentation("    * Herbivores")
  #=> 4

Now lets convert the string to an array of lines:

a = str.lines
  #=> ["* Zoo Animals\n",
  #    "    * Herbivores\n",
  #    "        * Zebra\n",
  #    "        * Gazelle\n",
  #    "    * Carnivores\n",
  #    "        * Tiger\n",
  #    "        * Lion\n",
  #    "    * Omnivores\n",
  #    "        * Gorilla\n",
  #    "        * Baboon\n",
  #    "        * Chimpanzee\n",
  #    "* Domestic Animals\n",
  #    "    * Canines\n",
  #    "        * German Shepherd\n",
  #    "        * Cocker Spaniel\n"]

Next, we convert a to an array of pairs, the second element of the pair being the element of a (a string), with the newline chopped off the end, the first being its indentation:

arr = a.map { |s| [indentation(s), s.chomp] }
  # => [[0, "* Zoo Animals"],        [4, "    * Herbivores"],
  #     [8, "        * Zebra"],      [8, "        * Gazelle"],
  #     [4, "    * Carnivores"],     [8, "        * Tiger"],
  #     [8, "        * Lion"],       [4, "    * Omnivores"],
  #     [8, "        * Gorilla"],    [8, "        * Baboon"],
  #     [8, "        * Chimpanzee"], [0, "* Domestic Animals"],
  #     [4, "    * Canines"],        [8, "        * German Shepherd"],
  #     [8, "        * Cocker Spaniel"]] 

In fact, we would perform the first two operations in one step:

arr = str.lines.map { |s| [indentation(s), s.chomp] }

Next, we need to know the indents that are used:

indents = arr.map { |pair| pair.first }
  #=> [0, 4, 8, 8, 4, 8, 8, 4, 8, 8, 8, 0, 4, 8, 8] 

which we could write more economically like this:

indents = arr.map(&:first)

To find the unique indents we write:

unique = indents.uniq
  #=> [0, 4, 8] 

In case they are not in order, we should sort them:

sorted = unique.sort
  #=> [0, 4, 8] 

Each of the three indents will will correspond to offset in the array we will sort, so it is convenient to construct a hash:

indent_offset = sorted.each_with_index.with_object({}) do |(indent, i),h|
  h[indent] = i
end
  #=> {0=>0, 4=>1, 8=>2}

Again, we can perform this computation by combining several steps:

indent_offset = arr.map(&:first).uniq.sort.each_with_index.
  with_object({}) { |(indent, i),h| h[indent] = i }

Next we replace each element of arr with a three element array of strings:

dim_size = indent_offset.size 
  #=> 3
prev = []
result = arr.map do |indent, s|
  a = ['']*dim_size
  offset = indent_offset[indent]
  a[offset] = s
  a[0,offset] = prev.first(offset)
  prev = a
  a
end

The result of this calculation is the first array I gave under General Approach, above. We can now sort result to obtain the second array I gave under General Approach:

sorted = result.sort

The last two steps are to replace each element of sorted (a three-element array) with the last non-empty string:

sorted_strings = sorted.map { |a| a[a.rindex { |s| s != '' }] }

and then join those strings into a single string:

sorted_strings.join("\n")

2 Comments

Wow, very thorough answer! Maybe this can be turned into a ruby gem one day.
Thanks, but I expect there's already a Ruby gem that does it better. As I was writing this I was thinking there's got to be a better way with Ruby.
3

I created a vscode extension for this if anyone is still interested.

It does more than just a scoped sort, it can remove unique values, it can recursively sort the nested items, it can be case insensitive, etc.

It also satisfies OP's other request of having content under a list item.

The extension is part of a bigger project called scopedsort, which is implemented on the command line, npm, and a website. The source code can be found on github. Here's the file for the actual implementation.

Here it is in text form, very outdated, but does what the OP originally requested:

// @ts-check
const getValuesRegex = /^(?<indentation>\s*)(?<char>[-*+])/;

/**
 *  @typedef {object} Options
 *  @property {boolean} [recursive]
 *  @property {boolean} [reverse]
 *  @property {boolean} [unique]
 *  @property {boolean} [caseInsensitive]
 */

/**
 * @param {string} a
 * @param {string} b
 */
function stringSortCaseInsensitive(a, b) {
    const lowerA = a.toLowerCase();
    const lowerB = b.toLowerCase();

    if (lowerA < lowerB) {
        return -1;
    } else if (lowerA > lowerB) {
        return 1;
    }

    return 0;
}

/** @param {string} str **/
function calculateSpaceLength(str) {
    return str.replace('\t', '    ').length;
}

/**
 * @param {string[]} sections
 * @param {Options} options
 */
function getModifiedSections(sections, options) {
    if (options.caseInsensitive) {
        sections.sort(stringSortCaseInsensitive);
    } else {
        sections.sort();
    }

    if (options.reverse) {
        sections.reverse();
    }

    if (options.unique) {
        /** @type {Set<string>} */
        const haveSeen = new Set();
        const unique = [];

        for (const section of sections) {
            const adjustedSection = options.caseInsensitive
                ? section.toLowerCase()
                : section;

            if (!haveSeen.has(adjustedSection)) {
                unique.push(section);
                haveSeen.add(adjustedSection);
            }
        }

        return unique;
    }

    return sections;
}

/**
 * @param {string[]} lines
 * @param {number} index
 * @param {Options} options
 */
function sortInnerSection(lines, index, options) {
    /** @type {string[]} */
    const sections = [];
    let currentIndentation = '';
    let amountAdded = 0;

    for (let i = index; i < lines.length; i++) {
        const line = lines[i];
        const match = line.match(getValuesRegex);
        const indentation = match?.groups?.indentation || '';
        const listChar = match?.groups?.char;

        if (!currentIndentation && indentation) {
            currentIndentation = indentation;
        }

        const indentationLength = calculateSpaceLength(indentation);
        const currentIndentationLength =
            calculateSpaceLength(currentIndentation);

        if (!listChar) {
            amountAdded++;
            sections[sections.length - 1] += '\n' + line;
        } else if (indentationLength === currentIndentationLength) {
            amountAdded++;
            sections.push(line);
        } else if (indentationLength > currentIndentationLength) {
            const child = sortInnerSection(lines, i, options);
            sections[sections.length - 1] += '\n' + child.content;
            i += child.amountAdded - 1;
            amountAdded += child.amountAdded;
        } else {
            break;
        }
    }

    return {
        content: getModifiedSections(sections, options).join('\n'),
        amountAdded,
    };
}

/**
 *  @param {string} text
 *  @param {Options} options
 */
function sort(text, options) {
    const lines = text.trimEnd().split(/\r?\n/);
    let sections = [];
    let currentSection = [];
    let currentIndentation = '';

    for (let i = 0; i < lines.length; i++) {
        const line = lines[i];
        const match = line.match(getValuesRegex);
        const indentation = match?.groups?.indentation || '';
        const listChar = match?.groups?.char;

        if (currentSection.length && listChar) {
            if (indentation === currentIndentation) {
                sections.push(currentSection.join('\n'));
                currentSection = [line];
            } else if (options.recursive) {
                const child = sortInnerSection(lines, i, options);
                currentSection.push(child.content);
                i += child.amountAdded - 1;
            } else {
                currentSection.push(line);
            }
        } else {
            currentSection.push(line);
        }
    }

    if (currentSection) {
        sections.push(currentSection.join('\n'));
    }

    return getModifiedSections(sections, options).join('\n');
}

module.exports = sort;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.