The Angular Sanitizer does not modify the passed HTML content. It does not extract anything from it. You need to do this manually. For example, you can parse the passed HTML content, remove unnecessary code from it and serialize it to a string again.
I know the htmlparser2 package that can build an AST from HTML. You can use it to parse your HTML. To serialize an AST to a string, you can use the dom-serializer package.
Thus, using these packages or similar, your getSanitized function logic may follow:
async getSanitized(s: string): Promise<string> {
// 1. make an AST from HTML in a string format
const dom = await this.getAST(s);
// 2. remove unwanted nodes from the AST
const filteredDOM = this.filterJS(dom);
// 3. serialize the AST back to a string
const result: string = serializer(filteredDOM);
return result;
}
The getAST function just uses the htmlparser2 API to get an AST from a string:
getAST(s: string): Promise<DomElement[]> {
return new Promise((res, rej) => {
const parser = new Parser(
new DomHandler((err, dom) => {
if (err) {
rej(err);
} else {
res(dom);
}
})
);
parser.write(s);
parser.end();
});
}
The filterJS function removes unnecessary nodes. There is an online visualizer for an AST htmlparser2 generates: https://astexplorer.net/. You can easily see what conditions you need to use to filter nodes. The filterJS function may be implemented as:
filterJS(dom: DomElement[]): DomElement[] {
return dom.reduce((acc, node) => {
if (node.type === 'tag') {
node.attribs = this.filterAttribs(node.attribs);
node.children = this.filterJS(node.children);
}
if (node.type !== 'script') {
acc.push(node);
}
return acc;
}, []);
}
In short, it removes script tags and calls the filterAttribs function to remove JavaScript from event handlers. The filterAttribs function may be:
filterAttribs(attribs: { [s: string]: string }) {
return Object.entries(attribs).reduce((acc, [key, value]) => {
if (!key.startsWith('on')) {
acc[key] = value;
}
return acc;
}, {});
}
Basically, it removes attributes starting from 'on', i.e. event handlers.
The serializer function is a call to the dom-serializer library.
Don't forget to import htmlparser2 and dom-serializer:
import { DomHandler, Parser, DomElement } from 'htmlparser2';
import serializer from 'dom-serializer';
For better TypeScript experience, the htmlparser2 library provides type definitions by using the @types/htmlparser2 package.
You can find a working example at https://stackblitz.com/edit/angular-busvys.