diff --git a/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png b/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_Final/index.md b/content/posts/GSoC_2021_Final/index.md new file mode 100644 index 0000000..c2c4626 --- /dev/null +++ b/content/posts/GSoC_2021_Final/index.md @@ -0,0 +1,169 @@ +--- +title: "GSoC'21: Final Report" +date: 2021-08-17T17:36:40+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Google Summer of Code 2021: Final Report - Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**Matplotlib: Revisiting Text/Font Handling** + +Here's a [meme](https://user-images.githubusercontent.com/43996118/129448683-bc136398-afeb-40ac-bbb7-0576757baf3c.jpg) I created, to kick things off for this final report! +## About Matplotlib +Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations, which has become a _de-facto Python plotting library_. + +Much of the implementation behind its font manager is inspired by [W3C](https://www.w3.org/) compliant algorithms, allowing users to interact with font properties like `font-size`, `font-weight`, `font-family`, etc. + +#### However, the way Matplotlib handled fonts and general text layout was not ideal, which is what Summer 2021 was all about. + +> By "not ideal", I do not mean that the library has design flaws, but that the design was engineered in the early 2000s, and is now _outdated_. + +(..more on this later) + +### About the Project +(PS: here's [the link](https://docs.google.com/document/d/11PrXKjMHhl0rcQB4p_W9JY_AbPCkYuoTT0t85937nB0/) to my GSoC proposal, if you're interested) + +Overall, the project was divided into two major subgoals: +1. Font Subsetting +2. Font Fallback + +But before we take each of them on, we should get an idea about some basic terminology for fonts (which are a _lot_, and are rightly _confusing_) + +The [PR: Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346/files) brings about a bit of clarity about some of these terms. The next section has a linked PR which also explains the types of fonts and how that is relevant to Matplotlib. +## Font Subsetting +An easy-to-read guide on Fonts and Matplotlib was created with [PR: [Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450), which is currently live at [Matplotlib's DevDocs](https://matplotlib.org/devdocs/users/fonts.html). + +Taking an excerpt from one of my previous blogs (and [the doc](https://matplotlib.org/devdocs/users/fonts.html#subsetting)): + +> Fonts can be considered as a collection of these glyphs, so ultimately the goal of subsetting is to find out which glyphs are required for a certain array of characters, and embed only those within the output. + +PDF, PS/EPS and SVG output document formats are special, as in **the text within them can be editable**, i.e, one can copy/search text from documents (for eg, from a PDF file) if the text is editable. + +### Matplotlib and Subsetting +The PDF, PS/EPS and SVG backends used to support font subsetting, _only for a few types_. What that means is, before Summer '21, Matplotlib could generate Type 3 subsets for PDF, PS/EPS backends, but it *could not* generate Type 42 / TrueType subsets. + +With [PR: Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391) merged in, users can expect their PDF/PS/EPS documents to contains subsetted glyphs from the original fonts. + +This is especially benefitial for people who wish to use commercial (or [CJK](https://en.wikipedia.org/wiki/CJK_characters)) fonts. Licenses for many fonts ***require*** subsetting such that they can’t be trivially copied from the output files generated from Matplotlib. + +## Font Fallback +Matplotlib was designed to work with a single font at runtime. A user _could_ specify a `font.family`, which was supposed to correspond to [CSS](https://www.w3schools.com/cssref/pr_font_font-family.asp) properties, but that was only used to find a _single_ font present on the user's system. + +Once that font was found (which is almost always found, since Matplotlib ships with a set of default fonts), all the user text was rendered only through that font. (which used to give out "tofu" if a character wasn't found) + +--- + +It might seem like an _outdated_ approach for text rendering, now that we have these concepts like font-fallback, but these concepts weren't very well discussed in early 2000s. Even getting a single font to work _was considered a hard engineering problem_. + +This was primarily because of the lack of **any standardization** for representation of fonts (Adobe had their own font representation, and so did Apple, Microsoft, etc.) + + +|  |  | +|--------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| +
+ Previous (notice Tofus) VS After (CJK font as fallback) +
+ +To migrate from a font-first approach to a text-first approach, there are multiple steps involved: + +### Parsing the whole font family +The very first (and crucial!) step is to get to a point where we have multiple font paths (ideally individual font files for the whole family). That is achieved with either: +- [PR: [with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496), or +- [PR: [without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549) + +Quoting one of my [previous](https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/) blogs: +> Don’t break, a lot at stake! + +My first approach was to change the existing public `findfont` API to incorporate multiple filepaths. Since Matplotlib has a _very huge_ userbase, there's a high chance it would break a chunk of people's workflow: + +
+
+ First PR (left), Second PR (right)
+
+
+ Font-Fallback Algorithm
+
+
+ Consider contributing to Matplotlib (Open Source in general) ❤️
+