Making JavaScript Art (P. 2)

August 20, 2021 · app devtool

Part 1 Recap:

Pictures to Shapes

In this post I'll give an introduction to the problem I want to solve and then dive into some interesting technical details of my work so far. At the end I'll also share some of my ideas for future improvements.

I'd like to remind you that you can try the demo at makejsart.pelmers.com! This piece of code works well if you need an input.

And the source code is published on GitHub.

Goal

What do I mean by pictures to shapes? In other words, the task is to extract the important regions from the picture. Here, importance stems from human perception and therefore is subjective, but we can establish some general guidelines. For example, given a picture of a person, extract the person. Given a picture of the world, pull out the land masses from the water.

However, pictures are infinitely varied, so how can we approach this in the general case? In the literature, this problem is known as visual saliency detection, and it encompasses interdisciplinary research across the fields of cognitive psychology, neurobiology, robotics, and computer vision.

Bottom-up or Top-down?

At a high level, we can observe two directions to tackle the problem: bottom-up and top-down. Bottom-up approaches describe using pixel-level building blocks such as color, contrast, and brightness to formulate algorithms that attempt to emulate human perception. (1)

A very simple example is an algorithm which just orders each pixel by brightness, so if you ask it for the most salient half of the image, it will just return the brightest 50% of the pixels. Guess what the "intensity" option on makejsart.pelmers.com does. More complex algorithms may involve splitting the image into regions, performing edge detection, or finding handcrafted features like text. The key takeaway here is that this class of algorithms is conceptually fairly human-accessible, meaning we mortal beings can basically understand how they work. The next section will demonstrate one such bottom-up approach.

In contrast, top-down approaches (not covered here) usually use supervised machine learning by describing the desired outcome (detection of salient regions) and letting the neural network derive the specific path to the best performance. In recent years (basically since 2015) these have been the main research focus and get the best scores on benchmarks. However, it's difficult for us to comprehend how these models build an understanding internally which makes debugging them as much an art as a science. (2)

Global Contrast-based Salient Region Detection

This section will discuss and implement in JavaScript the (first) algorithm introduced in Cheng 2011. (3)

Intuition

The algorithm evaluates the saliency of each pixel as the sum of its contrast against all other pixels in the image. That's it! Remarkably, it ignores all information about the pixel except its color. Thus, its position in the image and immediate neighborhood have no impact on the result.

The underlying assumption of this algorithm is the belief that the human visual system is especially attuned to quickly identify high-contrast stimulus, and that's what initially attracts our attention. What I found most impressive about this result was that it relied entirely on low-level contrast information; it surpassed previous approaches in benchmarks even without any awareness of how to recognize faces, text, or any other subjectively important feature.

Note: I have only implemented the histogram-based contrast method, denoted HC in the paper. A better-performing algorithm based on regions identified by GrabCut is introduced as RC in the paper, but unfortunately it exceeds the scope of my ambition here. But feel free to write me a pull request if you are more motivated!

a map of code

Implementation

The code in saliency.ts follows the HC algorithm given by Cheng 2011 as close to verbatim as possible. Of course, naively comparing every pixel to every other pixel will result in billions of calculations even for moderately sized images, which is why we build a histogram to bucket similar colors and greatly speed up the computation.

  1. Construct histogram of all pixels in the image in RGB space
  2. Compress histogram to only keep information for the most frequent top-95%
  3. Take sum of histogram-weighted comparisons of each pixel by perceptual difference to compute its saliency
  4. Return locations of all pixels whose saliency exceeds target threshold

Applying it to Code

Recall from part 1 we need the target length of each line. Without loss of generality*, in this case we'll define a "line" as a run of contiguously salient pixels. This way we can have multiple disconnected segments on a single line. When displaying the formatted results, we just need to remember the right number of spaces or line breaks to put between each segment.

*: mini-dream come true putting that phrase in a sentence outside of the classroom

Note: you might ask, "If we have a small image and a large piece of code, won't we run out of places to put the text?" Before performing saliency detection, I compute the code area (1.7 x code length) and resize the image to a total size of (1 / threshold) x code area pixels. If all goes according to plan, the saliency detection will return exactly as many pixel locations as there are characters in the source code.

Future Work

For now I think I've had my fill of this project, but if I were to revisit it in the future, I expect I'd work on a new saliency detection approach.

Oh, now that I think of it, one fairly low-hanging fruit is to split the processing work off the main thread and into a WebWorker so the page isn't stuck and unresponsive while processing the image.

Sources

  1. Runmin Cong, Jianjun Lei, et al. Review of Visual Saliency Detection with Comprehensive Information, 2018
  2. Ali Borji. Saliency Prediction in the Deep Learning Era:
    Successes, Limitations, and Future Challenges
    , 2019
  3. Ming-Ming Cheng, Niloy J. Mitra, et al. Global Contrast based Salient Region Detection, 2011
Previous: Making JavaScript Art (P. 1)
Next: Let's Run: Eindhoven - Ep. 1
View Comments