20201027

Random noise textures on the GPU in WebGL

Procedural rendering routines often need pseudorandom numbers. For graphics rendering, we usually prefer correct-ish and fast code over properly "correct" code. My approach to random number generation is usually to intermittently seed a weak pseudo-random number generator, like a linear-feedback shift register, from a stronger source like Javascript's Math.random(). 

On the GPU, things are not so simple: we want a pseudorandom algorithm that is fast, local, and parallel. For compatibility, we should also have this RNG use an 8-bit RGBA texture for its internal state. Bitops are a bit tricky in WebGL shaders (though not impossible), so we'll use a linear congruential generator.

20201015

WebGL shader rules for rendering dendriform growth

I've been playing around with cellular automata for procedural generation in WebGL. Here's an algorithm for rendering dendriform growth. I was aiming for rivers, but it came out like neurons.

[view in browser]

It's built atop a diffusion-limited aggregation cellular automata. Algorithm sketch: 

  • Each cell looks at the 4 nearest neighbors on the grid. We also check the 4 corners ("far" neighbors) to avoid creating loops.
  • Random numbers are provided by a separate noise texture
  • State is stored in the channels of an RGBA texture. Each [0,1] channel maps to a [0,255] byte value
  • The texture is seeded with at least one "active" pixel (existing water/lake/cell body). The seed is set to active (green=1.0)
  • Green channel runs a a diffusion limited aggregation: 
    • If exactly one nearby pixel is active, then we have a 5% of becoming active. 
    • Avoid creating loops: don't activate a pixel if more than one "far" neighbor is already active.
    • Because pixels act in parallel and at random, sometimes two pixels turn on in the same iteration, creating a loop. Detect and remove these after-the-fact, if they happen. 
  • Keep track of distance from the "seed" region in blue channel
    • Seeds are initialized with blue channel = 255
    • Newly activated cells get blue-1
    • This counts down to zero, at which point expansion stops
  • The texture is also initialized with various water sources / "synapses" scattered about it
    • This is stored in the alpha channel
    • If an active cell hits a source, it becomes a "river"
    • If a cell is active, and an adjacent cell is a "river", and is also further away from the source (check distance in blue channel), then this cell also becomes a "river"
  • A separate shader checks which cells are rivers, and generates tile ID values which are then passed to a tile shader.

Pixel-art-style Conway's game-of-life using a tile shader in WebGL

Implementation of Conway's game of life cellular automata with a retro-style shader. Added as example on webgpgpu Github.




Sketch of WebGL implementation:

  • Underlying cellular automata runs Conway's game of life
  • States as game tiles: new cells: small blue; live cells: blue; died: skulls
  • Life diffuses out coloring nearby areas
  • A single RGBA texture stores the game state. We render to/from this texture as a framebuffer.
  • Each color channel in [0,1] can be mapped to a [0,255] byte value
  • The green channel stores the game of life state
  • The blue channel stores the diffusing field
  • The red channel outputs a tile ID value that is passed to a tile shader to render the game


Retro-styled forest-fire cellular automata in WebGL


This is a cellular automata analogue of the model that we used for developmental retinal waves in this paper


Key rules

  • Vegetation "grows" occasionally (flip a coin to increment amount of trees)
  • Fires have a small chance of starting spontaneously
  • Intensity of ignited fires decay down to zero, at which point they leave ash
  • Fires spread to adjacent cells with a probability that depends on their intensity
  • Probability of a cell igniting given a fire is related to how much vegetation there is

Sketch of WebGL implementation:

  • A single RGBA texture stores the game state
  • Each color channel in [0,1] can be mapped to a [0,255] byte value
  • A "noise" texture creates pseudorandom numbers
  • The green channel stores the level of vegetation (BURNT,NONE,GRASS,SCRUB,TREE)
  • The blue channel stores the intensity of the fire
  • The red channel outputs a tile ID value that is passed to a tile shader to render the game


20200928

Algorithms for rendering quasicrystal tilings


Quasicrystals are aperiodic spatial tilings, and 2D quasicrystals have been used for centuries in Islamic art. I find these patterns beautiful, and I think more people should know how to render them.

You can render 2D quasicrystals as a sum of plane waves. They can also be constructed using aperiodic tilings (e.g. 4-fold Ammann-Beenker tiles and 5-fold Penrose Tiles), and there are fractal constructions that recursively subdivide a set of tiles using a smaller set of the same tiles (e.g. 5- and 7-fold tilings). Many more constructions can be found on the Tilings Encyclopedia

This post focuses on the "cut and project" construction, which is easy to implement in code. The routines shown here can be found in this iPython notebook on Github.

The cut-and-project construction

The simplest way to render crisp, geometric quasicrystals is the "cut and project" construction. This views the quasicrystal as a two-dimensional projection of a planar slice through a N-dimensional hyper-lattice. Let's implement this algorithm. 

A (hyper) lattice

First, we need to define the idea of a hyperlattice. This is a N-dimensional generalization of a cubic crystal lattice. It tiles ND space using ND hypercubes. We can represent the center of each hypercube (hypercell? hyperpixel?) as a N-D vector of integers
$$
\mathbf q_0 = [ x_1, ... x_N ],\,\,x_i\in\mathbb Z
$$
The associated N-cube, then, includes all points on a unit hypercube $\left[-\tfrac 1 2, \tfrac 1 2\right]^N$ offset to location $\mathbf q_0$:
$$
\mathbf q = \left[-\tfrac 1 2, \tfrac 1 2\right]^N + \mathbf q_0
$$

An irrational projection onto a 2D plane

To construct the quasicrystal, we cut through this ND lattice with a 2D plane. To get a aperiodic slice, this plane must not align with any periodic direction in the hyperlattice. 

A good choice is to project each direction of the hyperlattice onto an evenly-spaced set of basis directions, centered at the origin. We define a 2D $(s,t)$ plane $\mathcal P$ in terms of two N-D basis vectors, $\mathbf u, \mathbf v \in \mathbb R^N$.

$$
\begin{aligned}
\mathbf p &= \mathbf u \cdot s + \mathbf v \cdot t,\,\,s,t\in\mathbb R
\\
\mathbf u &= [ \cos(\theta_0), \dots , \cos(\theta_{N-1}) ]
\\
\mathbf v &= [ \sin(\theta_0), \dots , \sin(\theta_{N-1}) ]
\\
\theta_i &= \frac {\pi\cdot i}{N}
\end{aligned}
$$

20200918

Inverting the "Simple Sabotage Field Manual"

In 1944, the US Office of Strategic Services published an internal document: the "Simple Sabotage Field Manual". 

In addition to specific techniques for breaking machines, it included suggestions for how to cripple an organization. 

All of the suggestions are things that already happen spontaneously, because organizing large numbers of people reliably and safely is hard.

Rather than re-state the "sabotage" tips, I thought I'd invert each one to create guidelines for positive change in an organization:

  • Always share your competencies and knowledge freely and liberally.
  • Expedite when it is safe to do so, and whenever failing to do so would lead to something worse.
  • Rules that don't serve a purpose can be changed, or broken in emergencies (but this must be done with extreme thought, caution, and community support, see Chesterton's fence).
  • No speeches. Be brief and clear, and say nothing without purpose.
  • No committees, but if you must, never more than four persons.
  • Stay on mission. Do not re-open closed (or no longer relevant) matters.
  • With regards to language: be liberal in what you accept and conservative in what you give out; if the words communicated understanding, spend no time revising them.
  • Leadership requires action; Take thoughtful and calculated risks (hedging if appropriate). Beware inertia and cowardice in the name of caution.
  • Prioritize important jobs first, and assign them to the most competent and efficient.
  • Avoid perfectionism, be pragmatic.
  • Allocate promotions and responsibilities according to ability (but be kind and take care of all).
  • Conferences are a waste of time, if you already know what you need to do.
  • Minimize admin bottlenecks, but If a process really needs N persons to sign off, build in redundancy: ensure that at least 2N persons are trained and authorized to do so.
  • Be diligent and serious in your work.
  • Avoid distractions, minimize interruptions from co-wokers and avoid breaking up the workday.
  • Learn how to build and maintain good tools, and how to do good work with bad tools.

20200917

Wildcard selectors in uBlock Origin

The extension uBlock Origin for Chrome and Firefox allows users to block offensive elements on websites.

It works by matching the name of elements on a page against a block list, but some websites randomize element IDs, making this tricky. One can block elements with parts of their ID randomized in uBlock, but it is not automatic. First:

  • Have uBlock Origin installed
  • Identify element to block
  • Right click and select "block element"
  • uBblock will highlight the element, and a white box should appear in the lower right corner of the browser.

If the website is randomly-generating IDs, you won't be able to block the element with a fixed rule. You might see something like this:

##[randomly generated string]_[some predictable unique tag]

or

##[some predictable unique tag]_[randomly generated string]

Now, you might think that these could be blocked by "##*_[some predictable unique tag]" or "##[some predictable unique tag]_*", but uBlock does not use regular wildcard syntax. Instead, it uses CSS selector syntax, which is idiosyncratic. For example, this Reddit post suggests transforming the desired wildcard selector

###leaderboard_ad-*.block_article_ad.leaderboard_ad.etc

Into a rule expressed in terms of a CSS selector

##.block_article_ad.leaderboard_ad.etc[id^="leaderboard_ad-"]

This (1) Specifies the parts of the rule that are fixed, downstream of the wildcard, and (2) adds a string-matching query that finds elements whose ID starts with "leaderboard_ad-" (that's what "^=" means).

This post corroborates this approach, suggesting a generic format if the offending element has an ID starting with a predictable string:

www.annoyingwebsite.com##div[id^="start_of_div_name_before__randomnumber"]

This works, provided you can uniquely match the element based on a prefix. What if the unique identifier is a suffix? Referring back to CSS selector synatx, we find that ^= means "begins with" and $= means "ends with". So the wildcard rule for an element that randomizes the end of a div's ID would be:

www.annoyingwebsite.com##div[id$="_end_of_div_name_after_random_number"]

20200904

Box Drawing Characters + Marching Squares = Marching Boxes?

I couldn't figure out the name of this to search for it online, and it wasn't mentioned on Wikipedia's marching squares page, so… here's to reinventing wheels ( :

Sebastian Blomfield's "Lord of the Manor" game inspired me to explore retro-game engine coding. So far, I've got a basic tile-shader that renders tiled 2D game environments (e.g. game of life, forest fire simulation). Next up: what if I want to draw maps where the adjacent tiles of e.g. walls are continuous?

Box-drawing characters are a nice way to generate rectilinear paths in a tile-based game environment. By checking whether the tiles above/below/left/right are also walls, we can select one of 16 box drawing symbols to create continuous walls:

Note that this extends the usual unicode box-drawing characters with five additional tiles: one for an isolated (disconnected) wall, and four for "dead ends". These suffice for rendering square hallways, e.g:


With some tweaks to the tiles, we can coax box-drawing tiles to generate more curved and diagonal paths:

See here for an example page that renders dendriform growth using the above tile sets (refresh to get a new randomly-selected tile set).

Marching squares is another nice algorithm for rounding off corners on a map (e.g.). This doesn't quite work out-of-the-box for our tile shader, however, since marching squares accepts the values for the corners of each tile, and I want it to operate on the tile values directly. It also uses 16 tiles:

So! box-drawing is good for lines, and marching squares is good for borders/contours. Can we combine these and get a best-of-both-worlds approach? One that renders tiles based on their current value and the values of their neighbors?

Yes. I'm not sure what its name is, so I'm calling it "marching boxes" for now. Marching-boxes extends the box-drawing tiles with 31 tiles (below). These provide versions of the corner, T-junction, and cross-junction tiles that account for the possibility of nearby filled-in regions. The resulting tiles can render contours around filled-in regions as well as sharp lines.

Including background and foreground tiles, marching boxes uses 48 tiles.

This could be nice for any pattern that has a mixture of lines and enclosed regions, like a dungeon connected by hallways, or a water system that includes both lakes and rivers.

Click here to view a WebGL demo of marching-boxes tile rendering

20200818

More LED layouts for hand crafting

LED multiplexing layouts for hand-crafting outlined LED grid layouts using a diagonal section of a Charlieplexed grid of LEDs. This post presents some additional layouts designed for e-textiles (e.g. embroidery, bead-weaving).

The main idea of these layouts is that they should consist of a regular, repeating lattice of N driving lines, such that each pair of lines cross exactly twice. One LED is placed at each crossing (one forward and one backwards, in terms of polarity, for each pair).

Grid arrangements

The diagonal nature of the diagonal-multiplexing grid does not lend itself easily to conventional row/column display designs. Previously, we distorted the diagonal grid to create a square layout. This is tricky to solder correctly. One can achieve the same effect by skipping every-other position in the diagonal grid. Then, repeat the whole diagonal-multiplex pattern again, this time with the polarity of the LEDs reversed.

Here is an example where 9 control lines drive a 4×18 LED matrix. Below, the red/blue halves of the LEDs denote polarity (not color, these would be single-color LEDs). The polarity of the rightmost half of LEDs should be flipped. This approach works if the number of control lines is odd.



The two repeated patterns can also be sewn together in parallel. In this case, building a 9×8 grid.

Linear arrangements

In two special cases, its possible to lay out a straight line of LEDs. One pattern drives 6 LEDs on 3 lines, and another drives 20 LEDs on five lines. These designs intended for LED embroidery projects.

3 lines control 6 LEDs. (The polarity of the rightmost half of LEDs should be flipped.)

5 lines control 20 LEDs. (The polarity of the rightmost half of LEDs should be flipped.)



These could also be wrapped to form a circle.
 

RGB tiles

These RGB tiles were designed to work with 6-pin RGB LED modules, which expose both the anodes and cathode of each individual color channel. It is possible to design a single "tile" sewable PCB that will work with both layouts below. The idea is that this could enable a tile or "scale" pattern on e-textiles.

9 lines control 72 LEDs, creating 24 RGB tiles arranged in a 4×6 grid. Every three rows of the diagonal-Charlieplexing grid are grouped to produce one set of RGB tiles.

This pattern can be generalized to any number of control lines that is both odd and divisible by 3. E.g. 9, 15, and 21 control lines produce 4×6, 7×10, and 10×14 grids, respectively. (I think.. double check these.)

13 lines control 156 LEDs, creating 52 RGB tiles arranged in a 4×13 grid. Each row is grouped in alternating patterns of 2 and 1 LEDs, which are then groups by row into groups of 2+1=3.




This pattern can be generalized to any number of control lines $N$ such that $N$ is odd and $N-1$ is divisible by 3. E.g. 7, 13, and 19 control lines produce 2×7, 4×13, and 6×19 grids, respectively. (I think.. double check these.)


RGB Matrices

Other ways of grouping the LEDs into (R,G,B) triplets support ordinary rectangular RGB LED matrix layouts. N*3+1 control lines can be grouped to support a N×N*3+1 matrix of common-anode or common-cathode RGB LEDs:

For example, 13 control lines can support four groups of three (R,G,B) triples in 13 rows

Or 16 control lines can support a 5x16 matrix:

Notes

All of these designs assume tri-state drivers (positive, negative, and high-impedance (off)) that can source fairly large currents (e.g. ~40 mA). This is typical of the AtMega and AtTiny AVR chips powering most flavors of Arduino.

For large e-textile designs, the resistance of the conductive thread might be an issue. It may be necessary to use multiple threads in parallel, or to selecting a high conductance thread. Adjusting the layout so that no LEDs are too distant from the driver might also help. Crude software balancing of brightness via PWM may also be possible, but difficult on larger displays. Take care not to let driving lines short at points where they cross.

When driving displays that mix LEDs of different forward voltages: If more than one LED is lit at a time, then the red/green/blue LEDs must be scanned separately. Unless individual current-limiting resistors are provided for each LED, the forward voltage of the red LEDs must not be less than half the forward voltage of the blue LEDs.

There's a good argument that manually creating LED meshes is not a good use of time, and that one should use e.g. NeoPixels instead. These patterns, however, have the obsessive repetition and tedious assembly of knitting, and may appeal to some people. 

I have not had the opportunity to mass-produce the sewable LED boards that would be needed to complete these projects.

20200329

Microcontroller firmware for Charlieplexed LED displays


Previous posts illustrated hardware building Charlieplexed displays (1 2 3 4). However, building the display hardware is only half the work: one also needs a display driver.

This post covers strategies for writing firmware to drive Charlieplexed LED displays. I'll be working with the Arduino Uno, which uses the AtMega328 microcontroller. These strategies are general, but the hardware-specific optimizations will need to be adapted if using a different microcontroller.

This post walks through the basics of bringing up custom display driver code for a new LED display, including:
  1. Testing the display after soldering
  2. Row/column scanning for brighter Charlieplexed displays
  3. Display memory buffers
  4. Getting good performance by optimizing IO operations and the display memory format
  5. Scanning the display using timer interrupts
  6. Double buffering to improve animations  
It also briefly outlines some extensions that may be of interest
  1. Achieving graded brightness values
  2. How to mix LEDs of different colors (different forward voltages) in one grid
  3. Cheating on current limits to get brighter displays
Finally, there are some (hopefully) useful notes in the appendix
  1. LED resistor calculations for Charlieplexed LED displays
  2. Timer interrupts on the AtMega328 with the Arduino platform, in detail. 

0. Build the display


I've constructed my LED display out of discrete 3 mm LEDs, placed on cardboard from a cereal box using a template. The 'diagonal Charlieplexing' layout is, I think, the easiest for hand-crafting, since it gets the maximum number of LEDs using the smallest amount of wire and microcontroller IO pins. This particular build is a larger multi-color version of the Fibonacci spiral layout, and the construction steps are similar.

The basic steps are:
  1. Prepare a template for the LED layout, and attach this to some thin but stiff cardboard
  2. Perforate the cardboard for the component leads, and place components
  3. Connect and solder the components
Depending on how you attach the components, it may or may not be easier to do steps (2) and (3) concurrently.

Make a cardboard PCB Attach components Connect and solder

p.s.: It took me about a month to finish the hardware, a little bit each day. Like knitting.

1. Test the display one light at a time

 

The first thing to do is to verify that all LEDs work.
Before starting, ensure that all control lines have appropriate current-limiting resistors to avoid damaging the LEDs (it is frustrating to burn out a LED matrix before even getting started).

I'm working on a 306-light project that uses 18 control lines (the maximum number that one can use on the Arduino Uno while still leaving the serial pins free). I've arranged the display in a circular pattern, but these code examples will assume an ordinary rectangular layout for simplicity.

The Arduino sketch below will light-up each LED in a Charlieplexed display in sequence. Follow the steps in the top comment to adapt it to your project.

/**
 *  - Build Charlieplexed display hardware
 *  - Hook up all lines with appropriate current-limiting resistors
 *  - Set the constant NPINS to the number of lines
 *  - Place the Arduino pin numbers used in the `pinmap` array
 *  - Upload this sketch and confirm that all lights work
 */

#define NPINS 18
const int pinmap[NPINS] = {
  2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
};

void setup() {
  // put your setup code here, to run once:
}

void loop() {
  // put your main code here, to run repeatedly:

  for (int j=0; j<NPINS; j++)
  for (int i=0; i<NPINS; i++)
  {
    if (i==j) continue;
    int anode   = pinmap[i];
    int cathode = pinmap[j];
    // Turn everything off
    for (int line=0; line<NPINS; line++)
      pinMode(pinmap[line],INPUT);
    // Turn on one anode-cathode pair
    digitalWrite(cathode,LOW );
    digitalWrite(anode  ,HIGH);
    pinMode(anode  ,OUTPUT);
    pinMode(cathode,OUTPUT);
    delay(1);
  }
}


Before turning on the next light, we first set all pins to INPUT mode, with pull-up resistors disabled. This avoids triggering any LEDs accidentally as we switch the other pins.

For full source code see sketch Example 1.

Dead pixels


For home-made displays, I usually have a few LEDs that are burnt out or installed backwards. Testing the display will reveal such 'dead pixels', and now is good time to replace or re-solder them.

In Charlieplexed displays, attempting to light a dead pixel will cause current to flow in unexpected ways, spuriously lighting up the wrong LEDs. Sometime it is impractical to replace broken LEDs. For now, it suffices to note the anode and cathode pin numbers of any dead pixels. We can explicitly avoid turning them on in the display driving code late to avoid this issue.
Note: Watch out for quality control in bulk discrete LEDs. I usually find some where the anode/cathode labels are reversed, or where the LED isn't embedded properly in the plastic housing. (Mechanical stability is especially important for home-made projects, since movement during use or soldering can break the connection between the LED chip and its leads.) I often check each LED before soldering.

 

2. Row/column scanning 


Lighting LEDs individually works for small projects, but not for large ones. As the number of lights increases, the fraction of time that each light spends on decreases, making the display dim. There also isn't enough time to scan a large number of lights without introducing noticeable flicker.

The solution is to scan an entire row or column of the matrix at once. One can drive multiple LEDs simultaneously by (e.g.) turning on one anode and multiple cathodes (but take care not to over-current the microcontroller IO pins). Appendix 1 (at the end of this post) gives notes for setting resistor values, but the usual series resistance for lighting a single LED is a good upper bound.


Single-light scanning

  • Good: If only a sparse subset of the LEDs are on at a given time, this can lead to a brighter and more uniform display.
  • Bad: The display will be dim if there are many LEDs to scan, and this approach fails if a large number LEDs need to be lit.

Row/Column scanning

  • Good: Reduces the amount of CPU time needed to scan the display. Reduces flicker and increases brightness for large displays.
  • Bad: Current-limiting resistors set a fixed current per row or column. This means that the brightness depends on the number of LEDs being lit per row/column, which can lead to non-uniform brightness if not compensated in software. Handling displays that mix LEDs with different forward voltages is also more complicated.
Here is an Arduino sketch to test common-anode row/column scanning of a Charlieplexed display. (Set NPINS to the number of LED control lines in your project, and define the Arduino pin number for each line in the pinmap variable.)

/**
 *  - Set up the electronic circuit, with appropriate series resistors
 *    to protect the LEDs *and* *microcontroller* if something goes wrong.
 *  - For common-anode row/column scanning:
 *    - Set all pins to INPUT mode to avoid spuriously lighting LEDs
 *    - Set the anode pin to HIGH and all others to LOW
 *    - Switch to OUTPUT mode the anode, and any cathods from this row
 *      that you wish to light up.
 */

#define NPINS 18
const int pinmap[NPINS] = {
  10,6,18,15,12,4,5,8,7,16,17,19,14,9,11,13,3,2
};

void setup() {
  // put your setup code here, to run once:
}

void loop() {
  // put your main code here, to run repeatedly:
  for (int i=0; i<NPINS; i++)
  {
    int anode = pinmap[i];

    // Turn eveything off
    for (int line=0; line<NPINS; line++)
      pinMode(pinmap[line],INPUT);

    // Set cathodes to LOW and anode to HIGH
    for (int line=0; line<NPINS; line++)
      digitalWrite(pinmap[line],LOW );
    digitalWrite(anode,HIGH);

    // Turn on Anode, and any cathodes we want to light-up
    pinMode(anode,OUTPUT);
    for (int line=0; line<NPINS; line++)
      pinMode(pinmap[line],OUTPUT);

    delayMicroseconds(200);
  }
}


For full source code see sketch Example 2.

3. Display buffers

 

At this point we're ready to start coding a display driver. The first step is to add a display memory buffer so that other drawing routines can turn pixels on and off.

We'll do this by storing one bit for each directed pair of LED control lines. We'll pack this display memory into an array of 32-bit integers, so that if the LED between line $i$ and $j$ is on, then the $j^{\mathrm{th}}$ bit of the $i^{\mathrm{th}}$ column will be 1 (and 0 otherwise). This background on bitwise manipulations in integers might be useful.

Bit-packed representations save memory, and speed up IO operations, both important on microcontrollers with limited RAM and slow clocks. Since I have 18 control lines, I use a 32-bit unsigned integer (uint32_t or unsigned long on the Arduino). If you have fewer control lines, you can use a smaller unsigned integer type.

In the sketch below, we've modified the row/column scanning example
  • We add an array of unsigned integers, which store the bits of our display buffer
  • We explicitly clear the buffer in the setup() part of the sketch
  • We draw a test pattern into the buffer
  • In the row-column scanning loop, we now check the display buffer to see which lights to turn on for each row.
The code:
/**
 *  This sketch extends Example 2, row-column scanning,
 *  to read data from a display buffer.
 */

#define NPINS 18
const int pinmap[NPINS] = {
  10,6,18,15,12,4,5,8,7,16,17,19,14,9,11,13,3,2
};

uint32_t display_buffer[NPINS];

void setup() {
  // Initialize the display memory
  for (int line=0; line<NPINS; line++)
    display_buffer[line]=0;

  // Draw a test pattern
  for (int i=0; i<NPINS; i++)
    for (int j=0; j<NPINS; j++)
      if (i!=j && ((i>>2)&1)==((j>>2)&1))
        display_buffer[i] |= 1<<j;
  }

void loop() {
  // scan based on the information in the buffer
  for (int i=0; i<NPINS; i++)
  {
    // Turn eveything off
    for (int line=0; line<NPINS; line++)
      pinMode(pinmap[line],INPUT);

    // Set cathodes to LOW and anode to HIGH
    int anode = pinmap[i];
    for (int line=0; line<NPINS; line++)
      digitalWrite(pinmap[line],LOW );
    digitalWrite(anode,HIGH);

    // Turn on Anode, and any cathodes we want to light-up
    pinMode(anode,OUTPUT);
    for (int line=0; line<NPINS; line++)
      if (display_buffer[i]>>line&1)
        pinMode(pinmap[line],OUTPUT);

    delayMicroseconds(200);
  }
}

For full source code see sketch Example 3.

 

4. Tight loops: optimize it

 

So far, we've been setting the states of the IO pins individually. This is a bit slow, so let's optimize things. On most microcontrollers IO lines are grouped into "ports", each of which contains 8 IO pins. One can set the state of all 8 pins simultaneously by writing an 8-bit integer to the port. I'm using an AtMega328 board, which has three ports (B, C, and D). Quoting the Arduino tutorial on port manipulation, each port is controlled by three registers:
  • The DDR register sets whether pins are in input or output mode (1 means OUTPUT, 0 means INPUT)
  • The PORT register controls whether pins are high or low, in output mode, and controls whether the internal pull-up resistor is active in output mode (1 means HIGH, 0 means LOW)
  • The PIN register is used to read input (we won't use this)
On this project, I'm using Arduino pins 2-17. We need to refer to the Arduino pin mapping to identify which ports these correspond to.
  • Pins 2-7 correspond to PORTD 2-7
  • Pins 8-13 correspond to PORTB 0-5
  • Pins 14-19 correspond to PORTC 0-5

 

Subroutines for fast IO


First, write some helper functions to set all of the PORT or PIN states for these pins at once, based on a single bit-packed representation of the IO state. This encapsulates the translation from a bit-packed representation of the LED-control line states into a sequence of PORT or DDR register writes, simplifies the display driver design.

// Set the PORT (HIGH/LOW) status of all LED control lines
inline void PORT_LED(uint32_t states) {
  PORTD = (PORTD & 0b11) | ((states & 0b111111)<<2);
  states >>= 6;
  PORTB = states & 0b111111;
  states >>= 6;
  PORTC = states & 0b111111;
}

// Set the DDR (INPUT/OUTPUT) status of all LED control lines
inline void DDR_LED(uint32_t states) {
  DDRD = (DDRD & 0b11) | ((states & 0b111111)<<2);
  states >>= 6;
  DDRB = states & 0b111111;
  states >>= 6;
  DDRC = states & 0b111111;
}
Note: The inline in these function declarations tells the compiler to insert their instructions directly in any calling function, eliminating function-call overhead (although the compiler is free to ignore this).
Note: When we write to port D, we first read and copy the value of the first two bits. These are the port states for Arduino pins 0 and 1. Since I'm not using these pins for the display, I need to leave their values unchanged. In your project, you'll need to do this for any pins sharing a port with the LED lines, if those pins are being used for other functions.

 

Preparing display data for faster IO


Now, we need to prepare bit-packed IO states to send to these subroutines. Mapping from LED control lines to pins takes time, so we don't want to do this inside the display scanning loop. One solution is to ensure that your LED control lines are hooked up in a sensible way to the underlying ports, but this isn't always possible.

A more flexible solution is to store the display_buffer memory in a format that is convenient for scanning the display, and handle the pin-to-line mapping when we read/write pixels from the display buffer.

The advantage of this approach is that Charlieplexing layouts can be a bit weird in terms of how pixels map to control lines. By handling this is wrapper functions that read/write display memory, we can hide all this messiness and expose a clean interface in terms of pixel coordinates.

// Write a 1-bit pixel to display memory
void setPixel(int i, int j, int value) {
  // (add code to convert pixel to display coordinates here)
  if (value)
    display_buffer[pinmap[i]] |=  ((uint32_t)1)<<pinmap[j];
  else
    display_buffer[pinmap[i]] &=~(((uint32_t)1)<<pinmap[j]);
}

// Read a 1-bit pixel from display memory
int getPixel(int i, int j) {
  // (add code to convert pixel to display coordinates here)
  return (display_buffer[pinmap[i]]>>pinmap[j])&1;
}

The code to scan the display is now relatively simple:

for (int i=0; i<NPINS; i++)
{
  // Turn eveything off
  DDR_LED(0);
  // Set cathodes to LOW and anode to HIGH
  uint32_t anode_mask = ((uint32_t)1)<<i;
  PORT_LED(anode_mask);
  // Turn on those LEDs which are on
  DDR_LED(anode_mask | display_buffer[i]);    
  delayMicroseconds(200);
}

For full source code see sketch Example 4.

 

5. Timer interrupts for multi-tasking

 

For uniform brightness and to avoid flicker, we need to scan through the rows of the display at regular intervals. But, if our main loop is dedicated to the display driver, we can't really handle much computation for actually showing things on the display!

The solution is to move the display scanning code into a timer interrupt routine that is called at regular intervals. There is a good introduction to timer interrupts for the Arduino on Adafruit, and Appendix 2 goes into more detail. For this example we'll use the AtMega's Timer 2 for scanning the display. (This works provided we do not also use the Tone library, which needs Timer 2 for other purposes.)

To trigger the timer interrupt routine, we need to set up Timer 2 and enable the Timer 2 overflow interrupt. I've wrapped this and the display-buffer initializer code in a new function setup_display(), which is called when the device starts.

void setup_diplay() {
  // Initialize the display memory
  for (int line=0; line<NPINS; line++)
    display_buffer[line]=0;

  // Set up Timer2 interrupts
  // Timer/counter 2 control register A
  // Set to 0 do disable PWM and output-compare functions
  TCCR2A = 0;
  // Timer/counter 2 control register B
  // The first 3 bits control the prescaler
  // (i.e. clock divisor for timer tics)
  // 0:off     4:64
  // 1:1       5:128
  // 2:8       6:256
  // 3:32      7:1024
  //
  TCCR2A = 3;
  // Enable the Timer 2 overflow interrupt
  // Timer/Counter2 Interrupt Mask Register
  TIMSK2 = 1;
}

We then place our display scanning code inside the Timer 2 overflow interrupt signal handler:

// Interrupt handler for scanning the display
volatile int scan_line = 0;
SIGNAL(TIMER2_OVF_vect) {
  // Reset timer; 
  // We want to update 18 rows at 400 Hz
  // I used the following python code to calculate
  // the reset value, using a prescaler of 32.
  //
  // >>> CLOCKRATE = 16e6 # 16 MHz system clock
  // >>> NLINES    = 18   # 18 LED control lines
  // >>> RATE      = 400  # Hz; Display scan rate
  // >>> PRESCALE  = 32   # Timer prescaler
  // >>> TIMERMAX  = 256  # 256 if 8-bit, 65536 if 16-bit timer  
  // >>> trigger_every = (CLOCKRATE/PRESCALE)/(NLINES*RATE)
  // >>> reset_to      = int(TIMERMAX-trigger_every+0.5)
  // >>> print('Reset the 8-bit timer to %d'%reset_to)
  TCNT2 = 187;
  // Scan one row of the display
  DDR_LED(0);
  uint32_t anode_mask = ((uint32_t)1)<<scan_line;
  PORT_LED(anode_mask);
  DDR_LED(anode_mask | display_buffer[scan_line]);    
  scan_line++;
  if (scan_line>=NPINS) scan_line=0;
}

To get finer control over the scanning rate, one can manually reset the timer in the overflow signal handler. Here, I set it to 187, which means the timer will overflow (i.e. reach 256) again in 256-187=69 timer tics. (The other way to do this is to use an output-compare interrupt with the 'compare to counter' mode.)

With the display-driving code out of the way, one can now add interesting rendering code in the main program loop. As a first test, I've set it to randomly change pixel values:

void loop() {
  // The main loop is now free for implementing program logic
  // For example we can randomly flip some LEDs
  setPixel(random(NPINS),random(NPINS),random(2));
}

For full source code see sketch Example 5.

 

6. Double buffering for better animations

 

Updating pixels one at a time can cause artifacts when drawing a new frame to the display. For cleaner animations, one can use double buffering to draw the frames off-screen first, then show them all at once.

Without double buffering With double buffering

We define two copies of the display buffer (buffer1 and buffer), as well as two pointers, one for drawing and one for scanning the display. We alternate which pointer points to which buffer to achieve double-buffering.

uint32_t buffer1[NPINS];
uint32_t buffer2[NPINS];
uint32_t *display_buffer;
uint32_t *drawing_buffer;

// FLip display and drawing buffers
void flipBuffers() {
  uint32_t *temp = display_buffer;
  display_buffer = drawing_buffer;
  drawing_buffer = temp;
}

The setPixel and getPixel (section 4) routines are modified to use to accept a buffer pointer as a parameter.

// Write a 1-bit pixel to display memory
void setPixel(uint32_t *buff,int i, int j, int value) {
  // (add code to convert pixel to display coordinates here)
  if (value)
    buff[pinmap[i]] |=  ((uint32_t)1)<<pinmap[j];
  else
    buff[pinmap[i]] &=~(((uint32_t)1)<<pinmap[j]);
}

// Read a 1-bit pixel from display memory
int getPixel(uint32_t *buff,int i, int j) {
  // (add code to convert pixel to display coordinates here)
  return (buff[pinmap[i]]>>pinmap[j])&1;
}

In the initialization code, we assign the underlying buffers to the display/drawing pointers, and clear both display memories:

// Start with buffer 1 for display
// and buffer 2 for drawing.
display_buffer = &buffer1[0];
drawing_buffer = &buffer2[0];

// Clear the display memory
for (int line=0; line<NPINS; line++)
  buffer1[line]=buffer2[line]=0;


This lets us prepare the next frame off-screen, and show it all at once. For full source code see sketch Example 6.
Note: I'm still using the Charlieplexing grid coordinates for i and j, rather than screen coordinates. For this reason we skip the i==j slots, since these would correspond to the anode and cathode being the same pin. In your own project, you would use i and j in display coordinates, and add code in setPixel and getPixel to map these to Charlieplexing-grid coordinates.
Note: One can also synchronize the buffer-flips with the display driver. This avoids updating the display halfway through the scan. However, there really isn't a natural place to flip the buffers in the 'diagonal multiplexing' layout approach I'm using here, so I omit this.

Additional extensions

The code so far illustrates the bare essentials for a working display driver which supports one bit per pixel, runs in the background, and supports double buffering.

This is enough for most projects, but there are a couple more fun things to try. These include supporting multiple brightness levels per pixel, supporting a color display, and increasing the display brightness by calculating resistors based on average total current limits rather than peak instantaneous limits.

 

Extension 1: Multiple brightness levels (more than 1-bit per pixel)


If you have CPU cycles to spare, then one can vary the brightness of pixels via PWM. This is a tricky, however, since for $N$ control lines we're already
effectively PWM-ing each LED with a duty cycle of $1/N$. In my experience it is difficult to get more than 3 distinct brightness levels.

The way to do this is to scan through the display multiple times. The brightest LEDs will be lit during all scans, but we'll skip intermediate-brightness LEDs during some of the scans. For this example, I've implement 2-bit color, which supports four states: "off", and then three brightness levels. Human brightness perception is nonlinear, so I double the amount of time each light is on for each brightness increment.

I scan the display three times:
  • Scan 1: Duration 10 timer ticks
  • Scan 2: Duration 10 timer ticks
  • Scan 3: Duration 20 timer ticks
And implement 3 distinct brightness values like so:
  • Pixel value 0, i.e. 0b00: Always off.
  • Pixel value 1, i.e. 0b01: On during scan 1 for 10 cycles.
  • Pixel value 2, i.e. 0b10: On during scans 1 and 2, for a total of 20 cycles.
  • Pixel value 3, i.e. 0b11: On during all scans, for a total of 40 cycles.
An example sketch is given in Extension 1.

 

Extension 2: Driving multiple LED colors in the same grid


If you mix LEDs with different forward voltages (e.g. red, green, and blue), it can be hard to balance the current to each color channel using fixed resistors. In this case, different LED colors may end up with different apparent brightnesses.

The solution to this is to separate each color into its own "virtual row", and then scan each color separately. This prevents lower-voltage LEDs from stealing current from higher-voltage ones. You can also adjust the time between interrupts for each color separately to balance the brightness of different color channels.

Scan colors separately Pixel location → color Nice.

An example sketch is given in Extension 2. This approach is also used in the "zoom in" sketch shown at the end of this post. These examples were optimized for the weird spiral layout of my project, which split the lines into "high" (blue and white LEDs) and "low" (red and green LEDs) voltage to simplify color scanning. More generally you might need to manually specify bit-masks for each color, with one entry per pixel.

 

Extension 3: Cheating on current limits for brighter displays


So far, we've been strict about not exceeding the 40 mA current limit for our microcontroller pins, and the 100 mA peak instantaneous current for our LEDs (see Appendix 1). Can we relax this to achieve a brighter display?

I was hesitant to include this section because it involves doing technically unsafe things. You don't want to exceed the microcontroller specifications for a professional system, but as long as you're willing to risk destroying a hobby project, why not push things a bit.

If only one LED were lit per row, then we could allocate the full 40 mA current budget to this single LED. So, for sparse displays, it might be safe to reduce the current-limiting resistors a bit. If you enforce that no more than $K$ LEDs are ever lit from the same row, the safe current-limiting series resistor value is:
$$R_{\text{ch}} = \frac K {K+1} \frac {V_{\text{supply}} - V_{\text{LED}}} {I_{\text{pin}}}.$$

Strict current limits Average current Visible in diffuse daylight

What if we were to re-interpret the 40 mA/pin current limit as an average current draw? This should be ok if the main thing limiting the current is heat dissipation.

That said, we still need to keep the total current draw below 200 mA for the arduino. For projects with 6 or more contorl lines, you need to set the maximum average current per pin to $200/N_{\text{pins}}$ mA (this is $200/18\approx11$ mA for my project).

Each pin needs to source $I_{\text{peak}}$ of current as anodes $1/N$ of the time, for an average current draw of $I_{\text{peak}}/N$. This current needs to go somewhere, which means that all pins are also sinking $I_{\text{peak}}/N$ of current, on average. This suggests that we can (transiently) source up to
$$I_{\text{peak}} = \tfrac{N}{2} \cdot I_{\text{pin}}$$
of current. For 18 control lines and a 11 mA average current, this means we can (briefly) source up to 100 mA per pin. In this scenario, the average current per LED will be
$$I_{\text{peak}} = \tfrac{1}{2} \cdot I_{\text{pin}}$$
or 5.5 mA in this case. One can configure the current-limiting resistors for these new limits to increase display brightness:
$$R_{\text{ch}} = \frac 2 {N+1} \frac {V_{\text{supply}} - V_{\text{LED}}} {I_{\text{pin}}}.$$
For 2 V LEDs on a 5 V power supply, with 18 control lines and a 11 mA average current draw limit, this gives series resistors of 28 ohms.

White and blue LEDs can have a forward voltage up to ~4 V (but check the datasheet for your specific components), giving at resistor value of just 9.4 ohms. This is so low that I often omit resistors entirely. But beware, this could (in theory) brick the microcontroller! Also, If your display driver crashes without series resistors, it can send continuous current to a single LED and burn it out. However, I find that higher voltage LEDs can survive this, and I've never seen actual damage to the microcontorller with these higher current draws. Under no circumstances should you try this with red or yellow LEDs on a 5V system, as these can burn out if there is a software glitch.

 

Happy hacking (:


At this point, we have constructed a LED display and verified that the hardware works. We've written a display driver that uses row (or column) scanning, runs in the background using timer interrupts, and supports double-buffering.

The final thing you might want to do is modify the setPixel and getPixel routine to accept display coordinates and translate these in to coordinates on the Charlieplexing grid. This code depends on the particular layout of your project, so I don't include it here.

This is enough to start doing something nontrivial with the display. I'll end the main tutorial here, but provide some additional notes and a couple extensions at the end of this post. Here are some ideas, to get things started:
  • Implement text rendering to show messages on the display
  • Render psychedelic animations
  • Code up a Game of Life
  • Add support for serial communication to drive the display from a computer
If you'd like, please link to your own blinkylights projects in the comments. Stay safe, stay well, and remain indoors. Best,

-M




Appendices

 

Appendix 1: Current-limiting resistors


LEDs have two current ratings. The number we usually care about is the maximum continuous current, which is usually ~5-40 mA for discrete LEDs. When scanning multiplexed or charlieplexed arrays, however, we briefly turn LEDs on in sequence. In this case LEDs can handle slightly more current, and the peak instantaneous forward current is the number we need to use. This is usually given in the LED datasheet, and is often ~50-100 mA.

One can use this higher peak current limit to achieve a brighter display. I advise against this while prototyping, however: If the display driving code stalls, it could deliver excessive current and burn out the LED. This is especially the case with high-brightness red LEDs.

For Charlieplexing, however, LED peak current is not the limiting factor. Assuming we are using no additional hardware, the current limits for IO pins on the microcontroller itself are the limiting factor. The maximum safe current per pin on the Arduino is 40mA. When we drive multiple LEDs at a time, one should ensure that the total current does not exceed this.

Assume that all LEDs being driven have the same forward voltage $V_\text{LED}$. We can then calculate the total series resistance as if this were a single LED with forward voltage V drawing 40mA of current. We use the Equation V=IR. In this case we have Vpower = 5V. We set the current-limiting resistor based on the difference between the supply voltage and the LED voltage. We want to find R such that
$$ V_{\text{supply}} - V_{LED} = I_{\text{LED}} R,\qquad\text{i.e.}\qquad
R = \frac {V_{\text{supply}} - V_{LED}} {I_{\text{LED}}} $$
For a 5V supply current on the Arduino, a 40mA current, and a LED with a forward voltage of 2V, we get
$$ \frac {5V-2V} {40\,\mathrm{mA}} = 75\,\mathrm{\Omega}$$
Now, the fun part. This gives the total series resistance, not the value of each resistor. Since we're using the same control lines for the anodes and the cathods, we need to distribute this 70 ohm resistance over multiple resistors. One part of the resistance will come from the resistor on the anode. The remaining part will come from the network of resistors on the cathodes.

Resistors in parallel have decreased resistance. So, for the network of cathods, we need to divide our resistor value by the number of cathodes currently active. In the example project, I light up at most 9 LEDs at a time, so I would calculate this as 9 cathodes. We need to solve for a resistor $x$ such that
$$x + \tfrac 1 9 x = 70\,\mathrm{\Omega}$$
which, in this case, solves to:
$$x = \frac 9 {9+1} \times 70\,\mathrm{\Omega} = 63\,\mathrm{\Omega}$$

In summary, the following equations give resistor values for different LED driving scenarios:

For a single LED with sustained current rating $I_{\text{cont.}}$:
$$R_{\text{cont.}} = \frac {V_{\text{supply}} - V_{\text{LED}}} {I_{\text{cont.}}} $$
For single LED, briefly pulsed with peak current rating $I_{\text{peak}}$:
$$R_{\text{peak}} = \frac {V_{\text{supply}} - V_{\text{LED}}} {I_{\text{peak}}} $$
For driving $N$ LEDs simulataneously in charlieplexing from a pin with maximum current of $I_{pin}$:
$$R_{\text{ch}} = \frac N {N+1} \frac {V_{\text{supply}} - V_{\text{LED}}} {I_{\text{pin}}}.$$
To be absolutely safe when experimenting with a charlieplexing layout, set $R$ to the minumum of $R_{\text{ch}}$ and $R_{\text{cont.}}/2$. Once the display driver code is debugged and working reliably, once can relax these constraints.

 

Appendix 2: AtMega328 timer interrupts


This section is surveys different ways to use display-driver timer interrupts on the Arduino. The AtMega (datasheet) has three timers available for generating timer interrupts. On the Arduino, these timers are used for the following functions:
  • Timer0: for the Arduino functions delay(), millis() and micros().
  • Timer1: for the Servo library
  • Timer2: for the Tone library
  • All three timers are used for the PWM pins on the Arduino
Each timer counts upwards, from 0 up to to 255 for 8-bit timers (Timers 0 and 2), and up to 6555 for 16-bit timers (Timer 1). AVR chips support several different ways to trigger interrupts based on these timers. For further reading, there are many good introductions to AVR timer interrupts online (e.g. Adafruit, Oscar Liang, RobotFreak, Amanda Ghassaei, or Ankit Negi). I also found Nick Gammon's and Pramoth Thangaval's overviews of AVR interrupts helpful.

We need to scan the rows/columns the display at a suitably high rate to avoid a visible flicker. The typical "flicker fusion" frequency for humans is about 60 Hz in the fovea, but higher for some people and in the peripheral vision. A target a refresh frequency of about 200 Hz works well. For a display with $N$ rows/columns, one must scan the rows at $N\cdot200$ Hz. (For this project, I have 18 control lines so I need to scan the rows of the display at 3.6 kHz, or about every 280 microseconds, ideally.)

To control the scanning/refresh rate, we need to be able to specify the intervals between timer interrupts. There are a few ways to change the frequency of timer interrupts on AVRs.
  1. One can change the system clock rate: This is possible if working with bare AVR chips.
  2. One can change the timer prescaler: This is the multiple of the clock rate at which the timer "tics". E.g. a timer with a prescaler of 32 increments by one every 32 clock cycles.
  3. One can change the value of an "output compare" register, which sets when then next interrupt will trigger. This quite flexible, as it allows timer interrupts to be triggered in non-power-of-two multiples of the timer tic rate, e.g. every 33 tics.
Option (1) is unavailable on the Arduino, since the system clock is fixed. We also can't change timer pre-scalers (2) without disrupting other Arduino libraries, and using approach (3) to reset the timer at a certain value would also be disruptive.

All is not lost. It would be rare to use both the Servo and Tone libraries on a LED display project. This means that Timers 1 and 2 will usually be available. Smaller displays can also be run of Timer 1 without changing its configuration.

Timer interrupts as used by the Arduino environment on AtMega328-based boards


Let's look in detail how the various timers and associated interrupts are used by the Arduino library in the default configuration. The header file avr/interrupt.h defines the interrupts available on AVR microcontrollers. On the AtMega328 chips, we have the following timer interrupts:
Timer 0
  • TIMER0_OVF_vect Timer 0 Overflow is used for the millisecond clock in Arduino. On a 16 MHz Arduino Uno, this counter increments every 4 μs, and overflows every 1.024 ms.
  • TIMER0_COMPA_vect and TIMER0_COMPB_vect Timer 0 Compare Match A and B are both available, and can be used to register an interrupt routine triggered every every 1.024 ms on a 16 MHz system. This is sufficient to update a display with 16 rows/columns at 60 Hz, but only 5 rows/columns at 200 Hz.
Timer 1
  • TIMER1_COMPA_vect Timer 1 Compare Match A is unavailable if using the Servo library, which sets Timer 1 to increment every 2 MHz (0.5 μs) on a 16 MHz system. This is a 16-bit counter which overflows every 65,536 tics, approximately every 32.8 ms.
  • TIMER1_COMPB_vect and TIMER1_OVF_vect Timer 1 Compare Match B and Timer 1 Overflow are available, but set to trigger every 32.8 ms if using the Servo library. This is too slow for driving a LED display.
Timer 2:
  • TIMER2_COMPA_vect Timer 2 Compare Match A; This is unavailable if using the Tone library. When playing a tone, the Tone library may reset the value of Timer 2 after a certain value that depends on the tone frequency.
  • TIMER2_COMPB_vect and TIMER2_OVF_vect Timer/Counter2 Compare Match B and Timer/Counter2 Overflow are available, but the frequency at which these interrupts trigger depends on the tone being played by the Tone library, and might not trigger at all during tones. This makes it unsuitable for scanning a LED display.
In summary, we have the following options:
  • Strict compatibility with the Servo and Tone libraries leaves us with the ~1 kHz system Timer 1 output-compare interrupts A and B, which is sufficient for driving smaller displays.
  • If only using one of the Servo or Tone libraries, once can use Timer 2 and Timer 1 (respectively) for display updates. This is by far the easiest approach, provided you do not need to use these libraries.

 

Hacking Timer 1


Here is a hack to get ~4 kHz interrupts on the ~1 kHz Timer 1 without disrupting any Arduino timer functionality. This is compatible with the timer configuration for both the Servo and the Tone libraries (although things may break if you run out of CPU cycles to handle all interrupts promptly).

Use both output compare interrupts A and B with a phase offset of 64 tics. In the initialization code, set OCR0A to trigger on tic 0 and OCR0B to trigger on tic 64:
// Enable Timer 0 output-compare interrupts A and B
OCR0A = 0;
OCR0B = 64;
TIMSK0 |= 6; 

Move the display scanning code to its own subroutine e.g scan_display(). Call this subroutine from both the output-compare A and B interrupt handlers:

SIGNAL(TIMER0_COMPA_vect) {
  OCR0A += 128;
  scan_display();
}
SIGNAL(TIMER0_COMPB_vect) {
  OCR0B += 128;
  scan_display();
} 

Last but not least: change the output-compare register values in the interrupt handlers (e.g. OCR0A += 128). This causes each interrupt to be triggered every 128 tics rather than 256 tics.

Since we have two interrupts that trigger twice, this gives us 4 evenly-spaced interrupts during the Timer 1 cycle. For the default configuration on a 16 MHz AtMega*8-based board, this gives us a scan rate of 4.096 kHz. If you don't need a 4 kHz scan rate, you can just use one output-compare interrupt to achieve a 2 kHz rate with this trick.

When I tried to push this even further, for example trying OCR0A += 64 or OCR0A += 32, I ran into issues with flickering in the display. I'm not sure why; it might be related to the large (18) number of control lines that I'm using. You might want to experiment with this and let me know how it goes!