20151230

Better 3D graphics on the Arduino: avoiding flickering and tearing

A while ago I purchased a cheap TFT LCD Arduino shield from Ebay (e.g 1 2 3). The board arrived with no documentation. Disassembling the shield revealed no ICs, indicating that the driver is integrated with the LCD itself. Upon request, the vendor provided an archive containing a few amusingly translated datasheets, as well as a copy of Adafruit's Arduino LCD drivers. Evidently the product is a clone of the Adafruit LCD shields, and uses the ILI9341 LCD driver. I had hoped to use the display to show short animated GIF loops. This is not, in practice, possible. Test animations loaded slowly, with noticeable flicker and vertical tearing. The Arduino does not have enough speed or bandwidth to render full-screen animation frames, but what about 3D vector graphics?



Both optimizing ILI9341 LCD drivers and rendering basic wireframe meshes have been done before. XarkLabs provides an optimized fork of Adafruit's library. Youtube user electrodacus has also implementd an optimize driver for the ILI9341 communicating over SPI. Existing 3D wireframe demonstrations (e.g. 1, 2), even ones using optimized drivers, display a noticeable flicker when the animation updates. This flicker is caused by the delay between when the previous frame is erased and when the new frame is drawn.

Avoiding flicker and tearing There simply isn't enough processing power on the Arduino to render anything significant within one frame length, and there isn't enough memory to perform off-screen rendering. The ILI9341 supports a 16-bit RGB color interface with 320x240 pixels, and buffering even a single frame would take 150 kilobytes of memory, compared to the AtMega328's two kilobytes of RAM. The solution is to render animation frames in such a way that the intermediate rendering stages are not noticeable to the human eye. This can be achieved by rendering subsequent frames on top of previous frames without erasing, and then erasing only those pixels that have changed.


3D graphics engine on the Arduino: avoiding flickering and tearing.


Implementing incremental hand-over-hand erasing In order to erase only the pixels that have changed, we need to somehow remember which pixels contain background data, which pixels have been drawn in the current frame, and which pixels need to be erased from the previous frame. Storing a single bit for every pixel would require over 9K of RAM, too much for the AtMega328. At best, one might be able to store a single bit for all pixels within a small 100x100 region. But! the ILI9341 has plenty of RAM to spare. One can store a frame_ID flag in the low-order pixel bits of the color data itself. To erase old frames, inspect the pixel data, and check if the color data belongs to the current or to the previous frame. If it belongs to be previous frame, erase the pixel. This approach works very well, generating smooth, flicker-free 3D wireframe animations.

3D surface rendering: the visibility problem Wireframe rendering is fun, but what about 3D surface rendering? 3D surfaces can overlap themselves, and it is necessary to determine which polygons are in front to properly handle occlusions and overlap. Failure to do so will generate a jumble of triangles, with the back of an object drawn in front of the front and so on. A common solution to this problem is called Z buffering. In Z buffering, every pixel remembers the depth of the polygon drawn to it, and pixels are overwritten only if the new pixel data lies in front of the old. Another approach is to sort the polygons from back to front, and render them in order to an off-screen buffer. Neither solution is possible here, as there is not enough RAM to store either a Z-buffer or an off-screen rendering buffer.

The visibility problem for convex surfaces There is a simple trick to solve the visibility problem for closed convex surfaces, like cubes, spheres, polyhedra, etc. Such surfaces have a front, which is facing the camera, and a back, which is facing away from the camera. To avoid rendering the rear of the object on top of the front, it is sufficient to check whether each polygon faces the camera. This can be done by testing the sign of the z-component of the normal vector to each face. This heuristic is called back-face culling, and it cuts the average rendering time in half as the rear of the object can be skipped entirely.

The visibility problem for non-convex surfaces Non-convex surfaces may contain multiple camera-facing polygons that overlap. The solution is to avoid over-drawing foreground polygons by checking the frame ID bit already stored in the color data. When drawing a new frame, test the frame ID bit for each pixel. If the pixel comes from the previous frame or matches the background color, overwrite it. If this pixel comes from the current frame (i.e. has already been drawn), do not overwrite it. If the faces are drawn front-to-back, this procedure prevents foreground polygons from being overdrawn. Combined with the hand-over-hand approach for erasing the previous frame, this allows for flicker-free rendering of 3D surfaces. The much maligned bubble sort is actually a decent sorting algorithm to use to maintain the polygon drawing order. The code to implement it is small, it operates in-place, and because polygon remain mostly sorted after a small rotation the average runtime remains close to linear.

Modifying the Arduino ILI9341 drivers to support overdraw-avoidance and hand-over-hand erasing Both overdraw-avoidance and hand-over-hand erasing require reading pixel data back from the ILI9341. The Adafruit library version that the vendor provided did not implement reading pixel data for the ILI9341, and existing optimized ILI9341 Arduino drivers were designed for fast writing, not fast reading-and-writing. To achieve performant flicker-free 3D rendering, it was necessary to overhaul the ILI9341 driver. The modified drivers and 3D rendering engine are now hosted online at Github. A few tricks worth noting: (1) Convert I/O operations into direct reads and writes to PORTs and PINs, and combine multiple pin changes into a single PORT write. (2) Sacrifice color accuracy for speed. (3) Terminate commands and data-reads early (e.g. a command to set a screen subregion can be terminated after setting only the lower limit, leaving the upper limit as-is; Reading color data can be terminated after retrieving only the first byte) (4) Reduce flow-control and function call overhead in code "hoptspots" via inlining, converting subroutines to macros, and unrolling loops. (5) Read and write contiguous stretches of display memory at once to avoid the overhead of initiating reading and writing operations. (6) Optimize graphics primitive drawing routines for the Uno and the ILI9341, sacrificing portability in exchange for speed. (7) Forgo bounds-clipping and other luxuries.

Github repository The modifications to Adafruit's drivers necessary for frame-masked erasing and overdraw avoidance were extensive. The graphics primitives drawPixel, drawFastHLine, and drawLine, had to be overhauled read pixel data quickly from the display to supported masked erasing and overdraw avoidance. All other routines had to be extensively optimized to achieve good drawing performance, exhibiting a 4 to 14 fold speedup over the Adafruit drivers. The main demo sketch seen in the videos can be found here. These drivers were specialized for the Atmega328 (Arduino Uno) and the ILI9341 driver, and are not currently portable. However, much of the data wrangling is handled through macros which could potentially be redefined for other architectures or LCD drivers. Please feel free to borrow from this code, conceptually or literally.

Next steps? There's a lot of room to develop this further as a project. There are plenty of optimization puzzles to solve to accelerate graphics rendering. It would be cool to implement Phong shading, or implement a basic 3D game like Spectre. Supporting other LCD drivers or Arduino models besides the Uno could also be an interesting challenge. And of course, maybe we can eventually return to the problem of rendering animated GIFs on the Arduino.

11 comments:

  1. Amazing work. You have confirmed my theories about the need for pixel-for-pixel "clean up" on these drivers. I have only tripped on the tip of the iceberg, making simple shapes followed by the background color replacement. This is instead of wiping the whole screen(large flicker). I was worried about memory moving forward with storing layer colors and boundaries. This helps in my design for sure.

    ReplyDelete
  2. Thanks! Downloaded and will test.

    I have a nice Elegoo 2.8" shield unit which I modified by adding header pins to CON1. Primarily for access to the built-on SD card reader.

    Reading SD BMPs 240x320 (rotated to landscape) works fine. Rendering them is really slow. I was able to shave off 3 seconds for a 16bit RGB5 image.
    As you have done I modified the existing libraries to remove spurious code and focus only on the UNO. RAM is also an issue as I can only load 222 entries into the palette table that 16bit BMPs use.

    Will try your libraries and post what I find.

    Yours,
    WP

    ReplyDelete
  3. does this support arduino zero (it's 32 bit, 48mhz, 32kb ram, 256kb flash)
    (just for comparison the uno is 8 bit, 16mhz, 2kb sram, 32kb flash)
    cuz i think it may be strong enough for animations (if you'll enable them in the library) and large scenes (which i can add)

    ReplyDelete
    Replies
    1. Hmm; The driver for the TFT display would need to be re-written, since it is currently configured only for the AVR chips. Zero uses a Cortex processor. It looks like Zero probably does not have enough RAM to double-buffer the video, so using the pixel data to store a frame ID bit might be necessary. On the other hand, the Zero should be able to draw and erase very quickly, I wonder if there is a way to write data to the screen fast enough to avoid the flicker?

      Delete
  4. Damn fine work if you ask me. Incredible skills using the same flag bit to handle both buffering and occlusion detection. I am working on a project to display 3D Graphics on an 64x32 RGB LED Matrix display and want to use a prebuilt 3D transformation library to handle transformation matrices for rotation. Does your software include an open source linear algebra package for Arduino? Thanks!

    ReplyDelete
    Replies
    1. I do not know if it will be easy to integrate with your work, but the file Arduino_3D.cpp contains the rotation code.

      Scroll down to the comment "Routines for creating, rotating, and applying axis transformations". The function "transformPoint" applies a 3x3 rotation matrix to the vertex coordinates. The model's points are stored in PROGMEM and then a 3D rotation matrix is applied before rendering. I didn't include code to generate specific rotation matricies, but the function "rotateTransformXY" is used to update the (initially identity) transformation as the user swipes left/right/up/down on the touchscreen.

      You might be able to modify "Arduino_3D.cpp/.h" for your project. But, it was tightly integrated into this graphics demo sketch and it might be easier (or more rewarding) to re-write parts of it from scratch ( :

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Excellent. I will definitely check it out. I once wrote a spinning cube program for a Tandy TRS-80 Model 100 using the built-in Basic interpreter. Your code sounds like exactly what I need to refresh my memory regarding linear transformations. Thanks for sharing !

    ReplyDelete
  7. This is extremely helpful to know what is capable on an standard Arduino board. It just goes to show what a huge overhead cost is imposed by modern operating systems and the kind of raw Computational performance is possible, the closer you are to the bare metal.

    ReplyDelete
    Replies
    1. Yes! The emulated (32-bit??) floating point is alarmingly fast. I tested re-coding things in terms of fixed-point. It was about 10x faster for a cube wireframe on a small 8x8 LED screen, but didn't make much of a difference on the LCD since sending pixels to the screen was the bottleneck.

      Delete