Category Archives: Coder

Breaking the constraints of Auto Layout

Since iOS 6, Auto Layout has been the preferred way of arranging your UI components across iPhone and iPad. With iOS 8’s new size classes, its importance only grew. Achieving your designs by thinking in constraints is tricky, so I thought I would give an example of a non-trivial design problem I solved.

The desired result

At first blush, this seems simple; just add spacing constraints to each UI element relative to each other, and the UIDatePicker will expand and contract horizontally as necessary. But what about localization? If the sizes of the labels change, text will be truncated and it’ll look terrible. We also can’t know which label will be the widest.

What we need to do is model these labels as a system of inequalities. Each of the labels’ left edges should be at least 0 points from the left margin. But what is to stop them all being over 0 points from the left margin? What is to stop this:

Too much space!

The UIDatePicker fits to its intrinsic content size, but that leaves our labels hanging out in the middle of nowhere. The solution is to add a lower-priority constraint to each label:

Screenshot 2015-04-22 21.13.24

Screenshot 2015-04-22 21.14.26

Clearly they can’t all be satisfied, but that’s OK; Auto Layout satisfies the highest-priority constraints before those of lower priority. The inequalities ensure we have no ambiguity.

I didn't ask for no left alignment

OK, that kinda worked, but why are my labels left-aligned? What actually happened here is the labels have been stretched out so their left edge is on the margin and their right edge is 8 points away from the text field (or date picker). This isn’t such a big deal for labels since we can right-align them, but for any other control that might display a frame or background, this wouldn’t work out.

When you really need that ragged left edge, increase the Content Hugging Priority above the priority of the constraints we set earlier, for setting the leading space equal to 0. I chose 900. This means that taking the intrinsic width of the object is more important than increasing its width.

Done? Right?

So you might think we’re done, but look what happens if I go to the iPhone 4-inch size:

Oops!

Oof, that’s bad. UIDatePicker is trying to get as much space as it can, to the detriment of our labels. We can fix this by increasing the Content Compression Resistance Priority to 900, so the labels will take the space in preference to UIDatePicker. This is the corresponding opposite of Content Hugging Priority; how important is it to maintain the original width, and when can we shrink it?

Alternatively, since UIDatePicker is trying to preserve its own desired width, we could could achieve the affect by reducing the date picker’s Content Compression Resistance – thereby making its desired width lower priority than the labels’.

UIDatePicker doesn’t really need as much space as it’s asking for, anyway – here’s a screenshot in the iOS Simulator for an iPhone 4S:

Tada

Looks like I need to work on that top margin, huh?

Hopefully I’ve demonstrated the most important tools at your disposal for creating responsive layouts in iOS: inequalities, priorities, content hugging and content compression resistance.

You can get more advanced by controlling the constraints at runtime, but the further you can go with purely design-time constraints, the easier it is to manage their complexity.

Make your C++ interfaces ‘trait’-forward

Having tried the Rust language recently, I was glad to see the lack of typical object-oriented concepts in favour of a greater commitment to parametric polymorphism (a.k.a. generic programming). Its means of doing so – traits – is subtly different to what most programmers are used to, yet some of their advantages can be implemented right now in your C++ code. Applying their principles will help you define flexible interfaces, avoid cryptic template error messages, and adapt legacy code with ease.

As OOP goes through its final death, what will the prevailing way of modelling our software look like? It certainly looks like we are converging towards functional programming, with the addition of lambdas in nearly every language but C and many modern languages supporting type inference and/or generics. Some good efforts have been made with declarative programming, the most common example in use today being SQL but C# has made a heroic effort with LINQ. But there will always be a need for algorithms that require a certain set of permitted operations on the data types they use.

Most familiar languages (Java, C#) achieve this with interfaces, which define a set of functions required to achieve some goal or property of an object. For example, IList is an interface that lets you add, get and remove objects contained within another object. The author of the type must specify the interfaces supported by that type. If the type comes from a library, for example, you can’t make that type implement an interface you have created without resorting to the Adapter pattern.

interface IList
{
    Object getObject(int index);
    void setObject(int index, Object newValue);
}

class MyFancyArray : IList
{
    Object getObject(int index) {
        // code to get an object at the given index
    }
    void setObject(int index, Object newValue) {
        // ...
    }
}

Interestingly, Go has a few features that makes this easier – anonymous fields and implicit interfaces – but you still have to define a new struct to implement the interface for an existing type, then wrap instances of that existing type in your new interface-implementing type. This may sound like a palaver, but in the absence of generics, the worst problem you can have is verbosity.

Haskell has a different concept of typeclasses. A typeclass is also a set of functions required to implement a particular concept. However, Haskell types are not ‘objects’ and do not come with ‘methods’, so typeclasses have to be defined separately from the type. Importantly, this means anyone can define a typeclass and make it support any type, even if you didn’t write it or it comes from a library.

class Eq a where
  (==), (/=)            :: a -> a -> Bool
  x /= y                =  not (x == y)

instance Eq Foo where
   (Foo x1 y1) == (Foo x2 y2) = (x1 == x2) && (y1 == y2)

Rust’s traits are very similar to typeclasses. They are the primary method of dynamic dispatch as the language, like Haskell, does not support inheritance on its record types. But the value they bring to generic functions using static dispatch can be applied to C++ using templates, without sacrificing the convention of implicitly typed interfaces. Let’s see an example.

template <typename T>
class GraphicalTrait {
public:
  GraphicalTrait(T& self_) : self (self_) {}
  int getWidth() const { return self.getWidth(); }
  void setWidth(int width) { self.setWidth(width); }
  void render() { self.render(); }
private:
  T& self;
};

This trait is an abstraction of a graphical item, defining three methods. The default implementation proxies the call to the referenced object – the aforementioned implicit interface – but by specializing the template we can implement the trait for a type that does not have the getWidth etc. methods.

template <>
class GraphicalTrait<BigBallOfMud> {
public:
  GraphicalTrait(BigBallOfMud& self_) : self (self_) {}
  int getWidth() const { return self.GetWidgetWidthEx(); }
  void setWidth(int width) { self.SetWidgetSizeNew(width, self.GetWidgetHeightEx()); }
  void render() { cout << self.WidgetTextRepresentation << endl; }
private:
  BigBallOfMud& self;
};

The general concept is more important for legacy code or adapting libraries; new code can be written that does not require a major upheaval of existing code, allowing your code base to be refactored gradually.

What benefits does this give us over typical implicit interfaces in C++? Documentation and API usage is easier, because defining the trait interface forces you to enumerate all of the operations you are expected to implement. Writing code using the trait is easier, as it makes code completion possible. And code comprehension is improved because it is easier to reason over a smaller interface (that which defines the trait) than over a larger one (a whole class).

How do we write code that uses GraphicalTrait? Using templates, of course, but the devil’s in the details:

Option 1: Accept any type as a parameter, and create an instance of GraphicalTrait in the function body. This is easiest for users of your function, as they can directly pass an instance of their type implementing the trait you are using. However, like most uses of templates in C++, the function signature does not enforce the interface the type is expected to implement.

template <typename GraphicalTraitType>
void embiggen(GraphicalTraitType& graphicalObject) {
  GraphicalTrait<GraphicalTraitType> graphicalTrait(graphicalObject);
  graphicalTrait.setWidth(graphicalTrait.getWidth() * 2);
}

The process is easier if you have a class where the trait reference is a member; in this case, you can initialise the member in the constructor and there is no more code than if it were a standard reference type.

Option 2: Accept a GraphicalTrait as a parameter, callers construct a GraphicalTrait. This makes the interface explicit but callers are forced to specify boilerplate, as C++ template deduction rules do not take into account implicit conversions.

template <typename T>
void embiggenedRender(GraphicalTrait<T> graphicalObject) {
  auto oldWidth = graphicalObject.getWidth();
  graphicalObject.setWidth(oldWidth * 2);
  graphicalObject.render();
  graphicalObject.setWidth(oldWidth);
}

embiggenedRenderer(GraphicalTrait<CBWidget> (myCBWidget));

People with an interest in the drafting of C++11 may have heard of concepts; what I just described is concept maps. It is a shame that they never made it into the language; it is a convenient way of adapting existing code, and without the syntactic sugar offered by a language like Rust, the result will always be a little clunky for the code producer or consumer. On the other hand, this clunkiness is no different to most other C++ features!

Rust in pieces: the first 15 minutes

With the Rust language nearing its 1.0 release and many (but not all) APIs starting to stabilise, now seemed like a good time for someone with little time to play around with it.

Coming from a C++ background, the promise of powerful zero-cost abstractions and compiler-enforced lifetime management is certainly appealing. In practise, is it too much effort to be useful for the average performance-sensitive application? I hope we can extrapolate from some toy examples, because that’s all I’ll be able to manage just now!

If you want a tl;dr of my thoughts so far:

  • Compiler warnings are on a par with modern compilers for other languages (gcc, clang/xcode)
  • Type inference is great and cuts down on a lot of boilerplate.

  • The borrow checker is annoyingly merciless.

  • Type inference without garbage allocation makes it really easy to incur undefined behaviour everywhere.

  • The borrow checker fixes this.

  • What one hand giveth, the other taketh away, but there is a net positive: you’re safe from use-after-free bugs and data races.

If you are interested in how I arrived at these conclusions, then read on.

First, define a problem: Read a file and print back the unique lines in that file together with the number of times each line occurred. Should be easy, right? Let’s get some sample input, fruit.txt:

apple
banana
cherry
banana
apple
apple
damson
cherry
banana

And our program should output some permutation of the following:

apple: 3 time(s)
banana: 3 time(s)
cherry: 2 time(s)
damson: 1 time(s)

Let’s see how quickly we can get such a program together in Rust.

Continue reading

Don’t put self-documenting code on a pedestal

I don’t look at comments, they’re always out-of-date. Yogi Berra Co-worker

Good coding discipline is important, and we have many rules of thumb to guide us as we design and write or programs. We should also remember that blindly following your heuristics is unwise; there is no silver bullet.

The ends usurped by the means

I’d like to draw your focus towards the notion of self-documenting code. I often hear programmers telling others how hard they try to make their code self-documenting. I think we all know at least one occasion where we or someone we know has, with much pride, got rid of several comments by careful variable naming. But rarely have I heard someone mention making an effort to produce easy to understand code.

Isn’t it the same thing, you may ask? They may be correlated, but you can have one without the other. Poorly-designed software can be full of code which is self-documenting when taken in isolation, but the way modules are combined and the choice of algorithms to accomplish each task can be utterly confusing in a macroscopic view of the program. Sometimes you need to know why the code does what it does, but the self-documenting approach only tells you how. Conversely, literate programs are mostly documentation and can be very easy to understand.

Reach for the stars, fall in the gutter

Most commonly, code is none of these things, and often ripe for improvement. During some code review, we looked at the following block of string processing code which is responsible for grouping together strings recursively by the bytes within each character until it can no longer be subdivided, then processing each group found. (I expanded the example from the original posting date to give some more context here.)

void groupStrings(pair<int, int> currentGroup, bool upperBitsCleared,
                  int currentBits) {
  vector<pair<int, int>> stringGroups;
  // code to populate stringGroups elided
  firstElement = 0;
  if (upperBitsCleared && currentBits == (MAX_BITS - 1)) {
    // this implies we have reached the end of the strings in the first list,
    // which is our base case
    finishProcessing(stringGroups[0]);
    firstElement = 1;
  }
  for (int i = firstElement; i < strings.size (); ++i) {
    groupStrings(stringGroups[i], upperBitsCleared && i == 0, i);
  }

The proponents of self-documenting code among me seized on the condition and associated comment. “Can we make this more readable?” Their suggested outcome was to extract the conditional expression into a method:

void groupStrings(pair<int, int> currentGroup, bool upperBitsCleared,
                  int currentBits) {
  vector<pair<int, int>> stringGroups;
  // code to populate stringGroups elided
  firstElement = 0;
  if (isEndOfString(upperBitsCleared, currentBits, MAX_BITS)) {
    finishProcessing(stringGroups[0]);
    firstElement = 1;
  }
  for (int i = firstElement; i < strings.size (); ++i) {
    groupStrings(stringGroups[i], upperBitsCleared && i == 0, i);
  }

However, the new name is unspecific. Only the first element of the array ‘stringGroups’ contains strings that have no more characters to process; it is not true for the entire array, or even any of the input arguments! What did we really gain from this? According to the teachings of ‘Uncle Bob’, we avoided failure. In his book Clean Code, he says,

The proper use of comments is to compensate for our failure to express yourself in code. Note that I used the word failure. I meant it. Comments are always failures.

The implication in the rest of the chapter is that anything you can do to remove that comment is automatically an improvement, with exceptions (and this isn’t one of his exceptions). However, what happens if you don’t have the ability to accurately judge whether you really expressed that comment in your code?

It’s easy to tell yourself something is obvious and doesn’t need to be explained once you’ve already gained that understanding. Of course we know ‘isTerminal’ relates to the first element, and that this code handles the base case of our recursive algorithm. But how about six months from now? After a dozen bug fixes and feature requests? When code have been added or changed, is it still going to be that obvious?

Another of Uncle Bob’s teachings is the Boy Scout rule: “Always check a module in cleaner than when you checked it out.” The point of the rule is to encourage gradual refactoring of the code into an easier-to-understand form. However, this doesn’t mean any effort is positive: if you vacuum a carpet, but wear muddy boots doing it, only the gnarliest of dust bunnies can justify the damage caused. Even if you believe you’re improving the code, you cannot claim to follow the rule if you lack the perspective to evaluate that improvement.

How odd I can have all this inside me and to you itโ€™s just words

The paraphrase at the top of this essay was said at this code review, and is the reason why I wrote this at all. I concede that nobody forces you to update comments when the code surrounding it changes. Yet comments and identifiers have more in common than you think.

The biggest criticism of comments – that their content does not match reality – is equally true of identifiers. The compiler cannot check the syntax of language used in your identifier name. It cannot warn you when you are storing the end of the range in a variable named ‘start’, nor the width of an object in a field named ‘height’. It cannot tell you when you changed the behaviour of a function, and its name is no longer accurate.

Worse, you are further constricted by the length you can reasonably give an identifier. There’s a reason why nobody writes essays on Twitter: not every concept can be expressed in 140 characters or less.

If one of your axioms is that comments are worse than any code you could write, you are making a mistake. The only alternative to comments is using appropriate variable and function names. It is said that there are two hard problems in computer science, cache invalidation and naming things; it takes a lot of hubris to believe you can name things as well as you can describe them.

Regardless, you must try to name things as well as you can, but be careful. The capacity of a bad name to confuse and mislead the reader is boundless. A bad name unexplained is worse than no name at all.

Excuse me, I believe you have my stapler

This isn’t even considering the benefits of inlining code at its only call site rather than creating an elaborate tree of indirections throughout the source. Unfortunately this is a tough sell, as self-documenting code’s only means of describing blocks of code is through named functions. Take that away and what is left? But even to try and use functions this way is a mistake in my opinion.

Organising everything into small chunks of code works well when your code is perfectly designed and the problem it is trying to solve can be reduced to a series of independent processes. Often, you can ensure the code is reusable, or maps a set of inputs to a set of outputs in an easily-understandable way. It works because the code is simple.

In most real projects, we have to endure incomplete design or unmanaged inherent complexity in the problem domain. For these, it’s a suboptimal strategy unless you are simultaneously removing that complexity. Not all functions extracted from a block of code are fully independent from the code you extracted it from, usually because it expects its inputs to be in a particular state. Extracting that function does not remove complexity from that code. Instead it spreads it around, making it harder to understand why the function does what it does, and allowing other developers to use a function in the wrong context, potentially introducing bugs.

If your goal is to organise the code, commenting does this without removing the spatial context of the code it is complected with. Save functions for eminently reusable code. Functions imply reusability and the movement from one level of abstraction to another. Hijacking them for the purpose of code organisation is a misuse of structure. Structure is important: that’s why the GOTO wars were fought.

Here is an egregrious example from Uncle Bob, where a 10-line function does something more than its name suggests. Rather than documenting its behaviour, it is exploded into an English-language DSL that buries the unique feature of the class not described in its name – replacing a symbol at most once – three levels deep in the call stack. Local variables become class members, and since any method can alter a class member, we have increased the overall complexity of the original code. If trying to document your code makes it more complex, you are probably doing something wrong.

I hope most programmers are not as extreme as him, or at least not influenced too much. It is at least illustrative of the problem that self-documenting code is no replacement for good documentation. In the instance where you are trying to replace a comment, you are merely going from one form of documentation to another – and in a medium where it is harder to express yourself. This is not to say we shouldn’t try to make our code explain itself, but that it is only one of many means to the true goal: to write code that is easily understood and remains so for years to come.

Debugging Box2D with pixi.js

I’m creating a game using Pixi.js for rendering and Box2D (via box2d.js) for rigid body physics. If you’re trying to debug collision issues you really need to see what Box2D thinks your world looks like, which means implementing the b2DebugDraw class. This requires a lot of pointer wrapping, but fortunately the box2d.js repo provides a sample for rendering to a canvas. This interface is actually how the demo is rendered.

It turned out to be fairly simple to adapt the code to pixi.js, which provides a canvas-like Graphics class. You can find my code as a gist. The recommended way to use Box2D is to scale from MKS (metres, kilograms, seconds) units into screen pixels, so my version incorporates a scaling factor.

Clarity and bug prevention using phantom types

It’s inevitable that you will use similar data types and structures to represent different concepts in your code. For example, a string might represent raw data from an HTTP request, but it might also represent data which has been normalised and encoded for display in a web browser.

Often the context is clear as to what concept the types map to – the function and parameter names tell you. But what happens when multiple concepts map to the same basic data types? In a slowly-changing code base, how do you make sure you don’t let data that means one thing from being treated as if it means something else?

Let’s take a trivial example in C++. Imagine trying to get a row of records from a large table, but you only want data from a subset of the fields in that table. Here’s a likely class design for the table:

class Table {
  // various details elided
  vector<Value> getRow(size_t row, const vector<size_t> &indexes);
};

Insert row number, and for each column you specify by index, you get the value corresponding to that column. Simple enough; over the next few weeks you write a bunch of code which refers to the columns by index.

Later, the table/column design is extended so that columns correspond to a distinct field; that way, tables containing the same field can share data. Now in many places it’s easier to access the fields by their unique identifier instead of needing a table/column pair. Your Table class gets updated:

class Table {
  // various details elided
  vector<Value> getRowFromIndexes(size_t row,
                                  const vector<size_t> &indexes);
  vector<Value> getRowFromFields(size_t row,
                                 const vector<size_t> &fieldIDs);
};

Next, you change a function which used to use column indexes and instead use field identifiers. Did you remember to update all of the getRowFromIndexes calls to getRowFromFields? What about in functions called by the function you just changed? The compiler isn’t going to tell you if you forgot – there are only two differences between getRowFromIndexes and getRowFromFields: a name and an implementation, and the compiler only cares that the name you picked exists. Congratuations on your new potential bug.

Even worse, if your field identifiers are in the same kind of range as your field indexes, you might not even get warned about it by array bounds checking or segfaults, silently returning bad data. I speak from experience; this is not a fun thing to deal with.

The ideal solution to this is to automate the process of verifying that the semantics of data passed to a function match the semantics of the data the function expects to receive. If that doesn’t seem familiar to you, ask yourself something similar:

Can you automate the process of verifying that the structure of data passed to a function matches the structure of the data the function expects to receive?

The answer is yes – the compiler makes sure you can’t pass an int to a function that expects a string, nor will it let you store an instance of the SquarePeg class in a value of RoundHole unless there is a valid constructor or conversion operator for that. All we need is a simple way to embed semantics in structure, and that’s exactly what phantom types do.

Phantom types are merely a way to annotate a data type in such a way that the type system considers it distinct from the original data type, and for other instances of that data type that have different annotations. In our Table class, we can make the compiler prevent column indexes and field identifiers being passed to the wrong functions:

class Table {
  // various details elided
  vector<Value> getRowFromIndexes(size_t row,
                                  const vector<ColumnIndex> &indexes);
  vector<Value> getRowFromFields(size_t row,
                                 const vector<FieldIdentifier> &fieldIDs);
};

In fact, since these functions are now different types, we can even overload them safely:

class Table {
  // various details elided
  vector<Value> getRow(size_t row, const vector<ColumnIndex> &indexes);
  vector<Value> getRow(size_t row, const vector<FieldIdentifier> &fieldIDs);
};

And now we don’t even have to care what data type we have; the correct implementation of getRow will be used whether we have references to columns or to fields.

The question remains, how do you generate these phantom types? Languages with better-developed type systems like Haskell and Rust make it almost trivial, but you can still do it in C++ with a little work. It’s not as simple as creating a new class which inherits from the class you want to parameterise (which in any case precludes you from using fundamental types) as you need to obey a few rules:

  1. You cannot create a phantom type directly from another phantom type with the same type parameters. Hopefully very obvious, otherwise the types are not policy but merely comments.
  2. You may only create the phantom type explicitly. In the example, you shouldn’t be able to pass any old integer to a function expecting a ColumnIndex; there should be only a few approved sources of ColumnIndexes which sanitise their inputs.
  3. You may only get the underlying value in the phantom type explicitly. This is effectively the reverse of the above rule. Most function accepting ints are not appropriate for a ColumnIndex, and without this restriction you can accidentally circumvent rule 1. Unpack it at the lowest level at which you really need the value and you will minimise the risk of misusing the value.

While it sounds restrictive, it does not preclude you from using two instances of a phantom type with each other. There is no problem concatenating two UnsafeStrings or two SafeStrings, but concatenating an UnsafeString with a SafeString should be considered a mistake by default. Similarly, member access of a value in a phantom type is also fine.

If you want to try this out for yourself, there is a good discussion on Programmers Stack Exchange how you might actually achieve this in C++. The generic wrapper which lets you use operators with two instances of the same phantom type can be found in this paste and makes heavy use of SFINAE, so be brave!

Fast perspective-mapped textures

[latexpage]

Texturing your quad

Scanline rendering of a quad

Scanline rendering is a common method of drawing polygons. A quad is a polygon, therefore we’ll scanline render it. If you’re unfamiliar with scanline rendering, it rasterises graphics one (axis-aligned) line at a time. How exactly you set each pixel is up to you: if drawing a single polygon, you can use standard line-drawing algorithms like Bresenham’s; if you want to render adjacent polygons, you probably only want to set pixels that lie within or on the polygon’s edges. And when antialiasing, you’d probably take another approach entirely. If you are interested in the details, check out chapters 38-42 of the Graphics Programming Black Book.

If you’re paying attention, you’re probably now thinking “can’t we map an x, y coordinate in screen space to find alpha and beta, and thus the texture coordinates at a given pixel?” The intuitive approach is to do what we did in part 1, except instead of drawing a line between two points on opposite sides given a coefficient, we draw a horizontal line at y and find the intersections. Let’s start by rearrange our equations from part 1 to solve for $ \alpha $ given $ \vec{AB} $:

\begin{eqnarray*}
\vec{AB_{\alpha}} &=& \frac{ (1 – \alpha ) \frac{ \vec{A} }{ z_A } + \alpha \frac{ \vec{B} }{ z_B } }{ (1 – \alpha ) \frac{ 1 }{ z_A } + \alpha \frac{ 1 }{ z_B } } \\
\vec{AB_{\alpha}}( (1 – \alpha) \frac{ 1 }{ z_A } + \alpha \frac{ 1 }{ z_B } ) &=& (1 – \alpha ) \frac{ \vec{A} }{ z_A } + \alpha \frac{ \vec{B} }{ z_B } \\
\alpha ( \frac{ \vec{B} }{ z_B } – \frac{ \vec{A} }{ z_A } + \frac{ \vec{AB_{\alpha}} }{ z_A } – \frac{ \vec{AB_{\alpha}} }{ z_B }) &=& \frac{ \vec{AB_{\alpha}} }{ z_A } – \frac{ \vec{A} }{ z_A } \\
\alpha &=& \frac{z_B(\vec{AB_{\alpha}} – \vec{A})}{z_A(\vec{B} – AB_{\alpha}) + z_B(\vec{AB_{\alpha}} – \vec{A})}
\end{eqnarray*}

Perform appropriate subsitutions for the other lines of the quad. If the coefficient is not in the range [0, 1], it does not intersect the horizontal line at y; however it’s fastest simply to compare the y coordinates of each point to determine what will intersect.

Obviously, the u, v coordinates will vary non-linearly along the horizontal line due to the perspective transform, so we must compute the depth of the intersection points too. You already have that formula from part 1, however ๐Ÿ™‚

It’s affine transformation, but I want perspective

[latexpage]

I’m writing a little software rasteriser which lets you apply an arbitrary transformation to a 2D image. This includes both translation, rotation, scaling and shearing (the affine transformations) as well as full perspective mapping.

There are a few tricks to do quick affine transformations which I won’t go into here; they’re pretty easy to find, online and in books. Essentially, you exploit the fact that opposite sides are parallel in the quadrilateral (or quad) that defines the boundaries of the transformed image.

What is harder to find is dealing with perspective mapping; at least, in a 2D context.

The mathematics of perspective

A perspective-mapped rectangle has one or two vanishing points; the point at which opposite sides of the rectangle meet. Objects which are regularly spaced apart will appear closer together the further away they are, such as the sleepers on a railway track:

Railroad at Sunset

Source: 1

The further back you go, the smaller everything gets. More specifically, the apparent size of an object is inversely proportional to its distance, reaching its limit at the vanishing point where everything is infinitely small. Obviously we can’t just linearly interpolate between the corners of our quad, like we can with an affine transformation; we need to include the depth information in our calculation. The interpolation equation (shamelessly lifted from Wikipedia) :

$u^{}_{\alpha}= \frac{ (1 – \alpha ) \frac{ u_0 }{ z_0 } + \alpha \frac{ u_1 }{ z_1 } }{ (1 – \alpha ) \frac{ 1 }{ z_0 } + \alpha \frac{ 1 }{ z_1 } }$

where $ 0 \le \alpha \le 1$ with endpoints specified by $u_0$ and $u_1$.

Extruding flatland at an angle

So that’s great, but how exactly do you get that depth variable $ z $ from a 2D quad? Consider the above equation. When $z_{{0,1}} = 1$ it is equivalent to linear interpolation. We know that the vanishing point is asymptotic and that the apparent size is inversely proportional to distance (or depth) – the latter is accounted for in the interpolation equation, so we need only select appropriate reference points.

Setting $ z_v = 0 $ to represent the vanishing point will give us an asymptote via division by zero. Setting $ z_0 = 1 $ will leave the point on the quad furthest from the vanishing point unchanged; now all we need is to find the depth of the point closer to the vanishing point $ z_1 $ where $ z_v \leq z_1 \leq z_0 $.

Cross products and angry merchandise

We can find out if there is a vanishing point by using the vector cross-product. If you’re familiar with the vector cross-product, feel free to skip a few paragraphs. Otherwise, here is its definition:

$
\vec{u} \times \vec{v} = (u_y v_z – v_y u_z) \mathbf{i} + (u_z v_x – v_z u_x) \mathbf{j} + (u_x v_y – v_x u_y) \mathbf{k}
$

This calculates the vector perpendicular to both $ \vec{u}$ and $ \vec{v}$ in three dimensional space. You will note that, in two dimensions, the z component is always zero, leaving us with the simplified equation

$ \vec{u} \times \vec{v} = (u_x v_y – v_x u_y) \mathbf{k} $

which means it is also the magnitude of that vector. It can be considered a measure of ‘perpendicularness’, as is apparent when taking the cross-product of the unit vectors: $ \mathbf{i} \times \mathbf{j} = 1$ and $ \mathbf{i} \times \mathbf{i} = 0$.

Diversion over. Take the vectors of two opposite sides of our perspective-mapping quad:

\begin{eqnarray*}
\vec{AB} & =& (b_x – a_x, b_y – a_y) \\
\vec{DC} & =& (c_x – d_x, c_y – d_y) \\
\vec{OA} & =& (a_x, a_y) \\
\vec{OD} & =& (d_x, d_y)
\end{eqnarray*}

where $ \vec{OA}, \vec{OD} $ is the vector from the origin to points A and D, respectively. If the cross-product $ \vec{AB} \times \vec{DC} = 0 $ then the lines are parallel and there is no vanishing point; otherwise we may find the intersection by calculating the point at which the two line equations are equal:

$ \vec{OA} + \gamma \vec{AB} = \vec{OD} + \delta \vec{DC} $

Cross both sides by $ \vec{DC} $:

\begin{eqnarray*}
(\vec{OA} + \gamma \vec{AB}) \times \vec{DC} & =& (\vec{OD} + \delta \vec{DC}) \times \vec{DC} \\
\vec{OA} \times \vec{DC} + \gamma \vec{AB} \times \vec{DC} & =& \vec{OD} \times \vec{DC} + \delta \vec{DC} \times \vec{DC} \\
\gamma & =& \frac{(\vec{OD} – \vec{OA}) \times \vec{DC}} {\vec{AB} \times \vec{DC}}
\end{eqnarray*}

Making use of the cross-product identity $ \vec{DC} \times \vec{DC} = 0 $.

Now we have our value for $ \gamma $ we may substitute it into the first line equation to find our intersection point, aka. the vanishing point:

$
(v_x, v_y) = \vec{OA} + \vec{AB} \frac{(\vec{OA} – \vec{OD}) \times \vec{DC}} {\vec{AB} \times \vec{DC}}
$

Note that if $ \gamma $ is negative then the intersection lies in the direction opposite $ \vec{AB} $, making B the farthest point from the vanishing point and therefore the nearest point to us.

The depth equation

Now we may calculate the depth of the farthest point $ f $ by linearly interpolating between the nearest point to us (A or B as found in the last paragraph; referred to in the equations as $ n $) and our vanishing point $ v $, and applying our depth formula. Remember that $ z_v = 0, z_0 = 1$ .

\begin{eqnarray*}
f & = z_1 n + (1 – z_1) v \\
& = v + z_1 (n – v) \\
z_1 & = \frac{f-v}{n-v}
\end{eqnarray*}

Tying it all together

Two-point Perspective, depth labelled

With two-point perspective, the depth is the same along each parallel edge, with the longest at $ z_0 $ and the other $ z_1 $. However, three-point perspective places each corner at a different depth. We calculate the depth of the two points on the same line as the nearest point, and then calculate the final point’s depth as the product of the two adjacent to it. (Two-point perspective is a special case of this method – can you see why?)

Three-point Perspective, depth labelled

Luckily for us, depth along these lines is a linear function – so we now have everything we need to map perspective. First, find the line upon which our u, v coordinates lie by interpolating by $latex \alpha $ on opposite sides of the quad.

\begin{eqnarray*}
\vec{AB_{\alpha}} &=& \frac{ (1 – \alpha ) \frac{ \vec{A} }{ z_A } + \alpha \frac{ \vec{B} }{ z_B } }{ (1 – \alpha ) \frac{ 1 }{ z_A } + \alpha \frac{ 1 }{ z_C } } \\
\vec{DC_{\alpha}} &=& \frac{ (1 – \alpha ) \frac{ \vec{D} }{ z_D } + \alpha \frac{ \vec{C} }{ z_C } }{ (1 – \alpha ) \frac{ 1 }{ z_D } + \alpha \frac{ 1 }{ z_B } } \\
z_{AB} &=& ( 1 – \alpha ) z_A + \alpha z_B \\
z_{DC} &=& ( 1 – \alpha ) z_D + \alpha z_C
\end{eqnarray*}

Now we can calculate u, v by interpolating along the line between those two points.

$ \vec{u_{\alpha}v_{\beta}} = \frac{ (1 – \beta ) \frac{ \vec{AB_\alpha} }{ z_{AB} } + \beta \frac{ \vec{DC_\alpha} }{ z_{DC} } }{ (1 – \beta ) \frac{ 1 }{ z_{AB} } + \beta \frac{ 1 }{ z_{DC} } }$

UV coordinate calculation

This just leaves one part left: how do we use these coordinates to render our image? And how do we actually render the image efficiently? This will be revealed in the next part coming soon…


  1. Picture taken by Michael Wilson. Used under CC BY-NC-ND 2.0 license