top of page

Diving into the Intricate of Robotic Kinematics and Coordinate Frames!- Introduction to Robotics 🚀


Introduction:

Two of the most important mathematical representations are vectors and matrices from linear algebra. Vectors are often representations of positions or directions in two or three dimensions of space, but can also represent other quantities like sensor measurements. Matrices are representations of how representations change, either through an action, or even through a change in how those numbers are interpreted. We will be using them liberally throughout the book, and they appear in almost every subject of robotics. Hence, they must be mastered to get anywhere beyond a superficial understanding of the material.


1. Vectors and coordinates :

Vectors extend concepts that are familiar to us from working with real numbers ℝ� to other spaces of interest. They also succinctly represent collections of real numbers that have a common meaning like position or direction, or readings from a signal taken at a given time. They make mathematical expressions more compact, which helps us wrap our heads around more difficult concepts.

Most often, 𝑛-dimensional Euclidean spaces ℝ𝑛 is used, in which a vector is simply a tuple of 𝑛 real numbers. The "list of numbers" interpretation is the most common way that vectors are conceived of by engineers and computer scientists, and that is certainly how they are stored and operated upon. Let us call this the "layman's definition" of vectors. However, it is often important to realize that these numbers are just an interpretation of a more abstract essential concept -- the underlying physical meaning -- and the numbers will change depending on their manner of interpretation, such as a chosen frame of reference. This section will present common operations in 2D and 3D, and follow it with a discussion about the importance of separating meaning from representation.

1.1 2D coordinate frames:

In the "layman's definition", an 𝑛-dimensional vector 𝐱 is a tuple of real numbers 𝐱=(𝑥1,…,𝑥𝑛)∈ℝ𝑛. For now, we will work in ℝ2. We will use boldface notation only temporarily to help distinguish between vectors and real numbers. In the future, the boldface will typically be dropped.

A 2D position 𝑃 is represented by a 2-element vector 𝐩=(𝑝𝑥,𝑝𝑦) that gives its coordinates relative to axis directions 𝑋 and 𝑌, offset from a position 𝑂 where the axes cross, called the origin We will also represent vectors in column vector form

for use in matrix-vector products. Both parenthetical and column vector notations are equivalent and interchangeable.

Figure 1. A point 𝑃 in the plane (a) has no numerical representation until we define a reference coordinate frame (b), which has origin point 𝑂 and orthogonal coordinate axes 𝑋 and 𝑌. Its coordinates 𝐩=(𝑝𝑥,𝑝𝑦) are respectively the extents of 𝑃 along 𝑋 and 𝑌 from the origin (c).

The items 𝑂, 𝑋, and 𝑌 define the coordinate frame in which the coordinates are interpreted. Here 𝑂 is an arbitrary position in space, and 𝑋 and 𝑌 are orthogonal directions with 𝑌 rotated 90∘90∘ counter-clockwise from 𝑋. Note that in isolation, a vector of coordinates does not define a position. A physical position is only defined by coordinates in reference to a certain coordinate frame. The frame will often be left implicit, or spoken of as the reference frame of the coordinates

1.2 3D coordinate frames:

The situation in 3D space is similar, except that we represent a 3D position 𝑃 with a 3-element vector 𝐩=(𝑝𝑥,𝑝𝑦,𝑝𝑧)that gives its coordinates relative to axes 𝑋, 𝑌, and 𝑍 and offset from an origin 𝑂 in 3D space where the axes cross. The parenthetical notation is equivalent to the column vector form:

In 3D the coordinate frame consists of the origin 𝑂 and the mutually orthogonal axes 𝑋, 𝑌, and 𝑍. In this book we will use right-handed coordinate convention in which the axes can be envisioned in the layout of the first three fingers of the right hand, suitably arranged at 90∘90∘ right angles. 𝑋 axis corresponds to the thumb, 𝑌 axis corresponds to the index finger, and 𝑍 axis corresponds to the middle finger.

1.3 Directional quantities :

Vectors are also used to represent directional quantities, such as a displacement, direction, or derivative. A displacement is a difference between points, e.g., 𝐪−𝐩 gives the amount that would need to be moved in both the 𝑋 and 𝑌 direction to move from 𝑃 to 𝑄, where 𝐪 gives the coordinates of 𝑄 relative to the same reference frame. It has both a direction and a magnitude. In contrast, a direction does not have magnitude, and is a unit vector. The direction from 𝑃 to 𝑄 is given by

In 2D, a direction can also be given as an angle 𝜃∈[0,2𝜋)rad, with the convention that the angle measures the counter-clockwise direction from the 𝑋 axis. The corresponding unit vector is (cos(𝜃),sin(𝜃)).

Figure 2. Directional quantities arise from displacements (a), directions (b), and derivatives of paths (c).

A derivative is an infinitesimal displacement. If the position 𝑃(𝑡) is a function of 𝑡, then its derivative 𝐩′(𝑡)is a vector (𝑝𝑥(𝑡),𝑝𝑦(𝑡)).

The major difference between directional and position quantities is that coordinates of directional quantities do not vary with respect to the choice of origin. However, coordinates of both positions and directions are affected by the choice of coordinate axes.


1.4 Geometric operations :

The coordinates of a point 𝐩 after translation by a displacement 𝐝 can be computed by vector addition 𝐩+𝐝. Interpolation and extrapolation between points 𝐩, 𝐪 is specified by the equation

𝐱(𝑢)=(1−𝑢)𝐩+𝑢𝐪(4)

for 𝑢∈ℝ. This equation starts at 𝐱(0)=𝐩 at 𝑢=0, and ends at 𝐱(1)=𝐪 at 𝑢=1. Extrapolation can be obtained with 𝑢<0 or 𝑢>1, as shown in Figure 3.

Figure 3. The line passing through points 𝑃 and 𝑄 can be modeled as a parametric interpolation (a) or as a plane equation (b).

The line through 𝐩 and 𝐪 can be obtained by sweeping the above interpolation / extrapolation formula across the entire range of 𝑢∈ℝ. The line segment between 𝐩 and 𝐪 is obtained by sweeping 𝑢 across the range [0,1][0,1].

An other useful definition of a line in 2D uses a point on the line and an orthogonal direction. We define 𝐩⊥=(−𝑝𝑦,𝑝𝑥) as an orthogonal direction to 𝐩=(𝑝𝑥,𝑝𝑦), which has the same magnitude but is rotated 90 clockwise. The line through the origin passing through 𝐩 can be expressed in the form of all solutions 𝐱 to the equation 𝐱⋅𝐩⊥=0. Similarly, the line through points 𝑃 and 𝑄 can be expressed as the equation

𝐱⋅(𝐩−𝐪)⊥=𝐩⋅(𝐩−𝐪)⊥

Another expression of lines is the following:

𝐱⋅𝐧=𝑐

Where 𝐧 is orthogonal to the direction of the line and 𝑐=𝐩⋅𝐧 for any point 𝐩 on the line. (Fig. 3.b.)

This definition is known as the plane equation, which generalizes lines in 2D to planes in 3D and hyperplanes in higher dimensions. Each of these is object of 𝑛−1 dimensions in an 𝑛-dimensional space, which we call a generalized plane. A unique representation for a generalized plane is 𝐱⋅𝐮=𝑏 where 𝐮 is a unit vector orthogonal to the plane known as the normal direction and 𝑏 is a nonnegative offset that determines the distance away from the origin.

2. Transformations:

Transformations are functions that map 𝑛-D vectors to other 𝑛-D vectors: 𝑇:ℝ𝑛→ℝ𝑛. They can represent geometric operations, which are caused by movement or action, as well as changes of coordinates, which are caused by changes of interpretation. Many common spatial transformations, including translations, rotations, and scaling are represented by matrix / vector operations. Changes of coordinate frames are also matrix / vector operations. As a result, transformation matrices are stored and operated on ubiquitously in robotics.

2.1 Linear transformations:


2.2 Rotations in 2D

Rotations about the origin by angle 𝜃 can be defined as linear transformations. Consider two reference frames with a common origin 𝑂, the pre-rotation axes 𝑋 and 𝑌, and the post-rotation axes 𝑋′ and 𝑌′. Depicting 𝑋′ on top of 𝑋 and 𝑌 as a line emanating from the origin, and using a little trigonometry, we shall see that 𝑋′has coordinates 𝐱′=(cos𝜃,sin𝜃). It is a bit more involved, but not much, to determine that 𝑌′ has coordinates 𝐲′=(−sin𝜃,cos𝜃).

Now consider that along with the coordinate frames, a point 𝑃 was rotated to 𝑃′. We will derive how to determine its new coordinates relative to the original reference frame. Notice that 𝑃′ still has coordinates (𝑝𝑥,𝑝𝑦)relative to the post-rotation frame 𝑋′, 𝑌′, since distances do not shrink or grow when objects are rotated. Specifically, 𝑃′ is obtained by walking 𝑝𝑥 units from the origin in the direction of 𝑋′, and then 𝑝𝑦 units in the direction of 𝑌′ (Fig. 4). Hence, to determine its coordinates in the original reference frame, we can use the fact that the coordinates of 𝑋′ and 𝑌′ are known.


Figure 4. Rotating at an angle 𝜃 about the origin to achieve a new point 𝑃′(a). To calculate the coordinates of 𝑃′ (b), we first obtain the coordinates of transformed axes 𝑋′ and 𝑌′(c,d).

A more compact and convenient way of writing this is with a matrix equation

𝐩′=𝑅(𝜃)𝐩

with the rotation matrix given by:

𝑅(𝜃)=[cos𝜃sin𝜃−sin𝜃cos𝜃].

There are several useful properties of such matrices:

  1. The matrix composition 𝑅(𝜃1)𝑅(𝜃2)=𝑅(𝜃1+𝜃2) gives the rotation matrix for the sum of the angles.

  2. The determinant det(𝑅(𝜃))=cos2𝜃+sin2𝜃 for all 𝜃.

  3. Due to the identities cos(−𝜃)=cos(𝜃) and sin(−𝜃)=−sin(𝜃), the operation of rotating about −𝜃 is equivalent to a matrix transpose: 𝑅(−𝜃)=[cos𝜃−sin𝜃sin𝜃cos𝜃]=𝑅(𝜃)𝑇.

  4. Moreover, the transpose is equivalent to the matrix inverse: 𝑅(−𝜃)=𝑅(𝜃)𝑇=𝑅(𝜃)−1. In other words, rotation matrices are orthogonal.

  5. Vector norms are invariant under rotation: ‖𝑅(𝜃)𝐱‖=‖𝐱‖.

  6. The rotation matrix is only dependent on the argument's value modulo 2𝜋.

The space of rotations is known as the special orthogonal group 𝑆𝑂(2). The reason why it is called the special orthogonal group is that it is the set of all orthogonal 2×22×2 matrices with positive determinant, while there do exist other orthogonal matrices with determinant -1.

Property 4 implies that it is more proper to consider rotation matrices as only representing instantaneous orientation rather than accumulated amounts of revolution. For example, if a motor has spun 720∘, the matrix representation is indistinguishable from the 0 rotation. In certain applications that demand reasoning about accumulated revolution, the representation 𝜃∈ is more appropriate than a matrix. More about this topic will be discussed when we cover topological spaces.

2.3 Rotations in 3D

In 3D, rotations can also be defined as linear transformations, although parameterizing them is not as simple as in 2D. 3D rotation representations will discussed in further detail, but for now let us describe some of their properties.

A rotation in 3D can be represented by a matrix equation

𝐩′=𝑅𝐩

with 𝑅 a 3×33×3 rotation matrix.

Figure 5. A 3D rotation is encoded by a 3×33×3 matrix whose columns give the coordinates of the rotated axes relative to the original axes.

This interpretation is useful when determining the coordinates of a rotated point in the original reference frame: if the point is given by coordinates 𝐩=(𝑝𝑥,𝑝𝑦,𝑝𝑧) such that 𝑃−𝑂=𝑝𝑥𝑋+𝑝𝑦𝑌+𝑝𝑧𝑍, then the new coordinates of 𝑃 relative to the old reference frame are given by 𝑅𝐩.

2.4 Scaling

Axis-aligned scaling in 2D about the origin can be represented as a linear transform with matrix.

where 𝑠𝑥 is the scaling about the 𝑋 direction and 𝑠𝑦 is the scaling about the 𝑌 direction. If 𝑠𝑥=𝑠𝑦 this is known as a uniform scaling.

This definition can be generalized to 𝑛-D space using an 𝑛-D scaling vector 𝐬 which determines the scaling in each direction by mapping to the diagonal of an 𝑛×𝑛 matrix:

𝑆(𝐬)=𝑑𝑖𝑎𝑔(𝐬).


2.5 Compositions of linear transformations

When performing two linear transformations one after another, the results are determined via matrix multiplication. Suppose that 𝑇1(𝐱) and 𝑇2(𝐱) are both linear transformations with matrices 𝐴 and 𝐵, respectively. When performing the operation of 𝑇2 first to obtain 𝐲=𝑇2(𝐱), then performing 𝑇1 to obtain 𝐳=𝑇1(𝐲), the ultimate result is:

𝐳=𝑇1(𝑇2(𝐱))

where it should be noted that 𝑇1appears first in the equation even though it is performed after 𝑇2. Expanding this into matrix products,

𝐳=𝐴𝐵𝐱

holding for all values of 𝐱. As a result, the function composition 𝑇1∘𝑇2 is also a linear transformation with matrix 𝐴𝐵.

Using composition we can derive other useful transformations, like scaling not aligned to an axis. Suppose we wish to scale some coordinates by value 𝑠 in a direction 𝐯, where 𝐯 is a unit vector. We can think of this as first rotating by an angle 𝜃 so that the 𝑋 axis is aligned with 𝐯, then performing an axis-aligned scaling, then rotating back to the original coordinate frame:

𝐴=𝑅(−𝜃)𝑆(𝑠,1)𝑅(𝜃).

It so happens that

Note that due to the non-symmetricity of matrix multiplication, order of transformation matters: a rotation followed by a scaling is not necessarily the same as a scaling followed by a rotation. However, with a little inspection, we can derive the following symmetric compositions:

  • As a consequence of 𝑅(𝜃1)𝑅(𝜃2)=𝑅(𝜃1+𝜃2), rotation by angle 𝜃1 followed by angle 𝜃2 is symmetric: 𝑅(𝜃1)𝑅(𝜃2)=𝑅(𝜃2)𝑅(𝜃1). (Note that symmetry does not generally hold in 3D!)

  • A rotation and a uniform scaling.

  • Two axis-aligned scalings.


2.6 Rigid transformations

Rigid transformations in two dimensions have two properties:

  1. The distance between two points do not change after being transformed.

  2. In 2D, the orientation and area of any triangle does not change, and in 3D, the orientation and volume of any tetrahedron does not change.

The form of all rigid transforms is a rotation 𝑅� followed by an arbitrary translation 𝐭:

𝑇(𝐱)=𝑅𝐱+𝐭(24)(24)

Which can be thought of applying a rotation about the origin first, and then a translation second. Proving that all rigid transforms have this form will be left as an exercise.

It is also possible to interpret rigid transforms as rotation about an arbitrary point. Letting the center of rotation be denoted 𝐜, a rotation about 𝐜 can be constructed by translating a point so that 𝐜 is the origin, then rotating about the origin by some matrix 𝑅, and then translating back to the original origin. This form is:

𝑇(𝐱)=𝑅(𝐱−𝐜)+𝐜.

The parenthetical term is the translation to 𝐜 as the origin, the multiplication by 𝑅� is the rotation about the new origin, and the addition of 𝐜 is the translation back to the original origin. These two representations are related by 𝐭=𝐜−𝑅𝐜, and 𝐜=(𝐼−𝑅)−1𝐭.

The set of rigid transformations is called the special Euclidean group 𝑆𝐸(2) in 2D, and 𝑆𝐸(3) in 3D. Repeated application of rigid transformations also produce a rigid transformation. Given two rigid transforms

𝑇1(𝐱)=𝑅1𝐱+𝐭1

and

𝑇2(𝐱)=𝑅2𝐱+𝐭2

then the composite transform 𝑇1∘𝑇2 is the operation of performing 𝑇2 first, then 𝑇1. If we let 𝐲=𝑇2(𝐱), and 𝐳=𝑇1(𝐲), then we obtain:

𝐳=𝑇1(𝑇2(𝐱))=𝑅1(𝑅2𝐱+𝐭2)+𝐭1.

By the distributive property of matrix multiplication, we have

𝐳=𝑅1𝑅2𝐱+𝑅1𝐭2+𝐭1.

This is simply a rigid transform with rotation matrix 𝑅1𝑅2 and translation by (𝑅1𝐭2+𝐭1).


2.7 Inverse transformations

Not all transformations have inverses, but rotations, translations, rigid transformations, and many linear transformations do. As described before, the inverse of a rotation matrix is simply its transpose. Translations are inverted by translating in the negative direction. Linear transforms 𝐴𝐱 are invertible only if the matrix is invertible, with the inverse transformation 𝐴−1𝐱.

Rigid transformations are also invertible, and their inverse is also a rigid transformation:

𝑇−1(𝑅,𝐭)=𝑇(𝑅𝑇,−𝑅𝑇𝐭)

where 𝑇(𝑅,𝐭)(𝐱)=𝑅𝐱+𝐭. Proof of this equation will be left as an exercise.


2.8 Rigid movement

Rigid transformations are used to represent movement of rigid bodies in space. If, in 2D the origin of a body moves by translation 𝐭 in its original reference frame and rotates by angle 𝑅=𝑅(𝜃), then the transformation that converts positional coordinates from the new coordinate frame to the original coordinate frame is given by 𝑇𝑝(𝐱)=𝑅𝐱+𝐭. In other words, if 𝐱 gives the coordinates of a position 𝑃 that is attached to the body, then after moving, 𝑃 will have coordinates 𝑇𝑝(𝐱) relative to the original body's frame. However, the transformation of directional coordinates will simply be a rotation and ignore translation: 𝑇𝑑(𝐯)=𝑅𝐯. In other words, if 𝐯 gives the coordinates of a directional quantity 𝑉 that is attached to the body (such as the direction of a line attached to the body), then 𝑉 will have coordinates 𝑇𝑑(𝐯) relative to the original coordinate frame (as in other directional quantities, these are interpreted ignoring the origin).


2.9 Representation of coordinate frames and coordinate transforms

Coordinate frames, as well as conversions between them, are interpreted as rigid transformations.

Any 2D coordinate frame 𝐹 with origin 𝑂 and axes 𝑋 and 𝑌 may be represented by the coordinates of 𝑂, 𝑋, and 𝑌 in some privileged world frame. If 𝑂 has coordinates 𝐭, and 𝑋 and 𝑌 have (directional) coordinates 𝐱=(𝑥1,𝑥2) and 𝐲=(𝑦1,𝑦2) relative to 𝑊, then the world coordinates of any point 𝑃 such that 𝐩 is its coordinates in the frame 𝐹 can be calculated by the rigid transform

because 𝑋 and 𝑌 are both orthogonal and 𝑌 is 90 ccw from 𝑋.) The information stored for a 3D coordinate frame is similarly a rotation matrix 𝑅 and origin coordinates 𝐭. This operation is known as the coordinate transform 𝐴→𝑊 with 𝐴 the source frame and 𝑊 the target frame . We can also perform the reverse coordinate transform from 𝑊→𝐴 by applying the inverse transform. Changes of coordinate frames can also be represented in terms of rigid transforms. Suppose 𝐴 and 𝐵 are two coordinate frames, where 𝐴 is represented with respect to the world frame by a rotation matrix 𝑅𝐴 and translation 𝐭𝐴, and 𝐵 is represented by 𝑅𝐵 and 𝐭𝐵. Then given the coordinates 𝐩𝐴 of some point 𝑃 relative to 𝐴, we can determine 𝑃's coordinates relative to 𝐩𝐵 in two steps. First, we calculate its world coordinates:

𝐩𝑊=𝑇𝐴(𝐩𝐴)=𝑅𝐴𝐩𝐴+𝐭𝐴

And then we perform the inverse of 𝐵 coordinates to world coordinate to obtain its coordinates with respect to 𝐵:

𝐩𝐵=𝑇−1𝐵(𝐩𝑊)=𝑅𝑇𝐵(𝐩𝑊−𝐭𝐵).

This transform can be calculated for all points by the composition of the transform from 𝐴→𝑊 and then 𝑊→𝐵: 𝐩𝐵=𝑇−1𝐵(𝑇𝐴(𝐩𝐴))=𝑅𝑇𝐵𝑅𝐴𝐩𝐴+𝑅𝑇𝐵(𝐭𝐴−𝐭𝐵).

2.10 Homogeneous coordinate representations

Homogeneous coordinates gives a convenient representation of rigid transforms as linear transforms on an expanded space. Moreover, it compactly represents the distinction between positional and directional quantities. The idea is to augment every point with an additional homogeneous coordinate, which is 1 if it is positional and 0 if it is directional. This operation is denoted with the hat operator ^.

For 2D points and directions, we have:

where the original transformation 𝑇(𝐱)=𝑅(𝜃)𝐱+𝐭 is a rotation about angle 𝜃 followed by a translation of vector 𝐭=(𝑡𝑥,𝑡𝑦).

In 3D, the hat operator adds a 4th coordinate: For 2D points and directions, we have

Note that when applied to homogeneous positions, the rigid transform is applied to the first two coordinates of the vector while the homogeneous coordinate remains 1 (since the dot product of a position representation with the last row of the matrix is 1). Also, when applied to homogeneous directions, only the rotation is applied to the first two coordinates of the vector, since the third 0 coordinate nullifies the effect of the third column. The homogeneous coordinate remains 0, since the dot product of a position representation with the last row of the matrix is 0.

The nice thing about this representation is that transform application is a matrix-vector multiply, transform composition is a matrix-matrix multiply, and transform inversion is a matrix inversion. This makes it much easier to write out complex transformations. For example, consider the problem of the coordinate transform from frame 𝐴 to frame 𝐵 that we described above. Rather than writing out the operator expression

𝑇−1𝐵(𝑇𝐴(𝐩𝐴)), using homogeneous coordinates this becomes a series of matrix-matrix and matrix-vector multiplies:

𝑝̂ 𝐵=𝑇̂ −1𝐵⋅𝑇̂ 𝐴⋅𝑝̂ 𝐴.


3. Summary

Key takeaways:

  • Coordinates are numerical representations of geometric concepts, like points, directions, frames of reference, and movement.

  • Points, directions, and displacements in 𝑛-dimensional space are represented by 𝑛-dimensional vectors, while rotations and scalings are represented by 𝑛×𝑛 matrices.

  • Rigid transformations consist of a rotation followed by a translation. They represent both rigid body movement and changes of coordinate frame.

  • Homogeneous coordinates represent rigid transforms using matrix multiplication in an 𝑛+1 dimensional space where the last coordinate is either 0 or 1.

  • When working with coordinates it is easy to make mistakes. Having clear assumptions, clear notation and/or using coordinate management software can reduce the risk of error.


Until next time, keep your eyes on the horizon as we venture into new frontiers in the mesmerizing realm of robotics.




13 views0 comments

Comments


bottom of page