The main algorithms are running on the laptop and through a USB to Serial connection the robot microcontroller is interconnected with them. The project is split into the following sections:
• Mapping and localisation in an unknown environment (Robot Perception)
The robot is using the RGB-D camera for both odometry estimation and as a range finder.
• 3D recognition of a pre-specified object (without using colour data) (Robot Perception)
The user cycles through the objects, selects the desired object and records the signature of the object.
• Build a tracked mobile robot that will be able to carry the central processing unit, batteries, sensors (Mobile robotics)
The robot is connected to the laptop via a USB-Serial connection. The robot is also is the laptop’s carrier.
• Use the map and the recognition capabilities to place any objects on the global map in real-time. (Global system integration)
• Plan the route from point A to B using the generated map, avoiding any obstacles. (Global system integration)
As a whole, the system consists of software running on the laptop that is connected to a microcontroller via USB that is then directly connected to the robot’s hardware such as motor drivers, LEDs, current monitoring, etc. The laptop is running multiple programs, called nodes that are interconnected using a message passing library and tools called “Robot Operating System”. ROS allows multiple programs to communicate and share information using names such “/camera_colour_stream” or “/robot/motor/right/current” called topics. ROS allows building systems that exhibit complex behaviour and favours hierarchical design. The interconnected algorithms are publishing to and/or receiving messages from the microcontroller using a USB to serial bridge. The microcontroller takes care of the robot’s motor speeds, current, LEDs, fail-safe mechanisms and most importantly, package interactions with ROS. The radio control transmitter is used to control the robot when in the manual or head-less mode.
A high level architecture diagram is shown below.
The nodes that build-up the whole robot system were designed with modularity in mind. Each node has a distinct name and carries out a specific task. Each node runs in isolation and publishes and/or subscribes to messages using topics. Topics have names such “/robot/LED/red” of a specific type. In case of a node crashing the rest of the system will still be running. However, each node has to cater for the unexpected and pack fail-safe features that will prevent unexpected behaviour.
The mapping functionality is one of the core tasks the robot has to execute. Two algorithms were tested for mapping. One was a ready implementation following the algorithm by Rao and Blackwell  and the other was programmed from scratch. The second approach was the one used for the demonstration and it is the one described in this thesis. The Rao’s and Blackwell’s approach and techniques were tested to provide a form of evaluation and performance assessment.
The developed mapper can be run simultaneously with the object recognition node or on its own. In case that it runs along with the recognition then the mapper will automatically pick up any recognised objects and place them too on the map. Mapping is a process that requires odometry information to be provided. When the robot faces an object (wall, furniture) the location relative to the robot is known but in order for the map to be consistent, the object has to be translated to the global frame. The mapper node accepts the robot odometry from the transformation tree under the topic (“/tf”). The TF tree is maintained by various nodes that each update their piece of the tree. For example, the odometry node updates the relationship between the global frame and the robot base. In addition to the TF tree, the mapper also subscribes to the camera driver to get the point-cloud data. It also listens for an object pose; in case of the object recognition is running. The output of this node is a “/map” topic that contains the map array along with various meta-data such as resolution, offsets, orientation and timestamps.
Figure 20 Mapping node accepting odometry, pointcloud and optionally the object's pose to produce a map
The transformation tree is carrying all the relations between each part of the system. This makes it easy to visualise data hierarchically but also access odometry data and robot link details
Figure 21 Simple transformation tree
Each frame of the TF tree has a parent and a child. A dynamic link relation changes as the system operates (odometry of the robot relative to the start [0, 0, 0]). A static link describes the properties of the robot e.g. the robot base with camera relationship is shown below.
Figure 22 How a rigid link is defined as part of the transformation tree (not to scale, edited from )
A TF link relation is represented by various ways. The most common throughout the system is the use of a translation (X, Y, Z) and a rotation offset as a quaternion (X, Y, Z, W). In case of the above example in Figure 22, the robot_base and robot_camera only have a translation difference. Therefore the translation will be (X: 0.4, Y: 0.0, Z: 0.2) and the rotation quaternion would result in (X: 0.0, Y: 0.0, Z: 0.0, W: 1.0).
Once the TF tree is received along with the pointcloud data, the call-back functions in the mapper allow the processing to begin registering the points in space. To create a 2D map we need to get 2D scan of all the distances away from the robot. Normally this would require be a range finder. However since we have the RGB-D camera we instantly have up to 300,000 depth points to our disposal both in the width and height of the scene. Not all these points are necessary for the task of registering and avoiding obstacles and therefore only the points that match the robot’s footprint are used. By overlaying many pointcloud horizontal slices up to the height of the robot we can reduce the data needed down to a height of one point. Using superimposed slices also allows us to mark complicated obstacles like a table. A table that has four legs might be considered as four small obstacles if the robot is small. A big robot that cannot go under the table will mark the whole table as a unified obstacle. The mapping process is very similar to plane rotation using a rotation matrix. The matrix in this case is the odometry matrix that describes the translation and rotation of the robot from the initial 6-D pose. Any point that is received from the camera is paired with the odometry at time t and translated to the global frame. The result is a map that is consistent regardless of the robot’s position and orientation. This process works well, as demonstrated, and allows for maps to be created in 2D but also in 3D.
Figure 23 Translating all points from the pointcloud to the global frame results in a 3D map of this corridor
As shown in Figure 23, apart from the 2D map we can map all the points of the pointcloud to produce a better looking 3D map. However a 3D map results to extra, unnecessary processing and therefore not used by the robot for navigation. A 3D map is also prone to an extra dimension of error drifts (translational height drift and rotational roll and pitch drift).
Figure 24 From pointcloud data to a 2D map, mapping process viewed in RVIZ
Figure 24 shows a coloured line which is a 2D combination of pointcloud slices that are merged together to produce an output similar to a laser scanner. The colours of the scan are affected and change depending on the distance from the camera.
The map is represented as an occupancy grid in memory and published under the topic “/map” in ROS. An occupancy grid is simply an array or vector that holds a value for each cell. This value is the probability of the cell being occupied. In case of an obstacle, the value ranges from one to one hundred. If there isn’t an obstacle it’s a zero and if unknown it’s minus one. The map is very large in size due to the choice in resolution. The resolution used was 1cm2 that results in an array of hundreds of MB of data. This is where the ROS visualisation software (RVIZ) is useful to visualize large chunks of data streams.
A map example is shown below.
Figure 25 Map of a 10x10m environment with furniture
There have been some issues with the performance of the mapper. Due to its nature, the mapper heavily depends on accurate odometry estimates so that the distances are correctly registered and aligned. However this is not the case since error drifts are always present, especially when there are vibrations from the robot’s movement.
In the literature, many techniques exist that use an odometry feedback loop. It is used to assist and offset the odometry source, to correct drifts. The most common approaches are highlighted below.
Scan matching for mapping
Scan matching is a technique used to improve mapping accuracy by adding offsets to the odometry as mentioned by Grisetti . The robot observes the walls and if there is a slight offset it tries to match the two scans and feed their difference back to the odometry frame. The diagram below shows how the scan-matching interacts with the transformation tree.
Figure 26 TF tree with the addition of the scan-matching algorithm
The scan matching algorithm as discussed in the literature is a very simple and powerful concept that prevents maps from becoming corrupted and globally inconsistent.
However this technique is mostly used in laser scanners where the viewing angle is around 240 degrees. Performing scan matching on the RGB-D camera’s 57 degree viewing angle, results in unexpected results due to the absence of enough data and details to perform the marching.
Rao-Blackwellized SLAM comparison
As mentioned, to be able to properly test the custom implementation of the mapper, a standard mature library was used to test it. The library was the “Gmapping” library that uses Rao-Blackwellized particle filters to create a map. The library uses scan matching along with particle filters that each holds the map instance. The performance was slightly better but due to the narrow viewing angle of the sensor, the scan matching was sometimes mistakenly confusing possible matches. This resulted into offsetting the odometry without the need to do so.
If the viewing angle was wider, then the scan matcher would make a significant difference in improving the global consistency of the map.
The sensor could spin or two RGB-D cameras could be used together resulting in a far better, detailed scan, which would increase matching accuracy. The latter was tested in early stages but two RGB-D cameras reached the maximum USB motherboard bus limit. The devices stream data at hundreds of MB/s making difficult to use both while having adequate processing power left to process the data.
3D Object recognition
The 3D object recognition or depth based recognition is the second task that the robot is performing. An object is placed in front of the robot, the robot records the signature of the object and it can then detect the object and place in on its map. The object signature recorded is relying on the object’s curvature and shape information rather than the traditional colour data. This node also accepts the motor values from the system. This helps the algorithm know when the robot is moving and therefore pause the recognition by skipping frames to avoid noisy data. In addition, the algorithm features timers that help the logger node to collect performance information.
Before any recognition algorithm is applied to the scene, the pointcloud sample must first undergo many steps that will result in isolation of the points of the object. Since colour is not used, the algorithm has to extract and isolate the points that are representing the object. The pointcloud library offers many existing functions that can be used to filter and process the scene appropriately. The first step is to extract the points (cluster) that represent the possible objects in the scene.
Camera driver > Pointcloud > Filters > Plane extraction > Cluster extraction >
Then we process each cluster and apply the algorithm of choice on each.
> Extract signature of each cluster > Compare signatures against initial recording
Filtering a pointcloud is necessary in order to remove noise and unwanted data from the scene. Filtering noise can be in the form of removing not-a-numbers (NaNs), rejecting outliers or even using a pass-through filter to crop the minimum and maximum limits of the scene. On the other hand, extra filtering might be applied to down sample the pointcloud to reduce computation load. The process of reducing the resolution of a pointcloud is done using a voxel filter. A typical pointcloud of a 640x480 resolution has 307200 points that all carry colour (not used) and depth information at thirty frames per second. The rate that the sensor is producing all these data is in the order of hundreds of megabytes per second and is enough to slow down any algorithm that is listening to the stream.
Figure 27 Scene of a table and a carton milk box. Left original, right down sampled. 
A voxel filter is used to sample down the pointcloud data. It does so by creating a grid of 3D pixels and volumetrically reducing the number of them that occupy the space. The voxel grid filter takes a leaf size as the only parameter. The leaf size indicates the amount of down sampling that takes place.
A pass-through filter crops the point-cloud’s minimum and maximum distance. It accepts two values, one for the minimum and one for the maximum limit. The set values are 0.35 to 3.2 metres.
Following the filtering and pre-conditioning of the cloud the next step is to remove the planes. Extraction of planes helps create clusters of the objects on a table without the points representing the table itself.
Figure 28 Left: 2D snapshot of the scene in the pointcloud (right). In green colour are the points that belong on a plane, in blue are the clustered possible objects, yellow is the detected object (algorithm was set to sphere recognition) and in red is the current cluster that is in the scope and being scanned.
Plane extraction works by checking if the XYZ coordinates of a set of points are on a plane. The parameters such as how big the plane should be to be extracted and other tolerances are set to achieve best results for distances of one to five meters.
Distance Threshold – The distance of a point that can be still considered a plane Set to 0.06 metre.
Max iterations – The amount of iterations that the loop will execute to scan for planes. Set to 60.
Loop stop – Loop stops when the 30% of the cloud remains.
Extraction of the planes is a necessary step to isolate the points representing the object from the points representing the floor.
Euclidean cluster extraction
Cluster segmentation is used as the last step to get a point cloud that represents a possible object. Before this step, all the possible objects in the scene are all in one pointcloud. Our goal is to isolate each cluster into its own set of points.
The parameters related to the algorithm are:
Cluster Tolerance – A distance specified in centimetres that two points can be far but still regarded as the same object. Set to 0.04 metres.
Min Cluster Size – Minimum cluster size in points. Set to 40.
Max Cluster Size – Maximum cluster size in points. Set to 600.
Viewpoint feature histogram (VFH) signature extraction
After the cluster is separated from the scene and looks partially like the images below, it is passed through a viewpoint feature histogram extraction algorithm. The algorithm produces a histogram of 308 bins that its rotation and scale invariant and represents the object’s curvature characteristics.
Figure 29 A pointcloud representing a mug. This pointcloud was rendered complete by combining many pointclouds from more than one shots and angles. 
The operation of the VFH algorithm relies on the surface normals (surface curvature) properties of the object. The angles of the surface translated to the centre of the set of points are plotted as a histogram.
Figure 30 Angles of the normals translated to the centroid of the object
Figure 31 Viewpoint histogram of a mug
The only parameter for VFH is the neighbour search radius that is set to 0.05 metres.
Object pose filtering (weights)
Once the pose of the object is published by the recognition node, it has to be filtered. The emitted pose is raw and is subject to hysteresis when the robot oscillates between two objects. To accommodate this we pass the pose through a low pass filter that uses weights on each possible object. Assigning weights to the possible object coordinates helps build a belief system that is immune to noise but can still deal with similar objects.
Object placement in global map
The objects that match are also translated in the global map by multiplying with the odometry matrix of the robot. Dealing with the object coordinates in the global map helps take the issue of relative and absolute location out of the table. Once we see a local object relative to the robot we first translate it to the global frame and then compare if an object was seen there before.
Figure 32 Placing objects on the global map. On the right there are green and red markers. The colour indicates the belief of the robot and it’s related to the weight allocated to the object marker
Point count degrades with distance
Although the steps of processing a pointcloud from the start to finish are parameterized, there are some factors that limit the range of performance of the algorithms. At first there are the hardware limits of the sensor. The minimum range is 0.4 metres and 10 metres maximum. However above 4 meters the output becomes noisier. The next and biggest limitation is that, even though the structured light of the sensor can reach 10 meters, it does not have adequate resolution of points to represent an object at that distance. An average object (10cx20cm) at 0.4 metres can have 250-400 points that represent it but above 2 meters that number drops to usually 50 points.
The direct effect of the distance causes the algorithm’s performance to drop significantly at distances greater 2-3 metres. This also adds to the fact that the parameters of the algorithms such as plane and cluster extraction have to accept a variety of object sizes. Accepting sizes for objects below a certain threshold to accommodate this will mostly result in noisier data since most of the noise comes as small sets of random points.
The number of different parameters is also a tedious process where every parameter of each stage has the potential of breaking the next processing stage.
The most important parameters of each stage are listed below.
-Voxel - Leaf size
-Pass-through - Filter limits (min & max)
Min Cluster Size
Max Cluster Size
Viewpoint Feature Histogram
Neighbour search radius
In case of a change in the Voxel’s filter leaf size, the point-cloud might not go through the next stages due to the minimum cluster size. Changing the minimum cluster size would allow objects with fewer points (therefore less information) to be regarded as potential objects. This will make the algorithm pick up more noise. This shows how optimising for speed would induce more noise in the system.
The path planning functionality was added to enhance the interaction between the algorithms and the robot platform itself. The previous two developed algorithms, mapping and object recognition, could easily be self-contained in a terminal without the need of a robot. The aim of this added functionality is to have the algorithm command the robot to move on a path, calculated from the map that in turn was generated by the mapper node. Path planning was not included in the initial objective list, in the beginning of the project but was later on added to enhance the algorithms-robot interaction. The selected way to solve the path planning problem was using the flood fill algorithm. To realise the algorithms, two nodes were developed, one to calculate the path using the map and the second to drive the robot.
Figure 33 Block diagrams of the path and mission planner nodes
Flood fill works on maps, mazes and generally any form of data that can be chopped and organised into cells. The map of the environment is itself split into 1cm2 cells. This means that a 10x10 meter map will have one million cells. Iterating through data of this magnitude has a severe effect on responsiveness and robot’s ability to continuously generate a path on the fly.
Since the map obstacles are represented with numbers (1-100), the algorithm floods the cells that are zero (not occupied) and minus one (not visited). There is no need to differentiate between an explored and a not explored cell because the robot’s strategy is to plan like there are no walls until seen.
Figure 34 How the map is stored in memory. Each 1cm2 of the environment is an array element.
The target is first specified to the robot by the user. The goal pose is published in ROS under the topic “/goal_pose”. It is then received by the mapper along with the robot’s current position from the “/tf” topic. The flooding routine is started, once the start and goal positions are received. The algorithm starting from the goal cell increments each neighbouring cell by one. The process gets halted if the start cell is reached or if a regional checkpoint is reached. A checkpoint is placed every metre that is scanned, without a solution. At first the algorithm starts with one meter radius checking if a solution exists and then resets and starts again but at two meters. The reason for this is to avoid searching and iterating through a radius bigger than needed. Iterating through unnecessary radii is very costly causing each of the iterations to slow down within the loop resulting in an exponential increase in duration.
Figure 35 Cell increment away from the goal
Figure 36 The tracing of the path generated acts like water flowing through the path of least resistance
Once all the cells are flooded and allocated a number, tracing and generating the path becomes a simple procedure. The tracer would just follow the neighbour with the least number, just as water would flow through the path of least resistance.
While the planning algorithm produces a valid path when given a map, it does not take into account the robot dimensions. This results into potential collisions by driving very close to walls and obstacles. In order to provide enough clearance for the robot a simple technique called obstacle inflation is used. Before the flood fill algorithm begins, the obstacle inflation is called. The routine gets the position of the robot and uses that to inflate only what’s needed to reach the goal. Again selective processing techniques are used to minimise processing times.
Figure 37 Dynamic obstacle inflation is applied only in the region around the robot
Path tracing by the robot mission planner
After the path is ready it is published under the name “/goal_path” of type <nav_msgs::Path>. The mission planner subscribes to the path message which is a collection of pathway points and a timestamp. The robot mission node, as shown in Figure 33, scans the path message and finds which point is closer to the robot. From that point and on, the robot is commanded to reach the next point in the path. The resolution of the path array is very fine that multiple points fall within the robot’s footprint. In this case the algorithm sets the local target goal to a few points ahead instead of to the absolute next path point. Since the robot is of a differential drive type, simple trigonometry is used to keep the robot on track along the path way-points.
Processing speed of the algorithm is key to achieve real-time performance. The algorithm must conclude with a path fast enough so that the robot can have the green light to continue and preferably without stopping. A combination of a hybrid tree structure was developed to make things faster. The algorithm starts exploring a solution from point A to B. It first sets the width and height to plus/minus one meter. One metre with a resolution of one centimetre squared results in an array of 100x100. For the complete path generation the array is also multiplied once more with its size. The total of average cycles for one metre is 100x100x100 resulting in one million iterations roughly.
If a solution is not found then the algorithm increases the range of its search radius. A two metre radius would result in eight million iterations which clearly show that it is more efficient to start with a minimal radius and the increment once at a time.
With this optimisation the path generation takes from less than a second up to ten seconds compared to the previous version that always needed ten seconds.
The microcontroller software is responsible for a number of tasks. Once booted the microcontroller initialises the driver along with the rest of the hardware. The first two seconds of boot-up the microcontroller also waits for a button press. If pressed it allows the robot to go into headless mode allowing for any mode to be selected using the push button. Next it listens for a serial connection with the laptop while looping through the rest tasks. Connected or not the robot will execute the same tasks, the same way every time. If there is no connection between the laptop or the user did not specify then the hard-coded defaults are be used.
The main tasks of the microcontroller are:
• Establish connection with the laptop
• Read current sensors
• Read R/C signals
• Subscribe via callbacks to all the topics starting with “/robot/* ” if connection is made
• Publish all data in ROS via serial if connection is made
• Cap the motor values to the specified value
• Update the motor values
A block diagram representation of the microcontroller interactions is shown below.
Figure 47 Abstract level flowchart, shows the topics that are interconnected with the robot microcontroller.