ENGINEERING

Alphabet’s Waymo: A Technical Study on Autonomous Vehicle Tech

A literature research paper I wrote in Imperial College covering a high-level overview on autonomous vehicles and an in-depth analysis of Waymo technologies, dated 13 December 2019.

44 min readAug 18, 2021

Abstract

A study is conducted on the existing systems implemented by Waymo in their driverless cars. Due to the dearth of published material on autonomous technology developed by Waymo, it should be noted that a large proportion of technical information is derived from patents — where provided details are often ambiguous to safeguard intellectual property.

The three key sensor types used by Waymo are Light Detection and Ranging (LiDAR) systems, Radio Detection and Ranging (RADAR) systems and image sensors. Each sensor type has its individual strengths and weaknesses, and work to complement each other: Image sensors facilitate object recognition but have limited application in determining distance. LiDAR systems provide accurate distance ranging but are affected by environmental objects and meteorological conditions. RADAR systems are relatively less accurate as compared to LiDAR but work well in the presence of noise generated by rain or snow.

Autonomous vehicles are fundamentally built on neural networks, each of which performs a specific function. Networks required are categorised according to their function — such as identification, prediction or control based on contextual information. A key method used by Waymo to streamline decision making when interpreting the environment is semantic segmentation. A segmentation network clusters objects of the same type to reduce computational load on decision-making algorithms. Another crucial necessity of a neural network is its ability to improve, which is effectively shown through population-based training and neural architecture search methods. The former involves an evolutionary-based training environment where under-performing networks are eliminated and replaced with better-performing ones, while the latter creates neural networks by generating building blocks with automated machine learning.

It is concluded that non-trivial situations where Waymo’s autonomous vehicles cannot perform adequately still exist. These situations can be caused by external interference, such as bad weather, reflective surfaces or sabotage. Satisfactory performance of an autonomous vehicle in isolation is just the first step towards integration; designing operational frameworks, reaching agreements with regulators and gaining the confidence of society is necessary for widespread implementation of driverless vehicles.

1 Introduction

2 Present State of Waymo

3 Research & Development

3.1 Hardware

3.1.1 Sensors

3.1.2 Computing System

3.1.3 Embedded Control

3.2 Driver Intelligence

3.2.1 Pre-processing

3.2.2 Identifying

3.2.3 Predicting

3.2.4 Control

3.2.5 Improving Cognition

4 Conclusion

References

1 Introduction

Although the conceptualisation of Autonomous Vehicles (AVs) can be traced back to the 1940s [2], it was not until the early 2000s when the Defence Advanced Research Projects Agency (DARPA) Grand and Urban Challenges instigated the embodiment of AVs. The competition galvanised brilliant individuals to assemble teams and tackle a problem that was deemed impossible for its complexity. The participants of the DARPA challenges laid the cornerstone for autonomous vehicles, and many proceeded to engage in leading projects [3]. Despite initial doubt on the feasibility of AVs within the 21st century, technological breakthroughs in software and hardware have driven many companies to increase expenditure and advance their position in the industry. Consequently, a full study of the technological demands of AVs is critical in remaining germane.

Of the 6 levels of vehicle autonomy defined by the Society of Automotive Engineers (SAE), this paper is focused on the highest states — Levels 4 and 5. Studies have shown that semi-autonomous vehicles and Advanced Driver Assist System (ADAS) have less practical appeal, engendering higher risks as drivers are coaxed into inattention. Only with full autonomy do economical and safety benefits become compelling enough for commercialisation [3].

Due to the sheer breadth of the AV industry, an attempt to analyse all associated members within a paper of this length will lead to inadequate detail. Consequently, the following sections will display a deliberate bias towards Waymo. Although parochial in scope, limiting discussions to a single company enables fastidious analysis of key aspects which are critical for comprehension. Moreover, as the next section will show, Waymo’s remains eminent amongst its competitors — mitigating shortcomings associated with neglecting other companies.

Consequently, the five key objectives this paper aims to address are:

Reporting the current state of Waymo in the market,
Identifying the driving and resisting factors affecting consumer demand for AVs,
Analysing the types of hardware used in Waymo vehicles,
Analysing the software capabilities of Waymo vehicles, and
Discussing technological advances required.

2 Present State of Waymo

Developing AVs requires copious financing due to the integration of cutting-edge technology; only industry leaders with abundant cash reserves are capable of making the leap towards AVs. Consequently, the organisations engaged in autonomy can be shortlisted from the numerous automotive brand names. In the first quarter of 2019, Navigant Research conducted a study on companies developing AVs, and produced a ranked leader-board grid based on execution and strategy (see Figure 1), with Waymo named as the leading company [4].

Figure 1: Navigant Research Leaderboard - Automated Driving Vehicles [4]

Waymo has been developing AV technology since 2009, seeking to build the world’s most experienced driver [5]. As of September 2019, the team has attained 16 million kilometres in autonomous driving on public roads, and 16 billion kilometres in simulation. Their driverless system has undergone five major iterations; from the driverless car project, Autopilot, Firefly, Pacifica minivan, to the upcoming I-PACE [5]. The I-PACE is planned for launch in 2019 within Arizona, through a collaboration with Jaguar Land Rover. Moreover, Waymo has also entered a partnership with Lyft [6] and is looking to expand globally into Japan and France through a partnership with Nissan and Renault respectively [7]. In December 2018, Waymo gained the first-mover advantage by releasing the Pacifica minivan, a fully functional SAE Level 4 AV accessible through the Waymo One ride-sharing mobile application [8]. A common industry measure of performance is the number of miles the AV can autonomously navigate without the need for a driver to resume control. In 2018 alone, Waymo built a significant lead with a record of over eleven thousand miles per disengagement [9]; more than twice the number reported by General Motors (GM) Cruise, and by multiple magnitudes as compared to other companies [10].

The driving force behind AV investments can be attributed to three key benefits:

Consumer benefits: The consumer benefits associated with AVs are numerous; waiting for parking becomes irrelevant, time is freed during the commute, and vehicle interiors can be configured for other purposes. Accenture showed in 2019 that nearly half the drivers in America, Europe and China were amenable to replacing their vehicles with autonomous solutions [11]. The same study also showed that respondents were willing to relocate outside cities if their daily commute was facilitated by driverless vehicles. Additionally, AVs can ease transportation for the fifth of adults who are incapable of driving due to eyesight deterioration or other impairments [11].
Environmental benefits: Automotive vehicles are extremely energy inefficient, which is exacerbated by a decrease in the number of passengers. For a typical vehicle, over 70% of the energy obtained is used to power auxiliary components, while less than 30% is utilised to move the vehicle. Furthermore, only 5% of this is used to move the individual passengers, which is equivalent to 1.5% of the total energy used per passenger [3]. Implementing AVs opens possibilities for different seat configurations, and with more passengers come increased fuel efficiency and reduced overall emissions.
Safety: Statistics have shown that across the world, over 50 million people have gotten into an accident that resulted in an injury or fatality. In 90% of these accidents, human error was the leading cause [11].

These factors, along with the potential of a largely untapped market has accelerated research and development of the intricate systems which enable self-driving cars.

3 Research & Development

One aspect of AVs that this section will avoid discussing is the method of power generation and delivery; whether internal combustion engines or electric power is advantageous for autonomous vehicles. Studies have shown that electric vehicles not only reduce emissions but are more space-efficient. This provides the necessary space for hardware to be installed and simplifies feedback and control — hence the widespread use of electric vehicles for autonomy [3]. This section is composed of two main categories: hardware, which describes the necessary sensors, processing equipment, and embedded control required for operation, and software, which engenders intelligence and decision making.

3.1 Hardware

The three key categories of hardware are sensor, computing and embedded control systems. Sensors retrieve information about the environment and surroundings, which is interpreted and processed by the computing system for decision making. The ensuing information is interpreted by the embedded control system which executes steering, braking and throttling [12].

3.1.1 Sensors

Figure 2a: Top view of Waymo sensors [6, 12, 13].

As shown in Figure 2a, Waymo uses three types of sensors; image, LiDAR, and RADAR [6]. The produced analogue signals undergo Analogue-to-Digital Conversions (ADCs), achieved with a variety of comparators. Depending on the application, differential, single-ended, clocked, and non-clocked comparators are implemented by using different transistor layouts [12]. Complementary information from the different sensor types is subsequently stitched by sensor fusion (see Section 3.2.1) for a comprehensive data set, as depicted in Figure 2b.

Figure 2b: AV’s interpretation of sensor data [14].

3.1.1.1 Vision System

The 360° vision system is comprised of nine vision modules, each equipped with multiple image sensors. Each module has a wide dynamic range for adaptability in different lighting, and the forward-facing module has an exceptionally high resolution for detecting small objects from afar [6]. Each image sensor includes an array of Charge-Coupled Devices (CCDs) and other light sensors which generate current when exposed to light to produce an array of pixels [15]. In the ADC process, Single Slope Analogue-to-Digital Converters (SS-ADCs) are connected to each pixel column, enabling conversion in parallel. SS-ADCs are composed of integrators to generate a ramp reference voltage, comparators to digitise the analogue signal, and counters to scale digital output accordingly. The counter increments while the reference voltage is less than the CCD voltage, and resets when the reference and CCD voltage are equivalent [12]. As with all cameras, settings relating to exposure time and gain have to be adapted to environmental conditions to obtain favourable images. Overexposure can lead to colour saturation, distortion or white patches. Depending on ambient light and other light sources, exposure time is typically set to the order of milliseconds. This is problematic since the majority of illuminated objects are energised by a power grid operating at 60 Hertz or by Pulse Width Modulation, increasing the likelihood that the produced image is missing information. Exposure time has to be increased such that it covers the period of illumination from light sources — such as traffic lights and informational road signs — while preventing overexposure. To achieve this, Waymo uses neutral density (ND) optical filters that cut light significantly, while increasing exposure time. The captured image is also corroborated with the use of separate camera sensors that captures a sequential image for comparison and syncing of missing information [16].

3.1.1.2 Radio Detection and Ranging (RADAR) Systems

RADAR is an active ranging method used to actively estimate distances by time-of-flight; distance to radio-reflective features is determined based on the time delay between the emitted and reflected radio signal [17]. Waymo has developed a RADAR system that continuously tracks objects moving at different speeds all around the vehicle, and maintains high effectiveness in inclement weather [6]. The RADAR signals are emitted sequentially in time, causing each reflection signal to be received when the vehicle is in a different location. This method is known as Synthetic Aperture Radar (SAR) processing, where interrogation occurs from different angles, producing information with higher resolution as compared to traditional RADAR. The data is augmented with existing map information, enabling the vehicle to identify landmarks and locate itself on the map [18]. Tracking of objects in motion can also be achieved by analysing the change in frequency of the reflected signal caused by Doppler frequency shifts [17].

Components: RADAR systems typically consist of multiple antennas, each with the ability to transmit and receive electromagnetic waves. Waveguides are used to focus signals into narrow beams for higher spatial resolution and restrict receiving of signals from particular directions — thereby producing directional information [18]. The combination of range and directional information enable environmental features to be modelled [17].
Operating Modes: An adaptive algorithm is used by Waymo to configure the RADAR system’s operating modes. One configuration is used to perform a more detailed scan of an object. When the first reflected signal is processed by the algorithm and object properties cannot be identified, the algorithm determines its range and prompts the RADAR system for a secondary interrogation. The subsequent reflected signal is processed and components outside the range of interest are attenuated to only provide target relevant information (see Figure 3a). The RADAR system can also emit a time-varying frequency ramp and derive range from frequency deviations between the emitted and reflected signal [17].
Additional Features: Since RADAR detection is limited by the emitter and detector line of sight, Waymo is in the process of developing a vehicle-mounted RADAR deflector. This enhances transmission of signals into lanes where the vehicle may turn, or provide information about the environment when it is obscured by objects (see Figure 3b). The concept consists of emitting a radio signal from the main antenna on top to a RADAR deflector mounted at the front of the vehicle. The RADAR deflector can be steered to deflect a portion of the transmitted signal in the desired direction, with a beam not overlapping the original signal [20].

Figure 3a: Adaptive algorithm for RADAR interrogation [19].

Figure 3b: Vehicle-mounted RADAR deflectors [20].

3.1.1.3 Light Detection and Ranging (LiDAR) Systems

LiDAR enables accurate detection of three-dimensional shapes, allowing the car to differentiate a picture of a person from an actual person [6]. Ranging is accomplished using the same principles as RADAR, but with light waves instead of radio waves [21]. A typical LiDAR system setup is shown in Figure 4a.

Figure 4a: Cross-sectional view of LiDAR [21].

As with image sensors, LiDAR utilises SS-ADCs to convert reflected light pulses into digital data points with varying intensities. However, magnitude is not measured by light intensity, but by the time it takes for the light pulse to return to the sensor. As such, comparators are focused on capturing the leading edge of the return pulse. For example, to achieve a resolution of 1 centimetre, the comparator has to capture the return pulse within 67 picoseconds [12]. The converted data is subsequently assembled, mapping all surfaces within line of sight to create a point cloud as depicted in Figure 4b [21].

Figure 4b: Point cloud render on a highway [22].

Components: The optical system includes multiple laser light sources used to generate coherent and incoherent light pulses [23], and can be combined in sets to rapidly scan a scene [21]. Incoherent pulses are generated by a seed and pump laser in Master Oscillator Power Amplifier (MOPA)configuration to produce short pulses (2 to 4 nanoseconds) at high power (average of 15 Watts)and wavelength (1500 nanometre) over short coherence lengths. Coherent pulses are generated with a pump laser and combined with signals from a local oscillator to produce longer pulses (more than 8 milliseconds) at low power (average of 1 Watt) over longer coherence lengths. To provide a coherence length on the order of 100 metres, the local oscillator needs to be characterised by a high-quality factor. This can only be achieved with a Vertical Cavity Surface Emitting Laser (VC-SEL), external cavity Distributed Feedback Laser (DFB) laser, whispering gallery mode oscillator or telecommunication grade diode laser [23]. The receiving system includes arrays of Single Photo Avalanche Detectors (SPADs), Read Out Integrated Circuits (ROICs) and Application Specific Integration Circuits (ASICs). A SPAD operates on the principle of a p-n junction with a strong reverse bias, such that a single charge carrier generated by the photoelectric effect can trigger reverse current flow when injected into the depletion layer [24]. The ROICs and ASICs consist of operational amplifiers, sample and hold circuits, comparators and filters, and are configured to support the SPADs by amplifying and filtering the signals to produce a suitable signal [23]. The coherent signals are mixed with a local oscillator to generate a beat frequency, which is detected using a Phase-Locked Loop (PLL) or lock-in amplifier. It is then converted with an ADC and its waveform is detected digitally [23]. By placing the emitters and receivers in different orientations, 3D maps can be produced from combined orientation and distance information [21]. However, cost and power efficiency drop with an increase in emitters and sensors, even though installation at regularly spaced intervals can provide better resolutions for a given Field of View (FOV) [25].

Figure 5a: Component layout in rotating housing for LiDAR [26].

LiDAR and Component Layout: Waymo uses three types of LiDAR, a long-, medium- and short-range LiDAR. The medium- and long-range LiDAR, situated in the rotational housing on top of the vehicle, each provide an uninterrupted surround view and zooming capability for small objects over180 metres away [6]. The medium-range LiDAR is situated at the upper half of the assembly (see Figure 5a, allowing it to have a better view of objects close to the vehicle [26]. Short-range LiDAR systems are located around the vehicle, scanning FOVs which are not covered by the other two LiDARtypes [26]. Threshold distances are set to determine whether data from the short-range, mid-range or long-range LiDAR is used when obtaining target information [27]. Emitter-receiver configuration plays an important role in the resolution of the point cloud. For example, an array of emitter-receiver pairs set up to have a wide FOV may produce a wider scope, but suffers from lower resolution at longer ranges [28]. Consequently, emitter-receiver layouts in all three LiDAR types are concentrated on scanning a specific set of angles [25]. The emitters are installed within a transmitter block, with multiple facets directing each emitter along different elevation angles as shown in Figure 6a. Angular spacing between beams can be varied to achieve the desired spacing at a target range — lower angular spacing produces smaller spatial separation between data points, therefore increasing resolution at long ranges. However, this is physically limited by the die used to mould the facets and the substrate properties [25]. Each transmitter block emits a horizontally thin but vertically tall beamwidth, which is steered to provide a FOV [28]. As compared to a wide FOV which provides more information but lower angular resolution, scanning with a thin beam preserves angular resolution while allowing FOV to be controlled by the steering actuator. In the presence of reflective and less-reflective objects inclose horizontal proximity, narrow beams also reduce horizontal interference to ensure they are perceptible. Furthermore, designing a LiDAR in this configuration reduces manufacturing costs and power consumption as compared to an array of emitter-receiver pairs in different orientations [28]. Steering of beams is achieved differently depending on the LiDAR type; short-range LiDARs use oscillating slats, while medium- and long-range LiDARs are installed in a rotating housing.

Figure 5b: General LiDAR with multiple receivers for different FOVs [28].

Long and Medium Range LiDAR (Rotating Housing): Both LiDARs share a transmitter consisting of a high power fibre laser that produces light with wavelengths between 1525 nanometres and 1565 nanometres in the infra-red spectrum. The rays are passed through an optical diffuser to spread the beam along the vertical axis between +7° and -18° from the horizontal plane [28]. As shown in Figure 5b, there are two sets of receivers to detect light from two different FOVs with different resolutions. The first set of receivers are designed to detect light from long distances at a higher resolution and contains an optical lens configuration to only receive light between +7° and -7° from the horizontal plane. This is complemented by a second set of receivers which are designed to detect light from closer objects at a lower resolution with an optical lens configuration to only receive light between -7° and -18° from the horizontal plane. The second set of emitter-receiver pairs are positioned substantially higher than the first set, allowing it to maintain a tall vertical spread without reflecting off the vehicle roof [28]. The two sets of receivers rotate independently to achieve different scanning rates, enabling quick detection of object changes in medium-range while maintaining high resolution for objects at long-range [28].
Short Range LiDAR (Vehicle Perimeter): For the fixed, non-rotating short-range LiDAR systems, a wide FOV is achieved by introducing oscillating reflectors for beam steering. Instead of a large reflective surface, the system uses mirrors smaller than the cross-section of a single light pulse, closely organised to jointly reflect the incident beam. This allows the mirrors to be electromagnetically driven at high frequencies of up to 5 kiloHertz without significant deformation due to inertial moments. Each mirror is oscillated in phase through a ±1° range, to ensure that the resultant light beams are reflected in the same direction. To prevent restorative bias from torsional flex of the mirror supports, feedback sensors are included for positional control [29].
Operating Modes: To determine the range and relative speed of objects, coherent and incoherent light pulses are generated in an interleaved manner, with one coherent pulse for every ten incoherent pulses. Incoherent signals are ranging pulses for measuring distance, typical of conventional LiDARsystems. On the other hand, coherent, or heterodyne detection, involves non-linear mixing of un-modulated reflected signals with modulated signals produced by a local oscillator. Mixing occurs with a square-law detector, which produces superimposed signals with frequencies equivalent to the square of the sum of the modulated and un-modulated signals. Local oscillator and reflected coherent signals are non-linearly combined at the SPAD array by homodyne or heterodyne mixing to produce a beat frequency signal. The beat frequency signal allows object velocity to be determined by comparison with previous data and analysing the Doppler effect. For example, an object travelling at 0.1 metres per second produces a beat frequency signal of 125 kHz, while an object travelling at 20 metres per second produces a frequency of 40 MHz. Ranging can also be determined from coherent pulses by adjusting the wavelength of the local oscillator to produce a chirped signal and detecting the temporal position upon its return [23]. A balance between refresh rate and resolution has to be achieved to provide sufficient information for the vehicle. A higher refresh rate or shorter scanning duration may provide a quicker response to changes but at the cost of angular resolution — and vice versa. Depending on the distance, the refresh rate can be adjusted to only obtain necessary information [28]. Based on initial point cloud data, the neural network may not be able to identify certain objects due to low spatial resolution. Waymo utilises a system that can initiate the LiDAR to interrogate specific portions of the environment with a higher angular resolution for further detail. In practice, the angular resolution of LiDAR systems is affected by the effective solid angle of each light pulse and the angular separation between each measurement point. The angle can be minimised for higher resolution by reducing beam divergence with optical lenses but is also affected by atmospheric scattering and diffraction after reflection. On the other hand, decreasing angular separation between points by prolonging scanning duration can improve resolution, but slows down detection of changing objects. Alternatively, diverging the solid angle of light pulses to match angular separation enables maintaining a high refresh rate without missing information located between reflection points. However, this results in less precise data which may disallow the onboard neural network from identifying the object. To resolve the issue without compromise, Waymo has designed a control system to independently operate laser pulse and beam slew rates. This allows the vehicle to obtain enhanced scans of specific regions dynamically when and where required. This operation can be performed on edges of moving or distant objects in the point cloud which provide inadequate information in the baseline scan. The system proceeds to increase the pulse rate of the long-range or short-range LiDAR for a higher resolution scan, which is coupled with the initial scan for detail. Alternatively, the increased angular resolution can be obtained by performing a non-uniform scan with an increased pulse rate at the specified region alone [22]. The detailed scan is performed when the network detects moving objects, missing features, or features with high spatial or temporal frequency in the point cloud data (see white space on the left of the point cloud in Figure 4b). To facilitate LiDAR operations while maintaining a realistic power consumption rate, the beam spacing, timing and power also need to be controlled based on its use. The pulse emission schedule can also be adjusted based on the required resolution and maximum detection distance instead of being operated uniformly across all emitters. In this manner, the emitter duty cycle and reflection wait time can be cut down for scanning of nearby objects. This shorter listening window at each particular angle can be optimised for a faster scene refresh rate, or for increasing angular resolution in the same time. Shot power can be customised based on emitter orientation — close range pulses emitted downwards require less power than long-range pulses. For example, if the minimum amount of photons needed to resolve a given feature is proportional to the square of its distance, beams that only travel half of the distance require a quarter of the power [25].

3.1.1.3 Other Sensor Considerations

Clearly, for the sensors to function as desired, design considerations for operation in different environmental conditions must be accounted for. Ambient factors such as temperature, humidity and particulate concentration can have adverse effects on data produced. Waymo’s approach includes a passive superhydrophobic coating, and an active sensor housing regulation system.

Figure 6a: LiDAR transmitter block [25].

Figure 6b: Superhydrophobic coating [30].

Superhydrophobic Coating: The coating creates a water contact angle of at least 130° and can be easily applied to keep sensor housings clean. The coating is optically transparent, transmitting at least 90% of incident light having wavelength between 300 nanometres and 1500 nanometres, and remains well-bonded, preventing removal by rubbing or environmental conditions) to the surface. As shown in Figure 6b, the coating is composed of:

Hydrophobic Fluorinated Solvent: Ensures homogeneous dispersion of particles to prevent clouding of coating. Suitable solvents include fluorinated or perfluorinated alkanes, cycloalkanes, heterocycloalkanes, trialkylamine or a combination of them. Crosslinking agents (with at least one silicon atom) may be added to the solvent to decrease the required binder amount down to 0.3 wt. % of composition [30].
Hydrophobic Fluorinated Polymer Binder: The binder dissolves in the solvent and is used to adhere hydrophobic particles to the substrate surface. Suitable binders include fluroalkyl, fluoroalkoxy, perfluoroalkyl, perfluoroalkoxypolymers or a combination of them. The addition of a binder needs to be controlled as an incorrect amount can affect optical clarity. Furthermore, if too much binder is used, the hydrophobic particles get subsumed by the binder, rendering nano-texturing and hydrophobic properties useless. If too little binder is employed, the bonding between nano-particles and substrate becomes ineffective and susceptible to removal [30].
Hydrophobic Particles: Two types of particles are used to increase the hydrophobic properties of the coating; hydrophobic fumed silica (average size 200 nanometre) and aerogel nanoparticles (average size 100nanometre). On surfaces with aerogel nanoparticles used as the sole method of water repellency, the coating loses hydrophobic properties with a small amount of shear force. Fumed silica nanoparticles have higher durability and bond well to glass, protecting smaller aerogel nanoparticles nestled in-between [30].

Active Humidity Control: For humidity control, a flexible external chamber and the housing interior is connected through a cooling element by conduit. The flexible chamber regulates pressure within the housing; when there is a decrease of pressure within the housing, gas flows from the chamber to the housing and vice-versa. The cooling element and additional desiccant, which reduces excess moisture, complements pressure regulation to provide psychrometric control. Excess moisture is removed from the cooling element using an airlock system, and excess pressure which cannot be acclimatised by the flexible chamber is relieved with a Gore vent. To clear any external accumulation of rain, snow or dust, a nozzle is used to spray a heated, pressurised fluid stream on the housing. This eliminates noise and vibrations generated by mechanical wipers, which may affect sensitive components in the sensors [31].

3.1.2 Computing System

3.1.2.1 Processor

The utility of high-resolution sensor data and deep neural networks create a necessity for high-performance processing units, such as Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs) and custom ASICs. Special types of deep-net silicon are required for manufacturing integrated circuits, which incorporate a variety of transistors on the scale of 14 nanometres to 16 nanometres. Although these have proven their functionality, continued scaling of nodes to7 nanometre forecasts a power reduction of up to 65%. Considering the consumption concerns associated with processing-intensive systems, the development of smaller transistor configurations will provide substantial improvements for autonomous technology. Selecting suitable accelerators are also necessary to offload computationally heavy tasks while balancing with power consumption, cost and programming accessibility [12].

3.1.2.2 Non-Transitory Memory

Memory is used to store the working set for the processor to access and execute instructions. In AVs, the working set is comprised of at least a few frames from each sensor, an active region of driving maps, deep neural networks and their parameter schedules. This creates a substantial memory requirement on the order of tens of gigabytes. Additionally, data accessibility needs to be across a significant bandwidth, which is too large for Static Random Access Memory (SRAM). This invokes the need for high-performance Dynamic Random Access Memory (DRAM) but is coupled with high financial expense, power consumption, and difficulty for integration [12].

3.1.3 Embedded Control

Apart from embedded systems used to control steering, throttling, braking and auxiliary systems, Waymo implements a control system to tailor power emission to specific sensor systems for reduced power consumption. The control system accounts for two competing sensor characteristics; range and FOV. Configuring the sensor systems to achieve both characteristics spells high power consumption and heat generation, which can produce undesirable effects. These two characteristics have varying importance based on the situation — range is important when there is a large difference in velocity between the vehicle and other objects, while FOV is important for detecting peripheral objects. To take advantage of this, Waymo dynamically allocates high power in directions where range is required. For example, when travelling at high speeds, power allocation can be weighted to the front-facing sensors while reducing the ones on the rear. An example is shown in Figure 7a.

Figure 7a: Adjustable power emission for ranging sensors [32].

Further optimisation can be achieved by tailoring power emissions for each sensor. This creates regions of high power in the FOV for longer range, while maintaining low consumption for objects in the vehicle’s proximity. The same concept can be applied to rotating RADAR and LiDARsensors, which change as a function of azimuth instead; achieving power reductions of up to 80%in LiDAR systems. Sensor systems with overlapping FOVs, such as the RADAR systems mounted around the car, can be adjusted to operate in an interleaved manner when there are no nearby objects which may exist outside each FOV. The required power, P for achieving a range, r, is given by the equationP=rn, wheren= 2forRADAR andn= 4for LiDAR. Considering the power consumption for RADAR, an obvious solution is to reduce the power allocation to RADAR systems and substitute long-distance scanning in that region with LiDAR and/or cameras [32]

3.2 Driver Intelligence

The intelligence of a vehicle is achieved through an amalgam of different neural networks, each designed to complete a specific task. One of the most crucial attributes an AV must have is perception; the ability of a vehicle to interpret sensor data for identifying and tracking objects over time. Machine learning is fundamental in perception and finds application in the form of Deep Neural Networks (DNNs) which are inextricable in the design complexities of autonomy. For neural networks to perform optimally, they require a specific task to accomplish, a measure of performance to actively seek and experience for learning. Waymo determines the performance of their neural networks by two measures; quality, referring to the accuracy of answers provided by the network, and latency or inference time, indicating how fast the network provides an answer [34]. Considering the consequences of unsatisfactory performance in AVs, high accuracy and low latency is crucial. Task flow for driverless vehicles generally follow pre-processing, identifying, predicting and control.

3.2.1 Pre-processing

Main tasks in pre-processing include data cleansing and normalisation. This process identifies and corrects inaccurate or corrupted data, along with normalising the resultant data for analysis by subsequent networks [35]. Because environmental information obtained by the vehicle is obtained from different sources, data formats vary between different sets. For example, point clouds describe distances, while image data describe light intensity and colour. For this reason, sensor fusion is required to stitch the different information into a single data set for analysis [36]. The stitched dataset is then passed onto subsequent networks, allowing the vehicle to visualise its surroundings, as depicted in Figure 2b.

3.2.2 Identifying

Neural networks for detecting and characterising objects are crucial for AVs to identify their surroundings, with several other neural networks to complement and augment performance.

3.2.2.1 Object Detection & Characterisation

Primarily, detecting and characterising objects is accomplished with the centre prediction neural network and the object property neural network. The networks operate sequentially under the following process:

Generation of Center Prediction Neural Network (CPNN) output map: The CPNN generates an output map containing an object score at each location (data point)with a value from 0 to 1, representing the probability of an object being centred at that location. The output map is generated under a full context approach, where all sensor data is processed at the same time to generate an object score. As opposed to a sliding window approach, where small windows of data are examined at any one time, processing all sensor data is advantageous at long ranges where objects cannot be identified without analysing the entire data set. Multiple output maps are generated with each map only showing output scores for specific classes of objects, for example, one map for pedestrians, one map for road signs etc. The CPNN can also generate additional output maps for augmenting data representation; stacking maps for each object type creates a 3D map and doing this for every subsequent set generates a fourth dimension in time [28].
Location of object centres on CPNN output map: To select an object centre location within the output map, a centre selection module finds regions of object scores that exceed a preset threshold, and proceeds to select the highest scoring coordinate local to the region [28].
Retrieval of features at selected locations: Detailed features can be obtained by directing the vehicle’s sensors to focus on that particular field of view. The presence of a small object in sensor data is likely to engender high, adjacent object scores in the output map. This can be resolved by honing in onto the cluster of high object scores with the vehicle sensors [28].
Generation of object properties by Object Property Neural Network (OPNN): From the 4D output maps [37], OPNN can churn out object properties by using an instance classifier which recognises and identifies object instances characterised based on shape, colour or size [38].

Other methods are also put in place to support or disprove the initial identification.

Information Inference: In the event where AV sensors are obstructed by a large object, such as a heavy goods vehicle, an inference system can be activated to interpret the vehicle surroundings which are not obstructed. The inference system takes in surrogate data — data which is readily available but not necessarily purposeful — and deduces the environmental setting based on predetermined scenarios. Examples of surrogate data include the state of traffic flow and the relative motion of vehicles. A common scenario where this finds utility is shown in Figure 7b. Waymo’s inferencesystem is composed of a combination of Bayesian networks, hidden Markov models and decision trees [33].

Figure 7b: Inferring state of traffic signal based on surrogate data [33].

Pedestrian Detection: Waymo has implemented a supplementary method of identifying pedestrians by pinging handheld wireless devices. Wireless devices are often exchanging radio signals with wireless base stations, 802.11 channels or Bluetooth, and by transmitting a discovery signal over multiple channels, the AV can evoke a response and determine pedestrian location. The typical process for establishing communications in conventional systems involve a slave and master device, listening and requesting for a connection respectively. They do so on individual channels, and cycle through all available channels until they reach the same frequency. To reduce connecting time, the vehicle (master device) can cycle in the opposite direction and request on multiple channels at the same time [39].
Construction Zone Mapping: When a group of objects such as road works signs, cones, or construction vehicles are recognised and the area they are contained in is determined to be a construction zone, Waymo’s AV logs it in an external server. This server can be accessed within the fleet network, allowing other AVs to take additional precautions when navigating through the indicated area. After the initial allocation of an area as a construction zone, any AV passing through the area can make a reassessment to classify whether the construction zone is active, inactive or removed. This is dependent on the recognition of object discrepancies, for example, changes in the positioning of lane lines, closing of previously open lanes, or new barriers [40].

3.2.2.2 Semantic Segmentation

Semantic segmentation is a method modelled after subconscious human behaviour; effective decision making is based on a multitude of considerations that may not be obvious when looking at each aspect in isolation. When making navigation decisions, it is simple for the AV to determine a safe and smooth trajectory based on a finite number of appropriate responses concerning a single object- for example, passing to the left, right or stopping in front or behind a bicycle or car. However, in the presence of multiple objects, the number of possible reactions increase exponentially, which can overload the processing unit and increase processing time. By grouping multiple objects as a single entity, the AV can treat them as a single entity and choose a safe and efficient response with less processing power and delay. An example of this is shown in Figure 8a and 8b, where semantic segmentation is performed for a group of traffic cones and cars at a traffic light respectively.

Figure 8a: Clustering of traffic cones AR [41].

Figure 8b: Clustering of parked cars and vehicles waiting at an intersection [41].

This also ensures that the AV exhibits courteous driving; weaving in and out of lanes between cars is technically legal and physically feasible, but is not a desirable course of action. Waymo has applied the following process for semantic segmentation:

Identification of object pairs from sensor data: Objects in the environment of the AV identified by detection and characterisation methods are grouped into pairs and passed to the next stage.
Computing similarity value between object pairs based on object characteristics: By taking cues from object type, locations, proximity, current and short term history of relative motion, a similarity score can be generated between pairs of objects. If the score exceeds a preset threshold based on recall and precision (similar to Region Proposal Network) of the clustering network, the pair of objects are passed on to the clustering algorithm.
Clustering pairs using a union-find algorithm: Presupposing a plurality of objects in the same semantic cluster into a single object, helping to enforce a similar vehicle reaction for each object in the same cluster.
Passing clustered object to vehicle control: Processing can be simplified when the vehicle is deciding between two equivalently beneficial reactions, by selecting the reaction which involves semantic segmentation, ensuring consistent vehicle behaviour.

A high recall/low threshold value can be chosen in rudimentary stages to increase sensitivity to clustering, but progressive fine-tuning is required to ensure that a balance is reached between under-segmentation and over-segmentation. The former is when the network recognises two distinct but nearby objects as the same, which may cause the vehicle to behave erratically. The latter is when one object is recognised as two distinct objects in different semantic clusters; although this s less prone to error, it increases computational cost. To enable rapid grouping and severing of objects, clustering can be done each time a new object is observed as the vehicle moves along, creating a simplified environment for the vehicle to navigate in real-time.

3.2.2.3 Rare Instance Classifier

Waymo has implemented a system that reduces incorrect classification by the instance classifier of similar objects which have different implications for driving decisions. This is achieved through a rare instance classifier, which generates a secondary classification score and rarity score. The key difference between the computed object score of the common and rare instance classifier is the training data they adjusted their parameters to. The rare instance classifier is trained using object types that rarely occur in the common instance classifier or have had a history of erroneous classification. This provides a second opinion on the initial object recognition. The rarity score represents the probability that an identified instance was incorrectly categorised by the instance classifier. Depending on the rarity value relative to a preset threshold, the classification scores are individually weighted for greater or less emphasis — both of which are combined to produce a final score for classification. An example where this network finds utility is shown in Figure 9, where the instance classifier recognises two different signs as the same.

Figure 9a: Rare instance classifier identifies commonly occurring object [38].

Figure 9b: Rare instance classifier identifies rarely occurring object [38].

Apart from preventing erroneous categorisation, the neural network also improves the classification capabilities of the instance classifier network, by reducing repeated incorrect identification and providing negative feedback [38].

3.2.2.3 Region Proposal Network

A region proposal network is used to box out objects in the vehicle environment, defining a dimensional limit around the object centre to give it height, length and breadth. This allows the AV to gauge distance and manoeuvre around the object zone. For a specific object, such as a pedestrian, the network is assessed according to two measures of performance: recall, which is the proportion of pedestrians which are detected, and precision, which is the proportion of identified pedestrians which are real and not false positives [42].

3.2.3 Predicting

3.2.3.1 Behaviour Prediction

Waymo uses a system that predicts the future trajectory of surrounding vehicles based on the orientation of their front wheels. This is highly effective for stationary cars in parking spaces or cars performing three-point turns where trajectory cannot be solely predicted from the previous heading. The process is as follows:

Information from other networks: The region proposal network provides bounding box data and the object characterisation network provides information on object characteristics. If the object type is typical of one which has wheels, the network triggers the next process.
Front-wheel area or volume identification: Based on data from the bounding box, coordinates corresponding to the centre of the object’s front wheel is identified. An area or volume is then defined around the centre coordinates which corresponds to the dimensions of the wheel.
Compute orientation based on wheel data points: Using LiDAR, a plurality of data points can then be obtained within the defined area or volume. A plane fitting algorithm is then used to average the orientation of the points and create a corresponding plane. The system then takes the angle between the wheel and the bounding box as the orientation, which can then be used to estimate future trajectories. It should be noted that the accuracy of data points obtained by the LiDAR is dependent on the object's orientation to the AV; low accuracy can be expected for data points obtained from an object facing the AV’s sensors head-on [43].

3.2.3.2 Vehicle Movement

A system that allows the AV to detect slow-moving vehicles using wheel movement. It provides a significant advantage over techniques limited by signal noise, allowing it to safely navigate through complex driving environments such as residential neighbourhoods. This performance advantage can be attained when wheel tracking feature descriptors, such as the hub cap, wheel spoke, bolt, or tire branding, can be identified through gradient-based key points. This enables sequential analysis of images with progressing time, allowing the angular change in tracking features to be correlated to the linear speed of the vehicle [44].

By incorporating highly detailed map information with live sensor information, the AV can predict object behaviour in varying content reliably. Map information includes but is not limited to the shape and location of roads, lane types, traffic signals, road conditions, stop signs and car parks. Sensor data provides contextual information on object characteristics, such as kinematic behaviour, position, orientation, size and shape. Waymo has implemented a system to compare contextual information with map information to narrow down possible future behaviour by eliminating kinematically impossible actions. A network then generates the likelihood for each behaviour to happen based on the context, which is passed on to another network for secondary analysis if they are above a pre-determined threshold. In this network, a route is planned for the AV to avoid clashing with the predicted object behaviours, and the safest route is executed. If none of the behaviour probabilities exceeds the predefined threshold, all behaviours are analysed, and the likelihood of similar behaviours are accumulated to pass on to the secondary network. An example where this finds utility is given in Figure 10, where the future trajectory of a car is predicted [45].

Figure 10: Predicting the future trajectory of a car at a junction [45].

3.2.4 Control

3.2.4.1 Vehicle State

Waymo has designed a system to provide adaptive behaviour based on the presence of special-purpose vehicles, such as a fire truck, police car or ambulance, in its surroundings. Special-purpose vehicles can be identified through the use of visual or auditory input, typically by their standard characteristics such as sirens, flashing lights, shape and colour. The response action to a special-purpose vehicle should be instantaneous, prompting the AV to change its speed, pullover or change lanes [46]. Cameras are used to monitor the internal state of Waymo vehicles, allowing the vehicle to respond to predefined situations. Some examples include unbuckling of a seatbelt, interior damage or dirt, and left baggage. The system recognises predefined situations by training with a classifier neural network with labels, allowing the vehicle to make a response even when cellular service is not available [13].

3.2.4.2 Maneuvering System

A regularly occurring problem faced by drivers is deciding whether or not to slow down, speed up or maintain velocity when the amber lamp lights up on a traffic light. Simply stopping may cause accidents with surrounding vehicles, while proceeding through may cause the vehicle to cross the intersection after the light turns red. Waymo’s approach to this problem is to determine the duration where the light remains amber, to estimate the location of the vehicle when the light turns red. The duration is inferred from previously recorded information and the time at which the light turned amber. Estimating vehicle location is performed by exploring different acceleration iterations while accounting for the speed limit, smoothness and traffic light state. If any of those iterations bring the vehicle past a predetermined threshold from the stop line, the vehicle continues through the intersection [47].

3.2.4.3 Navigation/Pathfinding System

If considering utilising AVs for ride-hailing, a system is also required to dictate vehicle behaviour while waiting for pickups. Waymo notifies the potential passenger three times after a booking has been made — once when the driverless vehicle is within a preset distance to the pickup location, another time when the vehicle arrives, and finally when the booking is cancelled when no authentication from the passenger after a preset time [48].

3.2.4.4 Prioritising

A balance must be reached between obtaining sufficient information for awareness and the negative consequences associated with obtaining the information. For example, upon approaching the junction in Figure 7b, the AV must decide between changing lanes to maintain an unobstructed view or remaining behind the large vehicle and obtain information through surrogate data as described in Section 3.2.2. The former, which is a proactive or active-sensing action, can have negative consequences — such as increasing the probability of an accident, losing information from other sensors which are obstructed after the action, or annoying passengers and other drivers. Waymo compares a risk-cost framework with an information-improvement expectation framework across varying degrees of proactive actions, allowing the AV to determine the degree to which active-sensing can improve control. The information-improvement expectation is quantified based on the information the AV expects to gain, while risk-cost is calculated from the probabilities of negative outcomes in the resultant vehicle state. A score is computed between each information-improvement expectation and risk-cost pair and compared relative to other pairs to advise the next course of action.

3.2.5 Improving Cognition

There are two published methods through which Waymo improves its DNN. The former involves Automated Machine Learning (AutoML) and Neural Architecture Search (NAS) to automatically create DNNs, while the latter applies an evolutionary-based approach to optimise hyperparameter schedules.

3.2.5.1 AutoML & NAS

In collaboration with the Google Brain team, Waymo used Cloud AutoML to generate neural networks for semantic segmentation [34]. Cloud AutoML is a subsidiary of Google which combinesAutoML tools and NAS [49]. AutoML involves a variety of methods used to automate the completion of tasks in a machine learning system. Typically, these include data acquisition, analysis and augmentation, modelling (feature engineering, model selection, model building, hyperparameter optimisation), deployment, and production [49]. NAS is a subset of AutoML which uses a RecurrentNeural Network (RNN) to automatically design neural network architecture around the desired dataset. NAS achieves this by utilising evolutionary algorithms and reinforcement learning to model and train NAS cells (see Figure 11), which are stacked to form a deep neural network architecture [50].

An initial adaptation of existing NAS cells which had undergone reinforced learning to semantic segmentation to yield results better than manually fine-tuned networks (see the red dot in Figure 13a). This led to the development of an automatic search algorithm to explore different NAS cell combinations. The search generated hundreds of Convolutional Neural Networks (CNNs) which produced higher quality or lower latency answers [34] (see solid green line in Figure 13a). The same approach was used to create networks for the detection and localisation of traffic lanes, producing similar results[34]. As fast response times are required for the AV to make driving decisions on the fly, Waymo and GoogleBrain took the development a step further by performing an end-to-end search for combinations of new NAS cells which generated CNNs optimised for low latency and high quality. Due to the large computational costs of performing such a process, a proxy task was created — like the original task but scaled down for rapid testing (see Figure 12).

Figure 12: Proxy end-to-end search [34].

This allowed them to explore over ten thousand different architectures in the span of two weeks when it would have taken over a year. Different strategies were used in the end-to-end search, such as a random search (see yellow plots in Figure 13b) and reinforced learning (see blue plots in Figure13b), producing networks with higher response times and lower error rates. The networks created displayed innovative amalgams of pooling, convolution and deconvolution operations.

Figure 13a: Initial adaptation results. Red: Network created with transferred learning(from NAS paper).Green: Networks created with a random search on simple architectures [34].

Figure 13b: Further development resultsYellow: Networks created with a random search on re-fined architectures. Blue: Networks selected with a search using reinforcement learning on refined architectures [34].

A further development involved the use of NAS cells with the proxy task, allowing Waymo to achieve up to 30% lower latency and 10% lower error rate than previous architectures (not in graphs).

3.2.5.2 Population-Based Training (PBT)

Manual tuning and random search are commonly used methods for hyperparameter optimisation. These methods, although effective, are not entirely efficient [42]. For example,

In manual tuning, the hyperparameters of a single network are adjusted at the end of each training to optimise performance. Although this expedites the production of better results, it is arduous for the engineer.
Alternatively, in random searches, numerous random hyperparameter schedules are applied over different types of hyperparameters for network training. The training is done independently, but in parallel, and the best performing model is selected after all networks have completed the training data set.

This accrues high computational costs, which can be reduced by utilising a method developed by DeepMind called Population Based Training (PBT). PBT follows an evolutionary approach to automatically determine the best hyperparameter schedules. The process can be broken down into several steps:

A search tool is used to generate multiple networks with random hyperparameters.
A predetermined interval is specified (approximately 15 minutes long), at which each network is evaluated against the others.
At the end of each evaluation, a progeny of the neural network with better performance is created, bearing slightly mutated hyperparameters.
The progeny replaces the inferior network, and the process repeats until the neural networks evolve until a satisfactory hyperparameter schedule is obtained.

This approach circumvents the disadvantages associated with manual tuning and random search by adjusting hyperparameters throughout training and removing sub-par networks to free up memory. Furthermore, it obviates the need for restarting the training from scratch, since each progeny inherits its architecture from its parent network. These benefits were applied to Waymo’s Region ProposalNetwork, improving precision by reducing false positives by almost a quarter while retaining high recall rates. Moreover, training times and computational resources were reduced by 50% through PBT. A disadvantage that is inherent to PBT is its instantaneous optimisation behaviour. Since under-performing networks are immediately removed, the process disregards long term advantages, analogous to the vitality of diversification in biological evolution. An obvious solution is increasing population size to artificially diversify the sample, but this approach is limited by computational resources. Consequently, a two-part strategy was generated: only allowing networks to compete within sub-populations (niches), and directly rewarding more unique networks with a competitive edge within each niche. The combination of these two actions promoted diversity enabling testing of a larger hyperparameter space [42].

4 Conclusion

From Waymo’s patents and published research alone, certain technical aspects still need to be addressed. Vibrations, lurching and large rotations of the car can lead to misalignment of images and ranging data. The network responsible for sensor fusion will then require the capability of maintaining data consistency when the AV is travelling over humps or roads with irregular surfaces. Possible solutions include the implementation of an Inertial Measurement Unit (IMU) and an image heaving correction network. External sources can also corrupt return signals for ranging devices, especially if they are surreptitious. For example, mirrors and optically transparent surfaces can reflect the emitted signal or let it pass through completely to produce spurious distances in LiDAR or RADAR data. A system is then required to recognise these surfaces and correct data accordingly, by using DNNs to recognise image or ranging data characteristic of these surfaces. In environments where more than one AV is operating, sensors that rely on signal emissions, such as LiDAR and RADAR, may face data corruption from other emitters or signal interference. Potential measures can be implemented through fleet management, by varying the frequency or wavelength of emitted signals for each AV to identify its reflected signals and filter out the rest. This problem is exacerbated when the AV is operating in environments with AVs deployed by other companies, where signal characteristics may not be standardised. A regulatory framework may be necessary to allocate a set of signal characteristics through which AVs from a specific company can utilise. The utility of Global Positioning System (GPS) sensors for locating and navigating introduces system vulnerability to blocking, jamming and spoofing which could be potentially dangerous. The utility of different navigating and positioning systems, or countermeasures against GPS vulnerabilities could help to resolve this issue [51]. The development of software that operates outside the physical bounds of the AV is also necessary for functional performance. Simulation software, such as CarCraft, allowsWaymo’s AV network to learn and train all hours of the day. Waymo has also used Google StreetViewand GroundTruth (components of Google Maps) to its advantage, augmenting the AV’s locating and navigating abilities [3]. Fleet management software may also be necessary for companies looking to provide ride-hailing services, with disengagement capabilities for remote control when the AV fails to perform adequately [52]. Technical aspects aside, companies must work with statutory bodies and stakeholders to develop operating frameworks for full-scale deployment. The challenges presented are not only limited to sociocultural factors but legal and moral implications as well. Due to the breadth of technologies required in creating an AV, this paper is limited in technical aspects to the methods deployed by Waymo. Moving forward, research for technologies used by other leading developers and secondary suppliers, such as General Motors and Intel-Mobileye, can advise automakers deciding between independently developing autonomous technology or purchasing it from suppliers. An in-depth discussion on the political, economic, sociocultural, legal and environmental effects will help with understanding the future direction of AVs. A discussion of the operational frameworks which enable the integration of AVs into society will also assist decision making for regulatory bodies and companies. It is a burgeoning industry with opportunities for applications in other sectors, such as last-mile delivery, shipping, freight and space exploration. This short paper cannot hope to encapsulate the full potential of AVs but hopes to inspire others to further discussion.

References

[1] A. J. Hawkins, “Waymo and Jaguar will build up to 20,000 self-driving electric SUVs,” Mar 27 2018. [Online]. Available: https://www.theverge.com/2018/3/27/17165992/waymo-jaguar-i-pace-self-driving-ny-auto-show-2018

[2] N. B. Geddes, Magic motorways. New York: Random house, 1940.

[3] L. D. Burns and C. Shulgan, Autonomy: The Quest to Build the Driverless Car — And How It Will Reshape Our World. HarperCollins, Aug 28 2018.

[4] S. Abuelsamid and J. Gartner, “Navigant research leaderboard: Automated driving vehicles,” Navigant Research, Tech. Rep., Mar 19 2019. [Online]. Available: https://www.navigantresearch.com/reports/navigant-research-leaderboard-automated-driving-vehicles

[5] Waymo at IAA Frankfurt 2019. Waymo, Sep 12, 2019. [Online]. Available: https://medium.com/waymo/waymo-iaa-frankfurt-2019-b3cca36d8479

[6] J. Krafcik, “Introducing Waymo’s suite of custom-built, self-driving hardware,” Waymo, Tech. Rep., -02–16T17:36:06.102Z 2017. [Online]. Available: https://medium.com/waymo/introducing-waymos-suite-of-custom-built-self-driving-hardware-c47d1714563

[7] “Groupe Renault and Nissan sign exclusive alliance deal with Waymo to explore driverless mobility services,” Jun 19 2019. [Online]. Available: https://www.alliance-2022.com/news/groupe-renault-and-nissan-sign-exclusive-alliance-deal-with-waymo-to-explore-driverless-mobility-services/

[8] B. Lo, “Waymo’s self-driving experiment goes commercial,” Dec 6 2018. [Online]. Available: https://academic-mintel-com.iclibezp1.cc.ic.ac.uk/display/933384/?highlight

[9] “Infographic: The self-driving car companies going the distance.” [Online]. Available: https://www-statista-com.iclibezp1.cc.ic.ac.uk/chart/17144/test-miles-and-reportable-miles-per-disengagement/

[10] R. Waters and J. Burn-Murdoch, “Waymo builds big lead in self-driving car testing,” Feb 14 2019. [Online]. Available: https://www.ft.com/content/7c8e1d02-2ff2-11e9-8744-e7016697f225

[11] J. Miller, “Vw revs up efforts to take on driverless vehicle rivals,” Oct 28 2019. [Online]. Available: https://www.ft.com/content/f646649a-f8d0-11e9-a354-36acbbb0d9b6

[12] D. L. Rosenband, “Inside Waymo’s self-driving car: My favorite transistors,” in -. JSAP, Jun 8, 2017, pp. C20–C22. [Online]. Available: https://ieeexplore.ieee.org/document/8008500

[13] A. Wendel, C. K. Ludwick, and L. A. Feenstra, “Determining and responding to an internal status of a vehicle,” U.S. Patent US20 190 258 263A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190258263A1/en

[14] M. Tan, “Recreating the self-driving experience: the making of the Waymo 360 video,” Medium, Tech. Rep., Feb 28 2018. [Online]. Available: https://medium.com/waymo/recreating-the-self-driving-experience-the-making-of-the-waymo-360-video-37a80466af49

[15] P.-Y. Droz, S. Verghese, and B. Hermalyn, “Rotating lidar with co-aligned imager,” U.S. Patent US10 447 973B2, Oct 15, 2019. [Online]. Available: https://patents.google.com/patent/US10447973B2/en

[16] A. Wendel and B. Ingram, “Camera systems using filters and exposure times to detect flickering illuminated objects,” U.S. Patent US10 453 208B2, Oct 22, 2019. [Online]. Available: https://patents.google.com/patent/US10453208B2/en

[17] B. Chen, A. Brown, and J. Izadian, “Plated, injection molded, automotive radar waveguide antenna,” U.S. Patent US10 454 158B2, Oct 22, 2019. [Online]. Available: https://patents.google.com/patent/US10454158B2/en

[18] T. Campbell, “Radar based mapping and localization for autonomous vehicles,” U.S. Patent US10 386 480B1, Aug 20, 2019. [Online]. Available: https://patents.google.com/patent/US10386480B1/en

[19] A. Brown, “Adaptive algorithms for interrogating the viewable scene of an automotive radar,” U.S. Patent US10 222 462B2, Mar 5, 2019. [Online]. Available: https://patents.google.com/patent/US10222462B2/en

[20] J. Izadian, “Vehicle-mounted radar deflectors,” U.S. Patent US20 190 288 400A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190288400A1/en

[21] P.-Y. Droz, D. N. Hutchison, L. Wacheter, and A. McCauley, “Lidar optics alignment systems and methods,” U.S. Patent US10 094 916B1, Oct 9, 2018. [Online]. Available: https://patents.google.com/patent/US10094916B1/en

[22] B. Templeton, P.-Y. Droz, and J. Zhu, “Wide-view lidar with areas of special attention,” U.S. Patent US9 983 590B2, May 29, 2018. [Online]. Available: https://patents.google.com/patent/US9983590B2/en

[23] P.-Y. Droz, “Hybrid direct detection and coherent light detection and ranging system,” U.S. Patent US10 436 906B2, Oct 8, 2019. [Online]. Available: https://patents.google.com/patent/US10436906B2/en

[24] S. Cova, M. Ghioni, A. Lacaita, C. Samori, and F. Zappa, “Avalanche photodiodes and quenching circuits for single-photon detection,” Applied Optics, vol. 35, no. 12, pp. 1956–1976, 1996, j2: Appl. Opt. [Online]. Available: http://ao.osa.org/abstract.cfm?URI=ao-35-12-1956

[25] B. Ingram, P.-Y. Droz, L. Wachter, S. McCloskey, B. Gassend, and G. Pennecot, “Variable beam spacing, timing, and power for vehicle sensors,” U.S. Patent US10 416 290B2, Sep 17, 2019. [Online]. Available: https://patents.google.com/patent/US10416290B2/en

[26] P.-Y. Droz, G. Pennecot, A. Levandowski, D. E. Ulrich, Z. Morriss, L. Wachter, D. I. Iordache, W. McCann, D. Gruver, B. Fidric, and S. W. Lenius, “Long range steerable lidar system,” U.S. Patent US9 880 263B2, Jan 30, 2018. [Online]. Available: https://patents.google.com/patent/US9880263B2/en

[27] G. Pennecot, Z. Morriss, S. Lenius, I. Iordache, D. Gruver, P.-Y. Droz, L. Wachter, D. Ulrich, W. McCann, R. Pardhan, B. Fidric, A. Levandowski, and P. Avram, “Vehicle with multiple light detection and ranging devices (lidars),” U.S. Patent US10 120 079B2, November 6, 2018. [Online]. Available: https://patents.google.com/patent/US10120079B2/en

[28] P.-Y. Droz, C. Onal, W. McCann, B. Fidric, V. Gutnik, L. Mattos, and R. Pardhan, “Light detection and ranging (lidar) device having multiple receivers,” U.S. Patent US10 379 540B2, Aug 13, 2019. [Online]. Available: https://patents.google.com/patent/US10379540B2/en

[29] D. Ulrich, P.-Y. Droz, and S. Lenius, “Light steering device with an array of oscillating reflective slats,” U.S. Patent US10 401 865B1, Sep 3, 2019. [Online]. Available: https://patents.google.com/patent/US10401865B1/en

[30] J. T. Simpson, “Optically transparent superhydrophobic thin film,” U.S. Patent US20 190 262 861A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190262861A1/en

[31] P. C. Lombrozo and J. Switkes, “Sensor condensation prevention,” U.S. Patent US20 190 283 533A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190283533A1/en

[32] B. Ingram, E. McCloskey, T. Campbell, and P.-Y. Droz, “Tailoring sensor emission power to map, vehicle state, and environment,” U.S. Patent US20 190 277 962A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190277962A1/en

[33] P. Lombrozo, E. Teller, and B. Templeton, “Inferring state of traffic signal and other aspects of a vehicle’s environment based on surrogate data,” U.S. Patent US20 190 271 983A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190271983A1/en

[34] S. Cheng and G. Bender, “Automl: Automating the design of machine learning models for autonomous driving,” Medium, Tech. Rep., Jan 15 2019. [Online]. Available: https://medium.com/waymo/automl-automating-the-design-of-machine-learning-models-for-autonomous-driving-141a5583ec2a

[35] “Machine learning: Data pre-processing,” pp. 111–130, 2018.

[36] M. Rick, J. Clemens, L. Sommer, A. Folkers, K. Schill, and C. Bu ̈skens, “Autonomous driving based on nonlinear model predictive control and multi-sensor fusion,” pp. 182–187, 2019. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S2405896319303994

[37] A. Ogale and A. Krizhevsky, “Neural networks for object detection and characterization,” U.S. Patent US20 190 279 005A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190279005A1/en

[38] W.-Y. Lo, A. Ogale, and Y. Gao, “Rare instance classifiers,” U.S. Patent US20 190 318 207A1, Oct 17, 2019. [Online]. Available: https://patents.google.com/patent/US20190318207A1/en

[39] P.-Y. Droz, “Detection of pedestrian using radio devices,” U.S. Patent US10 377 374B1, Aug 13, 2019. [Online]. Available: https://patents.google.com/patent/US10377374B1/en

[40] D. I. Ferguson and D. J. Burnette, “Mapping active and inactive construction zones for autonomous driving,” U.S. Patent US20 190 258 261A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190258261A1/en

[41] J. S. Russell and F. Da, “Semantic object clustering for autonomous vehicle decision making,” U.S. Patent US10 401 862B2, Sep 3, 2019. [Online]. Available: https://patents.google.com/patent/US10401862B2/en

[42] Y. Hsin Chen, “How evolutionary selection can train more capable self-driving cars,” Waymo, Tech. Rep., -07–31T21:23:19.272Z 2019. [Online]. Available: https://medium.com/waymo/how-evolutionary-selection-can-train-more-capable-self-driving-cars-a7191f771982

[43] J.-S. R. Gutmann, “Using wheel orientation to determine future heading,” U.S. Patent US20 190 315 352A1, -, -, application pending. [Online]. Available: https://patents.google.com/patent/US20190315352A1/en

[44] C. L. Robinson, “Detecting vehicle movement through wheel movement,” U.S. Patent US10 380 757B2, Aug 13, 2019. [Online]. Available: https://patents.google.com/patent/US10380757B2/en

[45] D. I. F. Ferguson, D. H. Silver, S. Ross, N. Fairfield, and I.-A. Sucan, “Predicting trajectories of objects based on contextual information,” U.S. Patent US10 421 453B1, Sep 24, 2019. [Online]. Available: https://patents.google.com/patent/US10421453B1/en

[46] D. I. Ferguson, “Modifying a vehicle state based on the presence of a special-purpose vehicle,” U.S. Patent US10 427 684B1, Oct 1, 2019. [Online]. Available: https://patents.google.com/patent/US10427684B1/en

[47] J.-S. R. Gutmann, A. Wendel, N. Fairfield, D. A. Dolgov, and D. J. Burnette, “Traffic signal response for autonomous vehicles,” U.S. Patent US10 377 378B2, Aug 13, 2019. [Online]. Available: https://patents.google.com/patent/US10377378B2/en

[48] J. Arden, A. K. Aula, and B. D. Cullinane, “Autonomous vehicle behavior when waiting for passengers,” U.S. Patent US10 379 537B1, Aug 13, 2019. [Online]. Available: https://patents.google.com/patent/US10379537B1/en

[49] Y. Adan, “What is the difference between AutoML and NAS (neural architecture search)?” Feb 14 2019. [Online]. Available: https://www.quora.com/What-is-the-difference-between-AutoML-and-NAS-neural-architecture-search

[50] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in -. IEEE, Jun 2018, pp. 8697–8710. [Online]. Available: https://ieeexplore.ieee.org/document/8579005

[51] J. S. Warner and R. G. Johnston, “Gps spoofing countermeasures,” Vulnerability Assessment Team, Los Alamos National Laboratory, Tech. Rep., — -. [Online]. Available: http://lewisperdue.com/DieByWire/GPS-Vulnerability-LosAlamos.pdf

[52] N. Fairfield and J. S. Herbach, “Remote assistance for an autonomous vehicle in low confidence situations,” U.S. Patent US10 444 754B2, Oct 15, 2019. [Online]. Available: https://patents.google.com/patent/US10444754B2/en

ENGINEERING

Alphabet’s Waymo: A Technical Study on Autonomous Vehicle Tech

A literature research paper I wrote in Imperial College covering a high-level overview on autonomous vehicles and an in-depth analysis of Waymo technologies, dated 13 December 2019.

Abstract

Contents

1 Introduction

2 Present State of Waymo

3 Research & Development

3.1 Hardware

3.2 Driver Intelligence

4 Conclusion

References

Written by Justin Kek