How to build a Holodeck - part 1

Credit: TV series Star Trek TNG You may have seen the Holodeck device that appeared in the TV series Star Trek: The Next Generation, where a user goes into the Holodeck, issues verbal instructions, and entirely realistic 3D objects or environment would appear instantly. This post explores the relevant AI technology needed in order to support such a vision.

Here we use the Holodeck as an example, but in general terms this is in fact in line with what I am trying to achieve with the terraAI (a.k.a. TAI) project, in the sense that both require some kind of interactive and intelligent knowledge-based assistant for getting something useful done.

For ease of reference we shall call this system HAI (because it is in fact an application of the TAI platform for dealing with 3D objects).

For the type of watered-downed Holodeck described in this post we shall call it the HAI Holodeck, to distinguish it from the well-known TNG Holodeck.

Warnings

This Holodeck thread is a serious investigation from the AI (artificial intelligence) perspective on how to realize a holodeck-like experience.

Many previous attempts at building a Holodeck can be found over the Internet, which are almost invariably focused on the more flashy areas related to visualization, scanning, and sensing. This thread eschews most of those, and focus instead on building a foundation for the less visual but nonetheless vital part needed for supporting the Holodeck experience.

As stated in the TAI Manifesto, here we adopt a top-down design approach which is a common in large-scale commercial software design. As such you will find plenty of to-be-filled stubs as we go, since this is NOT a step-by-step how-to recipe.

You should stop reading here if you have received no training in Artificial Intelligence, since I will go into some heavy technical stuff that could be hard to follow for people without a background in AI.

Current state towards the Holodeck

There have been great progress made on many fronts:

  1. On the hardware side it would appear that soon we will have devices that are capable of rendering highly accurate and realistic AR/VR scenery.
  2. There are techniques such as the redirected walking which allows limited physical space to feel like much larger. This allows a users to have the illusion of walking freely around a large neighborhood while inside a relative small room wearing VR goggles.
  3. Gesture control, using devices such as Leap Motion, is maturing.
  4. Many haptic technologies are becoming available for recreating the sense of touch.

However, currently the virtual worlds shown in AR/VR demos are painstakingly handcrafted over long period of time, and the HAI Holodeck discussed here is intended to address this deficiency by making it exceeding easy to build a wide variety of 3D objects and environments, following the inspiration as given by the TNG Holodeck.

Our approach

We want the HAI system to be so easy to use that almost anybody is able to use it. This decision actually alters the design approach in some fundamental ways, including how 3D objects are represented, how knowledge is acquired by HAI, how a human user interacts with HAI in order to get something built, etc.

For the purpose of this HAI Holodeck thread, we will make the following simplifying assumptions:

  1. What we will leave out:

    1. We will leave the hardware side to the likes of Kinect, Oculus Rift, Magic Leap and HoloLens, and dovetail with those later when we are ready.
    2. We don't do holograms here, so users still need to wear AR/VR goggles in order to see the 3D world that was built, unlike the Holodeck on TV.
    3. We are not concerned with middleware 3D graphics rendering engine technologies, such as the types of things that Euclideon deals with.
    4. We don't deal with Star Trek replicator or transporter technologies here (just so that is no misunderstanding).
    5. We will ignore the gesture control or haptic technology for now.
    6. We will ignore issues related to the navigation in the virtual world for now.
    7. We are not concerned with 3D scanning technologies for the purpose of acquiring the 3D models needed for the target object or environment.
    8. We don't create seemingly sentient human-like characters or complex machinery. Rather, for now we will focus on the simpler tasks of creating static objects and environments as a start.
  2. What we will do:

    1. We will focus on constructing 3D objects or environments (referred to jointly as 3D models below) based on verbal commands from a user. In other words, our focus here is on how to allow complex 3D models to be built from simple user-system interaction, using a knowledge-driven approach.
    2. We will aim to make HAI Holodeck so easy to use that almost anyone is able to use it.
    3. We will start with a relatively simple and small target 3D world for now. We will design it for the long term so one day you can ask HAI to build something elaborate such as an entire 19th-century London neighborhood with all the details, but that's for later.
    4. We want to make it crowd-driven, in the Wikipedia spirit, so that domain experts in all areas eventually can help with building up an elaborate knowledge base for the HAI Holodeck, even if they don't know anything about AI or VR/AR technologies. Put another way, we aim for the Wikipedia model in terms of its richness and open contribution from the public.
  3. Ultimately we aim to create an open, rich, and ever-evolving 3D knowledge base.

More specifically, we adopt the following technical approach:

  1. HAI interacts with the user through a natural language (NL) interface, so that it is possible to figure out what the user wants interactively.
  2. Knowledge-guided. It is assumed that a large knowledge base is put in place, which contains detail information about the mapping between textual requests and corresponding 3D objets (e.g., what a typical car looks like, what a typical sports car looks like, etc.), as well as all kinds of background knowledge, etc. This allows HAI to offer partial solution based on little information, but then interact with user and guided by prior knowledge to converge on the target model that user wants.
  3. Machine learning assisted. We use supervised and unsupervised machine learning methods to acquire the large knowledge base to do the work effectively.
  4. Overall we take a top-down design approach. By taking a page from large-scale commercial software design, we start by working out a skeletal architecture with its requisite components, as well as the requirements for each of its components. We may defer the detail design within each components until later. This is also a way to solicit contribution from the research community, so that if someone come up with a new algorithm that fits the requirements, then we can quickly fit it into this grand HAI Holodeck design and have an instant upgrade. See here for more details

Put another way, our approach here is to focus on building a huge knowledge base about our world, with the assistance of machine learning and the general public, and then use these to drive a HAI Holodeck that makes the creation of 3D objects or environments exceedingly easy.

These are further explained separately in the sections below, where we will also try to find ways to make further simplifications for the initial phase.

NL user interface

HAI interacts with user for the following goals:

  • G1: acquire user's initial instruction, and consequently respond with a list of candidate objects \(\{C_i\}\) (which are likely off, thus require further refinement). This can be viewed as a matter of knowledge retrieval for objects that match the description.
  • G2: acquire user's instruction for modifying the candidate objects. This can be viewed as a matter of retrieving and applying operational knowledge that satisfies the stated goal.
  • G3: acquire additional information about \(\{C_i\}\). This can be viewed as a form of supervised learning related to the 3D objects, assisted by the user.
  • G4: acquire the linguistic terms, expressions, and convention that user employs. This can be viewed a form of language learning, assisted by the user.
Knowledge-guided interaction

The goals G1 and G2 above are guided by information stored in the knowledge base. What's special about this repository that we call the knowledge base is that most of its content can be acquired through a machine learning module.

How such knowledge is used to achieve the goals G1 and G2 are described in a separate upcoming post Knowledge Management.

Knowledge acquisition

Machine learning plays a pivotal role in this HAI system. The 3D and knowledge representations are the underpinning of the entire system, and the immense amount of content for these must be acquired largely through a machine learning module.

As described above, achieving the goals G1 and G2 requires the support of a knowledge base KB, and the content of KB must be populated by a machine learning module, either all by itself (i.e., unsupervised), sometimes with the assistance of human trainers (i.e., supervised)

Furthermore, this machine learning module is also pivotal for achieving goals G3 and G4.

These are discussed further in an upcoming post Knowledge Acquisition.

Crowd-driven knowledge acquisition

Building a knowledge base for supporting a HAI Holodeck requires huge amount of resources, even for a relatively small target domain. The resources are required in several areas:

  1. All probability distribution of the 3D objects in the target space that need to be acquired.
  2. All the minute details the 3D objects in the target space that need to be acquired.
  3. The unsupervised pre-training using video training examples.
  4. Supervised learning for understanding the numerous categorizations.
  5. What else?

While this part is not crucial for producing the first proof-of-concept HAI Holodeck, it is nonetheless vital to its long-term viability. It is helpful to think of the HAI system supported by a legion of domain experts who are able to make in-depth and persistent contributions to it, similar to what we see in the crowd-contributed Wikipedia, even if such domain experts know nothing about the underlying technology.

This topic is discussed in greater detail in another upcoming post The Crowd-driven Holodeck.

Putting things together

So far we have depicted a grand design with many empty stubs. Next let's dissect the basic user interaction flow as follows:

  1. User issues a request for a certain object.
    Here HAI must search through its knowledge base KB to find the a set of best candidates \(\{C_i\}\), possibly also needing to make some alteration of what's in the KB, and present them to the user for selection.
  2. User select one, C, and make suggestion (likely somewhat vague) on how to further modify C.
  3. HAI infer user's intention based on background knowledge from its KB to produce a updated candidate, and presents it to the user
  4. If user is satisfied then stop.
  5. Else user replies with an additional request to modify the candidate. Go to step 2 above.
###### Simple use case #1 ![](/content/images/2016/08/blocksworld.jpg#postHeaderImg) We will walk through a simplistic use case below. Which should help to expose some additional problem that we need to deal with. For now let's limit our target world to something really basic, as follows: 1. HAI has no problem understanding requests expressed in natural language. This is just so that we can focus on other areas for now. 1. **HAI acquires background knowledge about 3D objects from watching videos of blocks moving around and being stacked on top of each other, unsupervised.** For details, see the **machine learning module** section above. 1. HAI acquires knowledge about categorization labels (such as *twoet* for a stack more than 2 blocks tall, *row* for 2 or more blocks placed side by side, color information such as *red* block, etc.) through supervised learning. This requires what was learned from the unsupervised learning step above. 1. When given a request, such as **give me a red tower**, or **give me a row with blocks in all different colors**, then HAI will need to be able to offer something reason. 1. User may issue updated request, such as **make the block in the tower all different colors**, and HAI needs to be able to execute it. TO BE FILLED
Simple use case #1

The goal of this use case is for HAI to help a handyman create the 3D model of a simple 2-step staircase.

The dialog between the two might go like this:

  1. Handyman: HAI, give me a staircase.
  2. HAI: (showing the most common type, for lack of information) is this what you wanted?
  3. Handyman: I want the type suitable for a porch.
  4. HAI: (Showing a typical 4-step porch stair) How's this?
  5. Handyman: I need only two steps.
  6. HAI: (Showing a typical 2-step porch stair) How's this?
  7. Handyman: Better. I want the total height to be 22 inches, each step is 1 inch plank.
  8. HAI: (adjusts the spacing under each step to 10 inches) How's this?
  9. etc.

If we analyze this dialog in light of our approach earlier, we can see that:

  1. HAI's knowledge base needs to contain information such as:
    1. All kinds of stairs categorized by certain labels (e.g., straight stairs, winder stairs, stairs with intermediate landing, etc.), also the the typical types (e.g., most porch stairs have four steps), etc.
    2. Composition information, such as a stair has multiple steps, may have railings.
  2. HAI is able to manipulate sub-components individually.
  3. HAI has some capability for spatial reasoning, so it is capable of computing the dimensions of spacing etc from the given information.
  4. Even though we are deferring further discussion about knowledge acquisition process for now, it is clear that we definitely need to work out further details regarding knowledge representation.

These will be discussed further in separate posts.

TO BE FILLED

Summary

Here we have worked out a very rough skeleton for realizing this HAI Holodeck, and reduced the problem to a couple of core issues. As mentioned earlier, since we are taking a top-down design approach, our primary concern is in working out a suitable architecture, eventually down to the detail specifications for the major components, with the intention that some of the components may even be contributed by the research community.

There are still a great deal of important details that need to be worked out, in particular in the following areas:

  1. How to define a knowledge representation scheme for complex 3D objects and environments using CNN.
  2. How to represent 3D objects, in particular in a way that is conducive to machine reasoning and learning.
  3. How to acquire and accumulate knowledge.

To see further discussions on this How to build a Holodeck thread, read the following posts on this topic:

  1. (Upcoming) Knowledge representation
  2. (Upcoming) From CAPTCHA to Holodeck
  3. (Upcoming) User interface
  4. (Upcoming) Knowledge acquisition
  5. (Upcoming) Knowledge management
  6. (Upcoming) A Crowd-driven Holodeck
  7. (Upcoming) Putting everything together
  1. The terraAI Manifesto
  2. The terraAI Design Overview
  3. The terraAI Knowledge Management (upcoming)
  4. The Untold Story of Magic Leap, the World’s Most Secretive Startup
  5. You Can’t Walk in a Straight Line—And That’s Great for VR
  6. How objects are represented in human brain? Structural description models versus Image-based models
  7. Recurring Star Trek Holodeck Programs, Ranked, just in case that you wish to watch the Star Trek Holodeck programs again.
comments powered by Disqus