COMS W4172: 3D User Interfaces—Spring 2014

Prof. Steven Feiner
Date out: March 11, 2014
Date due: April 1, 2014

Assignment 3: Keeping Track

At this point, you've learned about some of the many ways to select and manipulate objects in 3D user interfaces. Now, it's time to try your hand at this yourself, using Unity and Vuforia. For this assignment, you will write your first augmented reality app. You will construct an augmented environment in which a user will be able to create and edit instances of virtual objects that can be translated, rotated, and scaled.

The Vuforia Developer Resources site will be an indispensable resource for this assignment. You will first need to download the Vuforia Unity Extension by selecting "Download Unity Extension 2.8.7 for Android & iOS" at the link in the previous sentence. You should install the package and build a simple project by following the guide under the "Getting Started - Installing the Vuforia SDK" section on the same page. You should also ensure you watch the following three tutorials on the Tutorials page before proceeding with the assignment: "ARCamera prefab in Unity," "Image Targets in Unity," and "Vuforia Play Mode for Unity." You will be using Image Targets in this assignment; therefore, it will be essential for you to read the Image Targets guide to understand how they work. (Please be sure to familiarize yourself with all of the documentation mentioned in this paragraph!)

All interaction in this assignment should be accomplished based on the relative position and orientation of tracked physical objects (Image Targets and your Android/iOS device), which can be optionally accompanied by triggers or modifiers specified using one or more button or touchscreen interactions. That is, you should not use the touchscreen for selection or manipulation, except insofar as you use it (a) to specify a vector relative to the camera, (b) as a trigger (e.g., tapping the screen to select) or (c) as a modifier (e.g., touching the screen to modify some behavior). Furthermore, triggers or 2D UI components should not be used for specifying the magnitude of a transformation; for example, you should not use a physical or virtual button to scale up/down in discrete steps or a 2D slider to set the scale.

Using the touchscreen as a trigger is optional because you may decide to use other means to trigger selection (e.g., using Image Targets). However, you are welcome to use the touchscreen or 2D UI components in any way you would like for debugging purposes (i.e., not as part of your final UI).


Objects and Instances

Your app should allow the user to create, delete, select, and transform any number of instances of at least four different virtual objects. At least two of the objects should be models created by you or by others (properly acknowledged in your documentation), which you load in from files. The other objects, if any, can be created directly in your Unity code. While you will specify the initial position, scale, and orientation of each instance, the user will be able to change it.

Your instances should reside within a coordinate system determined by a printout of an Image Target (we will call it the ground target), which you should place in your environment. When the tracking system recognizes the physical printout of the ground target within the camera image, it determines position and orientation of the camera relative to the ground target, and derives the geometric transformation that can be applied to instances that you wish to visually "attach" to that target. In addition, the camera image itself will need to be drawn as the background in the frame buffer to make this a video–see-through AR app. (The ARCamera prefab in the Vuforia package will handle this for you. You can read more about this in the Compiling a Simple Project guide on the Vuforia Resources site.) Since you are doing this assignment with a device containing an integrated camera and display, your app will be an example of what is sometimes called "magic lens" AR.


To select instances, you are welcome to implement any 3D selection technique discussed in class, or a variation on it. You are not required to support selecting more than one instance at a time. Your selection technique should use either (a) the 3D coordinate system of the device camera or (b) at least one Image Target (independent of the ground target, which we will call a "toolbar") whose position and orientation is controlled by the user. For example, you could use a 3D vector from the center of projection through some point on the viewport (device screen). Or, you could hold (or wear) a toolbar Image Target mounted on a card or wand, whose position and orientation are tracked to accomplish pointing, or whose position alone is tracked to create a virtual hand. Or, if you wore or mounted your device, you might hold (or wear) toolbar targets in or on both hands; for example, to perform two-handed ray pointing or a two-handed image-plane technique. These are only examples; you're encouraged to try other alternatives and to speak with us to get feedback on your ideas. In all cases, your 3D UI (just like a good 2D UI) should provide feedback to make it clear when you have selected an instance and what you have selected.


You should make it possible to move the selected instance to any desired location (i.e., with arbitrary x, y, and z coordinates) that is visible to the camera while the camera can view and track the ground target. You should make it possible for the user to translate an instance without affecting the instance's scale or orientation.


Your user should be able to scale the selected instance isotropically (i.e., uniformly scaling by the same user-specified factor along all three of the instance's principal axes). You are optionally invited to support anisotropic scaling (in which the user specifies scaling by different factors along one or more axes). You should make it possible for the user to scale an instance without affecting the instance's translation or orientation.


Your user should be able to rotate the selected instance to any desired orientation. (Depending on the approach you implement, you may need multiple interactions to rotate to an arbitrary orientation.)  Please make sure that that each instance's orientation is visually obvious, based on some combination of the instance's geometry and surface properties (e.g., texture). You should make it possible for the user to rotate an instance without affecting the instance's translation or scale.

Coordinated translation and rotation through seamless transition to another coordinate system

Your system should support some way in which an instance can be seamlessly transferred between the coordinate system defined by the ground target and a second coordinate system (and then back again). The second coordinate system can be defined by either (a) the device camera itself or (b) a toolbar Image Target. By seamless, we mean that the position and orientation of the instance as seen by the camera should not change during the transfer, except insofar as the physical ground target, camera, (and toolbar, if you use one) move relative to each other. In other words, if the ground target, camera (and toolbar, if you use one) are stationary relative to each other during the transition, a seamless transition would have absolutely no visible effect until there is relative motion. If you do not use a toolbar, once a transfer has occurred from the ground target to the camera, the ground target should no longer need to be visible to the camera for you to see the instance. If you use a toolbar, once a transfer has occurred, either from ground target to toolbar, or from toolbar to ground target, only the Image Target to which the instance is currently attached (ground target or toolbar) should need to be visible to the camera for the instance to be displayed relative to that Image Target.

Note that when the instance is transferred back to the ground target, its position and orientation relative to the ground target will have changed unless the ground target coordinate system and the second coordinate system have remained completely motionless relative to each other during the entire time that the instance has been transferred to the second coordinate system. Thus, this approach allows your user to simultaneously translate and rotate the selected instance.

(Please reread the last two paragraphs: If your program isn't doing what it says, you should talk to your TA.)


Sometimes an instance may be difficult to see and operate on in the ground coordinate system because of the instance's size or distance from the user, or because it is partially obscured by other instances. To address this, you should support a workspace, defined relative to a separate Image Target, distinct from the ground target. The user should be able to create a temporary copy of the currently selected instance within the workspace and manipulate (i.e., translate, scale, and rotate) it in isolation in the workspace, causing the original instance in the ground coordinate system to interactively mimic those manipulations as they are being performed.

To assist the user, you should also make it possible to scale, translate, and rotate the workspace coordinate system itself. For example, if the selected instance in the ground coordinate system is tiny, the user should be able to scale the workspace coordinate system, so that the copy of that instance in the workspace is larger and translating the copy in the workspace results in a proportionately smaller translation of the instance in the ground coordinate system.

To simplify your system, you do not need to support direct interaction within the ground coordinate system when there is a copy of an instance in the workspace.


To help your user understand how your selection and manipulation techniques work, you should provide two kinds of documentation: a written description and a video demonstration.

Written description

Write a brief document that describes how to use your system, including how to select, position, scale, and rotate instances, and how an instance can be transferred between coordinate systems. This should also present the rationale behind your choice of techniques, discussed in context of what we covered in class and in the readings. Please include screen shots in your description, integrating them into the document. There is no minimum length; however, your document should fully explain how your user interface and why you designed it that way. Your description should be submitted as a PDF file, although Word files will also be accepted.

Video demonstration

Create a narrated video demonstration (at most four minutes in length) that shows your system in action. Since each of you will do things differently, your goal here is to make it as easy as possible for us to understand how your system works by seeing it work, to minimize the time that it takes us to learn how to use it ourselves. If you don't show us some of the required functionality in your video, you can't assume that we will figure out how to do it on our own.

You can capture a video of your system by using a camcorder, webcam, or another Android/iOS device. (Note that if you capture a video directly from the screen of the device running your app, this could put an added load on the device, slowing it down, and would also prevent you from showing both the augmented view through your camera and a third-person view that includes you and the device you are using.)

Free video editing programs include Windows Movie Maker 2.6. (Steer clear of Windows Live Movie Maker, which has been significantly dumbed down.)

Please keep it simple: exotic visual and sound effects are neither needed nor desirable! Please choose a video format and codec (e.g., mp4) that will enable your file to be played by QuickTime, VLC Player, or Windows Media Player without the need for any additional downloads. You can also use a link to a video-sharing website such as YouTube or Vimeo, as long as it is clear when the video was posted. (In that case, please provide the links, as well as any additional information, such as a password, needed to view your video.)

What to submit

Your submission should include your complete Unity project (but, no executables or Xcode project directories), your written description, and your video demonstration. It is your responsibility to make sure that any file you submit is virus-free. Note, again, that any screen captures should be integrated into your written description, and not included as separate images. Each file should include your name and UNI at the beginning.

Your submission should include:

  1. A .zip (or other archive) of your entire Unity project
  2. Graphics files (.png, .tif, or .pdf) corresponding to the Image Targets you used in the project. This will allow your TA to print copies of your Image Targets and run your app. (Please be sure that the files you provide are exactly those used when testing.)
  3. Your written description.

  4. Your video demonstration. We strongly recommend you upload your video to a video-hosting site, a file-hosting site, or as a separate upload to your CourseWorks Dropbox for 4172.

  5. A README file with the following information:
    1. Your name and UNI.
    2. Date of submission
    3. Computer Platform
    4. Mobile Platform, OS Version, and Device name
    5. Project title
    6. Project directory overview
    7. Special Instructions, if any, for deploying your app
    8. Special Instructions, if any, for preparing your targets
    9. Location of video.
    10. Instructions for using your app
    11. Missing features
    12. Bugs in your code and Unity
    13. Asset sources

How to submit

Please compress all files in your submission (with the exception of the video) into a second parent archive file named [YourUNI]_assignment3.[zip|tar|gz|rar] where YourUNI is your Columbia UNI. Remember to include all the items listed above. Remove any large extraneous files (e.g., raw video footage or screen captures) that will bloat your submission.

Please verify that you can run your executable code by first extracting a copy of this archive to a location on your computer outside the directory tree where you did your development. Then, Build from within Unity and attempt to deploy to your mobile device. Similarly, if you're uploading your video, then extract and play it, or if you're hosting your video on a video site, make sure it's playable through the URL you provide.

Submission will be done through CourseWorks. Here are the steps:

  1. Log into CourseWorks.
  2. Select Drop Box from the left hand navigation pane.
  3. Expand the COMSW4172_001_2014_1 Drop Box folder. You should see a folder with your own name (if you do not, please email the TA).
  4. Select Upload Files in the drop-down list to the right of your name.
  5. The Upload Files page will load. Choose your archive using the browse dialog window that appears after pressing "Choose File."
  6. After choosing your project, enter the display name.  It should have the same name as the project archive, following the convention described above.
  7. Press "Upload Files Now."

Please try to submit before the deadline since CourseWorks can sometimes become busy and slow. You may use up to the number of late days you have left on this assignment, but remember that there is one additional assignment still to come besides the final project.


Your grade for this project will be determined in part by the quality of the user interface that you implement, evaluated using the heuristics you applied in Assignment 2:

Be sure that after moving an instance to a new location, it will be possible to select it again using your selection technique!

Enabling Extended Tracking can make it possible to work in a much larger volume.

Depending upon the size of the Image Targets you are using in this assignment, the angle with which they are viewed, and the lighting, camera position and orientation computed by the tracking software can change noticeably from one frame to another, resulting in "jitter." Consequently, the greater the distance between a vertex of an instance "attached" to an Image Target and the Image Target itself, the greater will be the variation in that vertex's position from one frame to another. For example, if you are implementing pointing with a visible "ray" emanating from an Image Target, this will be more evident in the jittering of the distant tip of the ray than in the base of the ray. In general, your scene will be more stable if (a) there are more completely unobscured, clearly viewed features that are seen simultaneously by your camera, (b) the vertices in your scene are closer to the Image Targets that define their positions, (c) the Image Targets are closer to the camera, and the (d) the Image Targets are not viewed from an extreme glancing angle.

Whatever you do, let the tracking software determine the position and orientation of the ground target and any other targets. That is, you should not require the physical camera to be at a hardwired position and orientation relative to any target.

Tracking with optical targets is sensitive to lighting and the quality and rigidity of the printed target. A dirty, crumpled, folded, or curled target will significantly decrease the quality with which you can track.

Avoid camera poses that view targets from oblique glancing angles that can cause the ink in the targets to appear shiny. Ensure that your Image Targets are perfectly flat. (You will get the best results if you glue them to cardboard or other firm material.) Please see the Image Targets guide on the Vuforia Resources site to learn how to create high quality targets.

You'll get the best results working in a brightly lit environment.

If you're having trouble implementing your desired interactions with targets, consider temporarily implementing "backup" debug interactions that use the touchscreen or 2D GUI control panels to verify the non-optical-tracking aspects of your code (e.g., your scene graph or transformations). This will help isolate problems with general program logic, scene graph design, and 3D transformations that you might erroneously attribute to optical tracking.