At this point, you've learned about some of the many ways to select and manipulate objects in 3D user interfaces. Now, it's time to try your hand at this, using GoblinXNA and its interface to optical marker tracking on a Nokia Lumia 800 or 920 phone running Windows Phone 7.5 or 8. For this assignment, you will write your first augmented reality application. You will create an environment containing a set of virtual and real objects, and make it possible for a user to select any virtual object, move it to a location that they specify, and scale or rotate it.
Goblin XNA Tutorial 8 and Tutorial 16 demonstrate how to use marker-based tracking and will be indispensable resources for this assignment. You should ensure you can run and understand these tutorials (i.e., by printing your own paper markers, following the hints about tracking at the end of this assignment, and trying the tutorials) before proceeding with your own design.
All interaction in this assignment should be accomplished based on the relative position and orientation of tracked objects (fiducial markers and the smartphone), which can be optionally accompanied by triggers or modifiers specified using one or more button or touchscreen interactions. That is, you should not use the touchscreen for selection or manipulation, except insofar as you use it (a) to specify a vector relative to the camera, (b) as a trigger (e.g., tapping the screen to select) or (c) as a modifier (e.g., touching the screen to modify some behavior). Furthermore, triggers or 2D UI components should not be used for specifying the magnitude of a transformation; for example, you should not use a physical or virtual button to scale up/down in discrete steps or a 2D slider to set the scale.
Using the touchscreen as a trigger is optional because you may decide to use other means to trigger selection (e.g., using markers). However, you are welcome to use the touchscreen or 2D UI components in any way you would like for debugging purposes (i.e., not as part of your final UI).
Your environment should contain at least four selectable virtual objects. At least two of the objects should be models created by you or by others (properly acknowledged in your documentation), which you load in from a file. The other objects, if any, can be created directly in your GoblinXNA code. While you will specify the initial position, scale, and orientation of each selectable object, the user will be able to change it.
Your objects should reside within a coordinate system determined by a printout of an array of markers (we will call it the ground array), which you should place in your environment. When the tracking system recognizes the physical printout of the ground array within the camera image, it determines the geometric transformation that can be applied to objects that you wish to visually "attach" to that array. In addition, the camera image itself should be drawn as the background in the frame buffer to make this a video–see-through AR application. (Tutorial 8 shows how to do this.) Since you are doing this assignment with a smartphone containing an integrated camera and display, your application will be an example of what is sometimes called "magic lens" AR.
You are welcome to implement any 3D selection technique discussed in class, or a variation on it. You are not required to support selecting more than one object at a time. Your selection technique should use either (a) the 3D coordinate system of the smartphone camera or (b) at least one "toolbar" array of one or more fiducial markers (independent of the ground array) whose position and orientation is controlled by the user. For example, you could use a 3D vector from the center of projection through some point on the viewport (smartphone screen). Or, you could hold (or wear) a toolbar mounted on a small card or wand, whose position and orientation are tracked to accomplish pointing, or whose position alone is tracked to create a virtual hand. Or, if you wore or mounted the phone, you might hold (or wear) toolbars in or on both hands; for example, to perform two-handed ray pointing or a two-handed image-plane technique. These are only examples; you're encouraged to try other alternatives and to speak with us to get feedback on your ideas. In all cases, your 3D UI (just like a good 2D UI) should provide feedback to make it clear when you have selected an object and what you have selected.
You should make it possible to move the selected object to any desired location (i.e., with arbitrary x, y, and z coordinates) that is visible to the camera while the camera can view and track the ground array. You should be able to translate an object without affecting the object's scale or orientation.
Your user should be able to scale the selected object isotropically (i.e., uniformly scaling by the same user-specified factor along all three of the object's principal axes). You are optionally invited to support anisotropic scaling (in which the user specifies scaling by different factors along one or more axes). You should be able to scale an object without affecting the object's translation or orientation.
Your user should be able to rotate the selected object to any desired orientation. (Depending on the approach you implement, you may need multiple interactions to rotate to an arbitrary orientation.) Please make sure that that each object's orientation is visually obvious, based on some combination of the object's geometry and surface properties. You should be able to rotate an object without affecting the object's translation or scale.
Your system should support some way in which an object can be seamlessly transferred between the coordinate system defined by the ground array and a second coordinate system (and then back again). The second coordinate system can be defined by either (a) the smartphone camera itself or (b) a second "toolbar" array of one or more markers. By seamless, we mean that the position and orientation of the object as seen by the camera should not change during the transfer, except insofar as the physical ground array, camera, (and toolbar, if you use one) move relative to each other, not counting precision problems in the matrices returned by the tracking software. In other words, if the ground array, camera (and toolbar, if you use one) are stationary relative to each other during the transition, a seamless transition would have absolutely no visible effect until there is relative motion. If you do not use a toolbar, once a transfer has occurred from the ground array to the camera, the ground array should no longer need to be visible to the camera for you to see the object. If you use a toolbar, once a transfer has occurred, either from ground array to toolbar, or from toolbar to ground array, only the marker array to which the object is currently attached (ground array or toolbar) should need to be visible to the camera for the object to be displayed relative to that array.
Note that when the object is transferred back to the ground array, its position and orientation relative to the ground array will have changed unless the ground array coordinate system and the second coordinate system have remained completely motionless relative to each other during the entire time that the object has been transferred to the second coordinate system.
(Please reread the last two paragraphs: If your program isn't doing what it says, you should talk to a TA.)
To help your user understand how your selection and manipulation techniques work, you should provide two kinds of documentation: a written description and a video demonstration.
Write a brief document that describes how to use your system, including how to select, position, scale, and rotate objects, and how an object can be transferred between coordinate systems. This should also present the rationale behind your choice of techniques, discussed in context of what we covered in class and in the readings. Please include screen shots in your description, integrating them into the document. There is no minimum length; however, your document should fully explain how your user interface and why you designed it that way. Your description should be submitted as a PDF file, although Word files will also be accepted.
Create a narrated video demonstration (at most four minutes in length) that shows your system in action. Since each of you will do things differently, your goal here is to make it as easy as possible for us to understand how your system works by seeing it work, to minimize the time that it takes us to learn how to use it ourselves. If you don't show us some of the required functionality in your video, you can't assume that we will figure out how to do it on our own.
You can capture a video of your system by using a camcorder, webcam, or another smartphone. (Note that the worst way to capture a video would be with software that captures directly from the screen of the smartphone running your application. This puts an added load on the smartphone, slowing it down, and also prevents you from showing both the augmented view through your camera and a third-person view that includes you and the device you are using.)
Free video editing programs include Windows Movie Maker 2.6. (Steer clear of Windows Live Movie Maker, which has been significantly dumbed down.)
Please keep it simple: exotic visual and sound effects are neither needed nor desirable! Please choose a video format and codec (e.g., mp4) that will enable your file to be played by QuickTime, VLC Player, or Windows Media Player without the need for any additional downloads. You can also use a link to a video-sharing website such as YouTube or Vimeo, as long as it is clear when the video was posted. (In that case, please provide the links, as well as any additional information, such as a password, needed to view your video.)
Your submission should include all of your code, and your Visual Studio project files, your written description, and your video demonstration. It is your responsibility to make sure that any file you submit is virus-free. Note, again, that any screen captures should be integrated into your written description, and not included as separate images. Each file should include your name and UNI at the beginning.
Your submission should include:
Your written description.
Please compress all files in your submission into a second parent archive file named [UNI]_assignment3.zip[.tar|.gz|.rar] where UNI is your Columbia UNI. Remember to include all the items listed above. Remove any large extraneous files (e.g., raw video footage or screen captures) that will bloat your submission.
Please verify that you can run your executable code by first extracting a copy of this archive to a location on your computer outside the directory tree where you did your development. Then, deploy your .xap file to the phone and run it. Similarly, if you're uploading your video, then xtract and play it, or if you're hosting your video on a video site, make sure it's playable through the URL you provide.
Submission will be done through CourseWorks. Here are the steps:
Please try to submit before the deadline since CourseWorks can sometimes become busy and slow. You may use up to the number of late days you have left on this assignment, but remember that there is one additional assignment still to come besides the final project.
Be sure that after moving an object to a new location, it will be possible to select it again using your selection technique!
Since the fiducials you are using in this assignment are relatively small and your camera is relatively low resolution, the matrices computed by the tracking software can change noticeably from one frame to another, especially in orientation, resulting in "jitter." Consequently, the greater the distance between a vertex of a virtual object "attached" to a fiducial and the fiducial itself, the greater will be the variation in that vertex's position from one frame to another. For example, if you are implementing pointing with a visible "ray" emanating from an array of fiducials, this will be more evident in the jittering of the distant tip of the ray than in the base of the ray. In general, your scene will be more stable if (a) there are more completely unobscured, clearly viewed fiducials that are seen simultaneously by your camera, (b) the vertices in your scene are closer to the fiducials that define their positions, (c) the fiducials are closer to the camera, and the (d) the fiducials are not viewed from an extreme glancing angle.
Whatever you do, let the tracking software determine the position and orientation of the ground array and any other marker arrays. That is, you should not require the physical camera to be at a hardwired position and orientation relative to any marker arrays.
Tracking with optical markers is sensitive to lighting and the quality and rigidity of the printed marker array. A crumpled, folded, or curled array will significantly decrease the quality with which you can track.
Avoid camera poses that view fiducials from oblique glancing angles that can cause the black ink in the fiducials to appear shiny. Ensure your fiducials have ample white borders (i.e., don't crop them to the edge of the bitmap pattern) and are perfectly flat. (You will get the best results if you glue them to cardboard or other firm material.)
You'll get the best results working in a brightly lit environment.
If you are having trouble implementing your desired interactions with markers, consider temporarily implementing "backup" debug interactions that use the touchscreen, or 2D GUI control panels to verify the non-optical-tracking aspects of your code (e.g., your scene graph or transformations). This will help isolate problems with general program logic, scene graph design, and 3D transformations that you might erroneously attribute to marker tracking.