Computer Graphics Notes - Unknown

Computer Graphics
View more...
   EMBED

Share

Preview only show first 6 pages with water mark for full document please download

Transcript

What is Computer Graphics (CG) The generation of graphical images using a computer, as opposed to "image processing" which manipulates images that are already in the computer. Creating a frame of "Toy Story" or "Jurassic Park" is computer graphics; Comparing an image of a face from an ATM camera against a database of known criminal mugshots is image processing. Note that the line between the two can sometimes be hazy, and a given task may require both sets of skills. Mathematics + computer science + art = computer graphics rendering of images on a device. Rendering - creating images from models models - objects constructed from geometric primitives (points, lines, polygons) specified by their vertices Models exist in n-dimensional 'mathematically pure' space o o n typically 2 or 3 n can be > 3 with scientific data Rendered version typically created on physical 2D media (e.g. a video screen.) Rendered version can be simple or complex (lighting, shadows, colours, texture) Rendering a single image can take from a small fraction of a second (say, a frame from 'Unreal') to hours or days (say, a frame from 'Toy Story') depending on the complexity of the scene, the amount of processing power available, and the needs of the user. Common Uses            Movies, such as Toy Story, Who Framed Roger Rabbit, The Hollow Man, Shrek, Monsters Inc, Jurassic Park, & The Perfect Storm Advertisements Football game annotations. scientific/medical visualization CAD/CAM multimedia computer interfaces (Windows, X, Aqua) virtual reality special effects artistic expression way cool video games Software Many application programs available to produce computer graphics, either as 2D images, 3D models, or animated sequences (Corel Draw, Photoshop, AutoCAD, Maya, SoftImage, etc.) We will deal with the lower level routines which do the work of converting models into a displayable form on the display device. Several 'common' graphics languages/libaries/APIs (Application Programming Interfaces.)      GKS DirectX X Postscript OpenGL We will be using OpenGL in this course on the linux machines in the CS Computer Graphics lab to give a common grading platform. OpenGL is availble for all the major platforms, and is accelerated on almost all current graphics cards, but is not necessarily available on all of the machines here in the university. If you want to work on your machine at home you should be able to get OpenGL libraries for it for free. Otherwise there is Mesa. Mesa is virtually identical to OpenGL, is free, and runs on a wider variety of platforms. For more information on Mesa you can check out: http://www.mesa3d.org . The only thing that should need to change to compile your code here is the Makefile. Mesa, Codeblocks(Cross Platform) or VC++ like OpenGL, is usually accessed through function calls from a C or C++ program. Evolution of Video Display Hardware:  Text generated graphics o Series of printed characters and spaces o For example a simple plot of a sine functions. o Some density images can be generated ( . versus # ) o + Fast, simple, and ( relatively ) easy o + Requires no special hardware or libraries o - Monochrome and crude. 1.0 | * | * | * | * 0.5 | * | * | * | * 0.0 +---*---------| * | * | *  Special graphics characters o Uses the extended ASCII character set, 128 to 255 o Special symbols mapped to extended ASCII codes. o + Fast, easy, & simple o +Works reasonably well for line boxes, smiley faces, etc. o +Allows for color, & reverse video   Vector (calligraphic) displays o lines drawn directly, no predefined grid o commands tell the electron gun where to move and when to turn on/off o + lines are smooth o + close to the 'pure mathematics' of the model o - slower with more elements to be drawn, can start to flicker o - only lines possible, no filled polygons, or bitmaps o - monocrome for each electron gun Raster displays o image represented by a rectangular grid of pixels (picture elements) o image stored in a frame buffer o electron gun(s) continually scanning in a regular pattern (line by line across entire screen) o + constant time to redraw any number of elements o + no flicker o - jagged edges from conversion to pixels o - discretized version of the model You need to keep redrawing the image on the screen to keep it from fading away. Vector displays redraw as quickly as possible given the number of objects on the screen; CRT based raster displays redraw the image (or refresh the screen) at a fixed rate (e.g. 60 times per second) no matter how complex the scene. For those who spent their early youth in the arcades, vector games included: Asteroids Battlezone Lunar Lander Star Trek Star Wars Tempest Initially these games were monocrome (white, green, or some other single color), as in asteroids, then coloured filters were used to colour different parts of the screen using the same monocrome electron gun as in Battlezone, and finally when RGB electron guns were cheap enough, true multi-colour vector games were possible. Buffers in Raster Displays Pretty much all CG done using raster displays. The screen is represented by a 2D array of elements Frame buffer - array of computer memory used to store an image to be displayed The user manipulates the values in the frame buffer. 60 times a second (or at some other fixed rate) the frame buffer is copied onto the display device. If video screen is 512 pixels wide by 512 pixels tall the frame buffer must be able to store 512 X 512 elements ... one element for each pixel on the screen monocrome display: 512 x 512 x 1bit (bit is either 0=off, or 1=on.) Each of the 512 x 512 pixels can be either on or off (white or black) 32768 bytes total 8 bit greyscale display: 512 x 512 x 8bit (each pixel is 8 bits deep so values 0-255 are possible.) Each of the 512 x 512 pixels can be one of 256 shades of grey (from black to white.) 262,144 bytes total 24 bit colour display: 512 x 512 x 24bit (each pixel has 8 bits for red, 8 bits for green, and 8 bits for blue.) Each pixel can be black->bright red (0-255) combined with black->bright green (0-255) combined with black->bright blue (0-255) 786,432 bytes total Each of the 512 x 512 pixels can be one of 16 million colours note however, that a 512 x 512 display has only 262,144 pixels so only 262,144 colours can be displayed simultaneously. A 1280 x 1024 display (common workstation screen size) has only 1,310,720 pixels, far fewer than the 16,000,000 possible colours. ( 3.75 MB for this configuration ) 8 bit colour display using a colour map: want benefits of 24 bit colour with only 8 bit display 512 x 512 x 8bit (each pixel is 8 bits deep so values 0-255 are possible.) Each of the 512 x 512 pixels can be one of 256 index values into a video lookup table. video lookup table has 256 24bit RGB values where each value has 8 bits for red, 8 bits for green, and 8 bits for blue. 16 million colours are possible, but only 256 of them can be displayed at the same time. Memory needs are 512x512x1byte plus 256x3bytes = 262,912 bytes, much less than the 786,432 needed without the colour map. Here are 2 sample colourmaps each with 256 24bit RGB values: depth of frame buffer (e.g. 8 bit) determines number of simultaneous colours possible width of colour map (e.g. 24 bit) determines number of colours that can be chosen from Size of various frame buffers: screen size Monocrome 8-bit 24-bit 512 X 512 32K 256K 768K 640 X 480 38K 300K 900K 1280 X 1024 160K 1.3M 3.8M Hardware affects on computer animation A program may take several minutes, hours, or days to render a single image. These images can then be stored and played back or recorded to create animation A more interesting situation is when the animation is live with the computer generated image changing while the user watches, or as the user interacts with the computer. To do the more interactive kind of animation you really need a SECOND frame buffer ... this is similar to how a motion picture works. Movie theatre projectors show a new image 24 times per second in the following sequence; so the patron sees only the moving images, not the movement of the film through the projector: 1. A new frame is moved in front of the lens 2. the shutter is opened to display the frame 3. the shutter is closed In fact, at the movies you actually spend more time looking at a black screen than you do looking at an image. Now that we are starting to see digital cinema, where there is no physical film to move, which will move film more towards the way computers display images. Similar problem in computer graphics. In the case of a film, each of the frames has already been generated and its just a question of showing them; whereas in interactive computer graphics the frames are being drawn one at a time. The frame buffer is regularly sent to the display device (eg 60 times per second) whether the frame buffer has been completely updated or not. We don't want the user to see the screen being cleared and the new image being drawn on the screen each time the display refreshes. We only want the user to see a sucession of completely drawn images. Solution is to have two frame buffers. User draws into one buffer while the other is being displayed. When drawing is complete the buffers are swapped. In OpenGL / Glut this is called "double buffering". It is set up with the GLUT_DOUBLE flag when initializing the display, and then the buffers are swapped with "glutSwapBuffers( )". We can only swap buffers in between screen refreshes - when electron gun of the CRT is off and moving from bottom of screen to the top. Note that different CRT monitors refresh at different rates. Also note that as we move more towards LCD displays and away from CRT displays that the hardware changes, but there are still timing constraints. In the case of LCDs the refresh rate is only important when the images are changing. For more info on the two types of displays the following link gives a nice description: http://www.ncdot.org/network/developers/guide/display_tech.html If you can clear the frame buffer and the draw scene in less than 1/60th second you will get 60 frames per second. If it takes longer than 1/60th of a second to draw the image (e.g. 1/59th of a second) then by the time you are ready to swap frame buffers the electron gun is already redrawing the current frame buffer again, so you must wait for it to finish before swapping buffers (e.g. 1/30th of a second after the last swap buffers if the frame takes 1/59th of a second.) If the screen refreshes 60 times a second you can have a new image on the screen:       1/60th of a second ... 60 frames per second if all frames take equal time 1/30th of a second ... 30 frames per second if all frames take equal time 1/20th of a second ... 20 frames per second if all frames take equal time 1/15th of a second ... 15 frames per second if all frames take equal time 1/12th of a second ... 12 frames per second if all frames take equal time 1/10th of a second ... 10 frames per second if all frames take equal time One of the most obvious implications of this is that a very small change in the time it takes to draw a scene into the frame buffer can have a major impact on the speed of the application. Can your program run at 45 frames per second (fps)? Yes. if 30 frames take 1/60th each and the next 15 frames take 1/30th each you will display 45 frames per second. For smooth motion you want at least 10 frames per second, and preferably more than 15 frames per second. More is better. Mathematics VS Engineering We like to think about a scene as mathematical primitives in a world-space. This scene is then rendered into the frame buffer. This allows a logical separation of the world from the view of that world. mathematically, points are infinitely small mathematically, line segments are infinitely thin these mathematical elements need to be converted into discrete pixels as usual, there is an obvious easy way of doing these conversions, and then there is the way it is actually done (for efficiency.) Scan Conversion (rasterization) of a line ( Foley 3.2 ) take an analytic (continuous) function and convert (digitize) it so appropriate pixels can be illuminated in the frame buffer In a language such as OpenGL a programmer can generate a 2D line segment in world-space with code like the following: glBegin(GL_LINE_STRIP); glVertex2f(1.5, 3.0); glVertex2f(4.0, 4.0); glEnd(); Polygons are single closed loops of line segments, usually drawn with their interiors filled. In a language such as OpenGL polygons are very restricted to improve speed:   edges can not intersect (simple polygons) polygon must be convex, not concave To generate the outline of a triangular 2D polygon in world-space using OpenGL a programmer can write code like the following: glBegin(GL_LINE_LOOP); glVertex2f(1.5, 3.0); glVertex2f(4.0, 4.0); glVertex2f(4.0, 1.0); glEnd(); To generate a filled triangular 2D polygon in world-space using OpenGL a programmer can write code like the following: glBegin(GL_POLYGON); glVertex2f(1.5, 3.0); glVertex2f(4.0, 4.0); glVertex2f(4.0, 1.0); glEnd(); We will not limit ourselves to these 'easier' polygons. Note that large complex objects are often reduced down to a large number of triangles ( i.e. triangulated ), for a number of reasons:     Triangles are guaranteed to be simple convex polygons. All points in a triangle are guaranteed to be co-planar. Most importantly, modern computer graphics hardware is often optimized to generate large numbers of triangles very quickly, so it is actually faster to break a complex shape down and then deal with a kazillion triangles than to deal with a smaller number of more complicated shapes. ( The degenerate case, of three co-linear points is usually not a major problem. ) How are line segments and polygons in world-space converted into illuminated pixels on the screen? First these coordinates in world-space must be converted to coordinates in the viewport (ie pixel coordinates in the frame buffer.) This may involve the conversion from a 2D world to a 2D frame buffer (which we will study in a couple weeks), or the reduction from a 3D world to a 2D frame buffer (which we will study a couple weeks later.) Then these coordinates in the viewport must be used to draw lines and polygons made up of individual pixels (rasterization.) This is the topic we will discuss now. Most of the algorithms in Computer Graphics will follow the same pattern below. There is the simple (braindead) algorithm that works, but is too slow. Then that algorithm is repeatedly refined, making it more complicated to understand, but much faster for the computer to implement. Braindead Algorithm given a line segment from leftmost (Xo,Yo) to rightmost (X1,Y1): Y=mX+B m = deltaY / deltaX = (Y1 - Yo) / ( X1 - Xo) Assuming |m| <= 1 we start at the leftmost edge of the line, and move right one pixel-column at a time illuminating the appropriate pixel in that column. start = round(Xo) stop = round(X1) for (Xi = start; Xi <= stop; Xi++) illuminate Xi, round(m * Xi + B); Why is this bad? Each iteration has:     comparison fractional multiplication 2 additions call to round() Addition is OK, fractional multiplication is bad, and a function call is very bad as this is done A LOT. So we need more complex algorithms which use simpler operations to decrease the speed. Why is the slope (m) important? if m=1 then each row and each column have a pixel filled in if 0 <= m< 1 then each column has a pixel and each row has >= 1, so we increment X each iteration and compute Y. if m > 1 then each row has a pixel and each column has >= 1, so we increment Y each iteration and compute X. Simple Incremental Algorithm ( Foley 3.2.1 ) The basic improvement of this algorithm over the purely braindead one is that instead of calculating Y for each X from the equation of a line ( one multiplication and one addition ), we will calculate it from the previous Y by just adding a fixed constant ( one addition only ). This works because the delta change in X from one column to the next is known to be exactly 1. given a line segment from leftmost (Xo,Yo) to rightmost (X1,Y1): Y=mX+B m = deltaY / deltaX = (Y1 - Yo) / ( X1 - Xo) if deltaX is 1 -> deltaY is m so if Xi+1 = Xi + 1 -> Yi+1 = Yi + m Assuming |m| <= 1 we start at the leftmost edge of the line, and move right one pixel-column at a time illuminating the pixel either in the current row or an adjacent row. starting at the leftmost edge of the line: X = round(Xo) Y = Yo while (X <= X1) repeatedly illuminate X, round(Y) add 1 to X (moving one pixel column to the right) add m to Y This guarantees there is one pixel illuminated in each column for the line If |m| > 1 then we must reverse the roles of X and Y, incrementing Y by 1 and incrementing X by 1/m in each iteration. Horizontal and vertical lines are subsets of the 2 cases given above. need such a common, primitive function to be VERY fast. features:     + incremental - rounding is a time consuming operation(Y) - real variables have limited precision, can cause a cumulative error in long line segments (e.g. 1/3) - Y must be a floating point variables Midpoint Line Algorithm ( Foley 3.2.2 ) This algorithm follows from the previous one, and further takes advantage of the observation that for line slopes between 0 and 1, the change in Y from one column to the next will be either 0 or 1. This algorithm requires no rounding, no floating point numbers, and no multiplications. given a line segment from leftmost (Xo,Yo) to rightmost (X1,Y1): Y=mX+b m = deltaY / deltaX = (Y1 - Yo) / ( X1 - Xo) assuming Xo,X1,Yo,Y1 are integers Assuming 0 <= m <= 1 we start at the leftmost edge of the line, and move right one pixel-column at a time illuminating the pixel either in the current row (the pixel to the EAST) or the next higher row (the pixel to the NORTHEAST.) Y=mX+B m = deltaY / deltaX = (Y1 - Yo) / ( X1 - Xo) can rewrite the equation in the form: F(X,Y) = ax + by + c = 0 Y = (deltaY / deltaX) * X + B 0 = (deltaY / deltaX) * X - Y + B 0 = deltaY * X - deltaX * Y + deltaX * B F(X,Y) = deltaY * X - deltaX * Y + deltaX * B so for any point (Xi,Yi) we can plug Xi,Yi into the above equation and F(Xi,Yi) = 0 -> (Xi,Yi) is on the line F(Xi,Yi) > 0 -> (Xi,Yi) is below the line F(Xi,Yi) < 0 -> (Xi,Yi) is above the line Given that we have illuminated the pixel at (Xp,Yp) we will next either illuminate the pixel to the EAST (Xp+ 1,Yp) or the pixel to the NORTHEAST (Xp+ 1,Yp+ 1) To decide we look at the Midpoint between the EAST and NORTHEAST pixel and see which side of the midpoint the line falls on. line above the midpoint -> illuminate the NORTHEAST pixel line below the midpoint -> illuminate the EAST pixel line exactly on the midpoint -> CHOOSE TO illuminate the EAST pixel We create a decision variable called d We plug the Midpoint into the above F() for the line and see where the midpoint falls in relation to the line. d = F(Xp+1,Yp+0.5) d > 0 -> pick NORTHEAST pixel d < 0 -> pick EAST pixel d = 0 -> ***CHOOSE*** to pick EAST pixel That tells us which pixel to illuminate next. Now we need to compute what Midpoint is for the next iteration. if we pick the EAST pixel Midpoint is incremented by 1 in X and 0 in Y We want to compute the new d without recomputing d from the new Midpoint We want to compute the new d only using current d dcurrent= F(Xp + 1,Yp + 0.5) using the function F(X,Y) = deltaY * X - deltaX * Y + deltaX * B we can expand this out ... dcurrent= deltaY * (Xp + 1) - deltaX * (Yp + 0.5) + deltaX * B dnew = F(Xp + 2, Yp + 0.5) dnew = deltaY * (Xp + 2) - deltaX * (Yp + 0.5) + deltaX * B when you simplify it you end up with: dnew = dcurrent + deltaY so we create a new variable called deltaE where deltaE = deltaY if we pick the NORTHEAST pixel Midpoint is incremented by 1 in X and 1 in Y We want to compute the new d without recomputing d from the new Midpoint, only using current d We want to compute the new d only using current d dcurrent= F(Xp + 1,Yp + 0.5) using the function F(X,Y) = deltaY * X - deltaX * Y + deltaX * B we can expand this out ... dcurrent= deltaY * (Xp + 1) - deltaX * (Yp + 0.5) + deltaX * B dnew = F(Xp + 2, Yp + 1.5) dnew = deltaY * (Xp + 2) - deltaX * (Yp + 1.5) + deltaX * B when you simplify it you end up with: dnew = dcurrent + deltaY - deltaX so we create a new variable called deltaNE where deltaNE = deltaY - deltaX initial point (Xo,Yo) is known so initial M is at (Xo + 1, Yo + 0.5) so initial d = F(Xo + 1,Yo + 0.5) using the function F(X,Y) = deltaY * X - deltaX * Y + deltaX * B we can expand this out ... = deltaY * (Xo + 1) - deltaX * (Yo + 0.5) + deltaX * B = (deltaY * Xo - deltaX * Yo + deltaX * B) + deltaY - 0.5 * deltaX = F(Xo,Yo) + deltaY - 0.5 * deltaX since (Xo,Yo) is on the line -> F(Xo,Yo) = 0 so initial d = deltaY - deltaX / 2 the divion by 2 is still annoying, but we can remove it by being clever we can avoid the division by 2 by multiplying F() by 2 this also multiplies d, deltaE, deltaNE by 2 but since d is only concerned with =0,< 0, or > 0 multiplication does not affect it So now we can finally show the Midpoint Line algorithm Assuming integral endpoints for the line segment (if not then make them integral) starting at the leftmost edge of the line: deltaX = X1 - Xo deltaY = Y1 - Yo d = deltaY * 2 - deltaX deltaE = deltaY * 2 deltaNE = (deltaY - deltaX) * 2 X = Xo Y = Yo illuminate X, Y while (X < X1) repeatedly if ( d <= 0) add deltaE to d add 1 to X else add deltaNE to d add 1 to X add 1 to Y illuminate X, Y Drawing Circles The algorithm used to draw circles is very similar to the Midpoint Line algorithm. 8 way-symmetry - for a circle centered at (0,0) and given that point (x,y) is on the circle, the following points are also on the circle: (-x, y) ( x,-y) (-x,-y) ( y, x) (-y, x) ( y,-x) (-y,-x) So it is only necessary to compute the pixels for 1/8 of the circle and then simply illuminate the appropriate pixels in the other 7/8. given a circle centered at (0,0) with radius R: R^2 = X^2 + Y^2 F(X,Y) = X^2 + Y^2 - R^2 We choose to work in the 1/8 of the circle (45 degrees) from x=0 (y-axis) to x = y = R/sqrt(2) (45 degrees clockwise from the y-axis.) so for any point (Xi,Yi) we can plug Xi,Yi into the above equation and F(Xi,Yi) = 0 -> (Xi,Yi) is on the circle F(Xi,Yi) > 0 -> (Xi,Yi) is outside the circle F(Xi,Yi) < 0 -> (Xi,Yi) is inside the circle Given that we have illuminated the pixel at (Xp,Yp) we will next either illuminate the pixel to the EAST (Xp + 1,Yp) or the pixel to the SOUTHEAST (Xp+ 1,Yp - 1) We again create a decision variable d set equal to the function evaluated at the midpoint (Xp+ 1,Yp - 0.5) d = F(Xp + 1,Yp - 0.5) We plug the midpoint into the above F() for the circle and see where the midpoint falls in relation to the circle. d > 0 (midpoint is outside) -> pick SOUTHEAST pixel d < 0 (midpoint is inside) -> pick EAST pixel d = 0 (midpoint is on circle) -> ***CHOOSE*** to pick SOUTHEAST pixel dcurrent = F(Xp + 1, Yp - 0.5) dcurrent = (Xp + 1)^2 + (Yp - 0.5)^2 - R^2 dcurrent = Xp^2 + 2Xp + 1 + Yp^2 - Yp + 0.25 - R^2 if the EAST pixel is chosen then: dnew = F(Xp + 2, Yp - 0.5) dnew = (Xp + 2)^2 + (Yp - 0.5)^2 - R^2 dnew = Xp^2 + 4Xp + 4 + Yp^2 - Yp + 0.25 - R^2 dnew - dcurrent = deltaE = 2Xp + 3 if the SOUTHEAST pixel is chosen then: dnew = F(Xp + 2, Yp - 1.5) dnew = (Xp + 2)^2 + (Yp - 1.5)^2 - R^2 dnew = Xp^2 + 4Xp + 4 + Yp^2 - 3Yp + 2.25 - R^2 dnew - dcurrent = deltaSE = 2Xp - 2Yp + 5 Unlike the algorithm for lines, deltaE and deltaSE are not constant. initial point (Xo,Yo) is known to be (0,R) so initial M is at (1, R - 0.5) so initial d = F(1, R - 0.5) = 1^2 + (R - 0.5)^2 - R^2 = 1 + R^2 - R + 0.25 - R^2 = 1.25 - R unfortunately while deltaSE and deltaE are integral, d is still a real variable, not an integer so: h = d - 0.25 h is initialized as 1 - R (instead of 1.25 - R) h is compared to as h < 0.25 (instead of d< 0) but since h starts off as an integer (assuming an integral R) and h is only incremented by integral amounts (deltaE and deltaSE) we can ignore the 0.25 and compare h < 0. X = 0; Y = radius; d = 1 - radius; draw8Points(X, Y); while(Y > X) if (d< 0) add 2 * X + 3 to d add 1 to X else add 2 * (X-Y) + 5 to d add 1 to X subtract 1 from Y draw8Points(X, Y); The full algorithm ( was ? ) given (in C) in the red book ( version ??? ) as program 3.4 on p.83. The full algorithm is given (in C) in the white book as figure 3.16 on p.86. This is still somewhat bad in that there is a multiplication to compute the new value of the decision variable. The book shows a more complicated algorithm which does this multiplication only once. ellipses F(x,y) = b^2 X^2 + a^2 Y^2 - a^2 b^2 = 0 are handled in a similar manner, except that 1/4 of the ellipse must be dealt with at a time and that 1/4 must be broken into 2 parts based on where the slope of the tangent to the ellipse is -1 (in first quadrant.) Clipping Since we have a separation between the models and the image created from those models, there can be parts of the model that do not appear in the current view when they are rendered. pixels outside the clip rectangle are clipped, and are not displayed. can clip analytically - knowing where the clip rectangle is clipping can be done before scan-line converting a graphics primitive (point, line, polygon) by altering the graphics primitive so the new version lies entirely within the clip rectangle can clip by brute force (scissoring) - scan convert the entire primitive but only display those pixels within the clip rectangle by checking each pixel to see if it is visible.      clipping a point against a rectangle -> nothing or single point clipping a line against a rectangle -> nothing or single line segment clipping a rectangle against a rectangle -> nothing or single rectangle o ( Assuming the rectangle is aligned. Otherwise treat as convex polygon. ) clipping a convex polygon against a rectangle -> nothing or single single convex polygon clipping a concave polygon against a rectangle -> nothing or 1 or more concave polygons as with scan conversion, this must be done as quickly as possible as it is a very common operation. Point Clipping point (X,Y) clipping rectangle with corners (Xmin,Ymin) (Xmax,Ymax) point is within the clip rectangle if: Xmin <= X<= Xmax Ymin <= Y<= Ymax Cohen-Sutherland Line Clipping ( Foley 3.12.3 ) given a line segment, repeatedly: 1. check for trivial acceptance both endpoints within clip rectangle 2. check for trivial rejection both endpoints outside clip rectangle IS NOT ENOUGH both endpoints off the same side of clip rectangle IS ENOUGH 3. divide segment in two where one part can be trivially rejected Clip rectangle extended into a plane divided into 9 regions each region is defined by a unique 4-bit string  left bit = 1: above top edge (Y > Ymax)    o left bit = sign bit of (Ymax - Y) 2nd bit = 1: below bottom edge (Y < Ymin) o 2nd bit = sign bit of (Y - Ymin) 3rd bit = 1: right of right edge (X > Xmax) o 3rd bit = sign bit of (Xmax - X) right bit = 1: left of left edge (X < Xmin) o right bit = sign bit of (X - Xmin) (the sign bit being the most significant bit in the binary representation of the value. This bit is '1' if the number is negative, and '0' if the number is positive.) The frame buffer itself, in the center, has code 0000. 1001 | 1000 | 1010 -----+------+----0001 | 0000 | 0010 -----+------+----0101 | 0100 | 0110 For each line segment: 1. each end point is given the 4-bit code of its region 2. repeat until acceptance or rejection 1. if both codes are 0000 -> trivial acceptance 2. if bitwise logical AND of codes is not 0000 -> trivial rejection 3. divide line into 2 segments using edge of clip rectangle 1. find an endpoint with code not equal to 0000 2. move left to right across the code to find a 1 bit -> the crossed edge 3. break the line segment into 2 line segments at the crossed edge 4. forget about the new line segment lying completely outside the clip rectangle The full algorithm ( was? ) given (in C) in the red book ( ??? Edition ) as program 3.7 on p.105. The full algorithm is given (in C) in the white book as figure 3.41 on p.116. Sutherland-Hodgman Polygon Clipping ( Foley 3.14.1 ) Unlike line-clipping where we selectively clipped against each edge, here we sucessively clip a polygon against all four edges of the clip rectangle given a polygon with vertices V1, V2, ... Vn and edges between vertices Vi and Vi+1, and from Vn to V1 for each of the four clipping edges repeatedly for each vertex V = Vn, V1, V2, ... Vn given an edge from vertex s to vertex p assume s has already been dealt with o o o o if s and p are both inside the clip rectangle -> output p if s is inside and p is outside the clip rectangle -> output i (the intersection of edge sp with the clip edge) if s and p are both outside the clip rectangle -> output nothing if s is outside and p is inside the clip rectangle -> output i (the intersection of edge sp with the clip edge) and then p output edges become new set of polygon edges Coordinate System "Handedness" In a 2-D coordinate system the X axis generally points from left to right, and the Y axis generally points from bottom to top. ( Although some windowing systems will have their Y coordinates going from top to bottom. ) When we add the third coordinate, Z, we have a choice as to whether the Z-axis points into the screen or out of the screen: Right Hand Coordinate System (RHS) Z is coming out of the page Counterclockwise rotations are positive if we rotate about the X axis : the rotation Y->Z is positive if we rotate about the Y axis : the rotation Z->X is positive if we rotate about the Z axis : the rotation X->Y is positive Left Hand Coordinate System (LHS) Z is going into the page Clockwise rotations are positive if we rotate about the X axis : the rotation Y->Z is positive if we rotate about the Y axis : the rotation Z->X is positive if we rotate about the Z axis : the rotation X->Y is positive so basically its the same thing ... The important thing to note is what coordinate system is being used by the package you are working with, both for the creation of models and the displaying of them. Also note that if the two packages use different coordinate systems, then the model(s) may need to be inverted in some fashion when they are loaded in for viewing. OpenGL generally uses a right-hand coordinate system. Multiple Coordinate Systems in a Graphics Program In a typical graphics program, we may need to deal with a number of different coordinate systems, and a good part of the work ( and the cause of many headaches ) is the conversion of coordinates from one system to another. We'll learn about the conversion process a little later, but in the meantime, here is a list of some of the coordinate systems you may encounter:       World Coordinate System - Also known as the "universe" or sometimes "model" coordinate system. This is the base reference system for the overall model, ( generally in 3D ), to which all other model coordinates relate. Object Coordinate System - When each object is created in a modelling program, the modeller must pick some point to be the origin of that particular object, and the orientation of the object to a set of model axes. For example when modelling a desk, the modeller might choose a point in the center of the desk top for the origin, or the point in the center of the desk at floor level, or the bottom of one of the legs of the desk. When this object is moved to a point in the world coordinate system, it is really the origin of the object ( in object coordinate system ) that is moved to the new world coordinates, and all other points in the model are moved by an equal amount. Note that while the origin of the object model is usually somewhere on the model itself, it does not have to be. For example, the origin of a doughnut or a tire might be in the vacant space in the middle. Hierarchical Coordinate Systems - Sometimes objects in a scene are arranged in a hierarchy, so that the "position" of one object in the hierarchy is relative to its parent in the hierarchy scheme, rather than to the world coordinate system. For example, a hand may be positioned relative to an arm, and the arm relative to the torso. When the arm moves, the hand moves with it, and when the torso moves, all three objects move together. Viewpoint Coordinate System - Also known as the "camera" coordinate system. This coordinate system is based upon the viewpoint of the observer, and changes as they change their view. Moving an object "forward" in this coordinate system moves it along the direction that the viewer happens to be looking at the time. Model Window Coordinate System - Not to be confused with desktop windowing systems ( MS Windows or X Windows ), this coordinate system refers to the subset of the overall model world that is to be displayed on the screen. Depending on the viewing parameters selected, the model window may be rectalinear or a distorted viewing frustrum of some kind. Screen Coordinate System - This 2D coordinate system refers to the physical coordinates of the pixels on the computer screen, based on current screen resolution. ( E.g. 1024x768 )  Viewport Coordinate System - This coordinate system refers to a subset of the screen space where the model window is to be displayed. Typically the viewport will occupy the entire screen window, or even the entire screen, but it is also possible to set up multiple smaller viewports within a single screen window. Transformations in 2 Dimensions One of the most common and important tasks in computer graphics is to transform the coordinates ( position, orientation, and size ) of either objects within the graphical scene or the camera that is viewing the scene. It is also frequently necessary to transform coordinates from one coordinate system to another, ( e.g. world coordinates to viewpoint coordinates to screen coordinates. ) All of these transformations can be efficiently and succintly handled using some simple matrix representations, which we will see can be particularly useful for combining multiple transformations into a single composite transform matrix. We will look first at simple translation, scaling, and rotation in 2D, then extend our results to 3D, and finally see how multiple transformations can be easily combined into a composite transform. Translation in 2D point (X,Y) is to be translated by amount Dx and Dy to a new location (X',Y') X' = Dx + X Y' = Dy + Y or P' = T + P where _ P' = | | _ T = | | _ P = | | - _ X' | Y' | _ Dx | Dy | _ X | Y | - Scaling in 2D point (X,Y) is to be scaled by amount Sx and Sy to location (X',Y') X' = Sx * X Y' = Sy * Y or P' = S * P where _ _ X' | Y' | _ _ = | Sx 0 | | 0 Sy | _ _ = | X | | Y | - P' = | | S P scaling is performed about the origin (0,0) not about the center of the line/polygon/whatever Scale > 1 enlarge the object and move it away from the origin. Scale = 1 leave the object alone Scale< 1 shrink the object and move it towards the origin. uniform scaling: Sx = Sy differential scaling Sx != Sy -> alters proportions Rotation in 2D point (X,Y) is to be rotated about the origin by angle theta to location (X',Y') X' = X * cos(theta) - Y * sin(theta) Y' = X * sin(theta) + Y *cos(theta) note that this does involve sin and cos which are much more costly than addition or multiplication or P' = R * P where _ _ X' | Y' | _ _ = | cos(theta) -sin(theta) | | sin(theta) cos(theta) | _ _ = | X | | Y | - P' = | | R P rotation is performed about the origin (0,0) not about the center of the line/polygon/whatever Derivation of the 2D Rotation Equations Where does this matrix come from? (X,Y) is located r away from (0,0) at a CCW angle of phi from the X axis. (X',Y') is located r away from (0,0) at a CCW angle of theta+phi from the X axis. Since rotation is about the origin, (X',Y') must be the same distance from the origin as (X,Y). from trigonometry we have: X = r * cos(phi) Y = r * sin(phi) and X' = r * cos(theta+phi) Y' = r * sin(theta+phi) Now making use of the following trigonometric identities: cos(a+b) = cos(a) * cos(b) - sin(a) * sin(b) sin(a+b) = sin(a) * cos(b) + cos(a) * sin(b) and substituting in for the above equations for X' and Y', we get: X' = r * cos(theta) * cos(phi) - r * sin(theta) * sin(phi) Y' = r * sin(theta) * cos(phi) + r * cos(theta) * sin(phi) Then we substitute in X and Y from their definitions above, and the final result simplifies to: X' = X * cos(theta) - Y * sin(theta) Y' = X * sin(theta) + Y * cos(theta) Homogeneous Coordinates in 2 Dimensions Scaling and rotations are both handled using matrix multiplication, which can be combined as we will see shortly. The translations cause a difficulty, however, since they use addition instead of multiplication. We want to be able to treat all 3 transformations (translation, scaling, rotation) in the same way as multiplications. The solution is to give each point a third coordinate (X, Y, W), which will allow translations to be handled as a multiplication also. ( Note that we are not really moving into the third dimension yet. The third coordinate is being added to the mathematics solely in order to combine the addition and multiplication of 2-D coordinates. ) Two triples (X,Y,W) and (X',Y',W') represent the same point if they are multiples of each other e.g. (1,2,3) and (2,4,6). At least one of the three coordinates must be nonzero. If W is 0 then the point is at infinity. This situation will rarely occur in practice in computer graphics. If W is nonzero we can divide the triple by W to get the cartesian coordinates of X and Y which will be identical for triples representing the same point (X/W, Y/W, 1). This step can be considered as mapping the point from 3-D space onto the plane W=1. Conversely, if the 2-D cartesian coordinates of a point are known as ( X, Y ), then the homogenous coordinates can be given as ( X, Y, 1 ) So, how does this apply to translation, scaling, and rotation of 2D coordinates? Translation of 2D Homogenous Coordinates point (X,Y) is to be translated by amount Dx and Dy to location (X',Y') X' = Dx + X Y' = Dy + Y or P' = T * P where _ _ X' | Y' | 1 | _ _ = | 1 0 Dx | = T(Dx,Dy) | 0 1 Dy | | 0 0 1 | _ _ = | X | | Y | | 1 | P' = | | | T P - - Hey Look! Translation is now a multiplication instead of an addition! Scaling of 2D Homogenous Coordinates P' = S * P where _ _ P' = | X' | | Y' | | 1 | _ _ S = | Sx 0 0 | = S(Sx,Sy) | 0 Sy 0 | | 0 0 1 | _ _ P = | X | | Y | | 1 | - Rotation of 2D Homogenous Coordinates P' = R * P where _ _ P' = | X' | | Y' | | 1 | _ _ R = | cos(theta) -sin(theta) 0 | = R(theta) | sin(theta) cos(theta) 0 | | 0 0 1 | _ _ P = | X | | Y | | 1 | - Composition of 2D Transformations There are many situations in which the final transformation of a point is a combination of several ( often many ) individual transformations. For example, the position of the finger of a robot might be a function of the rotation of the robots hand, arm, and torso, as well as the position of the robot on the railroad train and the position of the train in the world, and the rotation of the planet around the sun, and . . . Applying each transformation individually to all points in a model would take a lot of time. Instead of applying several transformations matrices to each point we want to combine the transformations to produce 1 matrix which can be applied to each point. In the simplest case we want to apply the same type of transformation (translation, rotation, scaling) more than once. translation is additive as expected scaling is multiplicative as expected rotation is additive as expected But what if we want to combine different types of transformations? a very common reason for doing this is to rotate a polygon about an arbitrary point (e.g. the center of the polygon) rather than around the origin. Translate so that P1 is at the origin T(-Dx,-Dy) Rotate R(theta) Translate so that the point at the origin is at P1 T(Dx,Dy) note the order of operations here is right to left: P' = T(Dx,Dy) * R(theta) * T(-Dx,-Dy) * P i.e. P' = T(Dx,Dy) * { R(theta) * [ T(-Dx,-Dy) * P ] } i.e. P' = [ T(Dx,Dy) * R(theta) * T(-Dx,-Dy) ] * P The matrix that results from these 3 steps can then be applied to all of the points in the polygon. another common reason for doing this is to scale a polygon about an arbitrary point (e.g. the center of the polygon) rather than around the origin. Translate so that P1 is at the origin Scale Translate so that the point at the origin is at P1 How do we determine the 'center' of the polygon?    specifically define the center (e.h. the center of mass) average the location of all the vertices take the center of the bounding box of the polygon Window to Viewport Generally user's prefer to work in world-coordinates. 1 unit can be 1 micron 1 unit can be 1 meter 1 unit can be 1 kilometer 1 unit can be 1 mile These coordinates must then be translated to screen coordinates to be displayed in a rectangular region of the screen called the viewport The objects are in world coordinates (with n dimensions) The viewport is in screen coordinates (with n=2) Want one matrix that can be applied to all points: rectangular area of world from (Xmin,Ymin) to (Xmax,Ymax) - world-coordinate window rectangular area of screen from (Umin,Vmin) to (Umax,Vmax) - viewport need to rescale the world-coordinate rectangle to the screen rectangle 1. translate world-coordinate window to the origin of the world coordinate system. 2. rescale the window to the size and aspect ratio of the viewport. 3. translate the viewport to its position on the screen in the screen coordinate system. Pscreen = M * Pworld M = T(Umin,Vmin) * S(deltaU/deltaX, deltaV/deltaY) * T(-Xmin, -Ymin) Transformations in 3D 3D Transformations Similar to 2D transformations, which used 3x3 matrices, 3D transformations use 4X4 matrices (X, Y, Z, W) 3D Translation: point (X,Y,Z) is to be translated by amount Dx, Dy and Dz to location (X',Y',Z') X' = Dx + X Y' = Dy + Y Z' = Dz + Z or P' = T * P where _ P' = | | | | _ X' Y' Z' 1 _ T = | | | | _ 1 0 0 0 _ P = | | | | | | | | 0 1 0 0 0 Dx 0 Dy 1 Dz 0 1 | = T(Dx,Dy,Dz) | | | - _ X Y Z 1 - | | | | - 3D Scaling: _ P' = | | | | _ X' Y' Z' 1 _ S = | | | | _ Sx 0 0 0 - | | | | 0 Sy 0 0 0 0 Sz 0 0 0 0 1 | = S(Sx,Sy,Sz) | | | - _ P = | | | | _ X Y Z 1 - | | | | - 3D Rotation: For 3D rotation we need to pick an axis to rotate about. The most common choices are the Xaxis, the Y-axis, and the Z-axis _ P' = | | | | _ X' Y' Z' 1 _ | | | | - _ cos(theta) -sin(theta) 0 0 | = Rz(theta) sin(theta) cos(theta) 0 0 | 0 0 1 0 | 0 0 0 1 | _ _ Rx = | 1 0 0 0 | = Rx(theta) | 0 cos(theta) -sin(theta) 0 | | 0 sin(theta) cos(theta) 0 | | 0 0 0 1 | _ _ Ry = | cos(theta) 0 sin(theta) 0 | = Ry(theta) | 0 1 0 0 | | -sin(theta) 0 cos(theta) 0 | | 0 0 0 1 | _ _ P = | X | | Y | | Z | | 1 | - Rz = | | | | Complex / Compound Rotations Now what if rotations are to occur other than around one of the cartesian axes? There are two ways of looking at this: 1. A series of rotations about cartesian axes, which can be combined by multiplying the appropriate matrices together. Again, the righmost matrix is the operation that occurs first. Note that with this approach there may be more than one combination of rotations that yield the same end result. 2. An alternate axis of rotation can be chosen, other than the cartesian axes, and the point rotated a given amount about this axis. For any given orientation change there exists a single unique axis and rotation angle ( 0 <= theta <= 180 degrees ) that will yield the desired rotation. This alternative approach is the basis for "quaternions", which will not likely be discussed further in this course. ( Quaternions are used heavily in the WorldToolKit package, which is no longer produced, and can be useful for interpolating rotations between two oblique angles. ) Composition is handled in a similar way to the 2D case, multiplying the transformation matrices from right to left. OpenGL Transform Operations In OpenGL translation, rotation, and scaling are performed using commands such as: glTranslate{fd}(X,Y,Z) - glTranslatef(1.0, 2.5, 3.0) glRotate{df}(Angle, X, Y, Z) - glRotatef(60.0, 0.0, 0.0, 1.0) glScale{df}(X, Y, Z) - glScalef(1.0, 1.5, 2.0) What these commands do in practice is to generate the corresponding transformation matrix for the operation that was requested, multiply it by whatever matrix is currently on top of the currently active matrix stack, and replace the matrix on the top of the stack with the result. If you want to calculate and save the result of a complex series of transformations, one approach is to push an identity matrix onto the stack, perform each of the operations in turn, and then save a copy of the result from the top of the stack into a global or static variable, ( possibly popping it off the stack if it won't be needed there immediately. ) Implementation: Applying Geometric Transformations Changing Coordinate Systems Last time we talked about transforms in terms of an object (polygon, line, point) in the same coordinate system of the world, that is (0,0) for the object is the same as (0,0) for the world. An alternative (and usually preferable) way is for each object to have its own local coordinate system separate from all of the other objects and the world coordinate system. This allows each object to be created separately and then added into the world, using transformations to place the object at the appropriate point. This is especially useful when you build larger objects out of reusable smaller objects. For example we could have a square with 2 unit long sides and the center of the square at its origin. The vertices of the square are: (-1,-1) ( 1,-1) ( 1, 1) (-1, 1) This is nice because the center of the square is where we probably want the center of rotation/scaling to be. We can rotate/scale it where it is and then translate it where it belongs in the world (e.g. T(5,5) to move the center to (5,5) in world-space) When we design the object we place the center of rotation/scaling where we wish it to be. For a wheel the center of rotation would be where the axle is to be attached, for a hinged gate the center would be at the hinge. Each object must then be translated/rotated/scaled (from its local co-ordinate system) to place it at the proper location in the world (in the world co-ordinate system). Say we are drawing an automobile and we want to draw the 4 tires. We can either draw each of the 4 tires independently at the appropriate location, or draw the same tire centered at its origin 4 times, and each time move it to the appropriate location in the world. OpenGL Example: Sun and Planet Some OpenGL code to create a solar system with a single sun and a single planet in orbit about it. We will bend the laws of physics slightly and say that the planet is orbiting about the center of the sun rather than having the objects in orbit about their shared center of mass. We also assume a circular orbit and that the plane of the solar system matches the Y=0 plane We place the sun at 0,0,0 and it will be a sphere of radius 1 The planet has a radius of 0.2 and orbits the sun at a radius of 2 The planet rotates about its axis once per day The planet revolves around the sun once each year glPushMatrix(); // Copies the top matrix on the stack and pushes it. glLoadIdentity(); //reset the matrix to the identity matrix drawSphere(1.0) // user defined function to draw the sun glRotatef(yearPercentage, 0.0, 1.0, 0.0); // Rotation about the sun glTranslatef(2.0, 0.0, 0.0); // Translate to distance away from the sun glRotatef(dayPercentage, 0.0, 1.0, 0.0); // Rotate about own axis. drawSphere(0.2) // user defined function to draw the planet glPopMatrix(); // Return stack to original condition Note: dayPercentage is the amount the planet rotates about ITS center. yearPercentage is the amount the planet rotates about the center of the sun. If you think about each object having its own coordinate system then the operations on the matrix are done in the SAME order as they are called: Initially the transformation matrix is the identity matrix The sun is drawn as a circle with radius 1 at (0,0,0) o o o o The planet is rotated about its Y-axis by the percentage of year that has passed turning its coordinate system in the process The planet is translated 2 units on its now rotated X-axis to its position in orbit The planet is rotated about its Y-axis by the percentage of day that has passed. Since the planet is still at (0,0,0) by its coordinate system, it rotates about its center. The planet is drawn as a circle with radius 0.2 If you think about the single coordinate system then the operations on the matrix are done in the REVERSE order from which they are called: Initially the transformation matrix is the identity matrix The sun is drawn as a circle with radius 1 at (0,0,0) o o o The planet is drawn as a circle with radius 0.2 at (0,0,0) The planet is rotated about the Y-axis by the percentage of day that has passed. Since the planet is still at the origin this rotates the planet about its center. The planet is translated 2 units on the X-axis moving its center to (2, 0, 0) o The planet is rotated about the Y-axis by the percentage of year that has passed. Since the planet is no longer at the origin it rotates about the origin at a radius of 2. if the matrix operations are not performed in reverse order then the year and day rotation percentages get reversed. Either way of thinking about it is equivalent, and irregardless of how you think about it, that is how OpenGL function calls must be issued. Say you have three polygonal drawing functions available to you:    Draw a square centered at the origin with sides of length 1 Draw a circle centered at the origin with diameter of length 1 Draw a equilateral triangle with the center of its base at the origin with sides of length 1 How can I draw the following scene? Object Hierarchies Single polygons are generally too small to be of interest ... its hard to think of a single polygon as an 'object' unless you are writing Tetris(tm). Even in a 2D world it is more convenient to think of objects which are a collection of polygons forming a recognizable shape: a car, a house, or a laser-toting mutant monster from Nebula-9. This object can then be moved/rotated/scaled as a single entity, at least at the conceptual level. This is especially true in a 3D world where you need more than one (planar) polygon to create a 3D object. Creating an object polygon by polygon is very slow when you want to create a very large complex object. On the other hand it does give you much more control over the object than creating it from higher-level primitives (cube, cone, sphere) The following two examples are from Silicon Graphics' OpenInventor(tm) a library which sits on top of OpenGL and allows higher-level objects to be created. The first shows a tree constructed from a cube and a cone. The second shows the same tree but constructed from triangular polygons. pine tree built from objects pine tree built from triangular polygons note, triangular polygons are often used instead of 4-sided ones because the 3 vertices in the triangle are guaranteed to form a plane, while the 4 vertices of a 4-sided polygon may not all fall in the same plane which may cause problems later on. Hierarchies are typically stored as Directed Acyclic Graphs, that is they are trees where a node can have multiple parents as long as no cycle is generated. Hierarchies store all information necessary to draw an object: polygon information material information transformation information Hierarchies are useful when you want to be able to manipulate an object on multiple levels: With an arm you may want to rotate the entire arm about the shoulder, or just the lower arm about the elbow, or just the wrist or just a finger. If you rotate the entire arm then you want the rest of the arm parts to follow along as though they were joined like a real arm - if you rotate the arm then the elbow should come along for the ride. With a car the wheels should rotate but if the car body is moving then the wheels should also be moving the same amount. An object hierarchy gives a high degree of encapsulation. An object heierarchy allows inheritance Attributes to be set once and then used by multiple sub-objects. For example, at the top of the hierarchy the object could be set to draw only as a wireframe, or with different lighting models, or different colours, or different texture maps. This would then be inherited by the sub-objects and not have to be explicitely set each of them. Hierarchies increase modularity. Hierarchies decrease storage space Only store one copy of each object in the hierarchy and then using pointers to that object. Hierarchies make it easy to propagate changes. For example if you decide to change the style of the 4 tires on your car model,the tire is only stored once and referenced four times so only one model needs to be updated instead of four. Hierarchies are common in 'real life' so it can be easier to model hierarchical things in a hierarchy. How can I redraw the house and tree scene to make better use of objects? Fonts Text is handled in one of two ways.   Using a bitmap for each character in the font Using lines / polygons for each character in the font bitmaps: rectangular array of 0s and 1s 00000000 00011000 00100100 01000010 01111110 01000010 01000010 01000010      need a set of bitmaps for each size and style of font 2D only always aligned with the screen dealt with specially, while manipulating the frame buffer faster polygons:  rescalable so that the definition can generate a 'smooth' character of any size     can be either 2D or 3D can be rotated treated like any other line/polygon to be displayed slower OpenGL provides minimal font support - only bitmapped fonts. Fortunately there are free 3D fonts available such as the Hershey library. General 3D Concepts Taking 2D objects and mapping onto a 2D screen is pretty straightforward. The window is the same plane as the 2D world. Now we are taking 3D objects and mapping them onto a 2D screen. Here is where the advantage of separating the model world from its rendered image becomes more obvious. The easiest way to think about converting 3D world into 2D image is the way we do it in real life - with a camera. Lets say we have an object in the real world (e.g. the Sears Tower.) The tower sits there in its 3Dness. You can move around the tower, on the ground, on the water, in the air, and take pictures of it, converting it to a 2D image. Depending on where you put the camera and the settings on the camera, and other factors such as light levels, you get different looking images. In the computer we have a synthetic camera taking still or moving pictures of a synthetic environment. While this synthetic camera gives you a much wider range of options than a real camera, you will find it is VERY easy to take a picture of nothing at all. Projections projection is 'formed' on the view plane (planar geometric projection) rays (projectors) projected from the center of projection pass through each point of the models and intersect projection plane. Since everything is synthetic, the projection plane can be in front of the models, inside the models, or behind the models. 2 main types: perspective projection and parallel projection. parallel : o o o o o center of projection infinitely far from view plane projectors will be parallel to each other need to define the direction of projection (vector) 2 sub-types  orthographic - direction of projection is normal to view plane  oblique - direction of projection not normal to view plane better for drafting / CAD applications perspective : o o o o o center of projection finitely far from view plane projectors will not be parallel to each other need to define the location of the center of projection (point) classified into 1, 2, or 3-point perspective more visually realistic - has perspective foreshortening (objects further away appear smaller) which type of projection is used depends on the needs of the user - whether the goal is the mathematically correct depiction of length and angles, or a realistic looking image of the object. specifying a 3D view Danger, watch out for acronyms being tossed out! Need to know the type of projection Need to know the clipping volume in OpenGL there are the following functions: glFrustum(left, right, bottom, top, near, far); glOrtho(left, right, bottom, top, near, far); View Plane defined by:   point on the plane - View Reference Point (VRP) normal to the plane pointing towards the center of projection- View-Plane Normal (VPN) view plane can be anywhere in the world-space The center of projection represents the location of the viewer's eye or the camera's lens. Need to define a 3D Viewing Reference Coordinate system (VRC) which has axis u, v, n     Origin of VRC is the VRP n axis of VRC is the VPN v axis of VRC is called the View-UP vector (VUP) u axis of VRC is defined to form a right-hand coordinate system with n and v since the View Plane (n=0) is infinite (as it is a plane) we need to declare a region of that plane to be our window.    (Umin, Vmin) to (Umax, Vmax) Center of Window (CW) on View Plane does not have to be VRP VRP may not even be within the window Projection Reference Point (PRP) defines the Center of Projection and Direction of Projection (DOP)    PRP given in VRC coordinate system (that is, its position is given relative to the VRP) parallel projection - DOP is from PRP to CW, and all projectors are parallel. perspective projection - Center of Projection is the PRP n-point perspective Perspective projections are categorized by the number of axes the view plane cuts (ie 1-point perspective, 2-point perspective or 3-point perspective) If the plane cuts the z axis only, then lines parallel to the z axis will meet at infinity; lines parallel to the x or y axis will not meet at infinity because they are parallel to the view plane. This is 1-point perspective. If the plane cuts the x and z axes, then lines parallel to the x axis or the z axis will meet at infinity; lines parallel to the y axis will not meet at infinity because they are parallel to the view plane. This is 2-point perspective. If the plane cuts the x, y, and z axis then lines parallel to the x, y, or z axis will meet at infinity. This is 3-point perspective. The n-point perspectives can work with any combination of the x, y, z axis. View Volumes The view volume in orthographic projection: The view volume in perspective projection: The front plane's location is given by the front distance F relative to the VRP The back plane's location is given by the back distance B relative to the VRP In both cases the positive direction is in the direction of the VPN Viewing volume has 6 clipping planes (left, right, top, bottom, near (hither), far (yon)) instead of the 4 clipping lines we had in the 2D case, so clipping is a bit more complicated perspective - viewing volume is a frustum of a 4-sided pyramid parallel - viewing volume is a rectangular parallelepiped (ie a box) Parallel Examples In the red version of the Foley vanDam book see P.211-212. In the white version of the Foley vanDam book see P.250-252. Perspective Examples Here are some examples using the same house as in the book (figure 6.18 in the red version of the Foley vanDam book, figure 6.24 in the white version of the Foley vanDam book, but using Andy Johnson's solution to the former HW3 as the viewing program: The object is a house with vertices: 0, 0,30 16, 0,30 16,10,30 8,16,30 0,10,30 0, 0,54 16, 0,54 16,10,54 8,16,54 0,10,54 VRP= VPN= VUP= PRP= 0 0 0 8 0 54 0 1 1 0 7 30 U: -1 to 17 V: -2 to 16 VRP= 16 0 54 or VRP= VPN= VUP= PRP= 8 0 0 0 7 54 0 1 1 0 0 30 U: -9 to 9 V: -9 to 9 VPN= 1 0 0 VUP= 0 1 0 PRP= 12 8 16 U: -1 to 25 V: -5 to 21 VRP= VPN= VUP= PRP= 16 1 0 6 0 54 0 1 1 0 8 10 U: -22 to 22 V: -2 to 18 For other examples, in the red version of the Foley vanDambook see P.206-211. In the white version of the Foley vanDam book see P.245-250. Example with a Finite View Volume In the red version of the Foley vanDam book see P.212-213. In the white version of the Foley vanDam book see P.253. Now, how do we actually IMPLEMENT this?!?!?!? with lots and lots of matrices. Method 1: 1. 2. 3. 4. Extend 3D coordinates to homogeneous coordinates Normalize the homogeneous coordinates Go back to 3D coordinates Clip 5. 6. 7. 8. Extend 3D coordinates to homogeneous coordinates Perform projection Translate and Scale into device coordinates Go to 2D coordinates Method 2: 1. 2. 3. 4. 5. Extend 3D coordinates to homogeneous coordinates Normalize the homogeneous coordinates Clip Translate and Scale into device coordinates Go to 2D coordinates The First Two Steps are Common to Both Methods 1. Extend 3D coordinates to homogeneous coordinates This is easy we just take (x, y, z) for every point and add a W=1 (x, y, z, 1) As we did previously, we are going to use homogeneous coordinates to make it easy to compose multiple matrices. 2. Normalizing the homogeneous coordinates It is hard to clip against any view volume the user can come up with, so first we normalize the homogeneous coordinates so we can clip against a known (easy) view volume (the canonical view volumes). That is, we are choosing a canonical view volume and we will manipulate the world so that the parts of the world that are in the existing view volume are in the new canonical view volume. This also allows easier projection into 2D. In the parallel projection case the volume is defined by these 6 planes: x = -1 x = 1 y = -1 y = 1 z = 0 z = -1 In the perspective projection case the volume is defined by these 6 planes: x = -z x = z y = -z y = z z = -zmin z = -1 We want to create Npar and Nper, matrices to perform this normalization. For Npar the steps involved are: 1. Translate VRP to the origin 2. Rotate VRC so n-axis (VPN) is z-axis, u-axis is x-axis, and v-axis is y-axis 3. Shear so direction of projection is parallel to z-axis (only needed for oblique parallel projections that is where the direction of projection is not normal to the view plane) 4. Translate and Scale into canonical view volume Step 2.1 Translate VRP to the origin is T(-VRP) Step 2.2 Rotate VRC so n-axis (VPN) is z-axis, u-axis is x-axis, and v-axis is y-axis uses unit vectors from the 3 axis: Rz = VPN / || VPN || (so Rz is a unit length vector in the direction of the VPN) Rx = VUP x Rz / || VUP x Rz || (so Rx is a unit length vector perpendicular to Rz and Vup) Ry = Rz x Rx (so Ry is a unit length vector perpendicular to the plane formed by Rz and Rx) giving matrix R: _ | | | | - _ r1x r1y r1z 0 r2x r2y r2z 0 r3x r3y r3z 0 0 0 0 1 | | | | - with rab being the ath element of Rb VPN rotated into Z axis, U rotated into X, V rotated into Y now the PRP is in world coordinates Step 2.3 Shear so direction of projection is parallel to z-axis (only needed for oblique parallel projections - that is where the direction of projection is not normal to the view plane) makes DOP coincident with the z axis. Since the PRP is now in world coordinates we can say that: DOP is now CW - PRP _ _ _ _ _ _ |DOPx| | (umax+umin)/2 | | PRPu | |DOPy| = | (vmax+vmin)/2 | - | PRPv | |DOPz| | 0 | | PRPn | | 0 | | 1 | | 1 | - We need DOP to be: _ _ | 0 | | 0 | |DOPz'| | 1 | - The shear is performed with the following matrix: _ | | | | - _ 1 0 0 0 0 1 0 0 shx shy 1 0 0 0 0 1 | | | | - where: shx = - DOPx / DOPz shy = - DOPy / DOPz note that if this is an orthographic projection (rather than an oblique one) DOPx = DOPy=0 so shx = shy = 0 and the shear matrix becomes the identity matrix. *** note that the red book has 2 misprints in this section. Equation 6.18 should have dopx as the first element in the DOP vector, and equation 6.22 should have dopx / dopz. The white book has the correct versions of the formulas Step 2.4 Translate and Scale the sheared volume into canonical view volume: Tpar = T( -(umax+umin)/2, -(vmax+vmin)/2, -F) Spar = S( 2/(umax-umin), 2/(vmax-vmin), 1/(F-B)) where F and B are the front and back distances for the view volume. So, finally we have the following procedure for computing Npar: Npar = Spar * TPar * SHpar * R * T(-VRP) For Nper the steps involved are: 1. 2. 3. 4. 5. Translate VRP to the origin Rotate VRC so n-axis (VPN) is z-axis, u-axis is x-axis, and v-axis is y-axis Translate so that the center of projection(PRP) is at the origin Shear so the center line of the view volume is the z-axis Scale into canonical view volume Step 2.1 Translate VRP to the origin is the same as step 2.1 for Npar: T(-VRP) Step 2.2 Rotate VRC so n-axis (VPN) is z-axis, u-axis is x-axis, and v-axis is y-axis is the same as step 2.2 for Npar: Step 2.3 Translate PRP to the origin which is T(-PRP) Step 2.4 is the same as step 2.3 for Npar. The PRP is now at the origin but the CW may not be on the Z axis. If it isn't then we need to shear to put the CW onto the Z axis. Step 2.5 scales into the canonical view volume up until step 2.3, the VRP was at the origin, afterwards it may not be. The new location of the VRP is: _ _ | 0 | VRP' = SHpar * T(-PRP) * | 0 | | 0 | | 1 | - - so Sper = ( 2VRP'z / [(umax-umin)(VRP'z+B)] , 2VRP'z / [(vmax-vmin)(VRP'z+B)], -1 / (VRP'z+B)) So, finally we have the following procedure for computing Nper: Nper = Sper * SHpar * T(-PRP) * R * T(-VRP) You should note that in both these cases, with Npar and Nper, the matrix depends only on the camera parameters, so if the camera parameters do not change, these matrices do not need to be recomputed. Conversely if there is constant change in the camera, these matrices will need to be constantly recreated. The Remaining Steps Differ for the Two Methods Now, here is where the 2 methods diverge with method one going back to 3D coordinates to clip while method 2 remains in homogeneous coordinates. The choice is based on whether W is ensured to be > 0. If so method 1 can be used, otherwise method 2 must be used. With what we have discussed in this class so far W will be > 0, and W should remain 1 thorought the normaliztion step. You get W < 0 when you do fancy stuff like bsplines. Method 2 also has the advantage of treating parallel and perspective cases with the same clipping algorithms, and is generally supported in modern hardware. We will deal with Method 1 first. 3. Divide by W to go back to 3D coordinates This is easy we just take (x, y, z, W) and divide all the terms by W to get (x/W, y/W, z/W, 1) and then we ignore the 1 to go back to 3D coordinates. We probably do not even need to divide by W as it should still be 1. 4. Clip in 3D against the appropriate view volume At this point we want to keep everything that is inside the canonical view volume, and clip away everything that is outside the canonical view volume. We can take the Cohen-Sutherland algorithm we used in 2D and extend it to 3D, except now there are 6 bits instead of four. For the parallel case the 6 bits are:  point is above view volume: y > 1      point is below view volume: y < -1 point is right of view volume: x > 1 point is left view volume: x < -1 point is behind view volume: Z < -1 point is in front of view volume: z > 0 For the perspective case the 6 bits are:       point is above view volume: y > -z point is below view volume: y < z point is right of view volume: x > -z point is left view volume: x < z point is behind view volume: z < -1 point is in front of view volume: z > zmin P273 in the white book shows the appropriate equations for doing this. 5. Back to homogeneous coordinates again This is easy we just take (x, y, z) and add a W=1 (x, y, z, 1) 6.Perform Projection - Collapse 3D onto 2D For the parallel case the projection plane is normal to the z-axis at z=0. For the perspective case the projection plane is normal to the z axis at z=d. In this case we set z = -1. In the parallel case, since there is no forced perspective, Xp = X and Yp = Y and Z is set to 0 to do the projection onto the projection plane. Points that are further away in Z still retain the same X and Y values - those values do not change with distance. For the parallel case, Mort is: _ | | | | - 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 _ | | | | - Multiplying the Mper matrix and the vector (X, Y, Z, 1) holding a given point, gives the resulting vector (X, Y, 0, 1). In the perspective case where there is forced perspective, the projected X and Y values do depend on the Z value. Objects that are further away should appear smaller than similar objects that are closer. For the perspective case, Mper is: _ | | | | - 1 0 0 0 0 0 1 0 0 1 0 -1 0 0 0 0 _ | | | | - Multiplying the Mper matrix and the vector (X, Y, Z, 1) holding a given point, gives the resulting vector (X, Y, Z, -Z). 7. Translate and Scale into device coordinates All of the points that were in the original view volume are still within the following range: -1 <= X <= 1 -1 <= Y <= 1 -1 <= Z <= 0 In the parallel case all of the points are at their correct X and Y locations on the Z=0 plane. In the perspective case the points are scattered in the space, but each has a W value of -Z which will map each (X, Y, Z) point to the appropriate (X', Y') place on the projection plane. Now we will map these points into the viewport by moving to device coordinates. This involves the following steps: 1. translate view volume so its corner (-1, -1, -1) is at the origin 2. scale to match the size of the 3D viewport (which keeps the corner at the origin) 3. translate the origin to the lower left hand corner of the viewport Mvv3dv = T(Xviewmin, Yviewmin, Zviewmin) * Svv3dv * T(1, 1, 1) where Svv3dv = S( (Xviewmax-Xviewmin)/2, (Yviewmax-Yviewmin)/2, (Zviewmax-Zviewmin)/1 ) You should note here that these parameters will only depend on the size and shape of the viewport - they are independant of the camera settings, so if the viewport is not resized, these matrices will remain constant and reusable. *** note that the red book has another misprint in this section. The discussion of the translation matrix before equation 6.44 has a spurious not equal to sign. 8. Divide by W to go from homogeneous to 2D coordinates Again we just take (x, y, z, W) and divide all the terms by W to get (x/W, y/W, z/W, 1) and then we ignore the 1 to go back to 3D coordinates. In the parallel case dropping the W takes us back to 3D coordinates with Z=0, which really means we now have 2D coordinates on the projection plane. In the perspective projection case, dividing by W will affect the transformation of the points. Dividing by W (which is -Z) takes us back to the 3D coordinates of (-X/Z, -Y/Z, -1, 1). Dropping the W takes us to the 3D coordinates of (-X/Z, -Y/Z, -1) which positions all of the points onto the Z=-1 projection plane which is what we wanted. Dropping the Z coordinate gives us the 2D location on the Z=-1 plane. The Second Method BUT ... there is still the second method where we clip in homogeneous coordinates to be more general The normalization step (step 2) is slighly different here, as both the parallel and perspective projections need to be normalized into the canonical parallel perspective view volume. Npar above does this for the parallel case. Nper' for the perspective case is M * Nper. Nper is the normalization given in step 2. This is the same normalization that needed to be done in step 8 of method one before we could convert to 2D coordinates. M= 1 0 0 0 0 1 0 0 0 0 1/(1+Zmin) z/(1+Zmin) 0 0 -1 0 Now in both the parallel and perspective cases the clipping routine is the same. Again we have 6 planes to clip against:    X = -W X=W Y = -W    Y=W Z = -W Z=0 but since W can be positive or negative the region defined by those planes is different depending on W. 2nd method step 4: Then we know all of the points that were in the original view volume are now within the following range: -1 <= X <= 1 -1 <= Y <= 1 -1 <= Z <= 0 Now we need to map these points into the viewport by moving to the device coordinates. This involves the following steps: 1. translate view volume so its corner (-1, -1, -1) is at the origin 2. scale to match the size of the 3D viewport (which keeps the corner at the origin) 3. translate the origin to the lower left hand corner of the viewport Mvv3dv = T(Xviewmin, Yviewmin, Zviewmin) * Svv3dv * T(1, 1, 1) where Svv3dv = S( (Xviewmax-Xviewmin)/2, (Yviewmax-Yviewmin)/2, (Zviewmax-Zview.min)/1 ) The new x and new y give us the information we need for the 2D version, but we will also find the new z to be useful for determining occlusion - which we will discuss in a few weeks as zbuffer techniques. 2nd method step 5: Then we divide by W to go from homogeneous to 2D coordinates. In the perspective projection case, dividing by W will affect the transformation of the points. Lighting and Shading What the human eye ( or virtual camera ) sees is a result of light coming off of an object or other light source and striking receptors in the eye. In order to understand and model this process, it is necessary to understand different light sources and the ways that different materials reflect those light sources. Trying to recreate reality is difficult. Lighting calculations can take a VERY long time. The techniques described here are heuristics which produce appropriate results, but they do not work in the same way reality works - because that would take too long to compute, at least for interactive graphics. Instead of just specifying a single colour for a ploygon we will instead specify the properties of the material that the polygon is supposed to be made out of, ( i.e. how the material responds to different kinds of light ), and the properties of the light or lights shining onto that material. Illumination Models No Lighting ( Emissive Lighting )         There are no lights in the scene Each polygon is self-luminous (it lights itself, but does not give off light) o This is also known as "emissive" lighting in some systems. o ( Truly "no" lighting would be a totally black screen and not very interesting. ) o Emissive lighting can have some interesting effects, such as car headlights or neon signs. Each polygon has its own colour which is constant over its surface That colour is not affected by anything else in the world That colour is not affected by the position or orientation of the polygon in the world This is very fast, but not very realistic position of viewer is not important No 3-D information provided. o Generally used as a special effect in combination with other lighting, or for very simple graphics. I = Ki I: intensity Ki: object's intrinsic intensity, 0.0 - 1.0 for each of R, G, and B This scene from battalion has no lighting ( ? ) Ambient        Non-directional light source Simulates light that has been reflected so many times from so many surfaces it appears to come equally from all directions intensity is constant over polygon's surface intensity is not affected by anything else in the world intensity is not affected by the position or orientation of the polygon in the world position of viewer is not important No 3-D information provided. o Generally used as a base lighting in combination with other lights. I = IaKa I: intensity Ia: intensity of Ambient light Ka: object's ambient reflection coefficient, 0.0 - 1.0 for each of R, G, and B Types of Light Sources Which Can be Used to Light a Scene    Directional light - produced by a light source an infinite distance from the scene., All of the light rays emanating from the light strike the polygons in the scene from a single parallel direction, and with equal intensity everywhere. o Sunlight is for all intents and purposes a directional light. o Characterized by color, intensity, and direction. Point light - a light that gives off equal amounts of light in all directions. Polygons, and parts of polygons which are closer to the light appear brighter than those that are further away. The angle at which light from a point light source hits an object is a function of the positions of both the object and the light source. The intensity of the light source hitting the object is a function of the distance between them. Different graphics programs may ( or may not ) allow the programmer to adjust the falloff function in different ways. o A bare bulb hanging from a cord is essentially a point light. o Characterized by color, intensity, location, and falloff function. Spotlight - light that radiates light in a cone with more light in the center of the cone, gradually tapering off towards the sides of the cone. The simplest spotlight would just be a point light that is restricted to a certain angle around its primary axis of direction Think of something like a flashlight or car headlight as opposed to a bare bulb hanging on a wire. More advanced spotlights have a falloff function making the light more intense at the center of the cone and softer at the edges. o Characterized as a point light, an axis of direction, a radius about that axis, and possibly a radial falloff function. Here is the same object (Christina Vasilakis' SoftImage Owl) under different lighting conditions: bounding boxes of the components of the owl self-luminous owl directional light from the front of the owl point light slightly in front of the owl spotlight slightly in front of the owl aimed at the owl Diffuse Reflection (Lambertian Reflection) Using a point light:      comes from a specific direction reflects off of dull surfaces light reflected with equal intensity in all directions brightness depends on theta - angle between surface normal (N) and the direction to the light source (L) position of viewer is not important I = Ip Kd cos(theta) or I = Ip Kd(N' * L') I: intensity Ip: intensity of point light Kd: object's diffuse reflection reflection coefficient, 0.0 - 1.0 for each of R, G, and B N': normalized surface normal L': normalized direction to light source *: represents the dot product of the two vectors Using a directional light:   theta is constant L' is constant Directional lights are faster than point lights because L' does not need to be recomputed for each polygon. It is rare that we have an object in the real world illuminated only by a single light. Even on a dark night there is a some ambient light. To make sure all sides of an object get at least a little light we add some ambient light to the point or directional light: I = Ia Ka + Ip Kd(N' * L') Currently there is no distinction made between an object close to a point light and an object far away from that light. Only the angle has been used so far. It helps to introduce a term based on distance from the light. So we add in a light source attenuation factor: Fatt. I = Ia Ka + Fatt Ip Kd(N' * L') Coming up with an appropriate value for Fatt is rather tricky. It can take a fair amount of time to balance all the various types of lights in a scene to give the desired effect (just as it takes a fair amount of time in real life to set up proper lighting) Specular Reflection  reflection off of shiny surfaces - you see a highlight    shiny metal or plastic has high specular component chalk or carpet has very low specular component position of the viewer IS important in specular reflection I = Ip cos^n(a) W(theta) I: intensity Ip: intensity of point light n: specular-reflection exponent (higher is sharper falloff) W: gives specular component of non-specular materials So if we put all the lighting models depending on light together we add up their various components to get: I = Ia Ka + Ip Kd(N' * L') + Ip cos^n(a) W(theta) As shown in the following figures: + + = Ambient Specular + = Diffuse Result + In OpenGL a polygon can have the following material properties:       ambientColor (R, G, B) - Contribution due to ambient light diffuseColor (R, G, B) - Contribution due to directed, point, and spot lights specularColor (R, G, B) - Contribution due to specularity, usually white emissiveColor (R, G, B) - For objects that glow of their own accord, independent of light transparency 0.0 - 1.0 - Alpha channel, for combining with pixels behind the object. shininess 0.0 - 1.0 - Controls the contribution of the specular component These properties describe how light is reflected off the surface of the polygon. a polygon with diffuse color (1, 0, 0) reflects all of the red light it is hit with, and absorbs all of the blue and green. If this red polygon is hit with a white light it will appear red. If it with a blue light, or a green light, or an aqua light it will appear black (as those lights have no red component.) If it is hit with a yellow light or a purple light it will appear red (as the polygon will reflect the red component of the light.) The following pictures will help to illustrate this: ball light image white red red white red green purple blue yellow aqua One important thing to note about all of the above equations is that each object is dealt with separately. That is, one object does not block light from reaching another object. The creation of realistic shadows is quite expensive if done right, and is a currently active area of research in computer graphics. ( Consider, for example, a plant with many leaves, each of which could cast shadows on other leaves or on the other nearby objects, and then further consider the leaves fluttering in the breeze and lit by diffuse or unusual light sources. ) Multiple Lights With multiple lights the affect of all the lights are additive. Shading Models We often use polygons to simulate curved surfaces. If these cases we want the colours of the polygons to flow smoothly into each other.    flat shading goraud shading ( color interpolation shading ) phong shading ( normal interpolation shading ) Flat Shading     each entire polygon is drawn with the same colour need to know one normal for the entire polygon fast lighting equation used once per polygon Given a single normal to the plane the lighting equations and the material properties are used to generate a single colour. The polygon is filled with that colour. Here is another of the OpenGL samples with a flat shaded scene: Goraud Shading     colours are interpolated across the polygon need to know a normal for each vertex of the polygon slower than flat shading lighting equation used at each vertex Given a normal at each vertex of the polygon, the colour at each vertex is determined from the lighting equations and the material properties. Linear interpolation of the colour values at each vertex are used to generate colour values for each pixel on the edges. Linear interpolation across each scan line is used to then fill in the colour of the polygon. Here is another of the OpenGL samples with a smooth shaded scene: Phong Shading  normals are interpolated across the polygon     need to know a normal for each vertex of the polygon better at dealing with highlights than Goraud shading slower than Goraud shading lighting equation used at each pixel Where Goraud shading uses normals at the vertices and then interpolates the resulting colours across the polygon, Phong shading goes further and interpolates tha normals. Linear interpolation of the normal values at each vertex are used to generate normal values for the pixels on the edges. Linear interpolation across each scan line is used to then generate normals at each pixel across the scan line. Whether we are interpolating normals or colours the procedure is the same: To find the intensity of Ip, we need to know the intensity of Ia and Ib. To find the intensity of Ia we need to know the intensity of I1 and I2. To find the intensity of Ib we need to know the intensity of I1 and I3. Ia = (Ys - Y2) / (Y1 - Y2) * I1 + (Y1 - Ys) / (Y1 - Y2) * I2 Ib = (Ys - Y3) / (Y1 - Y3) * I1 + (Y1 - Ys) / (Y1 - Y3) * I3 Ip = (Xb - Xp) / (Xb - Xa) * Ia + (Xp - Xa) / (Xb - Xa) * Ib Fog We talked earlier about how atmospheric affects give us a sense of depth as particles in the air make objects that are further away look less distinct than near objects. Fog, or atmospheric attenuation allows us to simulate this affect. Fog is implemented by blending the calculated color of a pixel with a given background color ( usually grey or black ), in a mixing ratio that is somehow proportional to the distance between the camera and the object. Objects that are farther away get a greater fraction of the background color relative to the object's color, and hence "fade away" into the background. In this sense, fog can ( sort of ) be thought of as a shading effect. Fog is typically given a starting distance, an ending distance, and a colour. The fog begins at the starting distance and all the colours slowly transition to the fog colour towards the ending distance. At the ending distance all colours are the fog colour. Here are those o-so-everpresent computer graphics teapots from the OpenGL samples: To use fog in OpenGL you need to tell the computer a few things:     color of the fog as R, G, and B values function for how to map the intermediate distances (linear, exponential, exponential squared where the fog begins and where the fog ends if using linear mapping density of the fog if using one of the two exponentials mappings Here is a scene from battalion without fog. The monster sees a very sharp edge to the world Here is the same scene with fog. The monster sees a much softer horizon as objects further away tend towards the black colour of the sky AntiAliasing Lines and the edges of polygons still look jagged at this point. This is especially noticable when moving through a static scene looking at sharp edges. This is known as aliasing, and is caused by the conversion from the mathematical edge to a discrete set of pixels. We saw near the beginning of the course how to scan convert a line into the frame buffer, but at that point we only dealth with placing the pixel or not placing the pixel. Now we will deal with coverage. The mathematical line will likely not exactly cover pixel boundaries - some pixels will be mostly covered by the line (or edge), and others only slightly. Instead of making a yes/no decision we can assign a value to this coverage (from say 0 to 1) for each pixel and then use these values to blend the colour of the line (or edge) with the existing contents of the frame buffer. In OpenGL you give hints setting the hints for GL_POINT_SMOOTH_HINT, GL_LINE_SMOOTH_HINT, GL_POLYGON_SMOOTH_HINT to tell OpenGL to be GL_FASTEST or GL_NICEST to try and smooth things out using the alpha (transparency) values. You also need to enable or disable that smoothing glEnable(GL_POINT_SMOOTH); glEnable(GL_LINE_SMOOTH); glEnable(GL_POLYGON_SMOOTH); In the beginning of the semester we dealt with simple wireframe drawings of the models. The main reason for this is so that we did not have to deal with hidden surface removal. Now we want to deal with more sophisticated images so we need to deal with which parts of the model obscure other parts of the model. The following sets of images show a wireframe version, a wireframe version with hidden line removal, and a solid polygonal representation of the same object. If we do not have a way of determining which surfaces are visible then which surfaces are visible depends on the order in which they are drawn with surfaces being drawn later appearing in front of surfaces drawn previously as shown below: Here the fins on the back are visible because they are drawn after the body, the shadow is drawn on top of the monster because it is drawn last. Both legs are visible and the eyes just look really weird. General Principles We do not want to draw surfaces that are hidden. If we can quickly compute which surfaces are hidden, we can bypass them and draw only the surfaces that are visible. For example, if we have a solid 6 sided cube, at most 3 of the 6 sides are visible at any one time, so at least 3 of the sides do not even need to be drawn because they are the back sides. We also want to avoid having to draw the polygons in a particular order. We would like to tell the graphics routines to draw all the polygons in whatever order we choose and let the graphics routines determine which polygons are in front of which other polygons. With the same cube as above we do not want to have to compute for ourselves which order to draw the visible faces, and then tell the graphics routines to draw them in that order. The idea is to speed up the drawing, and give the programmer an easier time, by doing some computation before drawing. Unfortunately these computations can take a lot of time, so special purpose hardware is often used to speed up the process. Techniques Two types of approaches:   object space (object precision in our text) image space (image precision in our text) object space algorithms do their work on the objects themselves before they are converted to pixels in the frame buffer. The resolution of the display device is irrelevant here as this calculation is done at the mathematical level of the objects for each object a in the scene determine which parts of object a are visible (involves comparing the polyons in object a to other polygons in a and to polygons in every other object in the scene) image space algorithms do their work as the objects are being converted to pixels in the frame buffer. The resolution of the display device is important here as this is done on a pixel by pixel basis. for each pixel in the frame buffer determine which polygon is closest to the viewer at that pixel location colour the pixel with the colour of that polygon at that location As in our discussion of vector vs raster graphics earlier in the term the mathematical (object space) algorithms tended to be used with the vector hardware whereas the pixel based (image space) algorithms tended to be used with the raster hardware. When we talked about 3D transformations we reached a point near the end when we converted the 3D (or 4D with homogeneous coordinates) to 2D by ignoring the Z values. Now we will use those Z values to determine which parts of which polygons (or lines) are in front of which parts of other polygons. There are different levels of checking that can be done.    Object Polygon part of a Polygon There are also times when we may not want to cull out polygons that are behind other polygons. If the frontmost polygon is transparent then we want to be able to 'see through' it to the polygons that are behind it as shown below: Which objects are transparent in the above scene? Coherence We used the idea of coherence before in our line drawing algorithm. We want to exploit 'local similarity' to reduce the amount of computation needed (this is how compression algorithms work.)         Face - properties (such as colour, lighting) vary smoothly across a face (or polygon) Depth - adjacent areas on a surface have similar depths Frame - images at successive time intervals tend to be similar Scan Line - adjacent scan lines tend to have similar spans of objects Area - adjacent pixels tend to be covered by the same face Object - if objects are separate from each other (ie they do not overlap) then we only need to compare polygons of the same object, and not one object to another Edge - edges only disappear when they go behind another edge or face Implied Edge - line of intersection of 2 faces can be determined by the endpoints of the intersection Extents Rather than dealing with a complex object, it is often easier to deal with a simpler version of the object. in 2: a bounding box in 3d: a bounding volume (though we still call it a bounding box) We convert a complex object into a simpler outline, generally in the shape of a box and then we can work with the boxes. Every part of the object is guaranteed to fall within the bounding box. Checks can then be made on the bounding box to make quick decisions (ie does a ray pass through the box.) For more detail, checks would then be made on the object in the box. There are many ways to define the bounding box. The simplest way is to take the minimum and maximum X, Y, and Z values to create a box. You can also have bounding boxes that rotate with the object, bounding spheres, bounding cylinders, etc. Back-Face Culling Back-face culling (an object space algorithm) works on 'solid' objects which you are looking at from the outside. That is, the polygons of the surface of the object completely enclose the object. Every planar polygon has a surface normal, that is, a vector that is normal to the surface of the polygon. Actually every planar polygon has two normals. Given that this polygon is part of a 'solid' object we are interested in the normal that points OUT, rather than the normal that points in. OpenGL specifies that all polygons be drawn such that the vertices are given in counterclockwise order as you look at the visible side of polygon in order to generate the 'correct' normal. Any polygons whose normal points away from the viewer is a 'back-facing' polygon and does not need to be further investigated. To find back facing polygons the dot product of the surface normal of each polygon is taken with a vector from the center of projection to any point on the polygon. The dot product is then used to determine what direction the polygon is facing:    greater than 0 : back facing equal to 0 : polygon viewed on edge less than 0 : front facing Back-face culling can very quickly remove unnecessary polygons. Unfortunately there are often times when back-face culling can not be used. For example if you wish to make an open-topped box - the inside and the ouside of the box both need to be visible, so either two sets of polygons must be generated, one set facing out and another facing in, or back-face culling must be turned off to draw that object. in OpenGL back-face culling is turned on using: glCullFace(GL_BACK); glEnable(GL_CULL_FACE); Depth Buffer Early on we talked about the frame buffer which holds the colour for each pixel to be displayed. This buffer could contain a variable number of bytes for each pixel depending on whether it was a greyscale, RGB, or colour-indexed frame buffer. All of the elements of the frame buffer are initially set to be the background colour. As lines and polygons are drawn the colour is set to be the colour of the line or polygon at that point. T We now introduce another buffer which is the same size as the frame buffer but contains depth information instead of colour information. z-buffering (an image-space algorithm) is another buffer which maintains the depth for each pixel. All of the elements of the z-buffer are initially set to be 'very far away.' Whenever a pixel colour is to be changed the depth of this new colour is compared to the current depth in the zbuffer. If this colour is 'closer' than the previous colour the pixel is given the new colour, and the z-buffer entry for that pixrl is updated as well. Otherwise the pixel retains the old colour, and the z-buffer retains its old value. Here is a pseudo-code algorithm for each polygon for each pixel p in the polygon's projection { pz = polygon's normalized z-value at (x, y); //z ranges from -1 to 0 if (pz > zBuffer[x, y]) // closer to the camera { zBuffer[x, y] = pz; framebuffer[x, y] = colour of pixel p } } This is very nice since the order of drawing polygons does not matter, the algorithm will always display the colour of the closest point. The biggest problem with the z-buffer is its finite precision, which is why it is important to set the near and far clipping planes to be as close together as possible to increase the resolution of the z-buffer within that range. Otherwise even though one polygon may mathematically be 'in front' of another that difference may disappear due to roundoff error. These days with memory getting cheaper it is easy to implement a software z-buffer and hardware z-buffering is becoming more common. In OpenGL the z-buffer and frame buffer are cleared using: glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); In OpenGL z-buffering is turned on using: glEnable(GL_DEPTH_TEST); The depth-buffer is especially useful when it is difficult to order the polygons in the scene based on their depth, such as in the case shown below: Warnock's Algorithm Warnock's algorithm is a recursive area-subdivision algorithm. Warnock's algorithm looks at an area of the image. If is is easy to determine which polygons are visible in the area they are drawn, else the area is subdivided into smaller parts and the algorithm recurses. Eventually an area will be represented by a single non-intersecting polygon. At each iteration the area of interest is subdivided into four equal areas. Each polygon is compared to each area and is put into one of four bins:     Disjoint polygons are completely outside the area Intersecting polygons intersect the area Surrounding polygons completely contain the area Contained polygons are completely contained in the area For a given area: 1. If all polygons are disjoint then the background colour fills the area 2. If there is a single contained polygon or intersecting polygon then the background colour is used to fill the area, then the part of the polygon contained in the area is filled with the colour of that polygon 3. If there is a single surrounding polygon and no intersecting or contained polygons then the area is filled with the colour of the surrounding polygon 4. If there is a surrounding polygon in front of any other surrounding, intersecting, or contained polygons then the area is filled with the colour of the front surrounding polygon Otherwise break the area into 4 equal parts and recurse. At worst log base 2 of the max(screen width, screen height) recursive steps will be needed. At that point the area being looked at is only a single pixel which can't be divided further. At that point the distance to each polygon intersecting,contained in, or surrounding the area is computed at the center of the polygon to determine the closest polygon and its colour, Below is an example scanned out of the text, where the numbers refer to the numbered steps listed above: Here is a place where the use of bounding boxes can speed up the process. Given that the bounding box is always at least as large as the polygon, or object checks for contained and disjoint polygons can be made using the bounding boxes, while checks for interstecting and surrounding can not. Depth Sort Algorithm, a.k.a. The Painter's Algorithm The idea here is to go back to front drawing all the objects into the frame buffer with nearer objects being drawn over top of objects that are further away. Simple algorithm: 1. Sort all polygons based on their farthest z coordinate 2. Resolve ambiguities 3. draw the polygons in order from back to front This algorithm would be very simple if the z coordinates of the polygons were guaranteed never to overlap. Unfortunately that is usually not the case, which means that step 2 can be somewhat complex. Any polygons whose z extents overlap must be tested against each other. We start with the furthest polygon and call it P. Polygon P must be compared with every polygon Q whose z extent overlaps P's z extent. 5 comparisons are made. If any comparison is true then P can be written before Q. If at least one comparison is true for each of the Qs then P is drawn and the next polygon from the back is chosen as the new P. 1. 2. 3. 4. 5. do P and Q's x-extents not overlap. do P and Q's y-extents not overlap. is P entirely on the opposite side of Q's plane from the viewport. is Q entirely on the same side of P's plane as the viewport. do the projections of P and Q onto the (x,y) plane not overlap. If all 5 tests fail we quickly check to see if switching P and Q will work. Tests 1, 2, and 5 do not differentiate between P and Q but 3 and 4 do. So we rewrite 3 and 4 3 - is Q entirely on the opposite side of P's plane from the viewport. 4 - is P entirely on the same side of Q's plane as the viewport. If either of these two tests succeed then Q and P are swapped and the new P (formerly Q) is tested against all the polygons whose z extent overlaps it's z extent. If these two tests still do not work then either P or Q is split into 2 polygons using the plane of the other. These 2 smaller polygons are then put into their proper places in the sorted list and the algorithm continues. beware of the dreaded infinite loop. BSP Trees Another popular way of dealing with these problems (especially in games) are Binary Space Partition trees. It is a depth sort algorithm with a large amount of preprocessing to create a data structure to hold the polygons. First generate a 3D BSP tree for all of the polygons in the scene Then display the polygons according to their order in the scene 1. polygons behind the current node 2. the current node 3. polygons in front of the current node Each node in the tree is a polygon. Extending that polygon generates a plane. That plane cuts space into 2 parts. We use the front-facing normal of the polygon to define the half of the space that is 'in front' of the polygon. Each node has two children: the front children (the polygons in front of this node) and the back children (the polgons behind this noce) In doing this we may need to split some polygons into two. Then when we are drawing the polygons we first see if the viewpoint is in front of or behind the root node. Based on this we know which child to deal with first - we first draw the subtree that is further from the viewpoint, then the root node, then the subtree that is in front of the root node, recursively, until we have drawn all the polygons. Compared to depth sort it takes more time to setup but less time to iterate through since there are no special cases. If the position or orientation of the polygons change then parts of the tree will need to be recomputed. here is an example originally by Nicolas Holzschuch showing the construction and use of a BSP tree for 6 polygons. Scan Line Algorithm This is an extension of the algorithm we dealt with earlier to fill polygons one scan line at a time. This time there will be multiple polygons being drawn simultaneously. Again we create a global edge table for all non-horizontal edges sorted based on the edges smaller y coordinate. Each entry in the table contains:    x coordinate corresponding to the smallest y coordinate y coordinate of the other end of the edge deltaX = 1/m and a new entry  a way to identify which polygon this edge belongs to In the scan line algorithm we had a simple 0/1 variable to deal with being in or out of the polygon. Since there are multiple polygons here we have a Polygon Table. The Polygon Table contains:    coefficients of the equation of the plane of the polygon shading / colour information on this polygon in / out flag initialized to false Again the edges are moved from the global edge table to the active edge table when the scan line corresponding to the bottom of the edge is reached. Moving across a scan line the flag for a polygon is flipped when an edge of that polygon is crossed. If no flags are true then nothing is drawn If one flag is true then the colour of that polygon is used If more than one flag is true then the frontmost polygon must be determined. Below is an example from the textbook (figure red:13.11, white:15.34) Here there are two polygons ABC and DEF Scan Line --------- AET contents ------------ Comments -------- alpha beta gamma gamma+1 gamma+2 AB AB AB AB AB one two two two two AC AC DE DE CB FD CB CB DE FE FE FE FE polygon separate polygons overlapping polygons overlapping polygons separate polygons Simple Ray Tracing A simple ray-tracing algorithm can be used to find visible surfaces, as opposed to a more complicated algorithm that can be used to generate those o-so-pretty images. Ray tracing is an image based algorithm. For every pixel in the image, a ray is cast from the center of projection through that pixel and into the scene. The colour of the pixel is set to the colour of the object that is first encountered. Given a Center Of Projection Given a window on the viewplane for (each scan line) for (each pixel on the scan line) { compute ray from COP through pixel for (each object in scene) if (object is intersected by ray && object is closer than previous intersection) record (intersection point, object) set pixel's colour to the colour of object at intersection point } So, given a ray (vector) and an object the key idea is computing if and if so where does the ray intersect the object. the ray is represented by the vector from (Xo, Yo, Zo) at the COP, to (X1, Y1, Z1) at the center of the pixel. We can parameterize this vector by introducing t: X = Xo + t(X1 - Xo) Y = Yo + t(Y1 - Yo) Z = Zo + t(Z1 - Zo) or X = Xo + t(deltaX) Y = Yo + t(deltaY) Z = Zo + t(deltaZ) t equal to 0 represents the COP, t equal to 1 represents the pixel t < 0 represents points behind the COP t > 1 represents points on the other side of the view plane from the COP We want to find out what the value of t is where the ray intersects the object. This way we can take the smallest value of t that is in front of the COP as defining the location of the nearest object along that vector. The problem is that this can take a lot of time, especially if there are lots of pixels and lots of objects. The raytraced images in 'Toy Story' for example took at minimum 45 minutes and at most 20 hours for a single frame. So minimizing the number of comparisons is critical. - Bounding boxes can be used to perform initial checks on complicated objects - hierarchies of bounding boxes can be used where a successful intersection with a bounding box then leads to tests with several smaller bounding boxes within the larger bounding box. - The space of the scene can be partitioned. These partitions are then treated like buckets in a hash table, and objects within each partition are assigned to that partition. Checks can then be made against this constant number of partitions first before going on to checking the objects themselves. These partitions could be equal sized volumes, or contain equal numbers of polygons. Full blown ray tracing takes this one step further by reflecting rays off of shiny objects, to see what the ray hits next, and then reflecting the ray off of that, and so on, until a limiting number of reflections have been encountered. For transparent or semi-transparent objects, rays are passed through the object, taking into account any deflection or filtering that may take place ( e.g. through a colored glass bottle or chess piece ), again proceeding until some limit is met. Then the contributions of all the reflections and transmissions are added together to determine the final color value for each pixel. The resulting images can be incredibly beautiful and realistic, and usually take a LONG time to compute. Comparison From table 13.1 in the red book (15.3 in the white book) the relative performance of the various algorithms where smaller is better and the depth sort of a hundred polygons is set to be 1. # of polygonal faces in the scene Algorithm 100 250 60000 -------------------------------------------------Depth Sort 1 10 507 z-buffer 54 54 54 scan line 5 21 100 Warnock 11 64 307 This table is somewhat bogus as z-buffer performance degrades as the number of polygonal faces increases. To get a better sense of this, here are the number of polygons in the following models: 250 triangular polygons: 550 triangular polygons: 6,000 triangular polygons: (parrot by Christina Vasilakis) 8,000 triangular polygons: 10,000 triangular polygons: (ARPA Integration testbed space by Jason Leigh, Andrew Johnson, Mike Kelley) MultiMedia Introduction Multimedia has become an inevitable part of any presentation. It has found a variety of applications right from entertainment to education. The evolution of internet has also increased the demand for multimedia content. Definition Multimedia is the media that uses multiple forms of information content and information processing (e.g. text, audio, graphics, animation, video, interactivity) to inform or entertain the user. Multimedia also refers to the use of electronic media to store and experience multimedia content. Multimedia is similar to traditional mixed media in fine art, but with a broader scope. The term "rich media" is synonymous for interactive multimedia. Elements of Multimedia System Multimedia means that computer information can be represented through audio, graphics, image, video and animation in addition to traditional media(text and graphics). Hypermedia can be considered as one type of particular multimedia application. Features of Multimedia Multimedia presentations may be viewed in person on stage, projected, transmitted, or played locally with a media player. A broadcast may be a live or recorded multimedia presentation. Broadcasts and recordings can be either analog or digital electronic media technology. Digital online multimedia may be downloaded or streamed. Streaming multimedia may be live or ondemand. Multimedia games and simulations may be used in a physical environment with special effects, with multiple users in an online network, or locally with an offline computer, game system, or simulator. Applications of Multimedia Multimedia finds its application in various areas including, but not limited to, advertisements, art, education, entertainment, engineering, medicine, mathematics, business, scientific research and spatial, temporal applications. A few application areas of multimedia are listed below: Creative industries Creative industries use multimedia for a variety of purposes ranging from fine arts, to entertainment, to commercial art, to journalism, to media and software services provided for any of the industries listed below. An individual multimedia designer may cover the spectrum throughout their career. Request for their skills range from technical, to analytical and to creative. Commercial Much of the electronic old and new media utilized by commercial artists is multimedia. Exciting presentations are used to grab and keep attention in advertising. Industrial, business to business, and interoffice communications are often developed by creative services firms for advanced multimedia presentations beyond simple slide shows to sell ideas or liven-up training. Commercial multimedia developers may be hired to design for governmental services and nonprofit services applications as well. Entertainment and Fine Arts In addition, multimedia is heavily used in the entertainment industry, especially to develop special effects in movies and animations. Multimedia games are a popular pastime and are software programs available either as CD-ROMs or online. Some video games also use multimedia features. Multimedia applications that allow users to actively participate instead of just sitting by as passive recipients of information are called Interactive Multimedia. Education In Education, multimedia is used to produce computer-based training courses (popularly called CBTs) and reference books like encyclopaedia and almanacs. A CBT lets the user go through a series of presentations, text about a particular topic, and associated illustrations in various information formats. Edutainment is an informal term used to describe combining education with entertainment, especially multimedia entertainment. Engineering Software engineers may use multimedia in Computer Simulations for anything from entertainment to training such as military or industrial training. Multimedia for software interfaces are often done as collaboration between creative professionals and software engineers. Industry In the Industrial sector, multimedia is used as a way to help present information to shareholders, superiors and coworkers. Multimedia is also helpful for providing employee training, advertising and selling products all over the world via virtually unlimited web-based technologies. Multimedia Building Blocks Any multimedia application consists any or all of the following components : Text : Text and symbols are very important for communication in any medium. With the recent explosion of the Internet and World Wide Web, text has become more the important than ever. Web is HTML (Hyper text Markup language) originally designed to display simple text documents on computer screens, with occasional graphic images thrown in as illustrations. Audio : Sound is perhaps the most element of multimedia. It can provide the listening pleasure of music, the startling accent of special effects or the ambience of a mood-setting background. 3. Images : Images whether represented analog or digital plays a vital role in a multimedia. It is expressed in the form of still picture, painting or a photograph taken through a digital camera. 4. Animation : Animation is the rapid display of a sequence of images of 2-D artwork or model positions in order to create an illusion of movement. It is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in a number of ways. 5. Video : Digital video has supplanted analog video as the method of choice for making video for multimedia use. Video in multimedia are used to portray real time moving pictures in a multimedia project. Text in Multimedia Words and symbols in any form, spoken or written, are the most common system of communication. They deliver the most widely understood meaning to the greatest number of people. Most academic related text such as journals, e-magazines are available in the Web Browser readable form. Multimedia Sound Systems The multimedia application user can use sound right off the bat on both the Macintosh and on a multimedia PC running Windows because beeps and warning sounds are available as soon as the operating system is installed. On the Macintosh you can choose one of the several sounds for the system alert. In Windows system sounds are WAV files and they reside in the windows\Media subdirectory. There are still more choices of audio if Microsoft Office is installed. Windows makes use of WAV files as the default file format for audio and Macintosh systems use SND as default file format for audio. Digital Audio Digital audio is created when a sound wave is converted into numbers – a process referred to as digitizing. It is possible to digitize sound from a microphone, a synthesizer, existing tape recordings, live radio and television broadcasts, and popular CDs. You can digitize sounds from a natural source or prerecorded. Making MIDI Audio MIDI (Musical Instrument Digital Interface) is a communication standard developed for electronic musical instruments and computers. MIDI files allow music and sound synthesizers from different manufacturers to communicate with each other by sending messages along cables connected to the devices. Audio File Formats A file format determines the application that is to be used for opening a file. Following is the list of different file formats and the software that can be used for opening a specific file. 1. *.AIF, *.SDII in Macintosh Systems 2. *.SND for Macintosh Systems 3. *.WAV for Windows Systems 4. MIDI files – used by north Macintosh and Windows 5. *.WMA –windows media player 6. *.MP3 – MP3 audio 7. *.RA – Real Player 8. *.VOC – VOC Sound 9. AIFF sound format for Macintosh sound files 10. *.OGG – Ogg Vorbis Software used for Audio Software such as Toast and CD-Creator from Adaptec can translate the digital files of red book Audio format on consumer compact discs directly into a digital sound editing file, or decompress MP3 files into CD-Audio. There are several tools available for recording audio. Following is the list of different software that can be used for recording and editing audio ; Soundrecorder fromMicrosoft Apple’s QuickTime Player pro Sonic Foundry’s SoundForge for Windows Soundedit16 Digital Image A digital image is represented by a matrix of numeric values each representing a quantized intensity value. When I is a two-dimensional matrix, then I(r,c) is the intensity value at the position corresponding to row r and column c of the matrix. The points at which an image is sampled are known as picture elements, commonly abbreviated as pixels. The pixel values of intensity images are called gray scale levels (we encode here the “color” of the image). The intensity at each pixel is represented by an integer and is determined from the continuous image by averaging over a small neighborhood around the pixel location. If there are just two intensity values, for example, black, and white, they are represented by the numbers 0 and 1; such images are called binary-valued images. If 8-bit integers are used to store each pixel value, the gray levels range from 0 (black) to 255 (white). Bitmaps A bitmap is a simple information matrix describing the individual dots that are the smallest elements of resolution on a computer screen or other display or printing device. A onedimensional matrix is required for monochrome (black and white); greater depth (more bits of information) is required to describe more than 16 million colors the picture elements may have, as illustrated in following figure. The state of all the pixels on a computer screen make up the image seen by the viewer, whether in combinations of black and white or colored pixels in a line of text, a photograph-like picture, or a simple background pattern. Clip Art A clip art collection may contain a random assortment of images, or it may contain a series of graphics, photographs, sound, and video related to a single topic. For example, Corel, Micrografx, and Fractal Design bundle extensive clip art collection with their image-editing software. Image File Formats There are many file formats used to store bitmaps and vectored drawing. Following is a list of few image file formats. Format Extension Microsoft Windows DIB .bmp .dib .rle Microsoft Palette .pal Autocad format 2D .dxf JPEG .jpg Windows Meta file .wmf Portable network graphic png Compuserve gif .gif Apple Macintosh .pict .pic .pct Principles of Animation Animation is the rapid display of a sequence of images of 2-D artwork or model positions in order to create an illusion of movement. It is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in a number of ways. The most common method of presenting animation is as a motion picture or video program, although several other forms of presenting animation also exist. Animation is possible because of a biological phenomenon known as persistence of vision and a psychological phenomenon called phi. An object seen by the human eye remains chemically mapped on the eye’s retina for a brief time after viewing. Combined with the human mind’s need to conceptually complete a perceived action, this makes it possible for a series of images that are changed very slightly and very rapidly, one after the other, to seemingly blend together into a visual illusion of movement. The following shows a few cells or frames of a rotating logo. When the images are progressively and rapidly changed, the arrow of the compass is perceived to be spinning. Television video builds entire frames or pictures every second; the speed with which each frame is replaced by the next one makes the images appear to blend smoothly into movement. To make an object travel across the screen while it changes its shape, just change the shape and also move or translate it a few pixels for each frame. Animation File Formats Some file formats are designed specifically to contain animations and the can be ported among application and platforms with the proper translators. Director *.dir, *.dcr AnimationPro *.fli, *.flc 3D Studio Max *.max SuperCard and Director *.pics CompuServe *.gif Flash *.fla, *.swf Following is the list of few Software used for computerized animation: 3D Studio Max Flash AnimationPro Video Analog versus Digital Digital video has supplanted analog video as the method of choice for making video for multimedia use. While broadcast stations and professional production and postproduction houses remain greatly invested in analog video hardware (according to Sony, there are more than 350,000 Betacam SP devices in use today), digital video gear produces excellent finished products at a fraction of the cost of analog. A digital camcorder directly connected to a computer workstation eliminates the image-degrading analog-to-digital conversion step typically performed by expensive video capture cards, and brings the power of nonlinear video editing and production to everyday users. Multimedia Authoring Tools Multimedia authoring tools provide the important framework you need for organizing and editing the elements of multimedia like graphics, sounds, animations and video clips. Authoring tools are used for designing interactivity and the user interface, for presentation your project on screen and assembling multimedia elements into a single cohesive project. Features of Authoring Tools Features of multimedia authoring tools are as mention below: 1. Editing features:- The elements of multimedia – image, animation, text, digital audio and MIDI music and video clips – need to be created, edited and converted to standard file formats and the specialized applications provide these capabilities. Editing tools for these elements, particularly text and still images are often included in your authoring system. 2. Organizing features:- The organization, design and production process for multimedia involves storyboarding and flowcharting. Some authoring tools provide a visual flowcharting system or overview facility for illustrating your 3. 4. 5. 6. 7. project’s structure at a macro level. Storyboards or navigation diagrams too can help organize a project. Because designing the interactivity and navigation flow of you project often requires a great deal of planning and programming effort, your story board should describe not just graphics of each screen but the interactive elements as well. Features that help organize your material, such as those provided by Super Edit, Authorware, IconAuthor and other authoring systems, are a plus. Programming features:- Authoring tools that offer a very high level language or interpreted scripting environment for navigation control and for enabling user inputs – such as Macromedia Director, Macromedia Flash, HyperCard, MetaCard and ToolBook are more powerful. The more commands and functions provided in the scripting language, the more powerful the authoring system. As with traditional programming tools looks for an authoring package with good debugging facilities, robust text editing and online syntax reference. Other scripting augmentation facilities are advantages as well. In complex projects you may need to program custom extensions of the scripting language for direct access to the computer’s operating system. Some authoring tools offer direct importing of preformatted text, including facilities, complex text search mechanisms and hyper linkage tools. These authoring systems are useful for development of CD-ROM information products online documentation products, online documentation and help systems and sophisticated multimedia enhanced publications. With script you can perform computational tasks; sense and respond to user input; create character, icon and motion animation; launch other application; and control external multimedia devices. Interactivity features:- Interactivity empowers the end users of your project by letting them control the content and flow of information. Authoring tools should provide one or more levels of interactivity: Simple branching, which offers the ability to go to another section of the multimedia production. Conditional branching, which supports a go-to based on the result of IF-THEN decision or events. A structured language that supports complex programming logic, such as nested IF-THENs, subroutines, event tracking and message passing among objects and elements. Performance tuning features:- Complex multimedia projects require extra synchronization of events. Accomplishing synchronization is difficult because performance varies widely among the different computers used for multimedia development and delivery. Some authoring tools allow you to lock a production’s playback speed to specified computer platform, but other provides no ability what so ever to control performance on various systems. Playback features:- When you are developing multimedia project, your will continually assembling elements and testing to see how the assembly looks and performs. Your authoring system should let you build a segment or part of your project and then quickly test it as if the user were actually using it. Delivery features:- Delivering your project may require building a run-time version of the project using the multimedia authoring software. A run-time version allows your project to play back with out requiring the full authoring software and all its tools and editors. Many times the run time version does not allow user to access or change the content, structure and programming of the project. If you are going to distribute your project widely, you should distribute it in the run-time version. 8. Cross-Platform features:- It is also increasingly important to use tools that make transfer across platforms easy. For many developers, the Macintosh remains the multimedia authoring platform of choice, but 80% of that developer’s target market may be Windows platforms. If you develop on a Macintosh, look for tools that provide a compatible authoring system for Windows or offer a run-time player for the other platform. 9. Internet Playability:- Due to the Web has become a significant delivery medium for multimedia, authoring systems typically provide a means to convert their output so that it can be delivered within the context of HTML or DHTML, either with special plug-in or embedding Java, JavaScript or other code structures in the HTML document. JPEG(Joint Photographic Expert Group) JPEG baseline compression algorithm using the image on the right as an example. (JPEG is an acronym for "Joint Photographic Experts Group.") Some compression algorithms are lossless for they preserve all the original information. Others, such as the JPEG baseline algorithm, are lossy-some of the information is lost, but only information that is judged to be insignificant. Home pages for the JPEG committee and JPEG 2000 committee The JPEG compression algorithm First, the image is divided into 8 by 8 blocks of pixels. Since each block is processed without reference to the others, we'll concentrate on a single block. In particular, we'll focus on the block highlighted below. Here is the same block blown up so that the individual pixels are more apparent. Notice that there is not tremendous variation over the 8 by 8 block (though other blocks may have more). Remember that the goal of data compression is to represent the data in a way that reveals some redundancy. We may think of the color of each pixel as represented by a three-dimensional vector (R,G,B) consisting of its red, green, and blue components. In a typical image, there is a significant amount of correlation between these components. For this reason, we will use a color space transform to produce a new vector whose components represent luminance, Y, and blue and red chrominance, Cb and Cr. The luminance describes the brightness of the pixel while the chrominance carries information about its hue. These three quantities are typically less correlated than the (R, G, B) components. Furthermore, psychovisual experiments demonstrate that the human eye is more sensitive to luminance than chrominance, which means that we may neglect larger changes in the chrominance without affecting our perception of the image. Since this transformation is invertible, we will be able to recover the (R,G,B) vector from the (Y, Cb, Cr) vector. This is important when we wish to reconstruct the image. (To be precise, we usually add 128 to the chrominance components so that they are represented as numbers between 0 and 255.) When we apply this transformation to each pixel in our block we obtain three new blocks, one corresponding to each component. These are shown below where brighter pixels correspond to larger values. Y Cb Cr As is typical, the luminance shows more variation than the the chrominance. For this reason, greater compression ratios are sometimes achieved by assuming the chrominance values are constant on 2 by 2 blocks, thereby recording fewer of these values. For instance, the image editing software Gimp provides the following menu when saving an image as a JPEG file: The Discrete Cosine Transform Now we come to the heart of the compression algorithm. Our expectation is that, over an 8 by 8 block, the changes in the components of the (Y, Cb, Cr) vector are rather mild, as demonstrated by the example above. Instead of recording the individual values of the components, we could record, say, the average values and how much each pixel differs from this average value. In many cases, we would expect the differences from the average to be rather small and hence safely ignored. This is the essence of the Discrete Cosine Transform (DCT), which will now be explained. We will first focus on one of the three components in one row in our block and imagine that the eight values are represented by f0, f1, ..., f7. We would like to represent these values in a way so that the variations become more apparent. For this reason, we will think of the values as given by a function fx, where x runs from 0 to 7, and write this function as a linear combination of cosine functions: Don't worry about the factor of 1/2 in front or the constants Cw (Cw = 1 for all w except C0 = ). What is important in this expression is that the function fx is being represented as a linear combination of cosine functions of varying frequencies with coefficients Fw. Shown below are the graphs of four of the cosine functions with corresponding frequencies w. w=0 w=1 w=2 w=3 Of course, the cosine functions with higher frequencies demonstrate more rapid variations. Therefore, if the values fx change relatively slowly, the coefficients Fw for larger frequencies should be relatively small. We could therefore choose not to record those coefficients in an effort to reduce the file size of our image. The DCT coefficients may be found using Notice that this implies that the DCT is invertible. For instance, we will begin with fx and record the values Fw. When we wish to reconstruct the image, however, we will have the coefficients Fw and recompute the fx. Rather than applying the DCT to only the rows of our blocks, we will exploit the twodimensional nature of our image. The Discrete Cosine Transform is first applied to the rows of our block. If the image does not change too rapidly in the vertical direction, then the coefficients shouldn't either. For this reason, we may fix a value of w and apply the Discrete Cosine Transform to the collection of eight values of Fw we get from the eight rows. This results in coefficients Fw,u where w is the horizontal frequency and u represents a vertical frequency. We store these coefficients in another 8 by 8 block as shown: Notice that when we move down or to the right, we encounter coefficients corresponding to higher frequencies, which we expect to be less significant. The DCT coefficients may be efficiently computed through a Fast Discrete Cosine Transform, in the same spirit that the Fast Fourier Transform efficiently computes the Discrete Fourier Transform. Quantization Of course, the coefficients Fw,u, are real numbers, which will be stored as integers. This means that we will need to round the coefficients; as we'll see, we do this in a way that facilitates greater compression. Rather than simply rounding the coefficients Fw,u, we will first divide by a quantizing factor and then record round(Fw,u / Qw,u) This allows us to emphasize certain frequencies over others. More specifically, the human eye is not particularly sensitive to rapid variations in the image. This means we may deemphasize the higher frequencies, without significantly affecting the visual quality of the image, by choosing a larger quantizing factor for higher frequencies. Remember also that, when a JPEG file is created, the algorithm asks for a parameter to control the quality of the image and how much the image is compressed. This parameter, which we'll call q, is an integer from 1 to 100. You should think of q as being a measure of the quality of the image: higher values of q correspond to higher quality images and larger file sizes. From q, a quantity is created using Here is a graph of as a function of q: Notice that higher values of q give lower values of round(Fw,u / . We then round the weights as Qw,u) Naturally, information will be lost through this rounding process. When either or Qw,u is increased (remember that large values of correspond to smaller values of the quality parameter q), more information is lost, and the file size decreases. Here are typical values for Qw,u recommended by the JPEG standard. First, for the luminance coefficients: and for the chrominance coefficients: These values are chosen to emphasize the lower frequencies. Let's see how this works in our example. Remember that we have the following blocks of values: Y Cb Cr Quantizing with q = 50 gives the following blocks: Y Cb Cr The entry in the upper left corner essentially represents the average over the block. Moving to the right increases the horizontal frequency while moving down increases the vertical frequency. What is important here is that there are lots of zeroes. We now order the coefficients as shown below so that the lower frequencies appear first. In particular, for the luminance coefficients we record 20 -7 1 -1 0 -1 1 0 0 0 0 0 0 0 -2 1 1 0 0 0 0 ... 0 Instead of recording all the zeroes, we can simply say how many appear (notice that there are even more zeroes in the chrominance weights). In this way, the sequences of DCT coefficients are greatly shortened, which is the goal of the compression algorithm. In fact, the JPEG algorithm uses extremely efficient means to encode sequences like this. When we reconstruct the DCT coefficients, we find Original Reconstruct ed Y Cb Cr Reconstructing the image from the information is rather straightforward. The quantization matrices are stored in the file so that approximate values of the DCT coefficients may be recomputed. From here, the (Y, Cb, Cr) vector is found through the Inverse Discrete Cosine Transform. Then the (R, G, B) vector is recovered by inverting the color space transform. Here is the reconstruction of the 8 by 8 block with the parameter q set to 50 Original Reconstructed (q = 50) and, below, with the quality parameter q set to 10. As expected, the higher value of the parameter q gives a higher quality image. Original Reconstructed (q = 10) JPEG 2000 While the JPEG compression algorithm has been quite successful, several factors created the need for a new algorithm, two of which we will now describe. First, the JPEG algorithm's use of the DCT leads to discontinuities at the boundaries of the 8 by 8 blocks. For instance, the color of a pixel on the edge of a block can be influenced by that of a pixel anywhere in the block, but not by an adjacent pixel in another block. This leads to blocking artifacts demonstrated by the version of our image created with the quality parameter q set to 5 (by the way, the size of this image file is only 1702 bytes) and explains why JPEG is not an ideal format for storing line art. In addition, the JPEG algorithm allows us to recover the image at only one resolution. In some instances, it is desirable to also recover the image at lower resolutions, allowing, for instance, the image to be displayed at progressively higher resolutions while the full image is being downloaded. To address these demands, among others, the JPEG 2000 standard was introduced in December 2000. While there are several differences between the two algorithms, we'll concentrate on the fact that JPEG 2000 uses a wavelet transform in place of the DCT. Before we explain the wavelet transform used in JPEG 2000, we'll consider a simpler example of a wavelet transform. As before, we'll imagine that we are working with luminance-chrominance values for each pixel. The DCT worked by applying the transform to one row at a time, then transforming the columns. The wavelet transform will work in a similar way. To this end, we imagine that we have a sequence f0, f1, ..., fn describing the values of one of the three components in a row of pixels. As before, we wish to separate rapid changes in the sequence from slower changes. To this end, we create a sequence of wavelet coefficients: Notice that the even coefficients record the average of two successive values--we call this the low pass band since information about high frequency changes is lost--while the odd coefficients record the difference in two successive values--we call this the high pass band as high frequency information is passed on. The number of low pass coefficients is half the number of values in the original sequence (as is the number of high pass coefficients). It is important to note that we may recover the original f values from the wavelet coefficients, as we'll need to do when reconstructing the image: We reorder the wavelet coefficients by listing the low pass coefficients first followed by the high pass coefficients. Just as with the 2-dimensional DCT, we may now apply the same operation to transform the wavelet coefficients vertically. This results in a 2-dimensional grid of wavelet coefficients divided into four blocks by the low and high pass bands: As before, we use the fact that the human eye is less sensitive to rapid variations to deemphasize the rapid changes seen with the high pass coefficients through a quantization process analagous to that seen in the JPEG algorithm. Notice that the LL region is obtained by averaging the values in a 2 by 2 block and so represents a lower resolution version of the image. In practice, our image is broken into tiles, usually of size 64 by 64. The reason for choosing a power of 2 will be apparent soon. We'll demonstrate using our image with the tile indicated. (This tile is 128 by 128 so that it may be more easily seen on this page.) Notice that, if we transmit the coefficients in the LL region first, we could reconstruct the image at a lower resolution before all the coefficients had arrived, one of aims of the JPEG 2000 algorithm. We may now perform the same operation on the lower resolution image in the LL region thereby obtaining images of lower and lower resolution. The wavelet coefficients may be computed through a lifting process like this: The advantage is that the coefficients may be computed without using additional computer memory--a0 first replaces f0 and then a1 replaces f1. Also, in the wavelet transforms that are used in the JPEG 2000 algorithm, the lifting process enables faster computation of the coefficients. MPEG The most common form of the video signal in use today is still analog. This signal is obtained through a process known as scanning. In this section the analog representation of the video signal and its disadvantages are discussed.This part also describes the need towards digital representation of video signal. After describing the need for compression of video signal, this paper describes the MPEG compression technique for video signals. Analog Video Signal Analog signal is obtained through a process known as scanning. This is shown in Figure 2. Scanning records the intensity values of the spatio-temporal signal only in the h direction. This signal is coupled with the horizontal and vertical synchronization pulses to yield the complete video signal. Scanning can be either progressive or interlaced. Progressive scanning scans all the horozontal lines to form the complete frame. In the interlaced scanning, the even and the odd horizontal lines of a picture are scanned seperately yielding the two fields of a picture. There are three main analog video standards. Composite. Component. S-Video. In the composit standard, the luminance and the two chrominance components are encoded together as a single signal. This is in contrast to the component standard, where the three components are coded as three distinct signals. The S-Video consists of seperate Y and C analog video signals. Today, the technology is attempting to integrate the video, computer and telecommunication industry together on a single mutimedia platform. The video signal is required to be scalable, platform independent, able to provide interactivity, and be robust. The analog unfortunately fails to address these requirements. Moving to digital not only eliminates most of the above mentioned problems but also opens door to a whole range of digital video processing techniques which can make the picture sharper. Digital Video Signal To digitize the spatio-temporal signal x(h,v,t), usually, the component form of the analog signal is sampled in all three directions. Each sample point in a frame is called a pixel. Sampling in the horizontal direction yields the pixels per line, which defines the horizontal resolution of the picture. Vertical resolution is controlled by sampling vertically. Temporal sampling determines the frame rate. Digital video too has its share of bottlenecks. The most important one is the huge bandwidth requirement. Inspite of being digital, it thus still need to stored. The logical solution to this problem is digital video compression. MPEG Compression Standard Compression aims at lowering the total number of parameters required to represent the signal, while maintaining good quality. These parameters are then coded for transmission or storage. A result of compressing digital video is that it becomes available as computer data, ready to transmitted over existing communication networks. There are many different redandancies present in the video signal data. spatial. Temporal. Psychovisual. Coding. Spatial redandancy occurs because neighboring pixels in each individual frame of a video signal are related. The pixels in consecitive frames of signal are also related, leading to temporal redundancy. The human visual system does not treat all the visual information with equal sensitivity, leading to psychovisual redundancy. Finally, not all parameters occur with the same probability in an image. As a result, they would not require equal number of bits to code them (Huffman coding). There are several different compression standards around today (CCITT recomandation H. 261). MPEG, which stands for moving pictures experts groups, is a joint coommitte of the OSI and IEC. It has been responsible for the MPEG-1(ISO/IEC 11172) and MPEG-2(ISO/IEC 13818) standards in the past and is currently developing the MPEG-4 standard. MPEG standards are generic and universal. There are three main parts in the MPEG-1 and MPEG-2 specifications, namely, Systems, Video and Audio. The Video part defines the syntax and semantics of the compressed video bitstream. The Audio part defines the same for audio bitstream, while the System part specifies the combination of one or more elementary streams of video and audio, as well as other data, into a single or multiple streams which are suitable for storage or transmision. The MPEG-2 standard consists of a fourth part called DSMCC, which defines a set of protocols for the retrieval and storage of MPEG data. We shall now examine the structure of a nonscalable video bitsream in some deatil to understand the video compression. The video bitstream consists of video sequences. Each video sequence consists of a variable number of group of pictures(GOP). A GOP contains a variable number of pictures(p), Figure 3. Mathematically, each picture is really an union of the pixel values of the luminance and the two chrominance components. The picture can also be subsampled at a lower resolution in the chrominance domain because the human eye is less sensitive to high frequency color shifts(more rods than cones on the retina). There are three formats: 1. 4:4:4---the chrominance and luminance planes are subsampled at the same resolution. 2. 4:2:2---the chrominance planes are subsampled at half resolution in the horizontal direction. 3. 4:2:0---the chrominance information is sub-sampled at half the rate both vertically and horizontally. These formats are shown in Format.fig. Pictures can be devided into three main types based on their compression schemes. I or Intra pictures. P or Predicted pictures. B or Bidirectional pictures. The frames that can be predicted from previous frames are called P-frames. But what happens if transmission errors occur in a sequnce of P-frames?. To avoid the propagation of transmission errors and to allow periodic resynchronization, a complete frame which does not rely on information from other frames is transmitted approximately once every 12 frames. These stand-alone frames are "intra coded" and are called I-frames. The coding technique for I pictures falls in the category of transform coding. Each picture is divided into 8x8 non-overlapping pixels blocks. Four of these blocks are additionally arranged into a bigger block of size 16x16, called macroblock. The DCT is applied to each 8x8 block individually, Figure 4. This transform converts the data into series of coefficients which represent the magnitudes of the cosine functions at increasing freqncies. The quantization process allows the high energy, low frequency coefficients to be coded with greater number of bits, while using fewer or zero bits for the high freuency coefficients. Retaining only a subset of the coeffients reduces the total number of parameters needed for representation by a substantial amount. The quantization process also helps in allowing the encoder to output bitstreams at specified bitrate. The DCT oefficients are coded using a combination of two special coding schemes- Run length and Huffman. The coefficients are scaned in a zigzag pattern to create a 1-D sequence. MPEG-2 provides an alternative scanning method. The resulting 1-D sequence usually contains a large number of zeros due to DCT and the quantization process. Each non-zero coefficients is assosiated with a pair of pointers. First, its position in the block which is indicated by the number of zeros between itself and the prevoius non-zero coefficient. Second, its coefficient value. Based on these two pointers, it is given a variable length code from a lookup table. This is done in a manner so that a highly probable combination gets a code with fewer bits, while the unlikely ones get longer codes. However, since spatial redundandancy is limitted, the I Pictures provide only moderate compression. The P and B pictures are where MPEG derives its maximum compression efficiency. It is done by a technique called motion compensation(MC) based prediction, which exploits the temporal redundancy. Since frames are closely related, it is assumed that a current picture can be modelled as a translation of the picture at the previous time. It is possible then to accurately "predict" the data of one frame based on the data of a previous frame. In P pictures, each 16x16 sized macroblocks is predicted from macroblock of previously encoded I picture. Since, frames are snapshots in time of a moving object, the macroblocks in the two frames may not correspond to the same spatial location. The encoder searches the previous frame(for P-frames, or the frames before and after for B-frames) in half pixel increments for other macroblock locations that are a close match to the information that is contained in the current macroblock. The displacements in the horizontal and vertical directions of the best match macroblocks from the cosited macroblock are called Motion vectors, Figure 5. If no matching macroblocks are found in the neighboring region, the macroblock is intra coded and the DCT coefficients are encoded. if a matching block is found in the search region the coefficients are not transmitted, but a motion vector is used instead. The motion vectors can also be used for motion prediction in case of corrupted data, and sophisticated decoder algorithms can use these vectors for error concealment( refer to article1). For B pictures, MC prediction and interpolation is performed using reference frames present on either side of it. Compared to I and P, B pictures provides the maximum compression. There are other advantages of B pictures.   Reduction of noise due to the averaging process. Use of future pictures for coding. B pictures themselves never used for predictions and hence do not propagate errors. MPEG-2 allows for both frame and field based MC. Field based MC is spatially useful when the video signal includes fast motion.