>> so, i'd like to welcome professor avidehzakhor from uc-berkeley. she's been a professor there since 1998 and actually started workingon the project she's going to show us today since about the year 2000 in automatic 3dreconstruction of city models. and this is something, obviously, we're very interestedin and this tape is--this talk is going to be broadcast externally. so if you have anyquestions of a confidential nature, save those till after the talk. okay, thank you. okay.>> zakhor: thank you very much. can everybody hear me okay? great. so i thought what wewould do is just start with a little demo. what i'm going to talk about today is a methodthat we've been developing at uc-berkeley over the last six years or so for automaticallybuilding three dimensional models of cities.
and we've inserted some of those models insidegoogle earth which i know some--most of you are probably familiar with. so i'm askingsteve, why don't you navigate in and out a little bit so people can see the model withinthe bigger context--zoom out. they can see the model within the bigger context of googleearth. so if you--if you look, you see that the model is taking all right here and this3d thing and the rest of it is just your--the satellite imagery from google earth. and now,we can zoom in to go to berkeley. this is downtown berkeley. this is shattuck street.that's the bart station--entrance of the bart station. this is the fargo bank building.and just to make sure that you get a sense of the 3d, why don't we turn the downtownberkeley model off and this is what you would've
gotten with google earth. and now, turn iton again. as you can see that there's a real sense of three dimensionality that gets addedon and off. so when to navigate a little bit more around campus and maybe now turn on thecampus data. so this model for downtown berkeley, as you see in a minute, was collect--was generatedby using both ground basically there an airborne. now, we've also generated the campus datawhich is only uses airborne data and that's down here. okay. so, we'll talk about thata little bit and why don't you turn that on and off just real quickly. okay. so, i thinki'll stop now with the google earth and get on with the main part of the talk when weexit all of that. and i'm happy to--at the end of my talk, i'll send you pointers. youcan download these and add it to your own
google earth and play with it. why don't you[indistinct] up there? >> yeah.>> zakhor: ...okay. okay. why don't you trial my powerpoint now? okay. great. so what i'mgoing to talk about for the next maybe half hour or 30 minutes or so is a quick overviewof how we generated some of these models. this is all done at my lab at berkeley whichis called the vip lab, video and image processing. is this--how would we down[indistinct]>> oh, here we go. >> zakhor: which button do i push?>> [indistinct]. >> zakhor: okay. so before going into thedetails of that, i wanted to quickly give an overview of other projects that has beengoing on in my lab over the last few years.
being--google, i can't--not talk about videosimilarity search. this is a thesis that was done by samson cheung in 2004 where we recognizedand a very, very large database of videos, nearly identical videos. there's quite a bitof work on multimedia networking and streaming especially over wireless. some work on compressionof vlsi data and finally, 3d modeling of urban environments which is something i'm goingto talk about right now. so a bit of acknowledgement, this work was sponsored by aro, army researchoffice, under a university research investigation program, muri, from 2000-2006. google, generally,they started to supporting 2006 and we're very grateful for that. we've also receivedwhat's called the durip equipment grant from afosr in 2006 and whole hosts of postdocs,graduate students, undergrads and research
staff had been working on this and three ofmy students are here and you're welcome to talk to them after the talk if you're interested.so the goal is to generate the three-dimensional city models that are useful for both virtualwalk-throughs, drive-throughs and fly-throughs and we wanted to be as photorealistic as possible.and there's a whole host of applications and i'm sure for this crowd, i don't really haveto motivate the problem too much. and our objectives has been to do it in an automatedway, fast, scalable and photorealistic. now, just as a way of background, there's beena lot of other work on city modeling. for example, seth teller at mit, developed thesystem where we would park your apparatus of a particular location in front of the buildingin the city, scan for 30 minutes or 20 minutes
and move to the next building, scan it, etc.what differentiates this work from existing work in the literature is that we actuallyacquire our data in a very fast, non-stop and go fashion. all right. and so that enablesus to generate models in very quickly. you'll see that in a little bit more detail. so theapproach that we've taken for full blown modeling is really come up with two models. a modelthat's been generated by acquisition vehicle that drives on the ground like a truck. andthat results on what we call ground-based modeling and that really models the faã§adesof the buildings. and then we do what's called airborne modeling where we have helicoptersas well as airplanes collecting laser data as well as imagery from the top in order tobuild the 3d model of the rooftops. and we
convert--we merge those two things to comeup with the 3d city model. and so a big chunk of our effort has to do with registering thesedifferent sources of data and fusing it together till they're all lined up and get somethingnice and good. so i'm going to talk very briefly what the ground-based modeling. this is ouracquisition vehicle parked in front of cory hall at uc-berkeley. it consists of this boardhere that has three things. it has two sick laser scanners. one of them is vertical, theother one is horizontal. and it has a camera, a sony camera and the data from all of thesethings is connected through these wires to a pc that's sitting in the back of the truckgetting powered by the battery on the truck itself. and as i said, we called this drive-byscanning. we collect the data as we're driving
on the normal traffic conditions in the roads.we don't stop and go. so here's the basic system. you'll have the vertical laser scannerthat takes a swipe like this vertically as the truck is driving. and the idea is to stackup this vertical laser scans next to each other in order to build the 3d profile ofthe faã§ade of the building. okay. but in order to know how far apart you stack thesevertical scans, you need to know how much the truck move. you essentially have to localizeyour truck. and that's not a terribly easy problem and the way to solve this localizationproblem is to use a horizontal laser scanner and successively match horizontal laser scansin order to deduce the movement of the truck so that we can then stack these vertical scansat appropriate little distance from each other.
and synchronized with all of these thingsis a camera that acquires texture together with the laser scans as we move along. sothis shows the process of pose estimation, in other words, localization of the truckusing horizontal laser scans. so this is a visualization of horizontal laser scan andlet's say, time [indistinct]. this is the next time step t1 and what do you want todo is you want to come up with the translation and rotation parameters that makes these twoscans match with each other. and to do that, you solve an optimization problem which iwon't go into the details of it. but once you do that, then you can--you can recoverthese little vectors delta u, delta v and delta phi concatenate these in order to reconstructthe path. and this was the reconstructive
path by successive matching of the horizontallaser scan. and if you then use this reconstructive path to superimpose the horizontal scans,you get this blue line. and ideally, you'd like to have this blue line to be as thinas possible so that the successive horizontal laser scans match each other. so if they weren'tmatching, this blue line would be very, very thick. so, this method is okay but it's notgreat. if you start at location 1 and drive all the way to location 2--and this is thereconstructed path that you get, the red. and what's underneath is, is the digital surfacemap that we've obtained aerially, in other words, a height field of the area underneath.then you can see these are the streets in a manhattan structure. and as you can see,the red path doesn't--goes all over the buildings,
it doesn't really follow the streets verywell. so, something has gone wrong. we need to do--what's gone wrong is that, especiallyat turns, another situations, these--the method doesn't quite work and errors also accumulate.and to prevent errors from accumulating, you have to do what's--what we call global correction.and that's exactly what we did next. and the way you do that, is you say, "okay, i'm goingto start with an--either an aerial picture or an aerial dsm, digital surface map." thisis just a height field using laser scans. we're going to use this later on for our rooftopsanyway. so, we might aw well just utilize it for localizing the location of the truck.so, if you use these two things, detect the edges and then match these edges with thehorizontal laser scan, then you can have a
shot at globally correcting it and makingit work. all right. so that's exactly what we did, is we applied what's called a monte-carlo-localizationtechniques using particle filtering. this is explained in this paper that's listed here.and we match the horizontal laser scans essentially with airborne edge maps, okay? in order tolocalize the vehicle. i won't go into the details of monte-carlo-localization but essentially,it consists of a motion phase where we increase on certainty and a perception phase wherewe decrease uncertainty. and this video here shows you the result of particle filtering.so, these--what we see here is the dsm, the airborne depth map. and the block here, isthe probability density of the location of the vehicle based on this technique. the redin the middle means that the thing has the
highest probability of being there. and theone--the yellow means the probability decreases. but the good thing is that the block kindof moves on nicely and smoothly way in the middle of the street even though we changedlanes. and i believe if this video was the one i and have in mind, it--even when we makea right turn, the block stays within the streets. these are the edges of those--we now switchback to the dsm again. so, these are--so, there's points at which just disintegratesa little bit but composes itself back again in the middle. and we used--for those of youwho were familiar with this particle filtering, we use 10000 particles in order to accomplishthat. >> there's no gps?>> zakhor: there's absolutely no gps. that's
a very good point. we dwelled about this problemquite a bit and we decided not to go against it because gps doesn't work in places wherethere's high buildings or... >> indoors.>> zakhor: or indoors. that's correct. if you had gps, you didn't have to do any ofthese tricks. yeah? >> so, we--if we have like a gross mismatchwithin the sample frames, we try aligned it [indistinct] include have someway worth moving,[indistinct] >> zakhor: i think that's where the globaledge thing from the top will help you correct those situations.>> but even it it's [indistinct] because if just one frame change the location of theother project [indistinct] project [indistinct]
>> zakhor: yeah, but you're imagining on thefact that your laser scanning runs at 75 hertz. and compared to the speed of your truck, that'spretty fast. so, you're right, there will be occlusions, there will be some changes,the matching is not perfect. but in almost all the experiments we've done with, you know,tens of minutes or hours of driving, together, the combination of airborne and the horizontalmatch has resulted in very good localization. yeah?>> [indistinct] i think the scanning work just than one direction where we move to theleft as well as to the right... >> zakhor: yeah. yeah. this is some interestingarchitectures that were considering now to have two laser scans at 45 degrees becausethat would also deal with the occlusion problem.
that's correct. so, in that monte-carlo-localizationsituation, not only do we compute--can calculate the orientation, but also z which is the heightof the truck. if you're going up the hills for example in berkeley or the slope of whereyou're going. so, it gives you all the parameters kind of in one shot. and where as what wegot before was localization that looked like this. now what we get is, with this kind oflocalization, which is perfectly aligned with the streets. and just to give you an idea,this was a 78 minute drive resulting from two data acquisitions, 24.3 kilometers, 600000scans, 85 million scan points, 19000 camera images for this drive within berkeley. andnot have you localize your truck, you can then stack these vertical scans at their appropriatedistances from each other. and get a point
cloud that's known here. for those of youwho're familiar with berkeley, this is the entrance to the bark station as you go down.and so the next thing that you do is--question, yeah?>> so, that's basically aligned and kind of get rid of cars and people and...>> zakhor: i'll talk about it in just one second.>> oh. >> zakhor: so, the next thing you want todo is triangulate or tessellate this. and generally, if you have point cloud, triangulateit in the most general ways, a very difficult problem. but we have the fortune of the factthat these vertical scans are given to us in order. so, that makes this triangulationproblem extremely easy. and after you triangulate,
the--a point cloud that like this, you getsomething like that. and at the first look, it looks pretty disappointing that these holesare there because the infrared laser goes right through the glasses of the windows.there's cars here, there's trees that block the window but after some processing you cando foreground removal which i'll talk about in just a second as it was brought up. wefill out some of these holes and we get something like that. and i'll talk about the step fromhere to here in just a second. and then we texture map it, and we get something lookslike this. so, if you just--this is kind of reinforcing what i just said a second ago,if you just triangulate from the--you get something like this which looks semi okayfrom the front view but from the side view,
there's all these garbage kind of flowing.so, how do we clean it up? so, we apply standard image processing techniques, okay? we transferfrom each pass segments to what's called a depth image. so this is a tree in front ofa building, cars in front of the trees, etc. and what we do then using the histogram analysisover the vertical scans, we try to do what's called foreground background separations.the idea is that, the trees and cars at a distance from the buildings. and therefore,by looking at this histogram we can remove those. and this works fairly well in downtownareas where there's fair distance between the trees and buildings, for residential regionsas i'll talk about in a few minutes, it doesn't necessarily apply. so, you start with thiskind of image it's separated into foreground
and background, as i was just talking about.so, the background ends up being the buildings. and the foreground is the trees and the cars,various other things. once--and you apply a bunch of techniques that i won't go overinto details. but, with--to fill out the holes in the background which is the buildings,you apply some interpolation techniques. and at the end, you end up with a clean backgroundwhich consist of your facades of your buildings. this video here shows the hole filling process.and as i said, in real life, we actually remove--do the foreground removal before the hole fillingbut--and once you exit out of this. and the next one shows foreground removal.>> [indistinct] >> zakhor: it's pretty much in impending andactually, filling holes in the 3d data is
a lot easier than texture filling. i'll talkquite a bit about texture filling, okay? and while we're at it, ones you show the texturemapping one as well. i'll explain how the texture mapping works in a little bit moredetails in a second but this is essentially university avenue, mcdonald, taiwan restaurant,the futon thing, etc. okay. >> can you go into it a little more, to tellabout how you interpolate over the radiant holes?>> zakhor: i didn't make any slides on those. >> yeah, but just a few words.>> zakhor: basically, you look at the whole from the left, from the right, from top andbottom. and you do interpolation. when you try to do interpolation while you're preservingthe edges kind of a thing.
>> so you're not just getting a plane thewhole or something [indistinct] to the... >> zakhor: i think we do a little bit of that.we--after we decide what's going on around it, we try to fill kind of something thatfits the neighbors but also as smooth as possible in the--in the middle. if it can be a plane,it can--then that would... >> so you try to propagate the near feature...>> zakhor: yeah. >> ...inside the hole.>> zakhor: right. right. we might--i could tell you this is not something that was terriblydifficult. so we didn't--the texture hole filling was a lot more--i mean, that requireda whole masters' thesis let's it put [indistinct] the 3d hole filling was, i would say, a monthworth of work. it wasn't...
>> [indistinct] surface passing through thelasers, it was getting information from [indistinct] >> zakhor: right.>> [indistinct] interpolation. >> zakhor: that's true.>> [indistinct] and registered. >> zakhor: that's right. and the answer tothat is that, we're making up data anyway and both for texture and for 3d. and at theend, what the best method is subject to a lot of debate. you're right. we could...>> [indistinct] your goal is... >> zakhor: okay.>> ...if your goal is to make a picture like this and you want--and you [indistinct] intosketch up or... >> zakhor: right.>> [indistinct]
>> zakhor: and actually, if your goal is tosatisfy the military guys, they don't want you to handle the data in any way because--imean, my goal was to generate something esthetically beautiful. and let's say hollywood wants that,the game companies want that, architecture firms want it. but the military wants accuracy.they don't want to mess around with the data and make it up. and that's good.>> [indistinct] can you make an underlying assumption that these visual effects is bymodel or? >> zakhor: the distribution of the...>> the effects is by model? >> zakhor: you could make that assumption.most of our holes actually were--a lot of them were the windows because it went rightthrough the glass. and so, extending the surrounding
area was kind of good enough. so, this showswithout processing and with. and there's a few other examples that i haven't includedhere. and next, we move on to what i just talked about texture mapping of these images.i showed the video already ahead of time. but here's the basic idea, we remove foregroundobjects like trees from in front of the buildings. so in the 3d model, there's no--there's no3d model for the tree anymore. that caused the hole in the building and we filled upthe hole. but now, we wanted the texture mapping. and if you--if you really want to do texturemapping, you should make sure the texture corresponding to the--to the tree doesn'tget mapped onto the building because there is no more tree. and so the question is, howdo we identify the pixels corresponding to
the foreground objects that we just removedfrom our 3d data in our images. and it's not, you know--you're first thought might be, oh,the laser scan and the--and the camera are synchronized and they are. therefore, we canback project into the images and figure out the locations and the images where we removedthe foreground objects. and you're quite right, we can do all of that. but the resolutionof the laser scan is quite different from the resolution of the--of the image. so, whilethis method works a little bit, it needs to be refined. so for example, if you apply thismethod, you can remove most of the tree but a little bit of the tree residuals still remainsin your images and you want to get rid of that. so the technique that we use and thiswas the masters of thesis of siddharth jain
who's a phd student at berkeley was to useoptical flow and region glowing techniques in order to more finely define the foregroundobjects like the trees, like the cars. and that worked pretty well. so you can see thatthese dotted regions or the foreground objects are detected and we removed it from in frontof this power power bar building. this is ross and we've detected these white tree pointshere and we removed the tree entirely from in front of this building. so what are thesteps for texture mapping, so you start with this--after you remove the pixels correspondingto foreground objects, you have a series of images like that. they overlapped quite abit with each other. you're making mosaic of them. and now, because you removed foregroundobjects, once again, you have to invent data
for them because now, we're trying to figureout what's in the background and fill it out. and again, for downtown regions, this foregroundbackground separation works very well because the objects are well separated for residentialareas and others might not be very appropriate to do so. but anyway, so we next applied texturesynthesizing techniques which is essentially in painting kind of technique. and i'll explainthat--so to go from here to here. >> [indistinct] window blocking out, that'snot [indistinct] >> zakhor: i'll explain that in just one second.one second. it's the example that's on the next slide. so, what we do is a copy and pastemethod. for some regions that are easy, you can do in painting and interpolation simplethings. for other regions like this, you can
do copy and paste method. so, you're missinghere. you get a window, you go around this and you compare that with other parts of theimage and see, these bricks are kind of a clue as to finding out what ideally you shouldput here. so you do a search around this region, do copy and paste and you gradually startbuilding in those missing parts. >> is that supervised?>> zakhor: no, it's not supervised. there's copy and paste and there's interpolation anddepending upon how big the hole is we choose between one or the other.>> so that chooses automatically? >> zakhor: that's automatic. yeah.>> [indistinct] >> zakhor: this would be kind of a good example.>> [indistinct]
>> zakhor: yeah. we--actually a whole lotof our images could not be used because the camera was pointing right at the sun. thoseimages got torn away right at--right after there and then i would say that there's probably30% overlap between successive images. the camera--the intensity camera we were runningit at 5 hertz. and the--probably the speed of the truck was about 25 miles an hour. that'sthe speed limit at berkeley and i--and i doubt if my students exceed the speed limit. butunlike their advisor, they actually stick to it. so, and by the way, that 25 miles anhour results in this vertical scans to be approximately five centimeters apart if you're--idon't know, 10 meters away from the building or something. so, your resolution of thesefaã§ades is very high. and you can even argue
it's too high because and it's--actually,the reason we have to bring a pc to fill in some our models and not a laptop because themodel is so rich. and that's the reason that we slowed down google earth so much when weput our 3d models in it because it's a lot of data and they have to do a lot of simplificationthat i'll talk about that in just a second. so, this shows the copy and paste process.we do it in a--to speed things up, we use a pyramid in order to do the search for theseregions. first, do the search of smaller regions than little bit bigger and the final image.and this is before and after. so for these regions--for this picture, the holes werelarge enough that we have to apply copy and paste method. and this is another example.by the way, the--yeah. i already showed the
texture mapping video. and in the interestof not crashing powerpoint again, we'll just skip this. okay. so now, comes to--yeah, question?>> did you have cases where copy and paste just lose things [indistinct]>> zakhor: [indistinct] >> just to make something [indistinct] onberkeley [indistinct] >> not that i know explicitly right now. butwe've tried this on a--this whole thing on a 4x4 block of downtown berkeley, so it'spossible that, that would generate. it could--it could happen. the copy and paste method isquite compute-intensive actually. and if you--if you skip that part and just interpolate, andthings go--would go a lot faster. but of course, you wouldn't be able to, for example, recoverthat arch in the--in that building. okay.
so, the rendering of these things--actually,the very first time we generated these models, we were like, four days away from our governmentreview in washington. and i kept telling my students, "okay. where is it? put it in thelaptop. i want to see it." and they say, "you just don't understand. it doesn't fit. it--there'sno--there are no browser that can enable us to see this model." it's kind of like, inventingsomething that you don't even--you can't even observe it like a phantom of your imaginationor something. anyway, so the ground-based models--the per pass segments, they have roughly270,000 triangles or 20 megabytes of texture, so many million triangles for four downtownblocks, about four million--400 megabytes of texture. so, these are all difficult, soyou have to build what's called levels-of-details
and generate scene graphs to do that. andsure enough we applied techniques like qslim to go from something that was more high resolutionlike this, something that was low resolution. here, the geometry is 10% of this one andtexture is 25% of the original. and one of the areas that i would like to work on inthe future is push qslim further and further in order to generate even simpler increase--decreasethis number to maybe even 1% while preserving the approximate shape of the buildings. thisis something actually ming, who's sitting here worked on a little bit as part of hisclass project as berkeley last semester. so, you can build these scene graphs with differentlevels-of-details by making these cuts along the segment that's shown here.>> [indistinct] like the top of that ballpoint.
>> zakhor: this part you mean?>> no. the outside of the modem [indistinct] >> zakhor: here you mean?>> no, on both. so, the [indistinct] included is actually texture [indistinct]>> zakhor: oh, you mean behind this thing? >> yes.>> zakhor: nothing. i didn't--i didn't put that. and actually another good point thati didn't mention is that this top part is blank because our camera wasn't pointing highenough. and i can get into all the details of why the camera wasn't pointing high enoughand all of that but basically the next revision that we--we'll do to the system will goingto have multiple cameras. if the camera was pointing too high the sun would have blockedit 90% of the time. you'd have useless pictures.
so, you have to come up with the scheme whereyou get useful pictures at the same time as covering the top of the buildings.>> [indistinct] >> zakhor: correct. you'll see that in justa second. and unfortunately the area has lower resolution, so you [indistinct] to the higherresolution ground-base and there's the visible lines that shows the difference.>> so, i would think the calibration [indistinct] >> zakhor: i'll show it in just one second.the resolution mismatch is horrendous. that's--to put it mildly, i think. okay. so, then afterall is said and done you have this interactive rendering with the web-based vrml browserwhich steve will show in just a second. so, this is the 12 block faã§ade of models downtownberkeley. this is one faã§ade, this is the
street behind it and et cetera, et cetera.and when we put it all together it looks something like this which is--it looks a little bitlike a ghost town, but if you put the rooftops, it will look better. so, let's just move onto that. just to give you an idea, the acquisition time for this was 25 minutes and the processingtime was four hours and 45 minutes in some [indistinct] machine and it's fully automated.we didn't hand-tune any part of it. by the way, this number does not include the copyand paste part. here, we reduced that require--we just did an interpolation all across, so that'san important point to emphasize. so, airborne modeling hired a company from southern californiato fly their airplanes over berkeley to collect laser data. they have the laser equipmentright on the plane and they set everything
up. and then we hired a helicopter pilot togetherwith one of the students to take pictures at a separate time at different location,et cetera. and these pictures are oblique airborne images that would hopefully paintthe top of the buildings, the upper part of the faã§ades where the image--the camera didnot--the ground-based camera did not capture. >> is there [indistinct]>> zakhor: for this one? >> yes.>> zakhor: yes, and ming who is sitting here is refining the scheme that i'm going to talkabout in just one minute. that is a hard problem. so, this is the flow diagram of our airborneprocessing. you start with the airborne scans, you do dsm generation, post--dsm post processing,triangulation and then image registration
selection texture. i'll go over each of thesein just a second. so, the airborne laser scans looks something like this. we have to re-gridit on a score grid in order to get it into a format that you can work with and figuringout what the optimum grid parameter, it's not that difficult and after you re-grid itsome grid points have multiple data and some of them have none. so, you have to do someinterpolation again to get a height field, and after you do it, you get something likethis which is called the dsm, digital surface model. and if you just connect everythingup and triangulate it, it looks something very noisy and do qslim simplification lookssomething very noisy. and the reason is because the rooftops would--well, specifically ifyou zoom in you see that the rooftops look
very bumpy, the edges are very jittery andall that. so, you go through this set of processing steps which are unfortunately don't have toomuch time to go over but the highlight of it is this ransac polygonization techniquethat will divide up the rooftops into--segment it into planar regions. planes can be eitherhorizontal or at an angle like this, but we're not filling quadratic surfaces or anythinglike that. and at the end, you end up with this kind of [indistinct] model and now yourdsm after triangulation looks neat and clean like this. next, texture mapping from aerial.so, this is the helicopter and we did a 20 minute ride, 5 mp digital camera, 17 imagesand it's from both rooftops and facades. so, now the question that this gentleman askscomes up. how do we determine the pose of
those images from the airborne images? so,the approach we took--so, before i even get to the approach we took, you can just uselewis algorithm, have human being clicked on seven points on the image, seven pointson the model--those are the correspondences, hit the gold button and in like what, 30 secondsor one minute or something like that you get the batch. so, that is very easy but the problemis it doesn't scale if you have thousands and hundreds of different scale. so, for themodels that i've have shown you what we've done is we've actually done the human processingpart. however, after we compute the answer using a human correspondence type of approach,we perturb that by five degrees, 10--in this direction, 10 degrees there and so many metersin the--in the z direction and asked our self
if we know approximately the pose, if we hadins on the helicopter or something like that, could we have arrive at those same poses thatthe human operator could get. and now we perturb the two answer, we try to arrive at it again.and this is the technique used and you'll see in just a second this method assumingcertain amounts of uncertainty in your gps and ins et cetera results in the processingstep. that's 24 hours per image which is very large. and that's exactly why ming, well,who's sitting here is working on the next generation scheme which i'll talk about very,very briefly in order to speed this process out. so, what's the step that takes 24 hours?well, you compute 2d line sin your images, match--try to match them with 3d lines inyour--in your model. and you form what's called
this cost function here and essentially, youdo an exhaustive search over your six dimensional pose space at very fine increments in orderto find a particular pose that matches--that results in the best possible match withinthese 2d lines and 3d lines. and so here's an example where the match is excellent. wehave a good match and here's an example where it's not, where the green line doesn't add--lineup with the edges of the building. so, this technique is okay but the seven dimensionalsearch space is extremely non-smooth. you know, if your step sizes and you exhaustivesearch is a little bit too high, you could easily miss the peak in this optimizationproblem. so, essentially, steepest descend is inapplicable and that's why you have todo exhaustive search to do that. and this
slide shows you what i said earlier whichis if you have absolutely no idea of where your helicopter is in this column here, 360degrees you are or 180 degrees [indistinct] thousand meter of uncertainty in your gps.this is the number of poses you have to go through and it will take you 3.4 million years.if you have low cost gps it takes you 25 hours which is the number i threw around a minuteago. now, if you have a [indistinct] gps with a little more expensive ins, et cetera, etcetera, you could get it under 40 seconds. and what ming is working on--i'll talk aboutit in just a second--is a method of using vanishing points, another techniques in orderto achieve something like this 40 seconds or a minute or two but using low cost equipment.and i'll talk a little bit about that. sure.
>> [indistinct] discuss the manual [indistinct]these models and, you know, exactly [indistinct] after your model [indistinct] second or something.what's [indistinct] >> zakhor: because he was doing in an imageto the match, not image to 3d. >> no, it's 3d model passing through the internet.>> zakhor: you mean he had a 3d model. he had images of it and he was matching the...>> we have image of a person using the [indistinct] >> zakhor: uh-huh.>> [indistinct] matches that [indistinct] frame model [indistinct]>> zakhor: what... >> [indistinct] models and [indistinct]>> zakhor: i'd have to look at the details of that. but, yeah, actually, i'd be veryinterested...
>> [indistinct]>> zakhor: so, now i get into fusing texture from multiple images. so, each of our trianglesand our airborne-model has been--has been imaged by more than one picture. and so thequestion is, which texture do we use in these series of images to paint that. and we usea series of heuristics that's listed here. you want to pick an image that has the highestresolution for painting that triangle. you want to use visibility considerations, normalvector considerations. the triangle would look a lot better if the picture was takenhead on than, for example, from the side. and also neighborhood consistency, ideallylike triangles that are next to each other to receive their texture from the same images,so that you don't have so much jitter across
the triangles. and so this picture here shows,in your downtown 3d airborne model, this shows that these are the regions that were paintedby this red image. these are the regions using those criteria as blue, blue and gray image.and you--and finally, the last thing you want to do is, because there's a lot of overlapbetween these images and only--and because you only use small piece of each image, youbuild a--what we call an atlas image which physically doesn't have any meaning, but ithas texture composited from all the different images that's actually being used. so, thisis what your graphic cards at the end would use in order to do the rendering of your airbornemodel. instead of having to start 225 megabyte texture, you're going to have 272 megabytes.so, here's an airborne-only model with all
the different things going on, airborne onlymodel from a different view. and finally model fusion. and i'll go over this very fast justso that we can look at some more videos. so, what we do is, if you look at the airborneonly models down at the bottom, these models don't look very good because the resolutionof the images is low and because we basically--it's a wreck to linear model. we had the rooftopsegmentation, and we just brought it down. so, the details of the 3d model at that streetlevel is not very good. so, you divide your model into segments, remove the part of theairborne model that's touching the street which is shown in this gray area here, insertthe ground-based model which is shown here. and this part of the ground-based model doesn'thave texture, but it is still ground-based.
you can see that. and as you can see, there'ssome holes between the ground-based and the rooftop model. and what you do is--we applywhat's called a blend mesh to connect the two things together. and finally, use slappingthe airborne picture in order to paste or paint that upper part of the building whichdidn't have. and so, you can see that is quite a bit of resolution difference and color adjustmentthat needs to be done. yeah? >> potentially, no [indistinct] because it'svery [indistinct] while your copying bases [indistinct]>> zakhor: exactly. that's very true. especially, because buildings are highly repetitive, that'sa very good point. so, this is it. and this shows other examples of, you know, both positiveand negative. the airborne image is very nicely
aligned for the ground-based image, showingthat the truck localization works very well and--the registration problem has been solvednicely, but the downside is that this resolution mismatch. and those can be solved by eithercollecting better data at the ground level or just as this gentleman said, "just extrapolatethis data, to make up this data". same thing's here, same thing here, the same thing here.that's downtown berkeley, and other fly-through model. so, i think at this point, i'm goingto have steve, before i get into future work, just to show us a demo of a vrml demo of thisdowntown model. just with the--this thing. so, this is the entire fused model for thefour by four block of berkeley. is it going to drive against it all? did you hit the drivebutton?
>> yeah, no.>> zakhor: why didn't you do that? >> you have to click it?>> zakhor: okay. buns, you do that. so, this is the walk through of--and this was the modelthat we inserted inside google earth at the very beginning. this model has--why didn'tyou hit the button? >> okay. sorry.>> zakhor: okay. so, this model has three levels of detail, whereas what we insertedinside google earth just had the medium-level. and one of the reasons was that if we--ifwe try to put the high resolution version of the model inside google earth, it's toomuch texture. it would just not be smooth on any of the machines we have for interactiveor rendering of it. okay. thank you. so, few
words on future work. it's actually 50 moreslides on future work but i won't--i won't talk so long. it is very hard to compete withlunch. so, i'll try to wrap up everything in eight minutes. so, scaling to very largeregions is a big goal of ours. and i'll talk about some of our thoughts on that; dealingwith trees and vegetations is foregoing background still, extended to indoor modeling, integratingthe 3d models with sensor networks, that's an area that the government is funding us,and streaming these 3d texture models to maybe, handheld devices and also model update, ifyou've already collected a bunch of data. and now, some of the buildings are gone. somenew ones have appeared. how do you fuse these two models without having to start everythingfrom scratch? so, scaling to large regions,
so far, we've used laser data and camera imagery,both from airborne and ground. and i can't stop wondering whether we could have simplifiedour life if we just didn't have ground-based laser scan for example. or how would our modelslook like if we just have airborne laser and camera. and that would scale very nicely becausewhen you go up in the air, take one picture, and it covers a huge area. or if you can takeeven a video camera up in the helicopter and covers a huge area. so, if your goal is justto have fly-throughs and you don't really want to land in the ground and see what'sgoing on at the street levels, then the airborne-based--airborne only models can look very nicely. however,for airborne only models as what's pointed out during this talk, the post estimationfor helicopter image is still a big problem.
and that's one of the things that--that wasa 24 hour thing that i was talking about. your choices are 3.4 million years, 24 hoursor 40 seconds, depending on what kind of equipment you have on the helicopter. so what ming isworking on is methods of developing low complexity post estimation techniques from--for airborneimages. and so, the lines of thinking that we have is to use an electronic compass andhook it up to a camera, take pictures, apply vanishing point kind of techniques to recoversome of the cameras, then do 3d feature matching both in the--in the--from 3d model to the--tothe images. and this is all work in progress. so, there's no results but merely outlinesthe kinds of things that we think we're going to be doing in the next five to six months,and apply ransac-type algorithm to do this
correspondence and in order to derive themodel, the parameters of the camera. and we're hoping that this approach of using vanishingpoints will be less time consuming than the exhaustive search that took 24 hours per image.the other problem is trees. what do you do with trees in residential areas? what do youdo with trees in downtown type areas? if you don't remove the trees, sometimes your modelcould become very cluttered. on the other hand, if you remove it, you have to make upfor the texture behind the trees. and that's always--that's not always easy. all right.john secord was a student who worked on it, got his master's thesis in 2006. and he cameup with a tree detection algorithm using registered aerial lidar light imagery based on segmentationand classification. we used region growing
for segmentation with--together with trainingand classification. we used support, but vector machines, these are the features that we usefor segmentation, height variation, normal vectors, hue saturation value, etcetera. andthis is the weights of the different features. so, as it turns out the height variation isjust about the most--you apply normalized cut technique and you find out that heightvariation is just about the most important feature in doing that segmentation process.then you do it support vector machine classification and it starts with the lidar and airbornedata texture mapped. that was for residential region and that's for campus. this is a resultof your segmentation. and this is a final result of tree detection. so, the green showstrees that our algorithm actually detected
as trees. the blue, dark blue ones right hereare--they were not trees, and they were incorrectly classified as trees. and the purple, theywere incorrectly classified as non-tree, but they were really trees. so, you can see thealgorithm mostly works. this work is going to get published in ieee geoscience lettersin about few months. this is the same thing for campus data. the green is the trees andwe've pretty much detected most of them. so, ideally you like--you like to have very littlepurple and blue and detect all the greens properly.>> [indistinct] >> zakhor: pardon?>> [indistinct] >> zakhor: how could they--how could theydetect it as non-tree? is this that what--you're
saying why is it that it does it?>> so, you're saying those things are trees? >> zakhor: the green is the trees that wedetected as trees. >> right. and purple are trees that weren'tcorrectly identified as trees? >> [indistinct]>> so, you correct non-trees which is correctly a tree, right?>> zakhor: that's correct and you're right. why is it that? and maybe these two need tobe flipped. >> right. because i believe it would [indistinct]>> zakhor: yeah, you--it depends what meaning of this thing i put down here. yeah.>> yeah. so, you can actually detect solar probe directly by ratio [indistinct] if youput into it, you know, [indistinct] directly
be [indistinct] you have to [indistinct] whatto do with the [indistinct] laser scales. >> zakhor: right.>> and so if you have one important picture provided by your optical...>> zakhor: right. that could easily look-very interesting. thank you.>> [indistinct] >> zakhor: let me go back to this one. yeah.actually, that's a good point. it was never brought up, even though this talk has beengiven before. so, i'll look in to that and see why the purple is on top of the building.this is the positive--false positive rate versus two positive rate. and the purple lineis the segmentation classification technique. and the green is kind of the same approachif you just keep the segmentation and do it
on a point-wise way. and this is for residentialdata and this is for campus data. so, you can see that--actually segmentation does helpyou to accomplish this task. portable modeling, possibly indoor, you can--there's nothingto stop you from applying allowing these things indoors. because we didn't use gps outdoors,we're now uniquely qualified, i guess, to apply the same kind of techniques for indoors.i don't think we want to carry a horizontal and a vertical lasers that's kind of--that'stoo heavy. i think steve would know the numbers, but it's about 10 pounds each or isn't it?yeah. so 20 pounds i think is excessive on anybody's back. but you still need to recoverthe six dimensional of dof. but together with some ins and camera imagery, you can--youcan recover those things. there's a lot of
nice things about indoors. you're walkinga lot slower than a truck was, that's when you're not doing 25 miles an hour. as youenter rooms, you can click a button and as you leave it you can click a button. thereforeyou can close loops. so, one nice tricks you can play. you can have additional senses likegravity and altitude, etcetera. an update is easier indoors than it is outdoors. so,you can use the same kind of sort of equipment, maybe just one vertical laser scanner, somecameras and some low cost ins and imu units. just a quick cost calculation, would be somethingwithin 7 to 14 k to build the equipment. the power requirement is 31.2 watts. and prettymuch if you have this pack of 12-volt batteries that you can put out like a built around yourself.you can make it portable unit. and the weight
of the whole thing, total, would be 24 poundswith just one laser scanner. again, this is something that's in progress. we don't haveany results to show. i'm going to skip the sensor placement stuff, but you can see thedifferent place sensors around the city. then you can visualize these things nicely andcan catch the bad guys as needed. i'm also going to skip dynamic scene modeling. andyou can extend some of these techniques. so that if there's an object moving with thehands and arms and you have scanning equipment, then you can generate the three dimensionalmodel as a function of time, 30 frames--30 times a second for example. and this systemwe have actually built in it and republished about it. there's a rotating mirror. there'sa laser hitting it. and this is--this causes
a laser line on your object. and as the mirrorrotates, this horizontal line goes up and down. and you project--also with the halogenlamp and ir filter, you project vertical lines and we can compute the depth along these linesand build the depth as a function of time. >> [indistinct]>> zakhor: pardon? >> roast?>> zakhor: it's a screen. it's a screen with a bunch of vertical strips on it.>> watching [indistinct]or... >> zakhor: pardon?>> i've never seen the term roast before. >> zakhor: it's probably a foreign studentusing that term. but it's a screen with a bunch of strips going up to project the vertical--actuallyit's right here. you paste a bunch of stripes
and that's what projects the vertical linesonto the object. okay. and then--and here we have cameras that are--have ir filtersand also a visible camera. the ir filter one will catch this signal coming from the irhalogen lamp. and the control pc and then sync generator and all the rest of it. that'sthe polygonal mirror with the--with the horizontal laser. and also in the end you end up havinga depth--sparse depth through representation of the object as its moving. you can--youinterpolate that to get dense up. and then you can--let's see if this video works, probablynot. no, it does, random. so, then you can reconstruct the 3d depth of this person asyou go around the person to the right and to the left. and we only have one of thesestations. if you duplicate multiple stations
around it, you can get the back's signal thatmerge those two things and get a true sense of a three dimensional depth of it. so, realquickly, other areas as streaming and also model update, these are areas that we haven'tdone any work yet but will be quite interesting. we hope that in the coming year, we wouldbe able to get to it. i'm going to stop right here. and this is the various places you candownload papers and demos and models and various other things. and shall we end? with--do we--dowe give all the demos or there is some more... >> the [indistinct]>> zakhor: okay. >> show them real quick.>> zakhor: you mean the vr, our new model. >> yeah.>> zakhor: okay. so, i'm just going to stop
and show you the--now, it's a vr, a new modelthat has--it has airborne only. >> you can [indistinct] these thing are, isn'tthat--it's just [indistinct] to a state level. >> zakhor: fly-thru or drive-thru?>> like drive-thru. >> zakhor: oh, it would look terrible.>> you need the direction? >> zakhor: we can do it. just--so, you cansee but no, it won't look good because it's all airborne only.>> the other thing i was thinking what lies the problem anything [indistinct] questions.but why are you trying to remove trees from [indistinct] because if you want to have ittruly immersive of experience, i think you really need to think. i think [indistinct]like hopefully stay locally.
>> zakhor: the problem is that the main--atthe ground level when you scan the trees. you don't do a good job of making a modelout of it, because the resolution of the laser isn't very good. but you're quite correctfor residential. i'm personally believing more and more that we shouldn't be removingtrees, because it's so difficult to replace them with--i mean, the idea is let's removethe trees and then replace it with an artificial trees, so that we can have a better modelof the tree. that's kind of the thinking. but that process, as times goes on, i'm beginningto believe is more and more difficult and not such a good idea.>> and probably less immersive as well. >> zakhor: that's true. that's true. i thinkthat's just about--yeah, question?
>> yeah, i have a question regarding the wayyou're putting your geometry model [indistinct] at the time.>> zakhor: right. >> basically you have to work on [indistinct]>> zakhor: right. >> and then what else winds you in that [indistinct]>> zakhor: right. >> but who--isn't that going to basicallyanything [indistinct] model into that on the suburb. that's a possible [indistinct] or...>> zakhor: right. >> ...small pillars and you don't have barelytwo samples on those pictures. >> zakhor: because we removed the trees.>> yeah. and so you don't have a very good geometry for those features [indistinct] andthat would add unnecessary clutter in my opinion.
>> zakhor: that's true.>> just to make sense to do some kind of goal processes on the points, points until youactually create a model. and one way to do that would be [indistinct] some kind of model[indistinct] maybe you can see [indistinct] is that something you consider?>> zakhor: yeah, it's possible to--third, there are people who do procedural model fitting.for example, ulrich neumann at ufc does something along those lines. albert, his methods aresemi-interactive. so, there's a user that comes in and click on things and says, youknow, put the parabolic surface here and put the plain here, put another plain here, etcetera.but i think your point is quite well taken. and philosophically that you're also bringingup a very good point, why remove things and
then make up data? why not try to fix whatyou have and do the best of what you got without extracting kind of. because the model thatdan also looks a little bit less for the realistic. >> i was just saying removing some featuresfrom the geometry [indistinct] >> zakhor: they said that there's anothertalk that [indistinct] let's discuss this offline. thank you.
0 komentar:
Posting Komentar