capital training institute review

the following content isprovided under a creative commons license. your support will help mitopencourseware continue to offer high-quality educationalresources for free. to make a donation or viewadditional materials from hundreds of mit courses, visitmit opencourseware at ocw.mit.edu. professor: ok, we'reall ready to go. this is discrete stochasticprocesses as you are know.

it is-- want to get it wherei can read it, too. we're going to try to dealwith a bunch of different topics today, some of which area little bit philosophical saying, what is probabilityreally? you are supposed to have takena course in probability, but unfortunately courses inprobability are almost always courses in how to solvewell-posed problems. the big problem in probabilitytheory, and particularly

stochastic processes is notso much how do you solve well-posed problems. anybody can do that. or anybody who has a little bitof background can do it. the hard problem is findingthe right models for a real-world problem. i will call these real-worldproblems. i hate people who call thingsreal-world because they sound like they dislike theoryor something.

it's the only word i can thinkof though, because physical world is no longer adequatebecause so much of the applications of probability areto all sorts of different things, not having to do withthe physical world very much, but having to do with thingslike the business world, or the economic world, or thebiological world, or all of these things. so real-world is just a codeword we'll use to distinguish theory from anything real thatyou might have to deal with.

theory is very nicebecause theory-- everything is specified. there's a right answerto everything. there is a wronganswer usually. but there's at leastone right answer. and most of us like thosethings because they're very specific. people go into engineering andscience very often because they don't like theuncertainty of a

lot of other fields. the problem is as soon as yougo into probability theory, you're moving away from thatsafe region where everything is specified, and you're movinginto a region where things, in fact, are not verywell-specified and you have to be careful about it. ok, so first we're going to talkabout probability in the real world and probability asa branch of mathematics. then we're going tosay what discrete

stochastic processes are. then we're going to talk just avery, very little bit about the processes we'regoing to study. if you want to see moreof that, you have two chapters of the notes. you can look at them. you can look at thetable of contents. and more than that, if you lookat my website, you will see the notes for all the otherchapters if you want to

read ahead or if you want toreally find out what kinds of things we're going to talk aboutand what kinds of things we're not going to talk about. then we're going to talkabout when, where, and how is this useful? the short answer to that isit's useful everywhere. but we'll have tosee why that is. then we're going to talkabout the axioms of probability theory.

you cannot take any elementarycourse in probability, or even in statistics, without seeingthe axioms of probability. and in almost all of thosecases, and in almost all of the graduate courses i'veseen, you see them, they disappear, and suddenly you'resolving problems in whatever way you can, and theaxioms have nothing to do with it anymore. so we're going to see that, infact, the axioms do have something to do with this.

those of you who want to bereal engineers and not mathematicians, you'llfind this a little uncomfortable at times. we are going to beproving things. and you will have toget used to that. and i'll try to convince youof why it's important to be able to prove things. then we're going on to areview of probability, independent events,

experiments, and random variables. so that's what we'll do today. incidentally, this coursestarted about-- must've been 25 yearsago or so. i started it because we had ahuge number of students at mit who had been interested incommunication and control, and who were suddenly starting toget interested in networks. and there were all sorts ofqueuing problems that they had to deal with every day.

and they started to readabout queuing theory. and it was the most disjointed,crazy theory in the world where there were1,000 different kinds of queues and each oneof them had to be treated in its own way. and we realized that stochasticprocesses was the right way to tie all ofthat together, so we started this course. and we made it mostly discreteso it would deal primarily

with network typeapplications. as things have grown, it nowdeals with a whole lot more applications. and we'll see how thatworks later on. ok, how did probability getstarted in the real world? well, there were games of chancethat everybody was interested in. people really like to gamble. i don't know why.

i don't like to gamblethat much. i would rather be certainabout things. but most people loveto gamble. and most people have anintuitive sense of what probability is about. i mean, eight-year-old kids,when they start to learn to play games of chance-- and there are all sortsof board games that involve chance.

these kids, if they'rebright-- and i'm sure you people fallinto that category-- they immediately start to figureout what the odds are. i mean, how many of you havenever thought about what the odds are in somegambling game? ok, that makes my point. so all of you understand thisat an intuitive level. but what makes games of chanceeasier to deal with than all the other issues where we haveuncertainty in life?

well, games of chance areinherently repeatable. you play a game of chance andyou play many, many hands, or many, many throws, ormany, many trials. and after many, many trials ofwhat's essentially the same experiment, you start to geta sense of what relative frequencies are. you start to get a sense of whatthe odds are because of doing this repeatedly. so games of chance are easy touse probability on because

they are repeatable. you have essentially the samething going on each time, but each time there's adifferent answer. you flip a coin and sometimesit comes up heads and sometimes it comes up tails. so in fact, we have to figureout how to deal with that fact that there is uncertaintythere. i'll talk about that injust another minute. but anyway, most of life'sdecisions involve uncertainty.

i mean, for all of you, when yougo into a phd program, you have two problems. am i going to enjoy this? and you don't know whetheryou're going to enjoy it because not until you reallyget into it do you have a sense of whether this set ofproblems you're dealing with is something that youlike to deal with. and the only way you can dothat is to make guesses. you come up with likelihoods.

there's some likelihood. there's a risk cost-benefitthat you deal with. and in life, risk cost-benefitsare always based on some sense of what thelikelihood of something is. now, what is a likelihood? a likelihood is justa probability. it's a synonym forprobability. when you get into themathematics of probability, likelihood has a specialmeaning to it.

but in the real world,likelihood is just a word you use when you don't want to letpeople know that you're really talking about probabilities. ok, so that's where we are. but on the last slide, you sawthe word "essentially, essentially, essentially." ifyou read the notes, and i hope you read the notes, because ispent the last three years doing virtually nothingbut trying to make these notes clear.

i would appreciate it if anyof you, with whatever background you have, when youread these nodes, if you read them twice and you still don'tunderstand something, tell me you don't understand it. if you know why you don'tunderstand it, i'd appreciate it knowing that. but just saying "help" is enoughto let me know that i still haven't made somethingas clear as it should be. at least as clear as it shouldbe for some of the people who

i think should be takingthis course. one of the problems we have atmit now, and at every graduate school in the world i think,is that human knowledge has changed and grown so muchin the last 50 years. so when you study somethinggeneral, like probability, there's just an enormousmass of stuff you have to deal with. and because of that, when youtry to write notes for a course, you don't knowwhat anybody's

background is anymore. i mean, it used to be that whenyou saw a graduate course at mit, people would knowwhat limits were. people would know what basicmathematics is all about. they would know whatcontinuity means. they would know somelinear algebra. they would know all sortsof different things. many people still doknow those things. many other people have studiedall sorts of other fascinating

and very interesting things. they're just as smart, but theyhave a very different background. so if your background isdifferent, it's not your fault that you don't have the kindof background that makes probability easy. just yell. or yell in class. please ask questions.

the fact that we're videotapingthis makes it far more interesting for anybodywho's using opencourseware to see some kinds of questionsgoing on, so i very much encourage that. i'm fairly old at this point,and my memory is getting shot. so if you ask a question and idon't remember what it's all about, just be patientwith it. i will come back the next time,or i'll send you an email straightening it out.

but i will often get confuseddoing something, and that's just because of my age. it's what we call "seniormoments." it's not that i don't understand the subject. i think i understand it, i just don't remember it anymore. important point aboutprobability. think about flipping a coin. i'm going to talk aboutflipping coins a

great deal this term. it's an absolutely trivialtopic, but it's important because when you understanddeep things about a large subject, the best way tounderstand them is to understand them in terms of themost trivial examples you can think of. now, when you flip a coin, theoutcome-- heads or tails-- really depends on the initialvelocity, the orientation of the person flipping it, or themachine flipping it, the coin

surfaces, the ground surface. and after you put all of thosethings into a careful equation, you will know whetherthe coin is going to come up heads or tails. i don't think quantum theoryhas anything to do with something as big as a coin. i might be wrong. i've never looked into it. and frankly, i don't care.

because the point that i'mtrying to make is that flipping a coin, and many ofthe things that you view as random, when you look at themin a slightly different way, are not random. there's a big field incommunication called data compression. and data compression is based onrandom models for the data, which is going tobe compressed. now, what i'm saying here todayis by no means random.

or maybe it is partly random. maybe it's coming out of arandom mind, i don't know. but all of the data we try tocompress to the people who have created that data, it'snot random at all. if you study the datacarefully enough-- i mean, code breakers and peoplelike that are extremely good at sorting out what themeaning is in something, which cannot be done by datacompression techniques at all. so the point is, when you'redoing data compression, you

model the data as being randomand having certain characteristics. but it really isn't. so the model is no good. when you get to more importantquestions-- well, data compression isan important question. when you ask, what's theprobability of another catastrophic oil spillin the next year? or you ask the question, what'sthe probability that

google stock will doublein five years? that's less important,but it's still important to many people. how do you model that? understanding probabilitytheory, understanding all the mathematics of this is not goingto help you model this. now, why do i make sucha big deal about this? well, there have been a numberof times in the last 10 or 15 years when the whole financialsystem of the world has almost

been destroyed by very,very bright phds. many of them coming fromelectrical engineering. most of whom are reallysuperb at understanding probability theory. and they have used theirprobability theory to analyze risk and other thingsin investments. and what has happened? they do very well for a while. suddenly they do so well thatthey think they can borrow all

sorts of money and risk otherpeople's money as well as their own. in fact, they try to do thatright from the beginning. and then suddenly, the wholething collapses. because their modelsare no damn good. there's nothing wrong with theirmathematics, it's that their models are no good. so please, especially if you'regoing to do something important in your lives--

if you're just going to writepapers in engineering journals, maybe it'sall right. but if you're going to makedecisions about things, please spend some time thinking aboutthe probability models that you come up because thisis vitally important. ok, what's probability? it's a branch of mathematics. now we're into somethingthat's more familiar, something that's simpler,something we can deal with.

you might be uncomfortablewith what probability really means. and all probability books, allstochastic process books are uncomfortable with this. feller is the best bookin probability there's ever been written. any question you have, heprobably has the answer to it. when you look at what he saysabout real-world probability, the modeling issues, he's anextraordinarily bright guy.

and he spent some timethinking about this. but you read it and you realize that it's pure nonsense. so please, take myword for it. don't assume that real-worldprobability is something you're going to learn about fromother people because you can't trust what anyof them say. it's something you have to thinkthrough for yourselves, and we'll talk more aboutthis as we go.

but now, when we get intomathematics, that's fine. we just create models. and once we have the model,we just use it. we have standard models forall sorts of different standard problems. when you talk about cointossing, what almost everyone means is not this crazy thing iwas just talking about where you have an initial angularmomentum when you flip a coin and all of that stuff.

it's a purely mathematical modelwhere a coin is flipped and with probability one halfit comes up heads and with probability one halfit comes up tails. ok, students are given awell-specified model, and they calculate various things. this is in mathematicalprobability. heads and tails are equiprobablein that system. subsequent tossesare independent. here's a little bitof cynicism.

i apologize for insultingyou people with it. i apologize to any facultymember who later reads this. and i particularly apologize tobusinessmen and government people who might read it. students compute, professorswrite papers, business and government leaders obtainquestionable models and data on which they canblame failures. most cynical towards businessleaders because business leaders often hireconsultants.

not so much to learn what to do,but so they have excuses when what they do doesn'twork out right. when i say the students compute,what i mean is this in almost all the courses you'vetaken up until now-- and in this course also-- what you're goingto be doing is solving well-posed problems. you solve well-posed exercisesbecause that's a good way to understand what themathematics of

the subject is about. don't think that that'sthe only part of it. if that's the only thing you'redoing, you might as well not waste your time. you might as well dosomething else. you might as well go out andshovel snow today instead of trying to learn aboutprobability theory. it's more pleasant to learnabout probability theory. ok, the use of probabilitymodels has two major

problems with it. the first problem is, how doyou make a model for a real-world problem? and a partial answer is, learnabout estimation and decisions in the context ofstandard models. in other words, decisions andestimation inside a completely mathematical framework. then you learn a greatdeal about the real-world problem itself.

not about the mathematics of it,but about how you actually understand what's going on. if you talk to somebodywho is a superb architect in any field-- networks, computer systems,control systems, anything-- what are you going to find? you're not going to find huge,involved sets of equations that they're going to use toexplain something to you. they're going to pick at-- ifthere any good, they're going

to take this big problem, andthey're going to take your issue with this big problem. and they're going to find theone or two really important things that tell you somethingthat you have to know. and that's what you want toget out of this course. you want to get the ability totake all of the chat, put it all together, and be able tosay one or two important things which is reallynecessary. that's where you're going to.

before you get there, you'lltake low-level jobs in various companies and you'll computea lot of things. you'll simulate alot of things. you'll deal with alot of detail. eventually, you're going to getto the point where you've got to make major decisions. and you want to beready for it. ok, that's enough philosophy. i will try to give no morephilosophy today, except when

i get pushed into it. ok, one of the problems in thisproblem of finding a good model is that no modelis perfect. namely, what happens is youkeep finding more and more complicated models, whichdeal with more and more of the issues. and as you deal with them,things get more complicated. you're more down in thelevel of details and you're finding out less.

so you want to find some sort ofmatch between a model that tells you something and a modelwhich is complicated enough to deal withthe issues. there's a beautiful quote byalfred north whitehead. i don't know whether you'veever heard of whitehead. you've probably heard ofbertrand russell, who was both a great logician and a greatphilosopher, and had a lot to do with the originsof set theory. whitehead and russell together,wrote this massive

book around the turn of the lastcentury between the 1900s and the 2000s called principiamathematica where they try to resolve all of the paradoxeswhich were coming up in mathematics. and whitehead's generalphilosophical comment was, "seek simplicity anddistrust it." now, every time i look atthat, i say, why in hell didn't he say, seek simplicityand question it? i mean, you all hear aboutquestioning authority, of

course, and that'simportant to do. why when you find a simple modelfor something should you distrust it? well, the reason ispsychological. if you find a simple model forsomething and you question it, you have an enormouspsychological bias towards not giving up the simple model. you want to keep thatsimple model. and therefore, it takes anenormous amount of evidence

before you're going togive something out. whitehead said somethingmore than that. he said, "seek simplicityand distrust it." now, why do i talk about thephilosophy of science when we're trying to learn aboutprobability theory? well, probability theory isa mathematical theory. it's the basis for a greatdeal of science. and it's the place wheremodeling is most difficult. scientific questions in mostareas, if there's no

probability or uncertaintyinvolved, you just do an experiment that tellsyou the answer. you might not do it carefullyenough and then 10 other people do it. and finally, everybodyagrees, this is the answer to that problem. in probability, it ain'tthat simple. and that's why one has to focuson this a little more than usual.

the second problem is, how doyou make a probability model that has no hiddenparadoxes in it? in other words, when you makea mathematical model, how do you make sure that it reallyis well-posed? how do you make sure that whenyou solve a problem in that mathematical model that youdon't come to something that doesn't make any sense? well, everyone's answer to thatis you use kolmogorov's axioms of probability.

because back in 1933, kolmogorovpublished this little thin book. those of you who are interestedin the history of science probably oughtto read it. you will find you onlyunderstand the first five pages the first timeyou look at it. but it's worthwhile doing thatbecause here was one of the truly great minds of theearly 20th century. and he took everything he knewabout probability, which was a

whole lot more than i knowcertainly, and a whole lot more than anybody else at thetime knew, and he collapsed it into these very simple axioms. and he said, if you obey theseaxioms in a model that you use in probability, those axiomswill keep you out of any paradoxes at all. and then, he showed why thatwas and he showed how the axioms could be usedand so forth. so we're going to spend a littlebit of time talking

about them today. ok, quickly, what is a discretestochastic process? well, a stochastic process-- you've been talkingabout probability. and you might be getting theidea that i'm just using the name "stochastic processes" asa foil for talking about what i really love, whichis the probability. and there's a certain amountof truth to that. but stochastic processes arespecial types of probability

models where the sample pointsrepresent functions in time. in other words, when we'redealing with a probability model, the basis ofa probability model is a sample space. it's the set of possible thingsthat might happen. and you can reduce that to thesample points, which are the indivisible, little, tiny crumbsof what happens when you do an experiment. it's the thing which specifieseverything that can be

specified in that modelof that experiment. ok, when you get to a stochasticprocess, what you're doing is you're lookingat a situation in which these sample points, the solutions towhat happens is, in fact, a whole sequence of randomvariables in time. and what that means is insteadof looking at just a vector of random variables, you're goingto be looking at a whole sequence of random variables. now, what is different about avector of a very large number

of random variablesand an infinite sequence of random variables? well, from an engineeringstandpoint, not very much. i mean, there's nothing you cando to actually look at an infinite sequence ofrandom variables. if you start out at the big bangand you carry it on to what you might imagine is thetime when the sun explodes or something, that's a finiteamount of time. and if you imagine how fastyou can observe things,

there's a finite numberof random variables you might observe. all these models we're going tobe dealing with get outside or that realm, and they dealwith something that starts infinitely far in thepast and goes infinitely far in the future. it doesn't make muchsense, does it? but then look at thealternative. you built a device whichyou're going to sell to

people, and which they'regoing to use. and you know they're only goingto use it three or four year until somethingbetter comes along. but do you want to build in toeverything you're doing the idea that it's going to beobsolete in three years? no. you want to design this thingso, in fact, it will work for an essentially arbitraryamount of time. and therefore, you make amathematical model of it.

you look at what happens overan infinite span of time. so whenever we get intomathematics, we always go to an infinite number ofthings rather than a finite number of things. now, discrete stochasticprocesses are those where the random variables arediscrete in time. namely, a finite numberof possible outcomes from each of them. or the set of possible samplevalues is discrete.

what does that mean? it doesn't mean a whole lot whenyou really start asking detailed questions about this. what it means is, i want to talkabout a particular kind of stochastic processes. and it's a class of processeswhich will be more than we can deal with in one term. and i want to exclude certainprocesses, like noise processes, because we don't havetime to do both of them.

so don't worry too much aboutexactly what a discrete stochastic process is. it's whatever we want to callit when we deal with it. oops. oh, where am i? oh, i wanted to talk about thedifferent processes we're going to study. the first kind of processis what we call a counting process.

the sample points in theprocess-- remember, a sample point specifies everythingabout an experiment. it tells you everylittle detail. and the sample points herein counting processes are sequences of arrivals. this is a very useful idea indealing with queuing systems because queuing systemshave arrivals. they have departures. they have rules for how thearrivals get processed before

they get spit out. and a big part of that isstudying first the arrival process, then we study thedeparture process. we study how to putthem together. and when we get to chapter 2 ofthe notes, we're going to be studying poisson processes,which are in a sense, the perfect discrete stochasticprocess. it's like coin tossingin probability. everything that mightbe true with a

poisson process is true. the only things that aren'ttrue are the things that obviously can't be true. and we'll find out whythat is and how that works a little later. we're then going tostudy renewal processes in chapter 4. we're going to put markovchains in between. and you'll see whywhen we do it.

and renewal processes are a morecomplicated kind of thing than poisson processes. and there's no point confusingyou at this point saying what the difference is, so i won't. markov processesare processes. in other words, the sequencesin time of things where what happens in the future dependson what happens in the past, only through the stateat the present. in other words, if you canspecify the state in the

present, you can forget abouteverything in the past. if you have those kinds ofprocesses around, you don't have to study history at all,which would be very nice. but unfortunately, not allprocesses behave that way. when you do the modeling to tryto find out what the state is, which is what you have toknow at the present, you find out there's a lot ofhistory involved. ok, finally, we're going to talkabout random wa;ls and martingales.

i'm not going to even saywhat a random walks or a martingale is. we will find out about thatsoon enough, but i want to tell you that's what's inchapter 7 of the notes. that's the last topicwe will deal with. we'll study all sorts ofmixtures of these. things which involve alittle bit of each. we'll start out workingon one thing and we'll find out another.

one of these other topics is theright way to look at it. if you want to know more aboutthat, please go look at the notes, and you'll find outas much as you want. but it's not appropriate totalk about it right now. ok, when, where, andhow is this useful? you see, i'm almost at thepoint where we'll start actually talking aboutreal stuff. and when i say real stuff, imean mathematical stuff, which is not real stuff.

broad answer-- probability in stochasticprocesses are an important adjunct to rational thoughtabout all human and scientific endeavor. that's a very strongstatement. i happen to believe it. you might not believe it. and you're welcome tonot believe it. it won't be on a quiz oranything, believe me.

but almost anything you have todeal with is dealing with something in the future. i mean, you have to plan forthings which are going to happen in the future. when you look at the future,there's a lot of uncertainty involved with it. one of the ways is dealingwith uncertainty. and probably the only scientificway of dealing with uncertainty is throughthe mechanism

of probability models. so anything you want to dealwith, which is important, you're probably better offknowing something about probability than not. a narrow answer is probabilityin stochastic processes are essential components ofthe following areas. now, i must confess i made upthis list in about 10 minutes without thinking aboutit very seriously. and these things are relatedto each other.

some of them are partsof others. let me read them. communication systemsand networks. that's where i got involvedin this question, and very important there. computer systems. i also got involved in itbecause of computer systems. queuing in all areas. well, i got involved in queuingbecause of being

interested in networks. risk management in all areas. i got interested in that becausei started to get disturbed about civilizationdestroying itself because people who have a great deal ofpower don't know anything about taking risks. ok, catastrophe management. how do you prevent oil spillsand things like that? how do you prevent nuclearplants from going off?

how do you prevent nuclearweapons from falling in the hands of the wrong people? these again, are probabilityissues. these are important probabilityissues because most people don't regard themas probability issues. if you say there is one chancein a billion that something will happen, 3/4 of thepopulation will say, that's not acceptable. i don't want any risk.

and these people are fools. but unfortunately, these foolsoutnumber those of us who have studied these issues. so we have to deal with it. we have to understandit if nothing else. ok, failures in alltypes of systems-- operations research, biology,medicine, optical systems, and control system. name your own favorite thing.

you can put it all in. probability gets usedeverywhere. ok, let's go to the axioms. probability models have threecomponents to them. there's a sample space. now, here we're in mathematicsagain. the sample space is justa set of things. you don't have to be at allspecific about what those things are.

i mean, at this point we'reright in to set theory, which is the most basic partof mathematics again. and a set contains elements. and that's what we'retalking about here. so there's a sample space. there are the elementsin that sample space. there's also a collectionof things called events. now, the events are subsetsof the sample space. and if you're dealing with afinite set of things, there's

no reason why the events shouldnot be all subsets of that countable collectionof things. if you have a deck of cards,there are 52 factorial ways of arranging the cards inthat deck of cards. very large number. but when you talk about subsetsof that, you might as well talk about all combinationsof those configurations of the deck. you can talk about, what's theprobability that the first

five cards in that deck happento contain 4 aces? that's an easy thingto compute. i'm sure you've all computedit at some point or other. those who like to play poker,of course do this. it's fun. but it's a straightforwardproblem. when you have these countablesets of things, there's no reason at all for not having theset of events consist of all possible subsets.

well, people believed thatfor a long time. one of the things that forcedkolmogorov to start dealing with these axioms was therealization that when you had much more complicated sets,where in fact you had the set of real numbers as possibleoutcomes, or sequences of things which go from 0 toinfinity, and all of these sets, which are uncountable,you really can't make sense out of probability models whereall subsets of sample points are called events.

so in terms of measure theory,you're forced to restrict the set of things you call events. now, we're not going todeal with measure theory in this subject. but every once in a while, wewill have to mention it because the reason why a lot ofthings are the way they are is because of measure theory. so you'll have to be atleast conscious of it. if you really want to beserious, as far as your study

of mathematical probabilitytheory, you really have to take a course in measuretheory at some point. but you don't haveto do it now. in fact, i would almost urgemost of you not to do it now. because once you get all theway into measure theory, you're so far into measuretheory that you can't come back and think about realproblems anymore. you're suddenly stuck in theworld of mathematics, which happens to lots of people.

so anyway, some of you shouldlearn about all this some of you shouldn't. some of you should learnabout it later. so you can do whateveryou want. ok, the axioms about eventsis that if you have a set of events. in other words, a set ofsubsets, and it's a countable set, then the unionof all of those-- the union from n equals1 to infinity of a sub

n is also an event. i've gone for 50 minutesand nobody has asked a question yet. who has a question? who thinks that all ofthis is nonsense? how many of you? i do. ok, i'll come back inanother 10 minutes. and if nobody has a questionby then, i'm just going to

stop and wait. ok, so anyway. if you look at a unionof events. now, remember, that an eventis a subset of points. we're just talking aboutset theory now. so the union of thisunion here-- excuse me. this union here is a1, all thepoints in a1, and all the points in a2, and all the pointsin a3, all the way up

to infinity. that's what we're talkingabout here. and one of the axioms ofprobability theory is that if each of these things are events,then this union is also an event. that's just an axiom. you can't define eventsif that's not true. and if you try to define eventswhere this isn't true, you eventually come into themost god awful problems you

might imagine. and suddenly, nothingmake sense anymore. most of the time when we definea set of events in a probability model, eachsingleton event-- namely, each single point hasa set, which contains only that element, is takenas an event. there's no real reasonto not do that. if you don't do that, you mightas well just put those points together and not regardthem as separate points.

we will see an example in alittle bit where, in fact, you might want to do that. but let's hold that offfor a little bit. ok, not all subsetsneed to be events. usually, each sample point istaken to be a singleton event. and then non-events aretruly weird things. i mean, as soon as you takeall sample points to be events, all countable unions ofsample points are events. and then intersections of eventsare events, and so

forth, and so forth,and so forth. so most things are events. and just because of measuretheory, you can't make all things events. and i'm not going to give youany example of that because examples are horrendous. ok, the empty set hasto be an event. why does the empty sethave to be an event. if we're going to believethese axioms--

i'm in a real bind here becauseevery one of you people has seen theseaxioms before. and you've all gone on and said,i can get an a in any probability class in the worldwithout having any idea of what these axiomsare all about. and therefore, it'sunimportant. so you see somethingthat says, the empty set is an event. and you say, well, ofcourse that has

nothing to do with anything. why should i worry about whetherthe empty set is an event or not? the empty set can't happen,so how can it be an event? well, because of these axioms,it has to be an event. the axioms say that if a is anevent, and that's the whole sample space, then thecomplement has to be an event also. so that says that the emptyset has to be an event.

and that just followsfrom the axioms. if all sample points aresingleton events, then all finite and countablesets are events. and finally, demorgan's law. is there anyone who isn'tfamiliar with demorgan's law? anyone who hasn't seeneven that small amount of set theory? if not, look it up on-- what's the name ofthe computer--

wikipedia. most of you will thinkthat things on wikipedia are not reliable. strangely enough, in terms ofprobability theory and a lot of mathematics, wikipedia doesthings a whole lot better than most textbooks do. so any time you're unfamiliarwith what a word means or something, you can lookit up in your old probability textbook.

if you've used [inaudible] and [inaudible], you willprobably find the right answer there. other textbooks, maybethe right answer. wikipedia's more reliablethan most of them. and it's also clearerthan most of them. so i highly recommend usingwikipedia whenever you get totally confused by something. ok, so probability measureand events satisfies

the following axioms. we've said what thingsare events. the only things that haveprobabilities are events. so the entire set hasa probability. when you do the experiment,something has to happen. so one of the samplepoints occurs. that's the whole ideaof probability. and therefore, omegahas probability 1. capital omega.

if a is an event, then theprobability of a has to be greater than or equal to 0. you can probably see without toomuch trouble why it has to be less than or equalto 1 also. but that's not oneof the axioms. you see, when you state a set ofaxioms for something, you'd like to use the minimum set ofaxioms you can, so that you don't have to verify too manythings before you say, yes, this satisfies all the axioms.

so the second one is theprobability of a has to be the third one says that if youhave a sequence of disjoint events, incidentally when i saya sequence, i will almost always mean a countablyinfinite sequence-- a1, a2, a3, all the way up. if i'm talking about whatmost of you would call a finite sequence-- and i like the word "finitesequence," but i like to be able to talk about sequences.

i'm talking about a finitesequence i will usually call it an n-tuple of randomvariables or an n-tuple of things. so sequence really means yougo the whole way out. ok, if a1, a2, all the wayup are disjoint events-- disjoint. disjoint means if omega is onlyin one, it can't be in any of the others. then the probability of thiscountable union is going to be

equal to the sum of theprobabilities of the individual event. anyone who has ever done aprobability problem knows all of these things. the only thing you don't knowand you probably haven't thought about is whyeverything else follows from this. but this is the wholemathematical theory. why should we studyit anymore?

we're done. we have the axioms. everything else follows, it'sjust a matter of computation. just sit down and do it. not quite that simple. anyway, a few consequences ofthe probability of the empty set is 0, which says when youdo an experiment something's going to happen. and therefore, the probabilitythat nothing happens is 0

because that's whatthe model says. the probability of thecomplement of an event is 1 minus the probabilityof that event. which, in fact, is what's saysthat all events have to have probabilities less thanor equal to 1. and if the event a is containedin the event b-- remember when we talk aboutevents, we're talking about two different things,both simultaneously. one of them is this beautifulidea with measure theory

worked into it andeverything else. and the other is just a simpleset theoretic idea. and all we need to befamiliar with is a set theoretical idea. within that set theoreticalidea, a contained in b means that every sample point that'sin a is also in b. it means that when you do an experiment,and the event a occurs, the event b has tooccur because one of the things that composea has to occur.

and that thing has to be in bbecause a is contained in b. so the probability of a has tobe less than or equal to the probability of b. that has tobe less than or equal to 1. these are things you all know. another consequence isthe union bound. many of you have probablyseen the union bound. we will use it probably almostevery day in this course. so it's good to have that as oneof the things you remember at the highest level.

if you have a set of events-- a1, a2, and so forth-- the probability ofthat union-- namely, the event that consistsof all of them-- is less than or equal to thesum of the individual event probabilities. i give a little proof here forjust two events, a1 and a2. so you see why this is true. i hope you can extendthis to 3 and 4.

i can't draw a picture ofit very easily for 3 and 4 and so forth. but here's the event a1. here's the event a2. visualize this as a set ofsample points, which are just in the two-dimensionalspace here. so all these pointshere are in a1. all these points are in a2. this set of points hereare the points that

are in a1 and a2. i will use just writing thingsnext to each other to mean intersection. and sometimes i'll use a bigcap to main intersection. so all of these things areboth in a1 and a2. this is a2, but not a1. so the probability of this wholeevent, a1 union a2, is the probability of this thingand this thing together. so it's the probabilityof this plus the

probability of this. the probability of this is lessthan the probability of a2 because this is containedin that whole rectangle. and therefore, the probabilityof the union of a1 and a2 is less than or equal to theprobability of a1 plus probability of a2. now, the classy way to extendthis to a countably infinite set is to use induction. and i leave that as somethingthat you can all play with

some time when it's not between9:30 and 11:00 in the morning and you're strugglingto stay awake. and if you don't want to do thaton your own, you can look at it in the notes. ok, these axioms lookho-hum to you. and you've always ignored thembefore, and you think you're going to be able toignore them now. partly you can, but partly youcan't because every once in a while we'll start doing thingswhere you really need to

understand what theaxioms say. ok, one other thing which youmight not have noticed. when you studied elementaryprobability, wherever you studied it, what do you spendmost of your time doing? you spent most of your timetalking about random variables and talking aboutexpectations. the axioms don't have randomvariables in them. they don't have expectationsin them. all they have in themis events and

probabilities of events. so these axioms say that thereally important things in probability are the events andthe probabilities of events. and the random variables and theexpectations are derived quantities, which we'll nowstart to talk about. ok, so we're now down toindependent events and experiments. two events, a1 and a2, areindependent if the probability of the two of them is equalto the product of their

you've all seen this. i'm sure you've all seen it. if you haven't at least seen it,you probably shouldn't be in this class because eventhough the text does everything in detail that has tobe done, you need to have a little bit of insight fromhaving dealt with these subjects before. if you don't have that, you'rejust going to get lost very, very quickly.

so the probability is theintersection of the event a1 and a2 is the productof the two. now, in other words, you havea red die and a white die. you flip the dice, what's theprobability that you get a 1 for the red die and a1 for the white die? well, the probability you geta 1 for the red die is 1/6. just by symmetry, there are only6 possible things that can happen. probability of whitedie comes up as 1.

probability is 1/6 for that. and the probability of the twothings, they're independent. there's a sense of real-worldindependence and probability theory independence. real-world independence says thetwo things are isolated, they don't interferewith each other. probability theory says justby definition, well, the real-world idea of them notinterfering with each other should say--

and i'm waving my hands herebecause this is so elementary, you all know it. and i would bore you if italked about it more. but i probably shouldtalk about it, but i'm not going to. anyway, this is the definitionof independence. if you don't have any idea ofhow this corresponds to being unconnected in the real-world,then go to wikipedia. read the notes.

well, you should readthe notes anyway. i hope you will read the notesbecause i'm not going to say everything in class thatneeds to be said. and you will get a betterfeeling for it. now, here's somethingimportant. given two probability models,a combined model can be defined in which, first, thesample space, omega, is the cartesian product of omega1 and omega 2. namely, it's the cartesianproduct of

the two sample spaces. think of rolling the reddie and the white die. rolling a red die isan experiment. there are 6 possibleoutcomes, a 1 to 6. rolled a white die, thereare 6 possible outcomes, a 1 to a 6. you roll the two dice together,and you really need to have some way ofputting these two experiments together.

how do you put them together? you talk about the outcome forthe two dice, number for one and number for the other. the cartesian product simplymeans you have the set made up of 1 to 6 cartesian productwith 1 to 6. so you have 36 possibilities. it's an interesting thing,which comes from kolmogorov's axioms. that, in fact, you can take inany two probability models for

two different experiments. you can take this cartesianproduct of sample points. you can assume that what happenshere is independent of what happens here. and when you do this, you will,in fact, get something for the two experiments puttogether which satisfies that is neither trivial norvery hard to prove. i mean, for the case of twodice, you can see it almost immediately.

i mean, you see what thesample space is. it's this cartesian product. and you see what theprobabilities have to be because the probability of, say,1 and 2 for the red die and 1 and 2 for the whiteis 2, 6 times 2, 6. so with probability 1/9, you'regoing to get a 1 and a 2 combined with a 1 and a 2. i'm going to talk a littlebit later about something that you all know.

what happens if you rolltwo white dice? this is something you all oughtto think about a little bit because it really isn'tas simple as it sounds. if you roll two dice, what'sthe probability that you'll get a 1 and a 2? and how can you justify that? first, what's a sample spacewhen you roll two white dice? well, if you look at thepossible things that might happen, you can get 1, 1; 2,2; 3, 3; 4, 4; 5, 5; 6, 6.

you can also get 1, 2 or 2, 1. but you can't tell them apart,so there's one sample point, you might say, whichis 1, 2, and 2, 1. another sample point which is2, 3; 3, 2, and so forth. if you count them up, there are21 separate sample points that you might have. and when you look at what theprobabilities ought to be, the probabilities of thepairs are 136 each. and the probabilities of thei j, where i is unequal

to j is 1/18 each. that's awful. so what do you do? when you're rolling two dice,you do the same thing that everybody else does. you say, well, even though it'stwo white dice, i'm going to think of it as if it's awhite die and a red die. i'm going to think of it as ifthe two are indistinguishable. my sample space is going to bethese 36 different things.

i will never be ableto distinguish a 1, 2 from a 2, 1. but i don't care becausei now know the probability of each of them. what i'm trying to say by this,is this a very, very trivial example of where youreally have to think through the question of what kind ofmathematical model do you want of the most simple situationyou can think of almost. when you combine two differentexperiments together and you

lose distinguishability,then what do you do? well, the sensible thing todo is assume that the distinguishability is stillthere, but it's not observable. but that makes it hard to makea correspondence between the real world and the probabilityworld. so we'll come backto that later. but for the most part, you don'thave to worry about it because this is somethingyou've dealt

with all of your lives. i mean, you've done probability problems with dice. you've done probability problemswith all sorts of other things where things are indistinguishable from each other. and after doing a few of theseproblems, you are used to being schizophrenic about it. and on one hand, thinkingthat these things are

distinguishable to figureout what all the probabilities are. and then you go back to saying,well, they aren't really distinguishable, andyou find the right answer. so you don't have toworry about it. all i'm trying to say here isthat you should understand it. because when you get thecomplicated situations, this is one of the main things whichwill cause confusion. it's one of the main thingswhere people write papers and

other people say that paper iswrong because they're both thinking of differentmodels for it. important thing is if yousatisfy kolmogorov's axioms in each of a set of models, andmost important thing is where each of these models areexactly the same. and then you make them eachindependent of each other, kolmogorov's axioms are goingto be satisfied for the combination, as well asthe individual model. why do i care about that?

because we're studyingstochastic processes. we're studying an infinitesequence of random variables. and i don't want to generate acomplete probability model for an infinite set of randomvariables every time i talk about an infinite setof random variables. if i'm talking about an infinitesequence of flipping a coin, i want to do what youdo, which is say, for each coin, the coin is equiprobablya head or a tail. and the coin tosses areindependent of each other.

and i want to know that i can gofrom that to thinking about this sequence. strange things will happenin these sequences when we go to the limit. but still, we don't want to haveto worry about a model for the whole infinitesequence. so that's one of the thingswe should deal with. finally, random variables. definition.

three years ago, i taught thiscourse and i asked people to write down definition of whata random variable was. and almost no one really hadany idea of what it was. they said it was somethingthat had a probability density, or something that hada probability mass function, or something that had adistribution function, or something like that. what it is, if you want to geta definition which fits in with the axioms, the only thingwe know from the axioms

is there's a sample space. there are events and thereare probabilities. so a random variable, what itreally is, is it's a function from the set of sample pointsto the set of real values. and as you get into this, youwill realize that the set of real values does notinclude minus infinity or plus infinity. it says that every samplepoint gets mapped into a finite value.

this happens, of course,when you flip a coin. well, flipping a coin,the outcome is not a random variable. but you'd like to make it arandom variable, so you say, ok, i'm going to modeltails as 0 and heads as 1, or vice versa. and then what happens? your model for coin tossing,a sequence of coin tosses becomes the same as yourmodel for data.

so that what you know about cointossing, you can apply to data compression. you see, when you think aboutthese things mathematically, then you can make all sortsof connections you couldn't make otherwise. so random variables have tosatisfy the constraint that-- they have to satisfy theconstraint that the set of sample points, such that x, xof omega, which is a real number, is less than or equalto some given real number.

that this set hasto be an event. because those are the onlythings that have so if we want to be able to talkabout the probabilities of these random variables lyingin certain ranges, or things like this, or havingpmfs, or anything that you like to do, you need thisconstraint on it. it's an event for all a inthe set of real numbers. also, if this setof things here are each random variables.

in other words, if each of themare functions from the sample space to the real line,then the set of omega such that x1 of omega is less thanor equal to a1, up to an of omega is less than or equalto a n is an event also. you might recognize this as thedistribution function, the joint distribution functionfor n random variables. you might recognize this asthe distribution function evaluated at a for a singlerandom variable. so you define a randomvariable.

and what we're doing here is-- it's kind of funny because wealready have these axioms. but now when we want to definethings in the context of these axioms, we need extra thingsin the definitions. this is a distribution function,a distribution function of the random variablex is the probability that the random variable x isless than or equal to x, which means that x is a mapping fromomega into real numbers. it says that with this mappinghere, you're mapping this

whole sample space intothe real line. some omegas get mapped intothings less than or equal to a real number x. some of them get mapped intothings greater than the real number x. and the set that gets mappedinto something less than or equal to x, according to thedefinition of a random variable, has to be an event. therefore, it has tohave a probability.

and these probabilitiesincrease as we go. it is totally immaterial for allpurposes whether we have a less than or equal to hereor a less than here. and everyone follows theconvention of using a less than or equal to here ratherthan a less than here. the importance of that isthat when you look at a distribution function,the distribution function often has jumps. and the distribution will havea jump whenever there's a

nonzero probability that therandom variable takes on a particular value x here. it takes on this particularvalue with something more than probabilities here. if you have a probabilitydensity for a random variable, this curve just movesup continuously. and the derivativeof this curve is the probability density. if you have a probability massfunction, this is a staircase

type of function. because of the fact that wedefine the distribution function with a less than orequal to rather than a less than means that in every one ofthese jumps, the value here is the upper valueof the jump. value here is the upper valueof the jump, and so forth. now, i'm going to-- i've already saidhalf of this. affects maps only until finiteor countable set of values.

it's discrete. and it has a probabilitymass function-- this notation. if the derivative exists, thenyou say that the random variable is continuousand it has a density. and most problems that you do inprobability theory, you're dealing with random variables. and they either have aprobability mass function if they're discrete or they havea density if they're

continuous. and this is just saying somethings are one way, some things are the other way. and some things are neither. and we'll see lots of thingsthat are neither. and you need the distributionfunction to talk about things that are neither. we will find that thedistribution function, which you've hardly ever used in thepast, is extraordinarily

important, both for theoreticalpurposes and for other purposes. you really need that as a wayof solving problems, as well as keeping yourselfout of trouble. for every random variable, thedistribution function exists. why? anybody know why thishas to exist for every random variable? yeah.

audience: becausethe [inaudible]. professor: yes. because we insistedthat it did. namely, we insisted that thisevent actually was an event for all little x. that's part of the definition. so, in fact, when you do thesethings a little more carefully than you might be used to, thedefinition implies that the distribution functionalways exists.

as a more real-world kind ofargument, we now have a way of dealing with things that arediscrete, and continuous, and mixed continuous and discrete,and anything else that you might think of because thedefinition restricts it. now, one other thing. how do i know thatthis starts at 0? that's a more complicatedthing. and i'm not even goingto do it in detail. but since every omega mapsinto a finite number, you

can't have a jump down hereat minus infinity. and you can't have a jumphere at plus infinity. because omegas don'tmap into plus infinity or minus infinity. so you have to startdown here at 0. you have to climbup here to 1. you might never reach 1. you might reach it only as alimit, but you have to reach it as a limit.

yes? audience: in the firstparagraph, [inaudible]. professor: if we have a sequenceof [inaudible], yeah. audience: it's [inaudible]. professor: you areprobably right. yes. well, i don't know i don'tthink about that one. i don't think you're right,but we can argue about it. but anyway, this hasto start at 0.

it has to go up to 1. ok, we did this. now, i'm going to go through atheoretical nitpick for the last five minutesof the class. anyone who doesn't liketheoretical nitpicks, you're welcome to either go to sleepfor five minutes, or you're welcome to go out and get a cupof coffee, or whatever you want to do. i will do this to youoccasionally.

and i realize it's almosttorture for some of you, because i want to get you usedto thinking about how relatively obvious thingsactually get proven. i want to increase your abilityto prove things. the general statement aboutproving things, or at least the way i prove things, is notthe way most mathematicians prove things. most mathematicians prove thingsby starting out with axioms and going step by stepuntil they get to what they're

trying to prove. now, every time i talk to a goodmathematician, i find out that's what they write down whenthey prove something, but that's not the way theythink about it at all. all of us-- engineers,businesspeople, everyone-- thinks about problemsin a different way. if we're trying to provesomething, we first give a really half-assed proof of it. and after we do that, we lookat it and we say, well, i

don't see why this is true andi don't see why that's true. and then you go back and youpatch these things up. and then after you patchthings up, it starts to look ugly. so you go back and doit a nicer way. and you go back and forth andback and forth and back and forth, using both contradictionand implication. you use both of them. now, when you're proving thingsin this class, i don't

care whether you make it looklike you're a formal mathematician or not. i would just assume you didn'tpretend you were a formal mathematician. i would like to see you provethings in such a way that it is at least difficult to pokea hole in your argument. in other words, i would like youto give an argument which you've thought about enoughthat there aren't obvious counter examples to it.

and if you learn to do that,you're well on your way to learning to use this theory ina way where you can actually come up with correct answer. and in fact, i'm not going to gothrough this proof at all. and i don't think ireally wanted to. i just did it because-- well, i think it's somethingyou ought to read. it is important to learn toprove things because when you get to complicated systems,you cannot see your way

through them intuitively. and if you can't see your waythrough it intuitively, you need to understand somethingabout how to prove things, and you need to put all thetechniques of proving things that you learned together withall the techniques that you've learned for doing thingsintuitively. and you need to know howto put them together. if you're stuck dealing onlywith things that are intuitive, or things that youlearned in high school like

calculus, then you really can'tdeal with complicated systems very well. ok, i'm going to endat that point. you can read this theoreticalnitpick if you want, and play with it. and we'll go on next time.

capital training institute review

Related : capital training institute review

0 komentar:

Posting Komentar

Popular Posts