the world is full of data. every app that you use is full of data. on khan academy, we store data about users and badges and progress. on facebook, they store data about who you are, who your friends are, and what they're posting. on bank of america, they store data about how much money you have and what accountsthat's in.
how do these apps store data? well they use a database, which is a program that helps store data, and provides functionality for adding, modifying and querying that data, and doing that all fast. databases come in many forms, but a really popular type of database is called a relational database. it stores each kind of data in a table,
which is kind of like storing data in a spreadsheet. a row represents an item, and a column represents properties about that item. for example, to store data about khan academyusers, we'd have a users table with a row for eachuser, and columns for properties like their nicknameand location. relational databases make it particularlyeasy to form relationships between tables. for example, in order to store khan academyusers
and their badges, we might have a users table and a badges table, and then a user badgestable to remember which users earned which badges, just by mapping user ids to badge ids. that's a more efficient form of storage than having to repeat everything about theuser and everything about the badge in the userbadges table. most databases come with a query language to interact with the database.
sql is a language designed entirely for accessing databases, and is the most popularof them. with sql, we can create tables, change data, get back to data that we're interested in, like we'd want to find which users joinedin the last week, or which users have a particular badge. that's what we're going to teach here, and you'll actually get to try out sql here in the browser, using sqlite,
a particular implementation of it. you won't be able to write the whole app here, but when you're done learning sql, you'll have a much better understanding of how data is stored in the apps that youuse, and be able to use sql, if you ever buildan app. welcome to my database. it has no data in it yet because i want to build it up with you.
what sort of data should we store in our first table? let's start with a grocery list, which you've probably used in real life. i'm pasting an example list which has three delicious items and how much we want to buy of each of them. our first bit of sequel will be the command
to make the table to store this list. alright. create table in all caps and then the name of the table "groceries." and then ( ); we see an error pop up because the sequel interpreter expects to see the column names inside these parenthesis. what column should we have
in order to describe each item on our list? well first we need a name for the item, which i'll call "name" and we need to follow that with a data type. we have a few options. let's go for text. if we look on the right hand side, we can see our new table is listed with onecolumn. but we also need to specify
how many of each thing to buy, like our four bananas. so let's add and quantity column as well. and this will always have a whole number so let's use an integer for that data type. and now we can see that new column listed in our table. that looks pretty good if we're thinking about what data we have
in this grocery list. but we're missing something that we need in databases. a unique identifier for each row. we almost always need unique ids for each row in a database because we need a way to identify rows later when we're updating or deleting them and not be dependent on other columns
because those could change. we typically specify this id column first. so i'm moving my cursor before a name. i'll call this column "id," which that's standard. and then for the data type, i'll have to write this phrase, "integer primary key" which signals to the database that it should treat this as the row identifier
and that each row must have a unique value for this column. okay. now we have our groceries table with three columns in it. it's empty though so let's put some data init. write "insert into" and then the table name "groceries" then "values" and then (
and here we start listing the column values in the order that we declared the columns. the first column was id so i'll put "1" since we haven't used that id yet. the second column is name so i'll write "bananas" and third column was quantity. so i'll write "4." do you see how the schema update on the right?
it now says that there's one row in the groceriestable. so our instruction worked. let's add the next two items and just doing the same thing. "insert into groceries" "values (2, "peanut butter" and just one of those cause if i say too much and "insert into groceries values"
and id "3, dark chocolate bars," have to be dark. milk chocolate is not even chocolate. and we'll get two of those. okay. so it says three rows but to really see that the database actually contains data you can click the table name on the right. this will insert a select statement in yourcode. and don't worry too much about this now
because we'll get to it in the next video. if you want, you can pause the talk through and try clicking the table name yourself to see what happens. that's it, that's all we needed to create our first table and add data toit. it's pretty cool, huh? in the next section, we'll see how to get the data back out of the table
in more interesting ways. we're back with our groceries table but we've expanded on it a bit. it now has a column for which aisle number we can find the item at the supermarket plus we've added a few more items. you can really see the power of sql in thedifferent ways that you can retrieve your data from yourdatabase. this is also where it can get a bit tricky.
to start off simple, how would we retrieve all the rows from our table? to form any query we write select, and then which columns we're interested in,like the name, and then from and the table name that we'reselecting from. we can see a list of the groceries on the right under that results section. what if we want all the column names? we can just replace name with star
and that was also the query that got inserted when you clicked the table name which you might remember from the first talkthrough. this list is out of order, though. if we went from top to bottom at the storewith this list we'd have to keep changing aisles. we'd rather have it ordered by aisle so that we can be more efficient at the store. to do that we can just add an order by clauseto our query,
specifying which column we want to order by. that's better, now we can get our ingredientsfaster. to be even more efficient, my brother andi like to shop together and split the store so that i shop in half the store, he shops in the other half and we meet atthe checkout. there are twelve aisles at this grocery store so for my list i just want to know which items are in aisles 6 through 12.
any time we want to filter results out we can use a where clause for this and specify the column name and then what we want to compare it to. in this case i used a greater than operator but there's lots of different comparison operators depending on what it is you were trying tofilter by. great, so now i know exactly what items i'mgoing to get, i'm going to be really efficent
and you've learned a few ways to use sql toquery. stay tuned to find out even more ways youcan select. - we're back with our grocery list. i can see that it has six rows in it, but i can also see that we need to buy more than one of each item, like our bananas. so i'm not sure offhand how many items we'll end up buying total. and i'd like to know that total number
so that when we're cashing out, we can just do a quick check to see if we have the right number of items in ourcart. well to do that in sql we can use what's called an aggregate function. an aggregate function is useful for things like getting the maximum, minimum, sum and average of values in our database.
in this example, to get the total number ofitems, we'll start with select. then the name of the aggregate function we'll be using, sum. then the name of the column we want. and then from, the name of the table we're selecting from. tah-dah! so you can see on the right that the sum is15
so we should have 15 items in our cart if we got everything correctly. and notice if i go up here and change the number of bananas because, you know, i'm gonna invite my monkeyfriends over for a jungle party, we can see the sum increase in real time and get bigger and bigger. so we know that we're now gonna need
67 items in our cart. now we can easily try out other aggregatefunctions here. cause sum's not the only one. so if we wanted to know what is the most that we'll be buying of any one item, then we could use max. see, 56, those are obviously our bananas. but this information isn't as useful for thiscase. let's go back to sum. okay.
now what if we wanted to make sure we had the right number of items after eachaisle? well we can do that in sql using the groupby clause. we add it to the end of a query, specifying the column name to group by "aisle." okay, so now we can see that in one aislewe have nine, in another we have one, but i don't actuallyknow which aisle we're getting each of those in. so what we can do there is just add "aisle"
to the beginning of this select. so there you can see it, okay. so we're gonna get nine items in aisle two, one item in aisle four, 56 in aisle seven, and one in aisle twelve. awesome. now how did that actually work behind thescenes? the sql engine first did the grouping of therows based on aisle.
so first did this group by. then it summed up the quantity in each ofthose groups. and then, finally, it selected the first aislevalue that it saw in each group. and, you know, the aisle value was the samefor all of them so we got the aisles back out. but we could've also said "name" here and we do see a name for each of the rows. but it's a bit misleading because for someof these aisles
there are actually multiple items in thatgroup. but the sql engine just picked the first itemout of it. so you really shouldn't be using somethingdifferent from what you're grouping by cause you might not get a sensible result. so let's say "aisle" and now that's an accurate presentation of our data. okay, so great. you've seen aggregate functions.
you've seen grouping it by and now you can officially gather useful statistics on your data. another tool for your sql toolbox. i've spent the last few talk-throughs makinga grocery list. well now that i've realized how much eatingi do, i realize what else i should be tracking ina database: how much exercise i'm doing. [laugh] so, i'll start by pasting in some statementsthat create a table
and insert a few rows of data. let's see what i have in `exercise_logs`. we have an `id`, of course, a `type`, which is a string like biking ordancing, `minutes`, for how many i spent, `calories`, for how many i burned, and `heart_rate` for how high it went. now, i'm inserting the data using a slightlydifferent syntax than before. notice how i'm specifying the column namesin parentheses after the table name.
when i do that, it means that i don't haveto specify every column value here, i only have to specify the ones that i givein this list of column names. now, do you see which one i didn't specifyhere? the missing column is `id`. and that's very much on purpose. `id`, here we go. the `id` column is the primary key in thetable, and it's set to auto-increment. that means that the database
will automatically fill it in for us whenwe tell it to, by picking an id that's different from theother rows in the table, usually a number that's one bigger than thebiggest number so far. now i'd rather have the database figure thatout than have me figure it out, so i usually like to do my `insert`s thisway, leave out the id, let the database do that work for me. okay, so there we go, my database table isset up, and it looks like i have even done a bit ofexercise. [laugh] go me!
let's find out where i'm burning the mostcalories, with a simple query. i'll just modify this one, so, `select * from exercise_logs where calories> 50`, and we'll just `order by calories` as well. okay, so i've done two activities where i'veburned more than 50 calories, and it looks like dancing burned the most. now, i want to find out which activities i'vedone that have burned more than 50 calories, butalso taken less than 30 minutes. because i want to exercise more efficientlyin the future.
it's pretty obvious from looking at the currentresults, there's only two results, it's easy to look at that, but just imaginethat there are thousands of rows, maybe this is data fromthousands of users. i would need a way to filter that down inthe sql query itself. to do that, i can use the `and` operator, to combine those multiple conditions, liketo only return the rows where calories are greater than 50, and minutesare less than 30. ha-ha! so now i only see one result, dancing,which means i should just spend my whole life dancing.
which sounds pretty good to me. okay, so in a similar way, i could use the`or` operator to return rows that meet any of some conditions. like, let's say rows where `calories` aregreater than 50, `or heart_rate` is above 100. and that helps me find the most vigorous exercises. now we can see that dancing and biking areboth good, and actually, i do like biking, it's justa bit harder. now, you can have as many `or` and `and` operators
in your query as you'd like. the `and` operator has precedence over the`or`, if you've got both of them in the same samequery, but you can always use parentheses to changethe order of evaluation, just like with math expressions. now i want you to try these out in the nextchallenge, and stay tuned for the next talk-through,where i'm going to show how to do some interesting things using this`type` column here. see you then!
i'm back with my table of exercise logs, andi've added a few more rows. now, if i wanted to fliter these logs to justshow my biking logs, i could just add `where type = "biking";`. simple. it's getting sunny outside, so actually whati want to find is all of the outdoor activities, not justbiking. so to do that, i can use the `or` operatorthat we just learned, checking each of the different outdoor types. `type = "hiking" or type = "tree-climbing"or type="rowing";`
unfortunately, i don't yet have a mountain,or a tree, or a lake in my home, so all of these are outdoor activities. all right, so that worked, but there is actuallya simpler way to do this query, and that's using the `in` operator. the `in` operator will check to see if a particularvalue is in a list of values. let me show you. so we'll take this query, and i'll paste it, and then i'm going to replace this equalssign with `in`, and then put a parenthesis,
and then i'm just going to separate each ofthese strings with a comma, instead of that whole long `or type = "blah";`. ta-da! same results, like we expected, butthis query is easier to read and it's a bit shorter, too. we could easily also do the inverse query, if we just want to see the indoor activities. we just write`not in`, ha. so that's just dancing, and i'll save thatfor the wintertime. let's change it back to `in`, and our outdooractivities,
and i'm going to show you something a littlemore interesting that we can do with `in`. first, i'm going to need another table, ofdoctor recommended activities. i'll paste in the sql for that-- let's see,where do we have it? here. so, here we've created a table of doctor recommendedactivities which just has a `type`, which matches ourtype from up here, and a `reason`, which is why the doctor recommendedit. now, what if i wanted to see all of my exerciselogs
that correspond to doctor recommended activities? well first, i might want to see, what arethose doctor recommended activities? so, `select type from drs_favorites;`. okay, biking and hiking. so, i could just take this query, and modify it, and just shorten it to bikingand hiking. however, if the `drs_favorites` tables changed-- like, there was a new row added, or a rowdeleted-- then this query would be out of date.
what i want is for this query to always beup to date with what's in this table here. so what i can do is use the `in` operatordirectly with the results of a sql query-- a `select` query. and it's actually very simple to do. i'll replace what's in the parentheses withthe query that we did before. ♪ ta-ta! ♪ cool! so, we call this inner query a subquery.
it's the query that we're doing inside theouter query. and now, this query will always pull basedon whatever's in the doctor's table at the time,it will stay up to date. and this is a pretty simple subquery, butit could actually get really complex. it could be as complex as any of the queriesthat we've learned so far. let me show you an example. what if we only want to select the favorites that the doctor recommended for cardiovascularreasons? so we could say, `where reason = `
and then we could paste this whole long reasonin here, and we see hiking. but what if for some reason we didn't havethe period in here? well now we actually see nothing for thatquery. because it's trying to do an exact match,and it can't exactly match that. so there are times like this when we wantto do an inexact match. and we can do that with the `like` operator, which is a pretty neat operator. and so to do that, we're then just going totake this
and i'm going to replace the equals sign with`like` and get rid of-- because all we care about is that very importantword, cardiovascular, that is what we're looking for. so we say `"cardiovascular"`, and we alsoput a percentage sign on either side, which acts as a wildcard. so this should match any rows which contains the word cardiovascular anywhere--and it did, yay! okay, so that's pretty cool. whew! we covered a lot, `in`, subqueries,`like`.
hopefully, you're pumped to try them out inthe next challenge, or, take a break, go climb a tree, and comeback. either way i hope you have fun with it. let's continue with more queries of my exerciselogs. let's say that i want to see how many caloriesi've burned for each type of activity. we can do that with an aggregate query. say `select type,` and then `sum(calories)from exercise_logs`, and then we're going to `group by type;`.
now i can see each type with how many caloriesi've burned total for that type of exercise. now, do you notice how the column shows upas `sum(calories)` in the results? if we want, we can actually tell sql to givethat column a new name, just by writing `as` and then giving it thenew name, `total_calories`. that can make our results easier to read andit also has another benefit, which we'll use later. so next, i want to filter the results, to only show activities where i burned morethan 150 calories total,
across all the times i've done that exercise. it turns out that 150 calories is the amountin one ounce of dark chocolate, so it seems like a good amount to shoot for. now, my first instinct might be using `where`to filter. like if i took this query here, and then said`where calories > 150`. then we only see dancing, and we don't seebiking, which we'd expect to see. the reason we only see dancing is becauseit filters each individual row that comes in.
and dancing is the only exercise where i burnedmore than 150 calories in a single log. but what i want to know is which exercisetype or types have i burned more than 150 calories across all of the logsfor that exercise type. we actually have to use something new forthat, which is the `having` clause. to use that, we'll remove this `where`, and then after the `group by` we'll write`having` and then `total_calories > 150`. ta-da! now we see biking.
because now it's actually looking at the totalcalories, and i've indeed earned more than 150 overtime in my logs. so as you can see, those two queries are actuallydifferent, but they are easy to confuse. when we use `having`, we're applying the conditions to the grouped values, not the individualvalues in the individual rows. we could use any aggregate function on a groupedcolumn that makes sense, `sum`, `min`, `max`, `avg`, whatever we wantto check. let's do another example.
so if we want to see the average caloriesfor all the exercises where we burned, let's say, more than 50 averagetotal-- oh, that's all of them, so let's up it. a hundred, in the middle, still doing good,seventy-- cool. so you can see we can play around with thatquery there, get different results with that `having`. now let me do something a little different. let's say we want to see all the exercisewhere we logged at least two activities for that type of exercise.
for that we're going to use a `count` function. so, i'll `select type from exercise_logs`, `group by type`, and then `having count(*)`, doesn't matter what column, so we can justdo star, greater than or equal to two. okay, so now we can see biking, dancing, treeclimbing. i've done all of those at least two times. makes sense, because those are my three favoriteactivities in life-- besides coding, of course!
okay, so now you've seen `having`. remember that it's easy to confuse `having`with `where` when using these aggregate functions. so think through your results, and make surethey make sense. and now you get to try it yourself. let's try a few more advanced sql features. i've been logging heart rate for each of myexercises, and i haven't done anything interesting withthis data yet. so now it's time.
i did a little research on the internet, andi found that the maximum heart rate is 220 minus your age. so one thing i can do is query my logs tosee if my heart rate ever goes above max, since that would be prettyscary, or tell me that there's something wrong withmy heart rate checking device. okay, to do that, i'm going to do a `count(*)`. so `count(*) from exercise_logs` and then `where heart_rate` greater than 220minus my age, 30. woo hoo, there's no results, and that makesmy heart happy.
now, notice i'm using the subtraction operator? and in fact, i can use most math operatorsin sql, like to add, subtract, multiply, or divide. and if i need to, just like in math, i canthrow in parentheses and change the order of evaluation. all right, so, we've confirmed that my heartdoesn't go above max, that's good. now let's look to see if my heart gets intothe target heart rate zone. and that is 50 to 90 percent of max. so we're going to start with our `count(*)`,
and we just need to change our `where`. so this time we're going to say heart rateis greater than-- and let's look at 50 percent of max-- that's220 minus 30. okay, so you see i'm combining a few mathoperators here. and i'm going to also throw in the `round`function, just to show you that we can use that function, it's prettynifty. and then we're going to say `and`, and i'lljust copy-paste this, and change it to less than 90. let's go ahead and throw an equals in there.
all right, so now, we're checking to see 50to 90 percent of max and we see there are four logs that fall in that targetzone. but what about my other logs? what zones are they in? we don't know, all we know is that four ofthem are in that target zone, the rest could be-- maybe they're 90 to 100percent, maybe they're less than 50 percent. what i would really like to see is a summaryof all my logs and how many were in each of the heart ratezones.
now that sounds like a situation where i woulduse a `group by`, like `group by` this column that says whichheart rate zone this row is in. but i don't actually have a column like thatto group by. well, i can effectively create a column, usingthe `case` statement. now, i'll admit that i find this the trickieststatement to use in sql. so, i struggle with it a bit, and don't worry if you find it a bit odd aswell. it is similar to an `if` or a `switch` fromprogramming languages, if you're familiar with those.
otherwise, don't worry about it. all right, so, we're going to use `case`,but i'm just going to start with outputting what we know, which is `type` and `heart rate` `from exercise_logs`, and see what that looks like. okay, now for each of these rows, i want toadd a new column that says which zone the heart rate is in. to do that, i go into the `select` part ofthe query. and i'm going to write `case`, to begin the`case` statement,
then i type `when`, and the first condition which is checking if it's above max, so thatwas `heart_rate > 220 - 30`. then i write `then`, [laughs] and an expression,which is just a string. so we'll say above max. okay, so this is my first case. now i'm going to keep going with my otherconditions. so, `when heart_rate > 90`, `then` we'll say that it's above target. and then-- so if greater than 90-- and thenwe'll say, at that point,
if it's greater than 50, we can say it's withintarget. and everything else would be below target,so i could do another condition for that. but i can also just be lazy, or smart, howeveryou want to think of it, and say `else "below target"`. okay? now my conditions are all done, but i haven't told sql what to name this newcolumn. for that, i write `end as "hr_zone"`. so now, for each of the rows, i can see thenew column
with a nice description of what zone they'rein. now that we've done that finally, i can makea query that summarizes how many of my logs are in each of the zones. and that is a whole lot easier now that i'veset up this `case`. i'm just going to take this query here, copy it, and i'm going to `group by` the newcolumn we created. and then add `count(*)`, get rid of the heartrate, see what we have here-- aha! and now we see that we've got four below targetand four within target.
that's pretty neat, huh? that `case` statement, it is a tricky one, but it sure is handy once you get the hangof it. you may not use it all the time, but it'sgood to know-- just in case. i've set up two tables in this database-- a `students` table, with detailed informationabout each student, like their name and email. and a `student_grades` table,
which has their student id, test name, andgrade. let's see what happens if we just select everythingfrom `student_grades`. ta-da. okay, so we see student ids, test names, andgrades. i'd actually rather be able to see studentnames, emails, that sort of thing. now, what you might notice is that the studentid in `student_grades` is related to the id in `students`. so this 1 here is actually referring to peterrabbit. and this 2 here is actually referring to allisonwonderland.
now, i want to form a query that will output the student name and email next toeach test grade. and that information is in two different tables. so that means i need to join the tables. and there are a few ways that we can do thatin sql, and we'll try them all out actually. the simplest is called a cross join. and i can make that happen just by puttingboth table names after the `from`. ta-da!
so as you can see, lots of rows, and basicallywhat this did is that for each row in the first table, it created a row for each of these rows here. that means we end up with eight rows, becauseit creates four rows for each of these two rows. this is the simplest join, but it's also theleast useful. we don't want every row matched with everyother row. we only want them matched if the student idmatches. now we can do that using an inner join,
which isn't actually much more work at all, and is way more useful. so, there's actually a few ways of doing aninner join. we'll start with just building off the lastquery, and we can add a `where`. the `where` will check to see that the studentid matches students to id. all right, so now we actually have what wewanted-- is test grades next to names. because it's done the the cross join and thenjust limited to the rows
where those columns were the same. great! this syntax here is called an implicit innerjoin, and it works, but it's not actually the bestpractice. the better way is if we do something calledan explicit inner join. and that is using the `join` keyword. so to do that, we're going to start with justdoing from one table. and actually, i'll do from `students`, andthen `join` on `student_grades`. and then, instead of `where`, we do `on`.
and that specifies what columns are beingmatched. so `on students.id = students_grades.student_id;` all right, so now we have the same resultsas before-- switched, but really the same. now that i have these tables joined, i'm justgoing to whittle down the columns i'm outputting to the names, theemail, the test name, and the grade. okay, beautiful. now the cool thing is, once we've done joins, we can still use `where` and `group by`, andall those fancy things.
like, let's say that we want to filter downto results to just grades greater than 90. woo hoo! let's see-- oh, nope. uh, grade, grade-- there we go. a-ha, lots of ones above 90, now i can emailall those students with a high-five. okay, but what would happen if my tables bothcontained columns with the same column name but different meanings?
like right now, our student grades table hasa grade column, but what if my students table also had a gradecolumn for their overall class grade? that would mean down here, when we selectedgrade, it wouldn't know which table to pull it from, because there'd be a grade column in bothof our tables. so actually, to be super safe, we should prefixour columns with the table name that they're from. so we say `students.first_name, students.last_name,students.email,`
`student_grades.test, student_grades.grade`. okay, now we know that we're going to getthe columns from the table we expect them to be in. great. now that is our basic explicit inner join. and it is what you'll use for most of yourjoins across related tables. but if you stay tuned, i'll show you a fewother types of joins as well. i'm back with my related tables about students-- with the students, and the student grades.
and i added a new table to keep track of whatprojects students are working on. and once again this one has a `student_id`to relate it to the `students` table, just like `student_grades` had. and it also has a `title` for each project. now what i want is a list of student namesand the projects they're working on. so that means i need to join `student_projects`with `students`. we can do that, as you can probably guess, using an `inner join` that we just learned. so what do we want?
so we want `students.first_name`, we want`students.last_name`, we want the title of the projects from thestudent_projects table, and then we'll say `from students join studentprojects` and `on students.id = student_projects.student_id`. all right, so i see one project, peter, and his very promising carrotapault experiment. but where is my other student, alice? well we're missing alice because an `innerjoin` only creates rows if there are matching records in the two tables.
there's no row for alice, because there'sno row in `student_projects` that has alice's studentid in it. and this makes sense, and often times withjoins, we do only want rows where the records matched. but in this case, we want a comprehensivelist of every student and their project, and we want every student to be on that list, even if they don't have a project yet. and that is where an `outer join` is superuseful. and it's really easy to use.
we're just going to go right here and type`left outer`. now we see alice, and there's a big old `null`for the project title. okay, so how this works is that the `left`tells sql that it should make sure to retain every rowfrom the left table, which is the one after the `from`, `students`. and the `outer` tells it that it should retainthe rows even if there's no match in the right table, which is `student_projects`. and that's our outer join, and there's a lotof cases where you might find
you want an outer join. and just keep in mind the behavior of innerversus outer joins. there are also some other outer joins. there's a `right outer join`, and it basicallydoes the opposite, makes sure that it keeps everything from theright and then joins with the left. we don't actually support right outer joinshere, but if you want that, you can just switchthe table order and it's the same thing. so you don't actually need to have a `rightouter join`,
you can always use `left outer join`. and there's also something called a `fullouter join`, and that matches rows, if it can, on boththe left and the right side, and fills in `null`s when it can't on eitherside. and that's pretty interesting, but also isnot supported in our environment here. and that is one of the interesting thingsabout learning sql-- i'm showing you lots of things that work here, and work in other sql environments, but every environment is a little different,
so you'll constantly be tweaking the toolsin your sql toolbox for each new environment. and that's true about a lot of things withcomputers honestly. you'll learn the general techniques, and thenlearn how to adapt it for a specific language, or app, or environment. and you basically just get really good atlearning. so-- yay! [laugh] let's say that we want to create tables tostore the data for a diary app. a basic setup would be a users table,
and a diary logs table. now when a user wrote up their diary on thewebsite, we would have some code to execute an `insert`statement like what i'm pasting in here. so this inserts two rows for two diary logs,one day after the other-- first where somebody has a horrible fight, and buries their woes in three pounds of darkchocolate, and the next day, of course, they make up and celebrate with ice cream,as you do.
so, we can check to see what our `diary_logs`table looks like now. and we can see those two posts in there. now, imagine that the user comes back andwants to modify their logs. maybe because they don't want to admit, even to themselves, that they have eaten thatmuch dark chocolate.
now this is telling it to update that tableand set the content, but it's not telling which row to update. we need to tell the database which row toupdate, because we don't want it updating every rowwith that content, just one row. so, how do we find the log to update? well, we can use the id, like that. if we didn't know the id, we could also filter by columns that we knew were unique to thelog, like `user_id`, and the date, which was 2015-04-01.
but really, if you know the id, it's saferbecause you're not relying on having to filter on columns which could actuallybe the same across multiple rows if you might have a diaryapp where users are allowed to write multiple logs per day. it really depends on how the app and the tablesare designed. i'll go back to the id, since i know the id. now we can do another `select` to see thatour table changed. and it did. no more knowledge of those three pounds ofchocolate.
okay, so the user made that change, but thenthe user decides that they actually want to delete the entirelog entry. like, maybe because they want to forget thatthey even had that horrible fight. so in that case, we would need to have codeto execute a `delete`. so we can say `delete from diary_logs where`, and once again, i'm going to use the id becausei know it. and let's select again to see what it lookslike. ha-ha, so we only have one row left, whichmeans that our `delete` worked. now, you should be very careful whenever you'redoing `update` or `delete`,
because you want to make sure you're updating the actual rows that you intend toupdate. you don't want to update the wrong rows, or delete the wrong rows and lose that data. in fact, some apps never issue `delete`s,they never really delete rows. instead, they add, like, a `deleted` columnto their database, and then they'll do something like set `deleted`to `true` when the user wants to delete. and then they filter based on `deleted = false`in the `select` queries.
something like that. so that means that you never actually deletedata, which can be a little bit safer, but it's a little more complicated to do thatas well. so anyway, that's up to how you design theapp. so now, with `select` and `insert` and `update`and `delete`, you have all the commands you need to handlewhat a user would want out of your diary app. and now hopefully you can better imagine whichcommands are happening
behind the scenes when you use your favoriteapps every day. we're back with the database for our diaryapp. now, when we first created these tables, likethe `diary_logs` table, we decided to store user id, date, and content. and now i want you to imagine that a few months after we launch this diaryapp, a user designer suggests adding a how areyou feeling drop-down on each diary entry. and the user would say something like happy,or sad, or confused.
in order to store that, we need to add anemotion column to the table. now we could just add it like this. but if this was a database for an app in areal production environment, then we couldn't just edit the `create` statement because that's something that we actuallyissued months ago, and it's already happened. and if we re-run it, it'll actually removethe current data. and we don't want to lose all the data thatwe've gotten over these many months. so, at this point, after we've done `create`,
and after we've got data, we're going to need to use the `alter table`statement. and to use that, we write `alter table`, andthen the name of the table, and then `add`, and then the name of the columnname we're adding, and then the data type of that new column. great! so now, the next time that the usermakes an entry, we'll need to specify that column-- or wecan specify that column. so let's say they went to disneyland, and to specify that new column, we're goingto put `, emotion`,
and then here we'll just say happy-- mostpeople are happy to go there. okay, so, let's see what it looks like nowif we `select` it all. so can we see the latest row has happy inthe emotion column, and the row before it has that big old `null`in it. and that is what databases usually do, they fill in `null` for unspecified values. if we want, we can actually get it to fillin some other value for rows where it wasn't set, and that's what we think of as the defaultvalue for a column.
so to do that, if we had wanted to do that, we would have put `default` and then the value in that `alter table` statement when we ranit. so now you can see, instead of `null` we seeunknown. so that's an option for you when you're addinga column, and just something to keep in mind, depends what you're adding to the database. okay, so now let me show you something thatyou rarely do-- deleting an entire table.
maybe i would do this if i migrated all ofthe data out of the this table into some new table, or if i just got mad and decided to be a horribleperson, and take a rampage all over a database. that's bad. but anyway, we can do this using the `droptable` statement. and now, if i try to `select`, what happens? i actually get an error, because my tableno longer exists. so it can't select from it.
so that was a bad idea, and i'll just deleteit. you should think really carefully about droppingtables. do not just drop them willy-nilly. you should also be careful with `alter table`, especially if you're running it on a tablewith a lot of rows, because you could have performance issuesthere. really, i don't want to scare you, but youshould be careful whenever you're making any changes to yourdatabase. with great power comes great responsibility--
and now you have both. i'm back with just the table about studentsthis time. i've added one more column to the table, `buddy_id`. see, i decided that every student should havea buddy-- somebody that they can pair up with, do assignments with, study with, all of that. now the buddy id is what it sounds like, it's an id that relates to the id of the studentthat they're buddied up with. so for example, here, peter is buddied upwith 2,
and 2 is alice. let me just do a `select` that should makethat more that clear. so, `select id, first_name, last_name,buddy_id from students`. the interesting thing here is that this `buddy_id` is a column in a tablethat's related to another column in the same table. that's a bit different from before, where we'd have columns related across differenttables. so, what if we wanted to show the names ofeach student
next to the email of the buddy, not just theid, the email. to do that, we need to actually join the `students`table with itself, because the two bits of related data thatwe need both happen to be in the same table. we can do that by-- let's see, so we wantthe first name, last name, email-- so to join `students` with itself, we're goingto write `from students` and then `join students;`. hm, okay? all right, okay, so we get an error-- `first_name`is an ambiguous column.
and that's true, we're telling it to select`first_name`, but it's joined across two tables, and both of these tables have the same columnnames. and before, we would just prefix these withthe table names, but we can't do that here, because actually, they're both named students. so we need to give it a way to distinguishbetween this one and this one, even if they're kind of thesame thing. and we can do that by giving one of them analias.
so, we're going to write the alias right afterhere. the alias is like a nickname. so here's the alias, `buddies`. and then we just have to specify that theemail is coming from `buddies.email`. ha-ha, there we go. now, we're seeing too many rows, way too manyrows, and that's because we don't have an `on` clauseyet. so it's just joining every possible row withevery other one. now our `on` should check that the buddy idfrom the `students` table,
that first one, matches the id from the secondtable, `buddies`, or what we called `buddies`. ha-ha! now we can see peter with alice's email address, alice with peter's, et cetera, et cetera. to make this results table even a little moreclear, i'm going to rename `buddiesemail as buddyemail`. okay, so now it's a lot more clear that we're looking at peter rabbit and hisbuddy's email.
cool. now this type of join is called a self join. and that's because it's a table being joinedto itself. you can try this out in the next challenge, and i'll show you one more way to use selfjoins in the talk-through after. i hope you'll join me, if not, just join yourself.[laugh] now i want to show you a combination of joinsand self joins. so, this time i'm back with multiple tables tracking the students and their projects.
now, since i have students working on greatprojects, i want them to review each other's work inpairs. for example, hyena habitats should reviewcarpet physics. that's why i've created this new table, `project_pairs`. and all it stores is two project ids thatare paired with each other. now let's see, what do i see if i `select* from project_pairs`? ugh, okay, it's just a bunch of numbers, a bunch of ids, all the ids that are mappedwith each other. it is not the most human readable output atall.
what i'd like instead, is rows of projecttitles paired with each other. the project titles are in the `student_projects`table. so that gives me a clue that i need to somehow join `project_pairs`with `student_projects`. let's see what happens if i do that. [typing] a-ha, so now we see project ids and projecttitles in each row. what i eventually want, in the results table,
is to see two project titles in each row,not just one. so that's another clue that we need to doanother join, a self join of `student_projects` with itself, so that we can get those same column valuestwice. now it may not be obvious how to add a selfjoin at this point, because we already have a join. if we want to add another one, we can actuallyuse the join keyword again. so `join student_projects` and we need tomake sure we give them different aliases, so that they'renot ambiguous.
so `a` and `b`. all right, so now we've got a lot of stuffin this results table, but if you'll scroll, you'll see that theproject titles show up twice in each row. so maybe let's make it easier to see, `a.title,b.title`. but we still have way too many rows, because we're matching every row with everyother row, and we haven't added the `on` clause, thevery important `on` clause. so we're actually going to add two `on` clauses,
one for each of the joins. so, `on project_pairs.project1.id = a.id`, and then `on project_pairs.project2.id = b.id`. a-ha, there we go! so thanks to this combination of joins andself joins, we can see a nice list of project titles pairedwith their project title. now you can use as many joins in your queryas you need to get the data that you want. but generally, more joins means slower queries.
if you're interested in how to figure out if your queries are slow, and making themmore efficient, we'll talk about that soon. so, stay tuned, and keep at it!
0 komentar:
Posting Komentar