Good methods or tutorials for understanding JOINs in SQL [closed]

As we know that there are different types of JOINs in any RDBMS, for eg:- Left join, outer join, inner join, left outer join etc. Invariably we use JOINs for lots of our business logics in our projects. However very few people have complete understanding or mastery over these JOINs. Usually people with half known knowledge on JOINs tend to get confused with complex queries. My area of interest would be mostly related to SQL Server since we are using that in almost all our projects.

I have gone through some nice articles on JOINs but most of them tend to confuse rather than convince. So I wanted to know if there are any good ways/methods or any good tutorials for getting a good grasp on JOINs in SQL. Tutorials should be simple enough as well as help in getting our basics on JOINs stronger.

Thanks in advance.

I find this link visually displaying the different joins in Venn diagrams is pretty succinct.

...Left join, outer join, inner join, left outer join...

LEFT JOIN and LEFT OUTER JOIN are the same; most databases just allow you to omit the OUTER keyword.

here is a generic script you can use in sql server. just amend the start and end dates:

if exists (select * from information_schema.tables where table_name = 'calendar' and table_type = 'base table')
begin
drop table [calendar]
end

create table [calendar]
(
    [calendardate] datetime
)

declare @startdate datetime
declare @enddate datetime
set @startdate = getdate()
set @enddate = dateadd(d, 365, @startdate)

while @startdate <= @enddate
      begin
             insert into [calendar]
             (
                   calendardate
             )
             select
                   @startdate

             set @startdate = dateadd(dd, 1, @startdate)
      end

if you want a more advanced calendar here is one i found on the net a while ago:

create schema auxiliary
-- we put our auxiliary tables and stuff in a separate schema
-- one of the great new things in sql server 2005
go

create function auxiliary.computus
-- computus (latin for computation) is the calculation of the date of
-- easter in the christian calendar
-- http://en.wikipedia.org/wiki/computus
-- i'm using the meeus/jones/butcher gregorian algorithm
(
    @y int -- the year we are calculating easter sunday for
)
returns datetime
as
begin
declare
@a int,
@b int,
@c int,
@d int,
@e int,
@f int,
@g int,
@h int,
@i int,
@k int,
@l int,
@m int

set @a = @y % 19
set @b = @y / 100
set @c = @y % 100
set @d = @b / 4
set @e = @b % 4
set @f = (@b + 8) / 25
set @g = (@b - @f + 1) / 3
set @h = (19 * @a + @b - @d - @g + 15) % 30
set @i = @c / 4
set @k = @c % 4
set @l = (32 + 2 * @e + 2 * @i - @h - @k) % 7
set @m = (@a + 11 * @h + 22 * @l) / 451
return(dateadd(month, ((@h + @l - 7 * @m + 114) / 31)-1, cast(cast(@y as varchar) as datetime)) + ((@h + @l - 7 * @m + 114) % 31))
end
go


create table [auxiliary].[calendar] (
-- this is the calendar table
  [date] datetime not null,
  [year] int not null,
  [quarter] int not null,
  [month] int not null,
  [week] int not null,
  [day] int not null,
  [dayofyear] int not null,
  [weekday] int not null,
  [fiscal_year] int not null,
  [fiscal_quarter] int not null,
  [fiscal_month] int not null,
  [kindofday] varchar(10) not null,
  [description] varchar(50) null,
  primary key clustered ([date])
)
go

alter table [auxiliary].[calendar]
-- in celkoish style i'm manic about constraints (never use em ;-))
-- http://www.celko.com/

add constraint [calendar_ck] check (  ([year] > 1900)
and ([quarter] between 1 and 4)
and ([month] between 1 and 12)
and ([week]  between 1 and 53)
and ([day] between 1 and 31)
and ([dayofyear] between 1 and 366)
and ([weekday] between 1 and 7)
and ([fiscal_year] > 1900)
and ([fiscal_quarter] between 1 and 4)
and ([fiscal_month] between 1 and 12)
and ([kindofday] in ('holiday', 'saturday', 'sunday', 'bankday')))
go




set datefirst 1;
-- i want my table to contain datedata acording to iso 8601
-- http://en.wikipedia.org/wiki/iso_8601
-- thus first day of a week is monday
with dates(date)
-- a recursive cte that produce all dates between 1999 and 2020-12-31
as
(
select cast('1999' as datetime) date -- sql server supports the iso 8601 format so this is an unambigious shortcut for 1999-01-01
union all                            -- http://msdn2.microsoft.com/en-us/library/ms190977.aspx
select (date + 1) as date
from dates
where
date < cast('2021' as datetime) -1
),

datesandthursdayinweek(date, thursday)
-- the weeks can be found by counting the thursdays in a year so we find
-- the thursday in the week for a particular date
as
(
select
date,
case datepart(weekday,date)
when 1 then date + 3
when 2 then date + 2
when 3 then date + 1
when 4 then date
when 5 then date - 1
when 6 then date - 2
when 7 then date - 3
end as thursday
from dates
),

weeks(week, thursday)
-- now we produce the weeknumers for the thursdays
-- row_number is new to sql server 2005
as
(
select row_number() over(partition by year(date) order by date) week, thursday
from datesandthursdayinweek
where datepart(weekday,date) = 4
)
insert into auxiliary.calendar
select
d.date,
year(d.date) as year,
datepart(quarter, d.date) as quarter,
month(d.date) as month,
w.week,
day(d.date) as day,
datepart(dayofyear, d.date) as dayofyear,
datepart(weekday, d.date) as weekday,

-- fiscal year may be different to the actual year in norway the are the same
-- http://en.wikipedia.org/wiki/fiscal_year
year(d.date) as fiscal_year,
datepart(quarter, d.date) as fiscal_quarter,
month(d.date) as fiscal_month,

case
-- holidays in norway
-- for other countries and states: wikipedia - list of holidays by country
-- http://en.wikipedia.org/wiki/list_of_holidays_by_country
    when (datepart(dayofyear, d.date) = 1)          -- new year's day
    or (d.date = auxiliary.computus(year(date))-7)  -- palm sunday
    or (d.date = auxiliary.computus(year(date))-3)  -- maundy thursday
    or (d.date = auxiliary.computus(year(date))-2)  -- good friday
    or (d.date = auxiliary.computus(year(date)))    -- easter sunday
    or (d.date = auxiliary.computus(year(date))+39) -- ascension day
    or (d.date = auxiliary.computus(year(date))+49) -- pentecost
    or (d.date = auxiliary.computus(year(date))+50) -- whitmonday
    or (month(d.date) = 5 and day(d.date) = 1)      -- labour day
    or (month(d.date) = 5 and day(d.date) = 17)     -- constitution day
    or (month(d.date) = 12 and day(d.date) = 25)    -- cristmas day
    or (month(d.date) = 12 and day(d.date) = 26)    -- boxing day
    then 'holiday'
    when datepart(weekday, d.date) = 6 then 'saturday'
    when datepart(weekday, d.date) = 7 then 'sunday'
    else 'bankday'
end kindofday,
case
-- description of holidays in norway
    when (datepart(dayofyear, d.date) = 1)            then 'new year''s day'
    when (d.date = auxiliary.computus(year(date))-7)  then 'palm sunday'
    when (d.date = auxiliary.computus(year(date))-3)  then 'maundy thursday'
    when (d.date = auxiliary.computus(year(date))-2)  then 'good friday'
    when (d.date = auxiliary.computus(year(date)))    then 'easter sunday'
    when (d.date = auxiliary.computus(year(date))+39) then 'ascension day'
    when (d.date = auxiliary.computus(year(date))+49) then 'pentecost'
    when (d.date = auxiliary.computus(year(date))+50) then 'whitmonday'
    when (month(d.date) = 5 and day(d.date) = 1)      then 'labour day'
    when (month(d.date) = 5 and day(d.date) = 17)     then 'constitution day'
    when (month(d.date) = 12 and day(d.date) = 25)    then 'cristmas day'
    when (month(d.date) = 12 and day(d.date) = 26)    then 'boxing day'
end description

from datesandthursdayinweek d
-- this join is for getting the week into the result set
     inner join weeks w
     on d.thursday = w.thursday

option(maxrecursion 0)
go

create function auxiliary.numbers
(
@afrom int,
@ato int,
@aincrement int
)
returns @retnumbers table
(
[number] int primary key not null
)
as
begin
with numbers(n)
as
(
select @afrom as n
union all
select (n + @aincrement) as n
from numbers
where
n < @ato
)
insert @retnumbers
select n from numbers
option(maxrecursion 0)
return;
end
go

create function auxiliary.inumbers
(
@afrom int,
@ato int,
@aincrement int
)
returns table
as
return(
with numbers(n)
as
(
select @afrom as n
union all
select (n + @aincrement) as n
from numbers
where
n < @ato
)
select n as number from numbers
)
go

yes, i do use all three of those joins, although i tend to stick to using just left (outer) joins instead of inter-mixing left and right joins. i also use full outer joins and cross joins.

in summary, an inner join restricts the resultset only to those records satisfied by the join condition. consider the following tables

edit: i've renamed the table names and prefix them with @ so that table variables can be used for anyone reading this answer and wanting to experiment.

if you'd also like to experiment with this in the browser, i've set this all up on sql fiddle too;

@table1

id | name
---------
1  | one
2  | two
3  | three
4  | four

@table2

id | name
---------
1  | partridge
2  | turtle doves
3  | french hens
5  | gold rings

sql code

declare @table1 table (id int primary key clustered, [name] varchar(25))

insert into @table1 values(1, 'one');
insert into @table1 values(2, 'two');
insert into @table1 values(3, 'three');
insert into @table1 values(4, 'four');

declare @table2 table (id int primary key clustered, [name] varchar(25))

insert into @table2 values(1, 'partridge');
insert into @table2 values(2, 'turtle doves');
insert into @table2 values(3, 'french hens');
insert into @table2 values(5, 'gold rings');

an inner join sql statement, joined on the id field

select 
    t1.id,
    t1.name,
    t2.name
from
    @table1 t1
inner join
    @table2 t2
    on 
        t1.id = t2.id

results in

id | name | name
----------------
1  | one  | partridge
2  | two  | turtle doves
3  | three| french hens

a left join will return a resultset with all records from the table on the left hand side of the join (if you were to write out the statement as a one liner, the table that appears first) and fields from the table on the right side of the join that match the join expression and are included in the select clause. missing details will be populated with null

select 
    t1.id,
    t1.name,
    t2.name
from
    @table1 t1
left join
    @table2 t2
    on 
        t1.id = t2.id

results in

id | name | name
----------------
1  | one  | partridge
2  | two  | turtle doves
3  | three| french hens
4  | four | null

a right join is the same logic as a left join but will return all records from the right-hand side of the join and fields from the left side that match the join expression and are included in the select clause.

select 
    t1.id,
    t1.name,
    t2.name
from
    @table1 t1
right join
    @table2 t2
    on 
        t1.id = t2.id

results in

id | name | name
----------------
1  | one  | partridge
2  | two  | turtle doves
3  | three| french hens
null| null| gold rings

of course, there is also the full outer join, which includes records from both joined tables and populates any missing details with null.

select 
    t1.id,
    t1.name,
    t2.name
from
    @table1 t1
full outer join
    @table2 t2
    on 
        t1.id = t2.id

results in

id | name | name
----------------
1  | one  | partridge
2  | two  | turtle doves
3  | three| french hens
4  | four | null
null| null| gold rings

and a cross join (also known as a cartesian product), which is simply the product of cross applying fields in the select statement from one table with the fields in the select statement from the other table. notice that there is no join expression in a cross join

select 
    t1.id,
    t1.name,
    t2.name
from
    @table1 t1
cross join
    @table2 t2

results in

id | name  | name
------------------
1  | one   | partridge
2  | two   | partridge
3  | three | partridge
4  | four  | partridge
1  | one   | turtle doves
2  | two   | turtle doves
3  | three | turtle doves
4  | four  | turtle doves
1  | one   | french hens
2  | two   | french hens
3  | three | french hens
4  | four  | french hens
1  | one   | gold rings
2  | two   | gold rings
3  | three | gold rings
4  | four  | gold rings

edit:

imagine there is now a table3

@table3

id | name
---------
2  | prime 1
3  | prime 2
5  | prime 3

the sql code

declare @table3 table (id int primary key clustered, [name] varchar(25))

insert into @table3 values(2, 'prime 1');
insert into @table3 values(3, 'prime 2');
insert into @table3 values(5, 'prime 3');

now all three tables joined with inner joins

select 
    t1.id,
    t1.name,
    t2.name,
    t3.name
from
    @table1 t1
inner join
    @table2 t2
    on 
        t1.id = t2.id
inner join
    @table3 t3
    on 
        t1.id = t3.id

results in

id | name | name         | name
-------------------------------
2  | two  | turtle doves | prime 1
3  | three| french hens  | prime 2

it might help to understand this result by thinking that records with id 2 and 3 are the only ones common to all 3 tables and are also the field we are joining each table on.

now all three with left joins

select 
    t1.id,
    t1.name,
    t2.name,
    t3.name
from
    @table1 t1
left join
    @table2 t2
    on 
        t1.id = t2.id
left join
    @table3 t3
    on 
        t1.id = t3.id

results in

id | name | name         | name
-------------------------------
1  | one  | partridge    | null
2  | two  | turtle doves | prime 1
3  | three| french hens  | prime 2
4  | four | null         | null

joel's answer is a good explanation for explaining this resultset (table1 is the base/origin table).

now with a inner join and a left join

select 
    t1.id,
    t1.name,
    t2.name,
    t3.name
from
    @table1 t1
inner join
    @table2 t2
    on 
        t1.id = t2.id
left join
    @table3 t3
    on 
        t1.id = t3.id

results in

id | name | name         | name
-------------------------------
1  | one  | partridge    | null
2  | two  | turtle doves | prime 1
3  | three| french hens  | prime 2

although we do not know the order in which the query optimiser will perform the operations, we will look at this query from top to bottom to understand the resultset. the inner join on ids between table1 and table2 will restrict the resultset to only those records satisfied by the join condition i.e. the three rows that we saw in the very first example. this temporary resultset will then be left joined to table3 on ids between table1 and tables; there are records in table3 with id 2 and 3, but not id 1, so t3.name field will have details in for 2 and 3 but not 1.

i figured it out thanks to group by clause is not working with image all i had to do was add alter the image column wording by casting it to varbinary like this:

select 
    "am-martin".dbo.cpcore_site.number,"am-martin".dbo.cpcore_site.latitude,"am-martin".dbo.cpcore_site.longitude,"am-martin".dbo.cpsm_face.retiredon,**cast("am-martin_bin".dbo.cpcore_image.bytes as varbinary)**, "am-martin".dbo.cpcore_site.name, "am-martin".dbo.cpcore_site.zipcode
from
    "am-martin".dbo.cpcore_site

inner join "am-martin".dbo.cpsm_face on "am-martin".dbo.cpsm_face.siteid = "am-martin".dbo.cpcore_site.oid
inner join "am-martin_bin".dbo.cpcore_image on "am-martin".dbo.cpsm_face.oid = "am-martin_bin".dbo.cpcore_image.ownerid

where
    "am-martin".dbo.cpsm_face.retiredon like '%9999%'

     and "am-martin".dbo.cpcore_site.number like n'%la%' or "am-martin".dbo.cpcore_site.number like n'%lc%' or "am-martin".dbo.cpcore_site.number like n'%bh%'


     and "am-martin".dbo.cpcore_site.latitude > 0.0

group by "am-martin".dbo.cpcore_site.number,"am-martin".dbo.cpcore_site.latitude,"am-martin".dbo.cpcore_site.longitude,"am-martin".dbo.cpsm_face.retiredon,**cast("am-martin_bin".dbo.cpcore_image.bytes as varbinary)**, "am-martin".dbo.cpcore_site.name, "am-martin".dbo.cpcore_site.zipcode;

i also have to give @myonara credit for getting me on the right track with the group by suggestion.

you can use checksum_agg. it only takes a single argument, so you could do checksum_agg(checksum(*)) - but this doesn't work for your xml datatype, so you'll have to resort to dynamic sql.

you could generate dynamically the column list from information_schema.columns and then insert int into a template:

declare @schema_name nvarchar(max) = 'myschemaname';
declare @table_name nvarchar(max) = 'mytablename';
declare @column_list nvarchar(max);

select @column_list = coalesce(@column_list + ', ', '')
        + /* put your casting here from xml, text, etc columns */ quotename(column_name)
from    information_schema.columns
where   table_name = @table_name
    and table_schema = @schema_name

declare @template as varchar(max)
set @template = 'select checksum_agg(checksum({@column_list})) from {@schema_name}.{@table_name}'

declare @sql as varchar(max)
set @sql = replace(replace(replace(@template,
    '{@column_list}', @column_list),
    '{@schema_name}', @schema_name),
    '{@table_name}', @table_name)

exec ( @sql )

Tags: Sql Sql Server Database Join