PERSONAL BLOG
Category: Programming

SQL window functions. Basic

Feb. 22, 2024

Pretty often SQL is used to calculate different metrics or to aggregate values and measurements. Alongside with aggregating functions there are windows functions, which are used widely. 

As an example we're going to use a small table with sudents' subjects & marks. This is how the table looks like in the DB

create table student_grades 
(
    name varchar,
    subject varchar,
    grade int
);

insert into student_grades
values
    ('Peter', 'english', 4),
    ('Peter', 'physics', 5),
    ('Peter', 'history', 4),
    ('Mary', 'math', 4),
    ('Mary', 'english', 4),
    ('Mary', 'physics', 5),
    ('Mary', 'history', 3);

select * from student_grades;
name subject grade
Peter english 4
Peter physics 5
Peter history 4
Mary math 4
Mary english 4
Mary physics 5
Mary history 3

 

Window function in SQL - is a function, which works with a set of rows (aka window, partition) and makes calculations for the set in a separate column.

Partitions (windows) - is a set of rows specified for a window function with a column(s) of the table. Paritions for every window function in a query can be divided into different table columns.

name subject grade
Peter english 4
Peter physics 5
Peter history 4
Mary math 4
Mary english 4
Mary physics 5
Mary history 3

Partitions by the name field

What is the main difference between window functions and aggregate grouping functions?

When using the aggregate functions the statement GROUP BY reduces the number of records in result by grouping them.

name subject grade
Peter english 4
Peter physics 5
Peter history 4
Mary math 4
Mary english 4
Mary physics 5
Mary history 3

name average grade
Peter 4.3
Mary 4

 

Using the window functions doesn't reduce the amount of records in comparison to the initial table.

 

name subject grade average grade
Peter english 4 4.3
Peter physics 5 4.3
Peter history 4 4.3
Mary math 4 4
Mary english 4 4
Mary physics 5 4
Mary history 3 4

 

The order of calculating window functions in an SQL query.

 

SELECT list of columns, window functions
FROM table, joint tables, subquery
WHERE filtering clause
GROUP  list of columns
HAVING aggregation filtering clause
ORDER BY list of columns / window functions
  1. First of all, the command for selecting tables, joining them, and possible subqueries under the FROM command is executed.
  2. Then goes WHERE filtration, grouping with GROUP BY and possible HAVING filter
  3. After that column selection command is applied alongside with the calculating of window functions
  4. At the end ORDER BY sorting is executed - here you can specify a window function column as well

It's important to remember that partitions of window functions are created after the table is grouped with GROUP BY command (in case if the last one participate in a query). 

 

Syntax of window functions

Syntax of window functions consists of identical commands (regardless of the function type). 

Function name  >  FUNCTION_NAME(column_name)
Filter expression (optional)  > [FILTER (WHERE filter_clause)]
Keyword definition  > OVER
    (
Declaring a partition by columns  >     PARTITION BY (column_names),
Sorting Window Function сalculation  >     ORDER BY (column_names)
Frame for the partition  >     [frame_clause]
    )

Window functions can be declared either inside the SELECT command or with a special keyword WINDOW, where the window gets an alias, which we can point to from the SELECT section.

SELECT name, subject, grade,
    row_number() over (partition by name order by grade desc),
    rank() over (partition by name order by grade desc),
    dense_rank() over (partition by name order by grade desc)
FROM student_grades;

-- is equal to

SELECT name, subject, grade,
    row_number() over name_grade,
    rank() over name_grade,
    dense_rank() over name_grade
FROM student_grades
WINDOW name_grade as (partition by name order by grade desc);

 

Categories of window functions

Aggregate Ranking Value
AVG() DENSE_RANK() FIRST_VALUE()
COUNT() NTILE() LAST_VALUE()
MAX() RANK() LAG()
MIN() ROW_NUMBER() LEAD()
SUM() CUME_DIST() NTH_VALUE()

Aggregate functions

SELETE name, subject, grade,
    sum(grade) over (partition by name) as sum_grade,
    avg(grade) over (partition by name) as avg_grade,
    count(grade) over (partition by name) as count_grade,
    min(grade) over (partition by name) as min_grade,
    max(grade) over (partition by name) as max_grade
FROM student_grades;

 

name subject grade sum_grade avg_grade count_grade min_grade max_grade
Peter english 4 13 4.3 3 4 5
Peter physics 5 13 4.3 3 4 5
Peter history 4 13 4.3 3 4 5
Mary math 4 16 4 4 3 5
Mary english 4 16 4 4 3 5
Mary physics 5 16 4 4 3 5
Mary history 3 16 4 4 3 5

 

Ranking functions

In ranking functions under the OVER keyword the condition ORDER BY is required.

  • ROW_NUMBER() - the function calculates a number of rows inside the partition regardless of duplicate values.
  • RANK() - the function calculates a rank of every row inside the partition. If there are duplicates, the function returns the same rank for every duplicate skipping the next rank number. 
  • DENSE_RANK() - the same as RANK, but doesn't skip the next rank number.

 

SELECT name, subject, grade,
    row_number() over (partition by name order by grade desc),
    rank() over (partition by name order by grade desc),
    dense_rank() over (partition by name order by grade desc)
FROM student_grades;

 

name subject grade row_number rank dense_rank
Peter physics 5 1 1 1
Peter english 4 2 2 2
Peter history 4 3 2 2
Mary physics 5 1 1 1
Mary math 4 2 2 2
Mary english 4 3 2 2
Mary history 3 4 4 3

 

NULL values are combined into the same partition

 

Value functions

Those are the functions, which go through the partition and returns border values of the partition or previous/next values.

  • LAG() - a function, which returns the previous value of the column according to the sort order.
  • LEAD() - a function, which returns the next value of the column according to the sort order.

 

A small example of how to get current grade and grades from next/previous semesters 

create table grades_quartal (
    name varchar,
    semester varchar,
    subject varchar,
    grade int
);

insert into grades_quartal (
values 
    ('Peter', '1 semester', 'physics', 4),
    ('Peter', '2 semester', 'physics', 3),
    ('Peter', '3 semester', 'physics', 4),
    ('Peter', '4 semester', 'physics', 5)
);

select * from grades_quartal;
name semester subject grade
Peter 1 semester physics 4
Peter 2 semester physics 3
Peter 3 semester physics 4
Peter 4 semester physics 5

 

SELECT name, quartal, subject, grade, 
    lag(grade) over (order by semester) as previous_grade,
    lead(grade) over (order by semester) as next_grade
FROM grades_quartal;

 

name semester subject grade previous_grade next_grade
Peter 1 semester physics 4 NULL 3
Peter 2 semester physics 3 4 4
Peter 3 semester physics 4 3 5
Peter 4 semester physics 5 4 NULL


FIRST_VALUE()/LAST_VALUE() - functions, which returns first or last values of the column of the specified partition. ORDER BY condition is required to specify next to OVER