Lec 1 Relational Model

Terminology

  • A database management system (DBMS) is software that allows applications to store and analyze information in a database, and it should support the definition. creation, querying, update, and administration of databases in accordance with some data model.’

  • A data model is a collection of concepts for describing the data in a database. (e.g. relationship or documentation)

  • A schema is a description of a particular collection of data, using a given data model. (e.g. table structure design)

  • A relation is an unordered set that contain the relationship of attributes that represent entities. (e.g. table or container)

  • A tuple is a set of attribute values (aka its domain) in the relation. (e.g. data entry in the table)

DBMS can auto-generation unique primary keys via an identity column:

  • IDENTITY (SQL Standard)

  • SEQUENCE (PostgreSQL / Oracle) Globally, others are bind with column

  • AUTO_INCREMENT (MySQL)

Relational Algebra

image-20260115225646566

Lec 2 Modern SQL

SQL is based on bags (duplicates, like multisite) not sets (no duplicates).

Terminologies:

  • Data Manipulation Language (DML)
  • Data Definition Language (DDL)
  • Data Control Language (DCL)

Execution order:

1
2
3
4
5
6
7
8
FROM / JOIN
→ WHERE
→ GROUP BY
→ 聚合函数(AVG / COUNT / …)
→ HAVING
→ SELECT
→ ORDER BY
→ LIMIT

Aggregates

Functions that return a single value from a bag of tuples:

  • AVG(col)→ Return the average col value.
  • MIN(col)→ Return minimum col value.
  • MAX(col)→ Return maximum col value.
  • SUM(col)→ Return sum of values in col.
  • COUNT(col)→ Return # of values for col.

COUNT(*) counts rows, COUNT(1) counts a constant value per row (which is never NULL), so they are equivalent; only COUNT(column) can differ because it ignores NULLs.

image-20260117180637261

In the above picture, the AVG(s.gpa) is only a single value, but e.cid may have multiple values, so the DB don’t know how to choose a value for the column, we should use ANY_VALUE() to explicitly tell DB how to choose.

Group By

Project tuples into subsets and calculate aggregates against each subset

Group by happlens after FROM/JOIN/WHERE but before the aggregate function.

Grouping Sets

Specify multiple groupings in a single query instead of using UNION ALL to combine the results of several individual GROUP BY queries.

Filter

Qualify results pre aggregation computation. Aggregation group membership qualifier.

image-20260117182334070

Having

Filters results post-aggregation computation.

image-20260117182730554

This indicats that having happens before the select, so the name avg_gpa is not generated yet.

String Operations

  • LIKE provides string matching with special match operators:
    • ‘%’ Matches any substring (including empty strings)
    • ’_’ Match any one character
  • SIMILAR TO allows for regular expression matching
    • In the SQL standard but not all systems support it
    • Other systems also support POSIX-style regular expressions
  • SUBSTRING
  • UPPER
  • || : concatenate two or more strings(this could also be + or CONCAT() function)

Date/Time Operations

Date/time operations manipulate temporal values and are DBMS-specific in syntax and behavior.

Output Control

Output control clauses determine the order and the number of rows returned by a query.

  • ORDER BY: ORDER BY column [ASC | DESC] Sort tuples by the values in one or more of their columns
  • FETCH: FETCH {FIRST|NEXT} <#> ROWS OFFSET <#> ROWS
    • Limit # of tuples returned in output
    • Can set an offset to return a “range”

image-20260117184442310

Output Redirection

Store query results in another table:

  • Table must not already be defined
  • Table will have the same # of columns with the same types as the input

image-20260117184804759

Nested Queries

Invoke a query inside of another query to compose more complex computations:

Innter queries can appear almost anywhere in query

  • ALL: The expression must be true for all rows in the sub-query.
  • ANY: The expression must be true for at least one row in the sub-query.
  • IN: Equivalent to ‘=ANY()’ .
  • EXISTS: At least one row is returned without comparing it to an attribute in outer query

Lateral Joins

The lateral operator allows a nested query to reference attributes in other nested queries that preced it (according to postition in the query)

image-20260117211328677

Common Table Expression

Specify a temporary result set that can then be referenced by another part of that query.

image-20260117211619436

Window Functions

Performs a calculation across a set of tuples that are related to current tuple, without collapsing them into a single output tuple, to support running totals, ranks, and moving averages.

Special window functions:

  • ROW_NUMBER() # of the current row
  • RANK() Order position of the current

The OVER keyword specifies how to group together tuples when computing the window function.

Use PARTITION BY to specify group