Lec 1 Relational Model

Terminology

A database management system (DBMS) is software that allows applications to store and analyze information in a database, and it should support the definition. creation, querying, update, and administration of databases in accordance with some data model.’
A data model is a collection of concepts for describing the data in a database. (e.g. relationship or documentation)
A schema is a description of a particular collection of data, using a given data model. (e.g. table structure design)
A relation is an unordered set that contain the relationship of attributes that represent entities. (e.g. table or container)
A tuple is a set of attribute values (aka its domain) in the relation. (e.g. data entry in the table)

DBMS can auto-generation unique primary keys via an identity column:

IDENTITY (SQL Standard)

SEQUENCE (PostgreSQL / Oracle) Globally, others are bind with column

AUTO_INCREMENT (MySQL)

Relational Algebra

Lec 2 Modern SQL

SQL is based on bags (duplicates, like multisite) not sets (no duplicates).

Terminologies:

Data Manipulation Language (DML)

Data Definition Language (DDL)

Data Control Language (DCL)

Execution order:

FROM / JOIN
→ WHERE
→ GROUP BY
→ 聚合函数（AVG / COUNT / …）
→ HAVING
→ SELECT
→ ORDER BY
→ LIMIT

Aggregates

Functions that return a single value from a bag of tuples:

AVG(col)→ Return the average col value.
MIN(col)→ Return minimum col value.
MAX(col)→ Return maximum col value.
SUM(col)→ Return sum of values in col.
COUNT(col)→ Return # of values for col.

COUNT(*) counts rows, COUNT(1) counts a constant value per row (which is never NULL), so they are equivalent; only COUNT(column) can differ because it ignores NULLs.

In the above picture, the AVG(s.gpa) is only a single value, but e.cid may have multiple values, so the DB don’t know how to choose a value for the column, we should use ANY_VALUE() to explicitly tell DB how to choose.

Group By

Project tuples into subsets and calculate aggregates against each subset

Group by happlens after FROM/JOIN/WHERE but before the aggregate function.

Grouping Sets

Specify multiple groupings in a single query instead of using UNION ALL to combine the results of several individual GROUP BY queries.

Filter

Qualify results pre aggregation computation. Aggregation group membership qualifier.

Having

Filters results post-aggregation computation.

This indicats that having happens before the select, so the name avg_gpa is not generated yet.

String Operations

LIKE provides string matching with special match operators:
- ‘%’ Matches any substring (including empty strings)
- ’_’ Match any one character
SIMILAR TO allows for regular expression matching
- In the SQL standard but not all systems support it
- Other systems also support POSIX-style regular expressions
SUBSTRING
UPPER
|| : concatenate two or more strings(this could also be + or CONCAT() function)

Date/Time Operations

Date/time operations manipulate temporal values and are DBMS-specific in syntax and behavior.

Output Control

Output control clauses determine the order and the number of rows returned by a query.

ORDER BY: ORDER BY column [ASC | DESC] Sort tuples by the values in one or more of their columns
FETCH: FETCH {FIRST|NEXT} <#> ROWS OFFSET <#> ROWS
- Limit # of tuples returned in output
- Can set an offset to return a “range”

Output Redirection

Store query results in another table:

Table must not already be defined
Table will have the same # of columns with the same types as the input

Nested Queries

Invoke a query inside of another query to compose more complex computations:

Innter queries can appear almost anywhere in query

ALL: The expression must be true for all rows in the sub-query.
ANY: The expression must be true for at least one row in the sub-query.
IN: Equivalent to ‘=ANY()’ .
EXISTS: At least one row is returned without comparing it to an attribute in outer query

Lateral Joins

The lateral operator allows a nested query to reference attributes in other nested queries that preced it (according to postition in the query)

Common Table Expression

Specify a temporary result set that can then be referenced by another part of that query.

Window Functions

Performs a calculation across a set of tuples that are related to current tuple, without collapsing them into a single output tuple, to support running totals, ranks, and moving averages.

Special window functions:

ROW_NUMBER() # of the current row
RANK() Order position of the current

The OVER keyword specifies how to group together tuples when computing the window function.

Use PARTITION BY to specify group