CS3321 Database System Technology
This is the note of CS3321 Database System Technology(数据库技术), a course offered to Grade 3 Computer Science undergraduates at SJTU focusing on mathematical principle of relational normalizations and design theory of database instead usage of database. If you focus on practical usage of database systems such as writing SQL queries and building database applications, you may consider NIS3351 Database Principle and Security (unavailable) instead. Sacastically, NIS3351 has nothing to do with database security.
Some pictures and content in this note are taken from slides, which can be checked at course homepage or homepage of Qiang Yin, the lecturer.
Introduction
Database, DBMS and Database System
Database is an organized coolection of inter-related data that models some respects of the real-world.
Database Management System (DBMS) is a software that facilitates the creation and use of databases.
Database system is a combination of databases and DBMS.
DBMS is Complex
Example 1.1 How to find the public elements in two different lists? If one of them is too large to fit in memory? If both of them is so?
Solution 1.1 Use multiple hash functions and separate the records into many so small groups that the memory can fit in.
DBMS must make no assumption about its data.
- Numerous corner cases exist.
- A list can contain duplicates.
- A single duplicate value may exceed the size of main memory.
- Multiple ways to process a query but none of them efficiently covers all cases.
Relational Data Model
Key features for a relational data model: data integrity, implementation and durability.
A data model is a collection of concepts/tools for describing the data in a database.
- A database is a collection of relations and each
relation is an unordered set of tuples (or rows). - Each relation has a set of attributes (or columns).
- Each attribute has a name and a domain and each
tuple has a value for each attribute of the relation. - Values are atomic/scalar.
Schema and Instance
A unordered tuple of attributes is a relational schema which specifies the logical structure of data.
Instance is a concrete table content matching a certain schema.
The relationship between the two concepts is like the relationship between class and instance in C++.
Keys
Suppose
A superkey is a candidate key if
A primary key is a designated candidate key of a relation.
A foreign key specifies that a tuple from one relation must map to a tuple in another relation.
Referenced attributed must be a primary key and no dangling pointers from the attributes of a foreign key are allowed.
Relation Algebra
Relational Algebra is a language for querying relational data based on fundamental relational operations. Each operation takes one or more relations as its input and output a new relation.
Operations can be composed to make complex queries.
Selection
Selection operation selects tuples that satisfy a given predicate.
The operation can be denoted by
Boolean connectives and logical connectives is allowed in the predicate.
Projection
The projection produces from an input relation
The operation can be denoted as
Some simple calculations can be done in projection, for instance,
In this operation, duplicated output tuples are removed since the input and output are both sets.
Removing duplicate results is optional in detailed instance of relational algebra. Multi-set instead of set is used in specific curriculum.
Cartesian product
The Cartesian Product of two relations
We include the relation name as a prefix in the attribute name of a product relation to avoid ambiguity such as
Join
The theta join (join) opertaion combines a selection operation and a Casterian-production operation into a single operation.
Theta join can be denoted as
The natural join of
The operation enforces equality between attributes that share the same time and retains one copy for each pair of identically named attributes.
If
Union
The union of
The two relations must have identical schema.
Difference
The difference of
Renaming
The renaming operation
User should focus on describing relations instead of optimizing query logitics. Relational algebra is only descriptive language.
SQL: Structured Query Language
SQL: Structured Query Language, pronounced as “S-Q-L” or “sequel”.
DDL: Data Definition Language
DML: Data Manipulation Language
- SQL is insensitive to case and white spaces;
- Everything from
--to the end of the line is ignored.
SQL DDL
Built-in Data Types
| SQL Data Types | Usage |
|---|---|
char(n) |
Fixed length string with length |
varchar(n) |
Variable length string with max length |
int, smallint |
Integer and small integer |
numeric(p,d) |
Fixed point integer similar to decimal |
real, double precision |
Floating point and double-precision floating point numbers |
float(n) |
Floating point numer with precision at least |
Machine dependent types are int, smallint, real and double precision.
Sizes of these data types are dependent to the machines.
Integrity Constraints
primary key (...): The attributes contained in the braces form the primary key for the relation.
foreign key (...) references ...: Values of the attributes contained in the braces must correspond to the primary key of reference table.
Referenced attributes must be a primary key and referencing attributes form a foreign key. No dangling pointers allowed.
SQL maintain the referential integrity in 3 ways:
- Reject: SQL’s default option to handle updates. It rejects any update that violate the referential integrity.
- Set as
NULL: set all child records toNULLif original record is deleted. CASCADE: propagated changes to all referring rows.
not null: For specified attributes, NULL value is not allowed.
Primary keys are not nullable.
1 | CREATE TABLE instructor( |
UNIQUE Constraint
UNIQUE (...) statement declares that the attributes included in the braces form a super key.
CHECK
A CHECK (...) clause specifies a predicate condition that must be satisfied by every tuple in a relation.
Basic Database Modification
Insertion: insert a tuple into a table.
1 | INSERT INTO instructor(ID, name) VALUES('10222','Root'); |
Deletion: purge tuples satisfying a given condition from certain table.
1 | DELETE FROM instructor WHERE name='Turing'; |
DBMS will prevent any update to the database that would violate an integrity constraint.
Basic SQL Queries
A basic SQL query can be expressed by a SELECT-FROM-WHERE statement.
1 | SELECT ID, name FROM instructor WHERE dept_name = 'SCS'; |
WHERE clause is optional and logical connectives such as AND, OR and NOT is allowed in the clause.
SELECT clause can contain expressions.
1 | SELECT ID, name, salary/12 FROM instructor; |
A relation name can be used as prefix to distinguish attributes with the same name.
1 | SELECT student.name, instructor.name FROM student, advisor, instructor WHERE student.ID=advisor.S_IS AND advisor.i_ID =instructor.ID |
SQL uses bag semantics by default where duplicates are allowed in query results. Keyword
DISTINCTshould be used to eliminate duplicates explicitly.
NULL
NULL is a special value for unknown or inapplicable values.
NULL is usually omitted in operations. For example, if a table has 4 lines but with a NULL value, the result of COUNT(A) is 3 and COUNT(*) is 4.
Evaluating aggregation functions except COUNT on an empty bag is NULL.
Special Rules
All arithmetic operations with NULL result in NULL, and comparsions with NULL result in UNKNOWN.
Aggreation functions ignore NULL except COUNT(*) which just counts rows.
Evaluating aggregation functions except COUNT on an empty bag returns NULL.
However, NULL cannot be used explicitly used an operand. For instance, NULL+3 is invalid.
In SQL we use a special three-value logic.
TRUEis 1,FALSEis 0 andUNKNOWNis 0.5.
x AND Yclause returns the mininum value of x and y andx OR yreturns the maximum.NOT xreturns.
Pitfalls of NULL
NULL will break many equivalences.
1 | SELECT * FROM instructor; |
Joins
Join expression is typically used in FROM clauses.
Theta Join and Natural Join
1 | R JOIN S join_condition -- Theta Join |
For example:
1 | -- student(ID, name, dept_name, tot_cred) |
You can join more than 2 relations using natural join.
1 | SELECT A_1,...A_n FROM R_1 NATURAL JOIN R_2 NATURAL JOIN ... R_k WHERE P; |
USING
Attributs with the same name sometimes get equated expectedly in natural join.
One solution of this problem is to use WHERE and product to avoid joining on unrelated attribute, and the other solution is using USING keyword to specify which attribute(s) should be joined.
1 | SELECT name,title FROM (student NATURAL JOIN takes) JOIN course USING (course_id); |
Outer Join Motivation
A left outer join between NULLs.
The definition of right outer join ,denoted as ⟖, and full join, denoted as ⟗, can refer to the definition above.
1 | SELECT * FROM course NATURAL RIGHT OUTER JOIN prereq; |
The difference between
ONandWHEREcondition should be noticed. When usingON, some records may be paddling.
ONis not a redundant keyword in SQL.
Subqueries
Nested Subqueries
A subquery is a SELECT-FROM-WHERE expression nested in another query.
Subquery can be nested in FROM, EXISTS, IN, WHERE and WITH query clause.
1 | SELECT DISTINCT course_id |
Expression EXISTS(subquery) represents whether the result of subquery is non-empty.
An attribute refers to the most closely nested relation with that attribute.
ALL means for all record in subquery result and SOME means some.
A scalar subquery can be used as a value in WHERE, SELECT and HAVING clauses. Runtime error if subquery returns more than one row. A NULL will be used if subquery returns no rows.
Common Table Expression
1 | WITH R1(A_1,A_2,...) AS (subquery_1), |
Defines temporary relations can be used other relations refined in the same WITH clause and the actual query.
The temporary relation is only valid in the single statement.
CTE with Recursion
CTE is a keyword for recursion.
1 | WITH RECURSIVE ReachableVertices(src, dst) AS (SELECT |
RECURSIVE CTE STATEMENT SHOULD BE FILLED.
Update
1 | UPDATE instructor SET salary = salary*1.03 WHERE salary > 100000; |
Or with CASE clause:
1 | UPDATE instructor SET salary = CASE |
Relational Database Design Theory
Functional Dependency Theory
A functional dependency (FD) is of the form
Candidate Keys
A set of attributes
(i.e., functionally determines all attributes in ); - No proper subset of
functionally determines all attributes in (i.e., is minimal).
A set
Closure of attributes
The closure of a set of attributes
The subscript
Closure of attributes can be computed by repeatedly applying the functional dependencies in
Time complexity is normally
where is the number of functional dependencies, but can be optimized to with proper data structures.
Closure of functional dependencies
The closure of
Armstrong’s Axioms
- Reflexivity: If
, then . - Augmentation: If
, then for any attribute set . - Transitivity: If
and , then .
Armstrong’s axioms are sound and complete.
Extraneous Attributes
An attribute is considered extraneous in a functional dependency
is extraneous in if . is extraneous in if where .
Canonical Cover
A canonical cover for a set of functional dependencies
In a canonical cover, there are no FDs that contain extraneous attributes, and each LHS of a FD is unique.
To compute a canonical cover, we can use the following steps:
- Remove Extraneous Attributes: For each functional dependency in
, remove any extraneous attributes from the left-hand side and right-hand side. - Remove Redundant Dependencies: For each functional dependency in
, check if it can be derived from the other dependencies. If it can, remove it from the set. - Resulting Set: The resulting set of functional dependencies is the canonical cover.
Canonical cover is not unique, but the number of functional dependencies in any canonical cover for a given set of functional dependencies is the same.
Algorithm to compute canonical cover
1 | Output: A canonical cover C for F |
Normalization Theory
Decomposition Criteria
Lossless Decomposition
Original relation schema should be able to be reconstructed by joining the decomposed relation schemas.
Suppose
Proof. Let
be an instance of schema and we try to prove that . The following content will prove that since the opposite direction is trivial. Suppose a record in
is and there must exist and in such that and . Since , we have and thus and are identical on . Therefore, must be in . Q.E.D.
Redundancy and Anomalies Avoidance
Unnecessary redundancy and anomalies should be avoided in a decomposition.
Dependency preservation
Minimize the cost to check the integrity constraints defined in terms of functional dependencies.
Let
A decomposition is dependency preserving if
BCNF decomposition is not always dependency preserving. If you focus more on dependency preservation rather tha n redundancy, you can refer to Third Normal Form.
Boyce-Codd Normal Form (BCNF)
A relation schema
A database schema is in BCNF if every relation schema in the database schema is in BCNF.
BCNF Decomposition Algorithm
1 | Input: A schema R and a set F of FD's |
Third Normal Form (3NF)
A relation schema
is trivial; is a superkey of ; is part of a candidate key for .
Third Normal Form (3NF) Decomposition Algorithm
1 | Input: A schema R and a set F of FD's |
Let
For other normal forms such as 1NF, 2NF and 4NF, refer to notes of NIS3351: Database Principle and Security(unavailable).
Storage
Why DBMS manage the buffer pool instead of letting OS do it?
Volatile Storage: loses contents when power if switched off, such as DRAM and CPU caches. It can be randomly accessed and bytes are individually addressable.
Non-volatile Storage: retains contents when power is switched off, such as magnetic disks and flash memory. It can be sequentially accessed but only in blocks.
Disk-oriented DBMSs are all about reducing disk I/O.
- Sequential I/O is more cheaper than random I/O.
- Cache blocks from non-volatile storage to volatile storage.

Storage Structures
In DBMS, data is stored in files which consists of a collection of pages. Each page holds a collection of tuples.
- Heap file: Tuples are placed arbiitrarily in pages.
- Sorted file: Tuples are sorted on some attribute(s).
- Index file: A data structure that improves the speed of data retrieval operations on a database table, such as B+ tree and hash index.
Format of Page
A database page is a fixed-size block of data that is given a unique page number as identifier. Each page has a header that contains number of slots/tuples, free space, data checksum and transaction visibility.
A heap file is an unordered collection of pages where tuples are placed in arbitrary order. It requires meta-data to track existing pages and find free space.

- Linked List: A header page at start stores the HEAD of the data page list and free page list.
- Page Directory: A special directory page is used to track data pages. Each directory entry records the number of free slots in its corresponding data page.

Page Structure: How are tuples organized within a database?

Slotted page structure is used in most DBMSs. In a slotted page, each tuple is stored in a fixed-size slot, and the page contains a header that keeps track of the free and occupied slots. This structure allows for efficient insertion and deletion of tuples, as the DBMS can easily find free slots and reuse them.
The header tracks number of slots, number of used slots, free space and an array of slot offsets.
When deleteing a tuple, the corresponding slot is marked as free but the space is not reclaimed until a compaction operation is performed.
Record Formats
In fixed-length format, each attribute has a fixed size and the size of each tuple is the sum of the sizes of its attributes. This format allows for efficient access to individual attributes, as their offsets can be easily calculated.
System catalog is a table that stores metadata about other tables. In fixed-length format, the system catalog stores the size of each attribute for each table. NULL values are handled by using a bit map at the beginning of the tuple.
In variable-length format, we move all variable-length attributes to the end of the tuple and use an array of offsets to locate them.
Buffer Pool Management
A buffer pool is a memory region organized as an array of fixed-size pages. Each array entry is called a frame which can hold one page.
MMAP is another way to manage buffer pool. It maps disk pages directly to memory addresses. However, it is less flexible and controllable than traditional buffer pool management.
Buffer Pool Metadata
The page table tracks which pages are currently in the buffer pool and maps page numbers to frame numbers.
Each frame has a pin or reference counter used for eviction decisions and a dirty bit which indicates whether the page has been modified.
Page Replacement Policies
When a new page is requested and the buffer pool is full, a page replacement policy is used to decide which page to evict. The final goal is to minimize cache misses.
- LRU (Least Recently Used): Evict the page that has not been used for the longest time.
- CLOCK: A more efficient approximation of LRU that uses a circular list and a reference bit.
Page replacement policy are usually specialized according to different access patterns.
Storage Layout
For a logical layout, there are several physical layouts such as row-oriented, column-oriented and hybrid layouts. Each layout has its own advantages and disadvantages depending on the access patterns of the workload.

Different Workload Patterns
- OLTP (Online Transaction Processing): Workloads that involve a large number of short online transactions such as insert, update and delete. These workloads typically require low latency and high concurrency. As a result, row-oriented layout is often preferred for OLTP workloads.
- OLAP (Online Analytical Processing): Workloads that involve complex queries for data analysis. These workloads typically require high throughput and can tolerate higher latency. As a result, column-oriented layout is often preferred for OLAP workloads.
- HTAP (Hybrid Transactional/Analytical Processing): Workloads that combine both OLTP and OLAP characteristics. These workloads require a balance between low latency for transactions and high throughput for analytical queries.
ETL, standing for Extract, Transform, Load, is a process to transform data from OLTP systems to OLAP systems for analysis.
Cache Behavior Analysis
If size of cache line is less than size of tuple, then each tuple access may cause multiple cache misses. This can significantly impact performance, especially for workloads with large tuples or high access rates.
PAX: Partition Attributes Across
PAX, starting from 2002, can maximize inter-tuple spatial locality and minimize intra-tuple spatial locality.

Group values of the same attributes are stored in the same minipage within a page. Fixed-length attribute value are stored in F-minipages while variable-length attribute values are stored in V-minipages.
When accessing a tuple, only the relevant minipages need to be loaded into cache, reducing cache misses and improving performance.
Also, PAX supports SIMD for faster analytical queries.
Columnar Storage
Column-oriented storage stores each attribute in a separate column file. It is optimized for OLAP for better cache performance and fewer I/O when accessing a subset of attributes while also enabling high compression ratios and reducing storage space.
As a page is a collection of column values, each page can be compressed/encoded independently. Dictionary encoding and run-length encoding, RLE, are commonly used compression techniques.

Indexing
Sequential scan and index scan are two common methods for accessing data in a database.
Index Basics
Index Data Structure
An index consists of a search key and a pointer to the actual data record.
An index file consists of records, called index entries, is usually much smaller than the actual data file.
This lecture focus on ordered indexes.
Dense Index: A dense index contains an index entry for every search key value in the data file. This allows for faster searches but requires more storage space.
It is possible that one index entry points to multiple records in the data file if there are duplicate search key values.
Sparse Index: A sparse index contains an index entry for only some of the search key values in the data file. This reduces storage space but can result in slower searches.
Sparse index is only applicable for sorted data files.
Clustering Index: A clustering index is an index that determines the physical order of data records in a table.
Non-Clustered Index: A non-clustered index is an index that does not affect the physical order of data records in a table. Instead, it creates a separate data structure that points to the actual data records.
A non-clustered index is also known as a secondary index which is always dense.
Index in SQL
1 | CREATE INDEX index_name ON table_name (attribute1, attribute2,...); |
A BUNIQUE keys.
Concurrency Control in B -Tree
For concept of B
-Tree, refer to CS0501 Data Structure.
Add a lock to the root node of tree every I/O seems too time-consuming. To improve the performance of concurrent operations on B
Latch
Latch is a lightweight, short-term lock for in-memory structures which protects pages or nodes in buffer and index managers, and ensures physical (not transactional) consistency. It is implemented using spinlocks or mutexes.
A latch only allows multiple shared holders or a single exclusive holder.
Latch Crabbing Protocol enables concurrent access and modification to B
- Latch parent node
- Latch child node
- If child is safe, release parent latch
A node is considered safe if an update will not trigger a split or merge. In insertion it means that the node has enough free space, and in deletion it means that the node stays above the minimum occupancy after deletion.
Instead of locking the entire tree during an operation, latch crabbing allows threads to acquire and release latches on individual nodes as they traverse the tree. This allows multiple threads to work on different parts of the tree simultaneously, improving concurrency and performance.
Query Processing

Parsing and Translation
DBMS checks the syntax and semantics of the SQL statement and builds a parse tree representing its structure. The parse tree is analyzed and then translated into a logical plan, a tree of relational algebra operators that captures the query’s semantics.
Query Optimization
Then, DBMS transforms the initial logical plan into an optimized physical plan consisting of two main phases:
- Logical Optimization: Rewrites the logical plan into an equivalent but more efficient logical plan using RA rules.
- Physical Optimization: Translates the optimized logical plan into a physical plan that specifies the actual algorithms and data structures to be used for query execution.
Query Execution
DBMS executes optimized physical plan to produce the final query result.
Common query execution models:
- Materialization Model: Each operator fully computes and stores its result in temporary tables before passing it to the next operator.
- Iterator (Volcano) Model: Operators are implemented as iterators that produce one tuple at a time, allowing for pipelined execution and reduced memory usage. Each operator implements three methods:
open(),next(), andclose()(optional).
Materialization model is good for OLTP workloads since it touches a few records at a time, but not good for OLAP.
Algorithms for Relational Operations
Notations
- Number of pages:
- Table:
- Number of available buffer pool pages:
- Cost metric: number of disk I/Os
Sorting
External sorting is required when data cannot fit in memory.
A divide-and-conquer algorithm called external merge sort is commonly used for external sorting. Recall that we have
Read
In the merge phase, we repeatedly merge
Cost Analysis: The total cost of external merge sort can be expressed as:
By same idea, we can implement Sort-based Duplication Elimination. Cost is the same as external merge sort.
As in SQL, union, intersection and difference requires duplication elimination by default, they can be implemented using sort-based algorithms with cost similar to external merge sort.
Cost of Sort-based Aggregation is same as external merge sort.
Join
Cost of Naive Nested Loop Join is
Cost of Index Nested Loop Join is
Sort-Merge Join requires both relations to be sorted on join attributes. If not, we need to sort them first. For most cases, the cost is
Hash Join is applicable for equality joins and natural joins. (Build Phase) Select a table and build a corresponding hash table in memory on the join attribute(s). (Probe Phase) For each tuple in the other table, use the hash table to find matching tuples.
Cost of this kind of join is
If neither relation fits in memory, we can use Grace Hash Join. Both relations are partitioned into
Hash-based Algorithms: Duplication elimination, union, intersection, difference, group by aggregation.
Query Optimization
SQL is declarative. Users specify what tuples they want and query optimizer searches and picks the best query plan. Cost difference between query plans for a query can be huge. This section takes System R’s optimizer as an example.
Rule-based Query Rewriting (RBQ)
Push down selection:
Push down projection:
In visualization aspect, the optimization is actually push down selection and projection operators down the query plan tree and clear unnecessary attributes early.
Cost-based Query Optimization (CBQ)
RBQ omits consideration on cost. CBQ estimates the cost of different query plans and picks the cheapest one.
Cost Estimation
Cost of a query plan is estimated by summing up the cost of each operator in the plan and each operator’s cost can be considered as linear function of the size of its input.
DBMS stores statistics information about relations in system catalog to help estimate the size of intermediate results. Catalogs are updated periodically.
Selection With Equality Predicates assumes uniform distribution of values. For
For conjunctive predicates such as
For negative predicates such as
For disjunctive predicates such as
In Range predicates such as
In join size estimation with instance such as
This estimation is a very rough one and may be very inaccurate in practice, and skewness is one of the major reasons. The assumption of independence may also not hold in many cases.
Historgrams in catalog can improve selectivity estimation by dividing values into different buckets with roughly equal number of tuples.
Cost-based Plan Search
We will enumerate all possible physical plans and pick the cheapest one. The practical goal in this process is not to find the optimal plan but to avoid bad ones. Take searching for optimal join orders as an example.
For
System R reduces search space by only considering left-deep trees which enables generation of fully pipelined plans because logical plans are based on volcano model.
A dynamic programming algorithm is used to find the optimal left-deep join tree in a bottom-up manner. Firstly, we should find the optimal access path for each single relation. Then, we can build optimal plans for joining two relations based on the optimal single-relation plans. This process continues until we build the optimal plan for joining all
Thereby, the cost can be expressed as
However, this algorithm is not effective for interesting orders problem, which can be solved by increasing the state space to keep track of interesting orders.
Worst-Case Optimal Join Algorithms
This chapter will expand from triangle query.

We assume in this query that
Delay the Computation (DC)
The key idea of DC is to delay the computation of intermediate results as much as possible and process tuples only when can contribute to the final output.
1 | for each a in π_A(R) do |
The time complexity of DC is
AGM Bound and WCOJ Algorithms
Consider a framework of general join queries
The hypergraph
- The set of vertices
. - The set of hyperedges
.
A fractional edge cover of a hypergraph
The minimum fractional edge cover of
The AGM bound states that the size of the join result
Generic Join Algorithm
1 | function Generic-Join(Attrs, Relations, Partial-Tuple) |