SQL Primer for COMP423
Learning Objectives
After completing this reading, you will be able to:
- Explain the relational model in practical software engineering terms.
- Differentiate between a primary key and a unique constraint.
- Describe what an index does and why lookups by column are faster with one.
- Predict what happens when a uniqueness constraint is violated.
- Explain what a foreign key constraint enforces (and what happens when it's violated).
- Write SQL queries for filtering, sorting, updating, and aggregating (including
GROUP BY). - Write a multi-statement SQL transaction with
BEGIN/COMMITand describe ACID at a high level.
1. What is a Relational Database?
A relational database stores data in tables. Each table has rows (records) and columns (fields). Relational databases are useful because they let us connect related data, enforce rules, and support many users safely at the same time.
Unlike in-memory Python data structures, relational databases persist data to disk. That means data remains even after the program or server stops. This makes them essential for long-term, real-world applications like banking, e-commerce, content management systems, and the apps you build in this course.
A Brief History
The concept of relational databases was introduced by Edgar F. Codd in 1970 while working at IBM. He proposed the relational model, which organizes data into structured tables and uses mathematical relational algebra for querying and manipulating data.
Before relational databases, many systems used hierarchical or network databases, which were rigid and harder to scale. Codd's model made data easier to organize and query.
Structured Query Language (SQL) was developed in the 1970s at IBM as a way to interact with relational databases. It became the standard language for querying and managing relational data and is used by every major database system today.
Popular Relational Databases
| Database | Notes |
|---|---|
| PostgreSQL | Open-source, feature-rich, widely used for modern applications |
| MySQL | Common in web applications, especially the LAMP stack |
| Microsoft SQL Server | Enterprise-level, common in .NET environments |
| Oracle Database | Widely used in large corporations |
In this course, we use PostgreSQL both for hands-on SQL practice and as the database for our Python web app. Running PostgreSQL as a separate process is realistic: your app is one process, the database is another, and they communicate over a network connection.
2. Setting Up PostgreSQL with Docker
To follow along, run PostgreSQL in Docker and keep psql open as you read.
Pull and Run PostgreSQL
From your host machine's terminal:
docker run \
--name postgres \
--env POSTGRES_PASSWORD=secret \
--publish 5432:5432 \
--detach \
postgres:latest
| Flag | Purpose |
|---|---|
--name postgres |
Names the container postgres |
--env POSTGRES_PASSWORD=... |
Sets the default password |
--publish 5432:5432 |
Maps port 5432 from container to host |
--detach |
Runs in the background |
postgres:latest |
Uses the latest stable PostgreSQL image |
If docker run ... --name postgres fails because the container already exists, try simply running the start command as follows:
docker start postgres
Connect to PostgreSQL
docker exec \
--interactive \
--tty \
--user postgres \
postgres \
psql
This runs the psql command-line client inside the existing container.
| Flag | Purpose |
|---|---|
--interactive, -i |
Keep STDIN open so you can interact with the process inside the container |
--tty, -t |
Allocate a pseudo-TTY (enables interactive terminal features) |
--user postgres |
Run the command as the specified user inside the container (postgres) |
<container> (e.g., postgres) |
The name or ID of the container to execute the command in |
psql |
The command executed inside the container (PostgreSQL interactive client) |
psql prompt & shell
After running psql you will see a database prompt such as postgres=#. This indicates you are in the PostgreSQL interactive SQL shell (a client connected to the server) rather than the Bash shell. Commands entered here are SQL statements which are terminated with a semicolon ;. To quit the SQL shell, type \q or press Ctrl-D.
Where is the Data Stored?
When running Docker without explicit volume mapping, PostgreSQL stores data inside the container's filesystem. Stopping and restarting the container preserves the data, but removing the container (docker rm postgres) deletes everything.
In production you would mount a volume, which is persistent storage managed by Docker that lives outside a container’s writable layer (typically on the host or via a volume driver). Volumes persist across container restarts and removals, can be shared between containers, and are the standard way to store durable data.
Useful Container Commands
docker stop postgres # Stop the container (data preserved)
docker start postgres # Restart (data still there)
docker rm postgres # Remove container (data lost!)
3. SQL: A Domain-Specific Language
SQL (Structured Query Language) is a language designed for relational databases. Unlike general-purpose languages like Python, SQL is domain-specific: it is focused on storing, retrieving, and changing structured data.
You can think of SQL in two big categories:
DDL (Data Definition Language)
DDL manages the structure (schema) of the database.
| Command | Purpose |
|---|---|
CREATE TABLE |
Define a new table |
ALTER TABLE |
Modify an existing table |
DROP TABLE |
Delete a table and all its data |
DML (Data Manipulation Language)
DML operates on the data stored in tables.
| Command | Purpose |
|---|---|
INSERT INTO |
Add new rows |
SELECT |
Retrieve rows |
UPDATE |
Modify existing rows |
DELETE |
Remove rows |
4. Key Abstractions: Tables, Columns, and Rows
Now that you have access to a database via psql, let's try it out!
Tables
A table is the basic structure in an RDBMS. It represents a collection of related data, similar to a spreadsheet, but with more rigid and well defined structure.
Let's create our first table using SQL's DDL (CREATE TABLE). In your psql prompt, run:
CREATE TABLE accounts (
id SERIAL PRIMARY KEY,
owner TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
balance INTEGER NOT NULL DEFAULT 0
);
You should see a success message like CREATE TABLE.
Columns (Fields)
A column defines one attribute and its allowed data type.
PostgreSQL has many data types you can learn about in the official documentation. For a quick overview, here are some of the common PostgreSQL data types:
| Data Type | Description |
|---|---|
INTEGER |
Whole numbers (e.g., 1, 42, -7) |
SERIAL |
Auto-incrementing integer (commonly used for primary keys) |
TEXT |
Variable-length string (unlimited size) |
VARCHAR(n) |
String with a maximum length of n characters |
BOOLEAN |
True or False values |
DATE |
Stores dates (YYYY-MM-DD) |
TIMESTAMP |
Stores date and time information |
DECIMAL(p, s) |
Precise fixed-point decimal numbers |
REAL / DOUBLE PRECISION |
Floating-point numbers |
Columns can also carry constraints that enforce rules on the data:
| Constraint | Meaning |
|---|---|
NOT NULL |
The column cannot be empty. |
UNIQUE |
All values in the column must be distinct. |
CHECK |
Enforces a Boolean condition (e.g., CHECK (balance >= 0)). |
DEFAULT |
Assigns a value automatically if none is provided. |
Some columns have fixed maximum widths. For example, VARCHAR(255) prevents values longer than 255 characters.
Verifying Your Table with SQL
After creating a table, it is good practice to verify it exists and that its columns look the way you expect. We will learn more about the structure of these SQL queries soon, for now you can scan them, copy, paste, and run in psql.
In many relational databases, including PostgreSQL, you can query the database to see its tables and information about columns in a table.
To list tables in the current database (restricting to the public schema in PostgreSQL):
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
AND table_type = 'BASE TABLE'
ORDER BY table_name;
To list columns of the accounts table:
SELECT
ordinal_position,
column_name,
data_type,
is_nullable,
column_default
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'accounts'
ORDER BY ordinal_position;
Other RDBMSs organize schemas/catalogs slightly differently, but the general idea is the same: query the database's catalog tables to confirm the schema you just created.
Rows (Records)
A row represents a single entry in a table — one specific combination of values across all the columns.
An analogy to Object-Oriented Programming is helpful here:
- A table is like a class definition.
- Columns define the attributes of the class.
- Rows are like instances (objects) created from the class.
Like all analogies, it is not a perfect one, which is why we will learn about mappings from OOP to SQL soon, but these are close conceptual relationships.
Let's insert some data using SQL's DML (INSERT INTO). Run the following in psql:
INSERT INTO accounts (owner, email, balance) VALUES ('SpongeBob', 'spongebob@unc.edu', 500);
INSERT INTO accounts (owner, email, balance) VALUES ('Squidward', 'squidward@unc.edu', 250);
INSERT INTO accounts (owner, email, balance) VALUES ('Patrick', 'patrick@unc.edu', 1000);
Notice we do not supply id — the SERIAL type auto-generates it for each row.
The table looks like this:
| id | owner | balance | |
|---|---|---|---|
| 1 | SpongeBob | spongebob@unc.edu | 500 |
| 2 | Squidward | squidward@unc.edu | 250 |
| 3 | Patrick | patrick@unc.edu | 1000 |
Each row is similar to a Python object, with some data type and constraint enforcement differences between Python and SQL:
class Account:
def __init__(self, id: int, owner: str, email: str, balance: int):
self.id = id
self.owner = owner
self.email = email
self.balance = balance
5. Primary Keys vs. Unique Constraints
These two concepts are related but serve different purposes.
Primary Key
A primary key uniquely identifies each row in a table. A table should have exactly one primary key. It is:
- Unique — no two rows share the same primary key value.
- Not null — a primary key column can never be empty.
- Immutable by convention — you generally never change a primary key after a row is created.
In our accounts table, id SERIAL PRIMARY KEY serves this role. The SERIAL type auto-increments, so the database assigns 1, 2, 3, ... automatically.
For many applications, SERIAL integers are common. At very large scale, teams often use UUIDs because they can be generated across different systems without collisions.
Unique Constraint
A unique constraint ensures that no two rows share the same value in a given column, but it is not the row's identity. A table can have many unique constraints but only one primary key.
In our example, email TEXT UNIQUE NOT NULL is a unique constraint. It prevents two accounts from sharing the same email address, but the identity of each row is still the id.
What Happens When You Violate a Constraint?
Try inserting a duplicate email in your psql console:
INSERT INTO accounts (owner, email, balance)
VALUES ('Eve', 'spongebob@unc.edu', 0);
PostgreSQL responds with an error:
ERROR: duplicate key value violates unique constraint "accounts_email_key"
DETAIL: Key (email)=(spongebob@unc.edu) already exists.
The INSERT fails entirely — no partial row is created. This is the database enforcing data integrity for you, which is far safer than checking for duplicates in application code.
6. Indexes — Making Lookups Fast
When you run a query like:
SELECT * FROM accounts WHERE email = 'squidward@unc.edu';
the database needs to find the matching row. Without any special data structure, it must scan every row in the table — a sequential scan. For 3 rows this is trivial; for 3 million rows it is painfully slow.
An index is a separate data structure (often a B-tree) that the database maintains alongside the table. It maps column values to row locations, like the index at the back of a textbook maps topics to page numbers.
When Are Indexes Created Automatically?
- A primary key always gets an index automatically.
- A unique constraint always gets an index automatically (the database needs it to efficiently enforce uniqueness).
So in our accounts table, both id and email are automatically indexed.
When Do You Need to Create an Index Manually?
If you frequently filter or sort on a column that is not a primary key or unique, you should create an index:
CREATE INDEX idx_accounts_owner ON accounts (owner);
Now SELECT * FROM accounts WHERE owner = 'SpongeBob'; can use the index instead of scanning every row.
The Trade-off
Indexes speed up reads but add some write overhead, because the index must be updated on inserts, updates, and deletes. In most applications, the read-speed gain is worth this cost.
7. Next Steps: The Activity Table
Before we move on to querying, let's establish another table for our banking system. The activity table records every deposit and withdrawal:
CREATE TABLE activity (
id SERIAL PRIMARY KEY,
account_id INTEGER NOT NULL REFERENCES accounts(id),
activity_type TEXT NOT NULL CHECK (activity_type IN ('WD', 'DEP')),
amount INTEGER NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
This introduces a foreign key constraint: activity.account_id must refer to an existing row in accounts(id).
- If you try to insert an activity entry for a nonexistent account, PostgreSQL will reject it.
- By default, PostgreSQL will also prevent you from deleting an account row if activity rows still reference it.
activity_type is a simple enumeration: DEP for deposit and WD for withdrawal.
Insert Some Activity Entries
INSERT INTO activity (account_id, activity_type, amount) VALUES (1, 'DEP', 500);
INSERT INTO activity (account_id, activity_type, amount) VALUES (2, 'DEP', 250);
INSERT INTO activity (account_id, activity_type, amount) VALUES (3, 'DEP', 1000);
INSERT INTO activity (account_id, activity_type, amount) VALUES (1, 'WD', 100);
INSERT INTO activity (account_id, activity_type, amount) VALUES (1, 'DEP', 50);
INSERT INTO activity (account_id, activity_type, amount) VALUES (2, 'WD', 30);
Convention
amount is stored as a positive integer. The activity_type indicates whether the activity adds to the balance (DEP) or subtracts from it (WD).
8. Querying: SELECT, WHERE, ORDER BY, LIMIT
In this section, you will see several SELECT statements. A SELECT query asks the database to return a result table (a set of rows and columns) based on rules you specify.
How to Read a Basic SELECT Query
Here is the basic shape you will see repeatedly:
SELECT <columns>
FROM <table>
WHERE <row_filter>
ORDER BY <sort_key>
LIMIT <max_rows>;
Not every query uses every clause, but the idea stays the same.
The pieces (syntax + meaning)
SELECT… chooses which columns to return.*is a wildcard meaning "all columns".- You can list columns separated by commas, like
owner, balance.
FROM… names the table you are reading from.WHERE… filters down to only the rows that match boolean conditions (predicates).- Example operators you will see:
=equality (exact match)>greater thanLIKEpattern matching (described below)
- Example operators you will see:
ORDER BY… sorts the returned rows.ASCmeans ascending (smallest → largest, A → Z).DESCmeans descending (largest → smallest, Z → A).- Without
ORDER BY, the database is free to return rows in any order.
LIMIT... restricts how many rows are returned.;ends the statement inpsql, like a statement in C-family programming languages
A note on literals
- String values use single quotes, like
'spongebob@unc.edu'. LIKE 'B%'means "starts with B" because%is a wildcard meaning "any sequence of characters".
A note on evaluation order (mental model)
When reading a SELECT query, it often helps to think:
FROM(pick the table)WHERE(filter rows)SELECT(choose columns)ORDER BY(sort)LIMIT(take the first N)
This is not a full description of how the database executes queries internally, but it is a useful way to understand what the clauses mean.
Now let's interactively explore this syntax with examples that build up to more complex select statements. Follow along with each of the examples below:
Retrieve All Rows
SELECT * FROM accounts;
| id | owner | balance | |
|---|---|---|---|
| 1 | SpongeBob | spongebob@unc.edu | 500 |
| 2 | Squidward | squidward@unc.edu | 250 |
| 3 | Patrick | patrick@unc.edu | 1000 |
Select Specific Columns
SELECT owner, balance FROM accounts;
Filter with WHERE
Find a specific account by email:
SELECT * FROM accounts WHERE email = 'spongebob@unc.edu';
Find accounts with a balance above 300:
SELECT * FROM accounts WHERE balance > 300;
Find accounts whose owner name starts with "S":
SELECT * FROM accounts WHERE owner LIKE 'S%';
Sorting with ORDER BY
Sort by balance, with the highest balances first:
SELECT * FROM accounts ORDER BY balance DESC;
Sort alphabetically by owner name:
SELECT * FROM accounts ORDER BY owner ASC;
Limiting Results
Get the single largest account:
SELECT * FROM accounts ORDER BY balance DESC LIMIT 1;
9. Updating Data
The UPDATE statement modifies existing rows. In a banking system, updates are the bread and butter — every deposit and withdrawal changes a balance.
How to Read an UPDATE Statement
Here is the common shape:
UPDATE <table>
SET <column> = <expression>
WHERE <row_filter>;
UPDATE <table>chooses which table you want to change.SETdescribes what to change. The right-hand side can be an expression.balance = balance + 100means "take the oldbalancevalue and add 100".- You can update multiple columns by separating assignments with commas.
WHEREchooses which rows to modify.WHERE id = 1targets the row whose primary key is 1.- If no rows match, then nothing is updated.
Always double-check WHERE
If you omit the WHERE clause, the update applies to every row in the table.
Deposit $100 into SpongeBob's Account
UPDATE accounts SET balance = balance + 100 WHERE id = 1;
After this, SpongeBob's balance is 600.
Withdraw $50 from Squidward's Account
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
Squidward's balance is now 200.
Always Use a WHERE Clause!
Running UPDATE accounts SET balance = 0; without a WHERE clause would set every account's balance to zero. Always double-check your WHERE before executing an UPDATE.
Recording Activity + Updating Balances as One Atomic Transaction
In a real system, you rarely want to update a balance without also recording why it changed. We can group multiple statements into a single transaction so they succeed or fail together.
Here is a simple example: transfer $200 from SpongeBob (account 1) to Squidward (account 2). This updates balances and inserts matching activity entries.
BEGIN;
-- Record the intent (activity log)
INSERT INTO activity (account_id, activity_type, amount) VALUES (1, 'WD', 200);
INSERT INTO activity (account_id, activity_type, amount) VALUES (2, 'DEP', 200);
-- Apply the balance changes
UPDATE accounts SET balance = balance - 200 WHERE id = 1;
UPDATE accounts SET balance = balance + 200 WHERE id = 2;
COMMIT;
If any statement in the transaction fails (for example, the foreign key constraint on activity.account_id or a check constraint), the whole transaction is rolled back and none of the changes take effect.
ACID: The Four Guarantees of a Transaction
The properties that make transactions trustworthy are summarized by ACID.
| Property | One-Sentence Definition | Banking Example |
|---|---|---|
| Atomicity | All operations in a transaction succeed together, or none of them take effect. | A transfer debits SpongeBob and credits Squidward, or neither balance changes. |
| Consistency | A transaction moves the database from one valid state to another — no constraint is violated. | An activity insert with a nonexistent account_id is rejected because of the foreign key. |
| Isolation | Concurrent transactions do not see each other's uncommitted work. | While Request A is transferring money, Request B reading balances sees the old values until A commits. |
| Durability | Once a transaction commits, the data is permanently saved — it survives crashes, power failures, and restarts. | After COMMIT; returns, the new balances and activity rows persist even if the server crashes one millisecond later. |
Isolation levels
In practice, databases offer different isolation levels (Read Committed, Repeatable Read, Serializable) that trade strictness for performance. PostgreSQL defaults to Read Committed. We won't dive into isolation levels in this course, but you should know the concept exists.
10. Aggregating Data
SQL provides aggregate functions that compute one result from many rows.
How to Read Aggregate Queries
Aggregates like COUNT(*) and SUM(amount) combine many rows into a single value.
For example:
SELECT COUNT(*) FROM accounts;
means "count how many rows are in accounts".
Aliases with AS
You will often see AS name to give a computed column a readable name:
... AS net_activitymeans "label this computed column asnet_activity" in the results.
Conditional logic with CASE
Sometimes you need conditional logic inside an expression. SQL uses a CASE expression:
CASE WHEN <condition> THEN <value_if_true> ELSE <value_if_false> END
In our banking example, we store amount as a positive integer and use activity_type to decide whether to add or subtract it.
| Function | Purpose |
|---|---|
COUNT(*) |
Number of rows |
SUM(column) |
Total of all values in a column |
AVG(column) |
Average value |
MIN(column) |
Smallest value |
MAX(column) |
Largest value |
How Many Accounts Exist?
SELECT COUNT(*) FROM accounts;
Result: 3
What is the Total Balance Across All Accounts?
SELECT SUM(balance) FROM accounts;
What is the Net Activity for Each Account in the Activity Table?
If you want an aggregate per group (per account), you use GROUP BY.
GROUP BY in One Sentence
GROUP BY account_id means: "Partition the rows into one bucket per account_id, then compute the aggregates once per bucket."
Here's the pattern:
SELECT <group_columns>, <aggregate_expressions>
FROM <table>
GROUP BY <group_columns>;
Now let's compute the signed net activity and number of entries per account:
SELECT
account_id,
SUM(CASE WHEN activity_type = 'DEP' THEN amount ELSE -amount END) AS net_activity,
COUNT(*) AS num_entries
FROM activity
GROUP BY account_id
ORDER BY account_id;
Read this as: "Group activity rows by account_id. For each account, compute the signed net total (deposits add, withdrawals subtract) and count how many activity rows exist."
Filtering Rows vs Filtering Groups
WHERE ...filters rows before grouping.GROUP BY ...forms groups.- Aggregate functions like
SUM(...)andCOUNT(*)compute results per group.
Sometimes you want to filter based on an aggregate result (for example, "only show accounts whose net activity is positive"). That requires the HAVING clause, which is beyond the scope of this reading.
- PostgreSQL docs for
HAVING: https://www.postgresql.org/docs/current/sql-select.html#SQL-HAVING
11. Deleting Data
How to Read DELETE Statements
DELETE FROM <table> WHERE <row_filter>;
DELETEremoves rows.- The table still exists afterward.
- If you omit the
WHEREclause, it deletes all rows.
To remove rows from a table, use DELETE:
DELETE FROM activity WHERE id = 6;
A safe DELETE habit
Before running a DELETE ... WHERE ..., run a SELECT ... WHERE ... first to confirm you are targeting the rows you intend.
To delete all rows (but keep the table structure):
DELETE FROM activity;
To remove a table in its entirety:
DROP TABLE <table>;
DROP TABLEdeletes the table itself (its schema and all of its data).- After dropping a table, you cannot query or insert into it until it is recreated.
DROP TABLE activity;
Warning
DELETE and DROP TABLE are generally irreversible. Use them carefully, especially without a WHERE clause.
Conclusion
In this reading, you practiced the foundations of relational databases and SQL through a simple banking system:
- Tables, columns, rows — the building blocks of the relational model.
- Primary keys identify rows; unique constraints prevent duplicates on other columns.
- Indexes make queries fast — they are created automatically for primary keys and unique constraints.
- DDL (
CREATE TABLE,ALTER TABLE,DROP TABLE) defines structure; DML (INSERT,SELECT,UPDATE,DELETE) manipulates data. - Filtering (
WHERE), sorting (ORDER BY), limiting (LIMIT), and aggregating (COUNT,SUM,AVG) withGROUP BYare the workhorses of SQL queries.
In the next reading, you will see how to express these same SQL queries in Python using SQLModel — the ORM library that bridges your Python application code and the database.