Databases

Many people use file systems to manage their data. Alternatively, database systems (DBS) can be used. Database systems consist of a database management system (DBMS) and a database (DB) in the background. The database management system is a program running either on our computer or on a remote server. Das DBMS ist ein Programm, das auf dem eigenen Computer oder einem Server läuft. There are different types of database systems. The most popular type of database system is the relational database system (RDBS). Relational database systems store their data in data tables. Some widely known RDBS are Microsoft SQL Server and MySQL.

An external application communicates with the database management system communicates with the DBMS by sending queries and receiving result sets as answers. In this process, the application cannot see the database directly; all database access happens via the database management system as the intermediary (encapsulation, information hiding). In relational database systems, queries are formulated using the Structured Query Language (SQL) and passed to the database management system as Strings. To ensure that potential attackers cannot view, copy, or manipulate the data stored in the database (SQL injection), user input should be passed to the database management system as parameters (rather than as parts of a pre-formulated input String). This way, the database management system can ensure proper escaping of potentially harmful input characters.

In general, the SQL language does not depend on the particular database system in use. However, most database systems only support a limited subset of all possible SQL statements. In SQL, we can define the schema of our database by creating tables, specifying the names and data types of the table columns, and - optionally - specifying the relationships between multiple tables. SQL also contains the so-called CRUD statements, which allow us to create, read, update, and delete data.

When there are internal dependencies between data values in different columns of a table, we might encounter anomalies when creating, updating, or deleting data (create, read, and update anomalies).

Example: When we store a client's address with the ZIP code and the city in different columns of the same table, we might forget to update the city when updating the ZIP code. This would constitute an update anomaly, making our data inconsistent.

To avoid such anomalies in relational databases, the data are distributed over multiple connected tables in the data modeling process. This process is called normalization, and the way our data is stored might conform to one or more normal forms. To connect the different tables in a database, the table columns need headings (attributes) that can serve as unique identifiers of individual data rows. These headings are called keys or key candidates. In every table, at least one column heading must be designated as the primary key. If a key is formed by multiple column headings taken together, it is called a composite key.

Most database systems allow us to combine several changes to our data into transactions. A transaction may be executed completely or not at all - if it is aborted, all transaction steps that might have been executed must be undone (rollback). It is only once all steps have been executed successfully that the changes effected are written permanently to the database (commit). Much like normalization, transactions help ensure the consistency of the stored data. A program must tell the database system which operations should form a transaction (e.g., for a bank transfer, one transaction might contain modifications to two bank accounts).

SQLite is a program library that lets us integrate a full database management system in our own applications without forcing us to install additional software.


SQLite in Python

A comprehensive explanation with many examples can be found in the documentation of the Python module. It is easier to read and digest than the official SQLite documentation.

Regular Usage

# Simple workflow

import sqlite3
# Loading the sqlite3 module

con = sqlite3.connect('database.db')
# Connect to a database file
# The database file is created if it does not exist
# The connect method returns a connection object

cur = con.cursor()   
# Returns a cursor pointing to the result set
# The cursor allows us to execute SQL statements

cur.execute('''
    CREATE TABLE IF NOT EXISTS mytable (
        column1 TEXT PRIMARY KEY,
        column2 TEXT);''')
# Executes a SQL statement specified as a String

con.commit()
# Commits the changes made to the database
# Other database connections can see our changes only when they have been commited

# Reset via con.rollback()

con.close()
# Closes the connection
# Preferable syntax for handling transactions
# - similar to the construct used when working with files
con = sqlite3.connect('database.db')

with con:
    cur = con.cursor()
    # ...
# The 'with' block defines a transaction
# with automatic commit or rollback
# Proper escaping (to avoid SQL injection)

mydict = {"one" : "value1", "two" : "value2"}
cur.execute("INSERT INTO mytable VALUES (:one, :two)", mydict)

mytuple = ("value1", "value2")
cur.execute("INSERT INTO mytable VALUES (?, ?)", mytuple)

Cell Magic and Line Magic in Jupyter Notebook

Loading the SQL extension and setting up a database connection:

%load_ext sql
%sql sqlite:///database.db

SQL Statements with the Cell Magic (Multiline Support)

%%sql
SELECT column1, column2 FROM mytable

SQL Statements with the Line Magic (No Multiline Support but Variable Assignment Possible)

var = %sql SELECT column1, column2 FROM mytable
print(var)

SQL Examples

Creating (CREATE) and Updating (ALTER) Tables:

%%sql
CREATE TABLE mytable (
    column1 TEXT,
    column2 TEXT
);
ALTER TABLE mytable ADD COLUMN column3 TEXT;

Inserting Data: INSERT

%sql INSERT INTO mytable (column1, column2) VALUES ("value1", "value2")
%sql INSERT INTO mytable (column2, column1) VALUES ("value2", "value1")
%sql INSERT INTO mytable (column1) VALUES ("value1")
%sql INSERT INTO mytable VALUES ("value1", "value2")

Updating Data: UPDATE

%sql UPDATE mytable SET column1 = "test"
# Caution: changes ALL data in mytable!
%sql UPDATE mytable SET column1 = "newvalue" WHERE value1 = "value1"
# Changes all columns in mytable that fulfill the WHERE condition

Querying Data: SELECT

%sql DELETE FROM mytable
# Caution: deletes ALL data from mytable!
%sql DELETE FROM mytable WHERE value1 = "newvalue"
# Deletes all data from mytable that fulfill the WHERE condition

Accessing Query Results in Python

cur.execute("SELECT * FROM mytable")

resultlist = cur.fetchall()
# returns a list of all rows as tuples
cur.execute("SELECT * FROM mytable")

result = cur.fetchone()
# returns the row that the cursor is currently pointing at
# after execution, the cursor points at the next row
cur.execute("SELECT * FROM mytable")
for row in cur:     # the cursor is iterable
    # do something
    print(row)

results matching ""

    No results matching ""