SELECT * FROM exciting_knowledge;
First off
Software Engineer on Sabbatical
but I want to
some of what
https://kyletolle.github.io/dbs_and_devs_talk
Let’s start at
&
cat students.csv
Id, Name, Grade
1, Ted, B
2, Stan, A-
3, Fred, C++
4, Ned, 5%
Some things don’t need to be complicated
can be used for
Then, things grow
Fortunately for us,
some smart folks
Popular DBs according to Devs
https://insights.stackoverflow.com/survey/2017#technology-databases
Many common DBs use the
*
*: By User:AutumnSnow - Own work, CC BY-SA 3.0, Link
In other words, we have a
with
for managing data in a
Remember our student data?
Id, Name, Grade
1, Ted, B
2, Stan, A-
3, Fred, C++
4, Ned, 5%
students
id | name | grade |
---|---|---|
1 | Ted | B |
2 | Stan | A- |
3 | Fred | C++ |
4 | Ned | 5% |
The table
has a schema
.
Each row
is a record
.
Each column
is an attribute
.
Let’s drop into
brew install postgres
brew services start postgresql
createdb dbs_and_devs_talk
psql dbs_and_devs_talk
CREATE TABLE students(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
grade TEXT NOT NULL
);
INSERT INTO students (name, grade) values ('Ted', 'B');
INSERT INTO students (name, grade) values ('Stan', 'A-');
INSERT INTO students (name, grade) values ('Fred', 'C++');
INSERT INTO students (name, grade) values ('Ned', '5%');
SELECT * FROM students;
id | name | grade
====+======+=======
1 | Ted | B
2 | Stan | A-
3 | Fred | C++
4 | Ned | 5%
(4 rows)
CREATE TABLE backpacks(
id SERIAL PRIMARY KEY,
color TEXT NOT NULL,
student_id INT references students(id)
);
INSERT INTO backpacks (color, student_id) values('blue', 1);
INSERT INTO backpacks (color, student_id) values('red', 2);
INSERT INTO backpacks (color, student_id) values('grey', 3);
INSERT INTO backpacks (color, student_id) values('green', 4);
INSERT INTO backpacks (color, student_id) values('clear', 1);
SELECT * FROM backpacks;
id | color | student_id
====+=======+============
1 | blue | 1
2 | red | 2
3 | grey | 3
4 | green | 4
5 | clear | 1
(5 rows)
SELECT color FROM backpacks WHERE student_id = 1;
color
=======
blue
clear
(2 rows)
SELECT s.name, b.color FROM students as s
JOIN backpacks as b ON s.id = b.student_id
ORDER BY s.name;
name | color
======+=======
Fred | grey
Ned | green
Stan | red
Ted | blue
Ted | clear
(5 rows)
\q
Do stuff
for a while
Plus, learn to
Object-Relational Mapping
Maps
to
follows the
brew install rbenv
rbenv install 2.4.2
rbenv global 2.4.2
sqlite3 --version
gem install rails
rails --version
We expect to see Rails 5.1.4
rails new blog
cd blog
rails generate scaffold Post title:string body:text
class CreatePosts < ActiveRecord::Migration[5.1]
def change
create_table :posts do |t|
t.string :title
t.text :body
t.timestamps
end
end
end
rails db:migrate
rails console
We can modify SQL data with it
Post.create title: 'First!', body: 'Viral to the max.'
The console shows us
for various
(0.1ms) begin transaction
SQL (0.4ms) INSERT INTO "posts"
("title", "body", "created_at", "updated_at")
VALUES (?, ?, ?, ?)
[
["title", "First!"],
["body", "Viral to the max."],
["created_at", "2017-11-29 01:37:50.231847"],
["updated_at", "2017-11-29 01:37:50.231847"]
]
(0.7ms) commit transaction
=> #<Post id: 1, title: "First!",
body: "Viral to the max.",
created_at: "2017-11-29 01:37:50",
updated_at: "2017-11-29 01:37:50">
(0.1ms) begin transaction
...
(0.7ms) commit transaction
Multiple queries either
or
An error in a query causes a
Provides protection from
causing
Important when
in
SQL (0.4ms) INSERT INTO "posts"
("title", "body", "created_at", "updated_at")
VALUES (?, ?, ?, ?)
[
["title", "First!"],
["body", "Viral to the max."],
["created_at", "2017-11-29 01:37:50.231847"],
["updated_at", "2017-11-29 01:37:50.231847"]
]
ActiveRecord analyzes the table and
generates the correct query
SQL (0.4ms) INSERT INTO "posts" ...
ActiveRecord measures
each query takes to run
Can help
and find slow ones
Table name (posts
) is pluralized
Class name (Post
) is singular
Handles
like
people
& Person
Notice the
updated_at
& created_at
columns?
ActiveRecord automatically tracks
times of each record
post = Post.last
SELECT "posts".* FROM "posts"
ORDER BY "posts"."id" DESC LIMIT ? [["LIMIT", 1]]
=> #<Post id: 1,
title: "First!",
body: "Viral to the max.",
created_at: "2017-11-29 01:37:50",
updated_at: "2017-11-29 01:37:50">
post.title
=> "First!"
post.updated_at
=> Wed, 29 Nov 2017 02:31:07 UTC +00:00
Post.count
SELECT COUNT(*) FROM "posts"
=> 1
Post.destroy_all
SELECT "posts".* FROM "posts"
DELETE FROM "posts" WHERE "posts"."id" = ? [["id", 1]]
This is not an exhaustive list
exit
rails g migration AddCurrentSongToPosts current_song:string
class AddCurrentSongToPosts < ActiveRecord::Migration[5.1]
def change
add_column :posts, :current_song, :string
end
end
Migrations support
rails db:rollback
rails g model like post:references
class CreateLikes < ActiveRecord::Migration[5.1]
def change
create_table :likes do |t|
t.references :post, foreign_key: true
t.timestamps
end
end
end
post_id
An author can publish many posts, and
a post can have many authors
rails g model author name:string email:string
class CreateAuthors < ActiveRecord::Migration[5.1]
def change
create_table :authors do |t|
t.string :name
t.string :email
t.timestamps
end
end
end
rails g migration CreateJoinTableAuthorsPosts author post
class CreateJoinTableAuthorsPosts < ActiveRecord::Migration[5.1]
def change
create_join_table :authors, :posts do |t|
# t.index [:author_id, :post_id]
# t.index [:post_id, :author_id]
end
end
end
Sometimes the join tables are more complex
Publications
primary_author
boolean attributeroyalty_earned
money attributeQuick detour into
A single DB can provide
In the event of a network failure,
a distributed DB must choose between
Some distributed DBs prioritize availability.
Eventually, all the data will be consistent.
support multiple databases through
generate queries on the fly, so
are slower than hard-coded queries
inflate objects with
data from the DB, which
like sequel for Ruby
ORMS may lazily fetch data, resulting in
1,000+1 queries for
1,000 posts by 1 author
Key (like id
) that uniquely identifies this record
No two records have the same primary key
Key (like post_id
) that uniquely identifies
some other row in a different table or in this table
To allow parent-child relationships
Can speed up lookups, but
require extra space on disk and
slow down writes
SQL injection can cause your DB to be hacked
Some ORMs may sanitize some data by default
before entering your system
Some users might only need read-only access
Review DB permissions to reduce risk of accidental
Removing duplication across tables
Changing the database schema as your needs change
To keep the db online and rename a column:
old_name
to one called new_name
new_name
as well as old_name
old_name
to new_name
old_name
Application-level integrity vs Database-level integrity
Adding a non-null column to a large table
can lock the table
and prevent reads or writes
To prevent a long table lock:
If one node goes down,
another can still serve requests
Make regular backups of databases
Have a restore process
to make sure it works!
dropdb dbs_and_devs_talk