DynamoDB
A key-value and document NoSQL database which can guarantee consistent reads and writes at any scale
Features
- Fully managed
- Multi-region
- Multi-master
- Durable database
- Built-in security
- Backup and restore
- In-memory caching
Provides
- Eventual Consistent Reads (default)
- Strongly Consistent Reads
Specify your read and write capacity per second, it just works at whatever capacity you need without you tweaking anything.
All data is stored on SSD storage an dis spread across 3 different AZs
Anatomy of DynamoDB
Tables : contain rows and columns
Items : rows of data
Attributes : The columns of data
Keys : Identifying names of your data
Values : The actual data itself
Read Consistency
When data needs to be updated it has to write updates to all copies
It is possible for data to be inconsistent when reading from a copy which has yet to be updated.
You have the ability to choose the read consistency in DynamoDB to meet your needs
Eventual Consistent Reads (DEFAULT)
- When copies are being updated it is possible for you to read and be returned an inconsistent copy
- Reads are fast but there is no guarantee of consistent
- All copies of data eventually become generally consistent within a second
Strongly Consistent Reads
- When copies are being updated and you attempt to red, it will ot return a result until all copies are consistent.
- You have a guarantee of consistency but the trade off is higher latency (slower reads).
- All copies of data **will be consistent within a second.
DynamoDB - Partitions
What is a partition? : A partition is an allocation of storage for a table, backed by SSDs and automatically replicated across AZs within a AWS Region.
A partition is when you slice your table up into smaller chunks of data (a partition). It speeds up reads for very large tables by logically grouping similar data together.
DynamoDB automatically creates partitions for you as your data grows
DynamoDB starts off with a single partition
There are ✌🏾 two cases where DynamoDB will create new partitions:
- for every 10GB of data
- when you exceed the RCUs (Read capacity units) or WCUs (Write capacity units) for a single partition
Each partition has a maximum of 3000 RCUs and 1000 WCUs
DyanamoDB evenly splits the RCUs and WCUs across Partitions
Primary Keys
When you crate a table you have to define a Primary Key. The primary key determines where and how your data will be stored in partitions.
⚠ The primary key cannot be changed after
Determines which partition data should be written to
Determines how data should be sorted on a paritition
- Using on a Partition Key is called a Simple Primary key
- Using both a Partition and Sort is called Composite Primary key
Simple Primary Keys
How a Primary Key with only a Partition Key chooses which partition Partition Key has to be unique when Simple Primary Key
DynamoDBs Internal Hash Function It’s a secret, we have no idea how the algorithm decides which partition to write to
Composite Primary Key
How a primary Key with a Partition and Sort Key chooses which partitions.
The combination of partition and sort key key have to be unique
**When using a sort key the records have the same Partition key will be kept together and sorted A-Z
DynamoDBs Internal Hash Function It’s a secret, we have no idea how the algorithm decides which partition to write to
Primary Key Design
Simple Primary Keys
Only Partition key No two items can have the same Partition key
Composite Primary Keys
Partition + Sort key Two items can have the same partition key, but partition and sort key combined must be unique
✌🏾 Two things when designing your Primary key Distinct - The key should be as distinct (unique) as possible Uniform - The key should evenly divide data
Query and Scan
Query
- Allow you to find items in a table based on primary key Values
- Query any table or secondary index that has a composite primary key (partition and sort key)Region.
- By default reads as Eventually Consistent (if you want Strongly Consistent set
ConsistentRead True) - By default returns all attributes for items
- You can return specific attributes by using
ProjectExpression - By default is sorted ascending (Use
ScanIndexForward falseto reverse or to descending)
Scan
- Scan through all items and then return one or more items through filters
- By default returns all attributes for items
- Scans can be performed on tables and secondary indexes
- Can return specific attributes by using
ProjectExpression - Scan operations are sequential. You can speed up a scan through parallel scans using
SegmentsandTotal Segments
⚠ Avoid Scans When Possible
- Much less efficient than running a Query
- As a table grows, scans take longer to complete
- A large table can use all your provisioned throughput in a single scan
Provision Capacity
Provisioned Throughput Capacity : is the maximum amount of capacity your application is allowed to read or write per second from the table or index
Throughput is measured in capacity units:
- RCUs Read Capacity Units
- WCUs Write Capacity Units
You can set DynamoDB for Provisioned to scale capacity up an down based on utilization
If you go beyond your capacity you will get the error ProvisionedThroughputExceededException
This is known as Throttling. Requests that are throttled will be dropped (data loss)
On-Demand Capacity
On-Demand Capacity is pay per request. So you only pay for what you use.
On-demand mode is good for:
- New tables with unknown workloads
- Unpredictable application traffic
- The ease of paying for only what you use
The throughput is limited by default upper limits for a table:
- 40,000 RCUs
- 40,000 WCUs
Since there is no hard limit imposed by the user, On-Demand could become very expensive based on the emerging scenario
Calculating Reads
Read Capacity Unit (RCU)
A read capacity unit represents:
- one strongly consistent read per second
- or two eventually consistent reads per second
- for an item of up to 4kb in size
Strong read calculation
- Round data up to the nearest 4
- Divide data by 4
- Times by number of reads
50 reads at 40kb per item. (40/4) x 50 = 500 RCUs
10 reads at 6kb per item (8/4) x 10 = 20 RCUs
33 reads at 17kb per item (20/4) x 33 = 165 RCUs
Eventual read calculation
- Round data up to the nearest 4
- Divide data by 4
- Times by number of reads
- Divide by 2
- Round up to whole number
50 reads at 40kb per item ((40/4) x 50) / 2 = 250 RCUs
11 reads at 9kb per item ((12/4) x 11) / 2 = 17 RCUs
Calculating Writes
Write Capacity Unit (RCUs)
A write capacity unit represents:
- one write per second
- for an item up to 1kb
50 writes at 40kb per item 40 x 50 = 2000 WCUs
Global Tables
Amazon DynamoDB global tables provide a fully managed solution for deploying a multi-region, multi-master database, without having to build and maintain your own replication solution
To create a global tables you must first:
- Use KMS CMK
- Enable Streams
- Stream Type of New and Old Image
Transactions
What is a transaction? : Represents a change that will occur to the database. If any dependent conditions fail than a transaction will rollback as if the database changes never occurred
DynamoDB offers the ability to perform transactions at no additional cost using the TransactWriteItems and TranactGetItems
DynamoDB transactions allow for all-or-nothing changes to multiple items both within and across Tables DynamoDB performs ✌🏾 two underlying reads ro writes of every item in the transaction:
- one to prepare the transaction
- one to commit the transactions
These two underlying read/write operations are visible in your Amazon CloudWatch metrics.
You can use ConditionCheck with DynamoDB transaction to do a pre-conditional check.
Time To Live (TTL)
Time To Live (TTL) lets you have items in DynamoDB expire (deleted) at a given time
TTL is great for keeping databases small and manageable and suited for temporary continuous data e.g. session data, events logs, usages patterns
Put in the attribute name which will have a string in DateTime format that will determine when this record will be deleted.
DynamoDB does not have a DateTime datatype! To use TTL you must use a string which needs to be in Epoch foramt (Datetime represented as numbers)
Jan 05, 1993 10:00PM GMT 726270962
A smaller database results in money saved!
Streams
When you enable a stream on a table DynamoDb captures every modification to data items so you can react to those changes
When an Insert, Update, or Delete occurs, the change will be captured and sent to Lambda Function
- Changes are sent in batches at time to your custom Lambda
- Changes are sent your your custom Lambda in near real-Time
- Each stream record appears exactly once in the Stream
- For each item that is modified, the stream records appear in the same sequence as the actual modifications
Ex. Every time a purchase is records send an internal email.
Errors
ThrottlingException
: Rate of requests exceeds the allowed throughput. This exception might be returned if yu perform any of the following operations too rapidly: CreateTable, UpdateTable, DeleteTable
ProvisionedThroughputExceededException
: You exceeded your maximum allowed provisioned throughput for a table or for one or more global secondary indexes.
The AWS SDK will **automatically retry** with **Exponential Back-offs** when an error occurs: It will attempt to request again in 50ms, 100ms, 200ms, up-to a minute before stopping
Indexes
What is an index? : A database index is a copy of selected columns of data in a database which is used to quickly sort
DynamoDB has ✌🏾 two types of Indexes: : 1. LSI - Local Secondary Index (can only be created with initial table) 2. GSI - Global Secondary Index
You should generally use Global over Local
Strong Consistency IS A Deciding Factor
A LSI can provided strong consistency A GSI cannot provide strong consistency
Local Secondary Indexes
What is a LSI? : It’s “local” in that every partition of an LSI is scoped to a base table partition that hs the same partition key value
The total size of indexed items for any one partition key value cannot exceed 10gb
Shared provisioned throughput settings for read and write activity with the table it is indexing.
Limited to 5 per table (default)
LSIs are created with the initial table LSIs are immutable A LSI needs both a partition and sort Key The partition key must be the same as the base table The sort key should be different from the base table
Global Secondary Indexes
What is a Global secondary Index (LSI)? : It is considered “global” because queries on the index can span all of the data in the base table, across partitions
The indexes have no size restrictions
They have their own provisioned throughput settings. They consume capacity but not from the base table
Limited to 20 per table (default)
GSI can be added, modified, or deleted at anytime The partition key should be different from the base table The sort key is optional but not required
LSI vs GSI
| Characteristics | Local Secondary Index | Global Secondary Index |
|---|---|---|
| Key Schema | Composite | ⭐ Simple or Composite |
| Key Attributes | Partition key must be the same as base table | ⭐ Partition and Sort key can be any attribute |
| Size Restriction Per Partition Key Value | all indexed items must be 10GB or less. | ⭐ Unlimited |
| Online Index Operations | create index on table creation | ⭐ Add, modify or delete index at anytime |
| Queries and Partition | query over a single partition, as specified by the partition key value in the query | ⭐ query over the entire table, across all Partitions |
| Read Consistency | ⭐ Strongly or eventual consistent | Only eventual consistent |
| Provisioned Throughput Consumption | Share capacity units with base table | ⭐ Has own capacity |
| Projected Attributes | ⭐ can request attributes that are not projected in to the index | can only request the attributes that are projected into the index |
Accelerator (DAX)
DAX is fully manaaged in-memory cache for DynamoDB that runs in the cluster.
DynamoDB response times can be in the single-digit milliseconds_e.g. DAX can reduce response times to microseconds
- A DAX cluster consists of one or more nodes.
- Each node runs its own instance of the DAX caching software.
- One of the nodes serves as the primary node for the cluster.
- Additional nodes (if present) serve as read replicas.
- Your app can access DAX by specifying the endpoint for the DAX cluster.
- The DAX client software works with the cluster endpoint to perform intelligent load balancing and routing.
- Incoming requests are evenly distributed across all of the nodes in the cluster.
| Dax is Ideal for: | Dax is not Ideal for: |
|---|---|
| Apps that require the fasted possible response time fo reads e.g. real-time bidding, social gaming, and trading applications | Apps that require strongly consistent reads |
| Apps that read a small number of items more frequently than others | Apps that do not require microsecond response times for reads, or that do not need to offload repeated read activity from underlying Tables |
| Apps that are read-intensive, but are also cost-sensitive | Apps that are write-intensive, or that do not perform much read activity |
| Apps that require repeated reads against a large set of data. | Apps that are already using a different caching solution with DynamoDB, are are using their own client-side logic for working with that caching solution |