DynamoDB

A key-value and document NoSQL database which can guarantee consistent reads and writes at any scale

Features

  • Fully managed
  • Multi-region
  • Multi-master
  • Durable database
  • Built-in security
  • Backup and restore
  • In-memory caching

Provides

  • Eventual Consistent Reads (default)
  • Strongly Consistent Reads

Specify your read and write capacity per second, it just works at whatever capacity you need without you tweaking anything.

All data is stored on SSD storage an dis spread across 3 different AZs

Anatomy of DynamoDB

Tables : contain rows and columns

Items : rows of data

Attributes : The columns of data

Keys : Identifying names of your data

Values : The actual data itself

Read Consistency

When data needs to be updated it has to write updates to all copies

It is possible for data to be inconsistent when reading from a copy which has yet to be updated.

You have the ability to choose the read consistency in DynamoDB to meet your needs

Eventual Consistent Reads (DEFAULT)

  • When copies are being updated it is possible for you to read and be returned an inconsistent copy
  • Reads are fast but there is no guarantee of consistent
  • All copies of data eventually become generally consistent within a second

Strongly Consistent Reads

  • When copies are being updated and you attempt to red, it will ot return a result until all copies are consistent.
  • You have a guarantee of consistency but the trade off is higher latency (slower reads).
  • All copies of data **will be consistent within a second.

DynamoDB - Partitions

What is a partition? : A partition is an allocation of storage for a table, backed by SSDs and automatically replicated across AZs within a AWS Region.

A partition is when you slice your table up into smaller chunks of data (a partition). It speeds up reads for very large tables by logically grouping similar data together.

DynamoDB automatically creates partitions for you as your data grows

DynamoDB starts off with a single partition

There are ✌🏾 two cases where DynamoDB will create new partitions:

  1. for every 10GB of data
  2. when you exceed the RCUs (Read capacity units) or WCUs (Write capacity units) for a single partition

Each partition has a maximum of 3000 RCUs and 1000 WCUs

DyanamoDB evenly splits the RCUs and WCUs across Partitions

Primary Keys

When you crate a table you have to define a Primary Key. The primary key determines where and how your data will be stored in partitions.

⚠ The primary key cannot be changed after

Determines which partition data should be written to

Determines how data should be sorted on a paritition

  • Using on a Partition Key is called a Simple Primary key
  • Using both a Partition and Sort is called Composite Primary key

Simple Primary Keys

How a Primary Key with only a Partition Key chooses which partition Partition Key has to be unique when Simple Primary Key

DynamoDBs Internal Hash Function It’s a secret, we have no idea how the algorithm decides which partition to write to

Composite Primary Key

How a primary Key with a Partition and Sort Key chooses which partitions.

The combination of partition and sort key key have to be unique

**When using a sort key the records have the same Partition key will be kept together and sorted A-Z

DynamoDBs Internal Hash Function It’s a secret, we have no idea how the algorithm decides which partition to write to

Primary Key Design

Simple Primary Keys

Only Partition key No two items can have the same Partition key

Composite Primary Keys

Partition + Sort key Two items can have the same partition key, but partition and sort key combined must be unique

✌🏾 Two things when designing your Primary key Distinct - The key should be as distinct (unique) as possible Uniform - The key should evenly divide data

Query and Scan

Query

  • Allow you to find items in a table based on primary key Values
  • Query any table or secondary index that has a composite primary key (partition and sort key)Region.
  • By default reads as Eventually Consistent (if you want Strongly Consistent set ConsistentRead True)
  • By default returns all attributes for items
  • You can return specific attributes by using ProjectExpression
  • By default is sorted ascending (Use ScanIndexForward false to reverse or to descending)

Scan

  • Scan through all items and then return one or more items through filters
  • By default returns all attributes for items
  • Scans can be performed on tables and secondary indexes
  • Can return specific attributes by using ProjectExpression
  • Scan operations are sequential. You can speed up a scan through parallel scans using Segments and Total Segments

Avoid Scans When Possible

  • Much less efficient than running a Query
  • As a table grows, scans take longer to complete
  • A large table can use all your provisioned throughput in a single scan

Provision Capacity

Provisioned Throughput Capacity : is the maximum amount of capacity your application is allowed to read or write per second from the table or index

Throughput is measured in capacity units:

  • RCUs Read Capacity Units
  • WCUs Write Capacity Units

You can set DynamoDB for Provisioned to scale capacity up an down based on utilization

If you go beyond your capacity you will get the error ProvisionedThroughputExceededException

This is known as Throttling. Requests that are throttled will be dropped (data loss)

On-Demand Capacity

On-Demand Capacity is pay per request. So you only pay for what you use.

On-demand mode is good for:

  • New tables with unknown workloads
  • Unpredictable application traffic
  • The ease of paying for only what you use

The throughput is limited by default upper limits for a table:

  • 40,000 RCUs
  • 40,000 WCUs

Since there is no hard limit imposed by the user, On-Demand could become very expensive based on the emerging scenario

Calculating Reads

Read Capacity Unit (RCU)

A read capacity unit represents:

  • one strongly consistent read per second
  • or two eventually consistent reads per second
  • for an item of up to 4kb in size

Strong read calculation

  1. Round data up to the nearest 4
  2. Divide data by 4
  3. Times by number of reads

50 reads at 40kb per item. (40/4) x 50 = 500 RCUs

10 reads at 6kb per item (8/4) x 10 = 20 RCUs

33 reads at 17kb per item (20/4) x 33 = 165 RCUs

Eventual read calculation

  1. Round data up to the nearest 4
  2. Divide data by 4
  3. Times by number of reads
  4. Divide by 2
  5. Round up to whole number

50 reads at 40kb per item ((40/4) x 50) / 2 = 250 RCUs

11 reads at 9kb per item ((12/4) x 11) / 2 = 17 RCUs

Calculating Writes

Write Capacity Unit (RCUs)

A write capacity unit represents:

  • one write per second
  • for an item up to 1kb

50 writes at 40kb per item 40 x 50 = 2000 WCUs

Global Tables

Amazon DynamoDB global tables provide a fully managed solution for deploying a multi-region, multi-master database, without having to build and maintain your own replication solution

To create a global tables you must first:

  1. Use KMS CMK
  2. Enable Streams
  3. Stream Type of New and Old Image

Transactions

What is a transaction? : Represents a change that will occur to the database. If any dependent conditions fail than a transaction will rollback as if the database changes never occurred

DynamoDB offers the ability to perform transactions at no additional cost using the TransactWriteItems and TranactGetItems

DynamoDB transactions allow for all-or-nothing changes to multiple items both within and across Tables DynamoDB performs ✌🏾 two underlying reads ro writes of every item in the transaction:

  1. one to prepare the transaction
  2. one to commit the transactions

These two underlying read/write operations are visible in your Amazon CloudWatch metrics.

You can use ConditionCheck with DynamoDB transaction to do a pre-conditional check.

Time To Live (TTL)

Time To Live (TTL) lets you have items in DynamoDB expire (deleted) at a given time

TTL is great for keeping databases small and manageable and suited for temporary continuous data e.g. session data, events logs, usages patterns

Put in the attribute name which will have a string in DateTime format that will determine when this record will be deleted.

DynamoDB does not have a DateTime datatype! To use TTL you must use a string which needs to be in Epoch foramt (Datetime represented as numbers)

Jan 05, 1993 10:00PM GMT 726270962

A smaller database results in money saved!

Streams

When you enable a stream on a table DynamoDb captures every modification to data items so you can react to those changes

When an Insert, Update, or Delete occurs, the change will be captured and sent to Lambda Function

  • Changes are sent in batches at time to your custom Lambda
  • Changes are sent your your custom Lambda in near real-Time
  • Each stream record appears exactly once in the Stream
  • For each item that is modified, the stream records appear in the same sequence as the actual modifications

Ex. Every time a purchase is records send an internal email.

Errors

ThrottlingException : Rate of requests exceeds the allowed throughput. This exception might be returned if yu perform any of the following operations too rapidly: CreateTable, UpdateTable, DeleteTable

ProvisionedThroughputExceededException : You exceeded your maximum allowed provisioned throughput for a table or for one or more global secondary indexes.

The AWS SDK will **automatically retry** with **Exponential Back-offs** when an error occurs: It will attempt to request again in 50ms, 100ms, 200ms, up-to a minute before stopping

Indexes

What is an index? : A database index is a copy of selected columns of data in a database which is used to quickly sort

DynamoDB has ✌🏾 two types of Indexes: : 1. LSI - Local Secondary Index (can only be created with initial table) 2. GSI - Global Secondary Index

You should generally use Global over Local

Strong Consistency IS A Deciding Factor

A LSI can provided strong consistency A GSI cannot provide strong consistency

Local Secondary Indexes

What is a LSI? : It’s “local” in that every partition of an LSI is scoped to a base table partition that hs the same partition key value

The total size of indexed items for any one partition key value cannot exceed 10gb

Shared provisioned throughput settings for read and write activity with the table it is indexing.

Limited to 5 per table (default)

LSIs are created with the initial table LSIs are immutable A LSI needs both a partition and sort Key The partition key must be the same as the base table The sort key should be different from the base table

Global Secondary Indexes

What is a Global secondary Index (LSI)? : It is considered “global” because queries on the index can span all of the data in the base table, across partitions

The indexes have no size restrictions

They have their own provisioned throughput settings. They consume capacity but not from the base table

Limited to 20 per table (default)

GSI can be added, modified, or deleted at anytime The partition key should be different from the base table The sort key is optional but not required

LSI vs GSI

CharacteristicsLocal Secondary IndexGlobal Secondary Index
Key SchemaComposite⭐ Simple or Composite
Key AttributesPartition key must be the same as base table⭐ Partition and Sort key can be any attribute
Size Restriction Per Partition Key Valueall indexed items must be 10GB or less.⭐ Unlimited
Online Index Operationscreate index on table creation⭐ Add, modify or delete index at anytime
Queries and Partitionquery over a single partition, as specified by the partition key value in the query⭐ query over the entire table, across all Partitions
Read Consistency⭐ Strongly or eventual consistentOnly eventual consistent
Provisioned Throughput ConsumptionShare capacity units with base table⭐ Has own capacity
Projected Attributes⭐ can request attributes that are not projected in to the indexcan only request the attributes that are projected into the index

Accelerator (DAX)

DAX is fully manaaged in-memory cache for DynamoDB that runs in the cluster.

DynamoDB response times can be in the single-digit milliseconds_e.g. DAX can reduce response times to microseconds

  • A DAX cluster consists of one or more nodes.
  • Each node runs its own instance of the DAX caching software.
  • One of the nodes serves as the primary node for the cluster.
  • Additional nodes (if present) serve as read replicas.
  • Your app can access DAX by specifying the endpoint for the DAX cluster.
  • The DAX client software works with the cluster endpoint to perform intelligent load balancing and routing.
  • Incoming requests are evenly distributed across all of the nodes in the cluster.
Dax is Ideal for:Dax is not Ideal for:
Apps that require the fasted possible response time fo reads e.g. real-time bidding, social gaming, and trading applicationsApps that require strongly consistent reads
Apps that read a small number of items more frequently than othersApps that do not require microsecond response times for reads, or that do not need to offload repeated read activity from underlying Tables
Apps that are read-intensive, but are also cost-sensitiveApps that are write-intensive, or that do not perform much read activity
Apps that require repeated reads against a large set of data.Apps that are already using a different caching solution with DynamoDB, are are using their own client-side logic for working with that caching solution