banner
ShuWa

ShuWa

是进亦忧,退亦忧。然则何时而乐耶?
twitter

Redis Data Types

Data Types and Application Scenarios#

Redis provides a rich set of data types, with five common ones: String, Hash, List, Set, and Zset (sorted set). With the updates to Redis versions, four more data types have been supported: BitMap (added in version 2.2), HyperLogLog (added in version 2.8), GEO (added in version 3.2), and Stream (added in version 5.0).

String#

String is the most basic key-value structure, where the key is a unique identifier and the value is the specific value. The value can not only be a string but also a number (integer or floating point), and the maximum length of data that can be accommodated in the value is 512M.

Internal Implementation#

The underlying data structure implementation of the String type mainly consists of int and SDS (Simple Dynamic Strings).

Compared to C's native strings:

  • SDS can store not only text data but also binary data. This is because SDS uses the value of the len attribute instead of a null character to determine if the string has ended, and all APIs of SDS will handle the data stored in the buf[] array in a binary manner. Therefore, SDS can store not only text data but also binary data such as images, audio, video, and compressed files.
  • The time complexity for getting the length of a string in SDS is O(1). C language strings do not record their own length, so getting the length has a complexity of O(n); whereas in the SDS structure, the length is recorded using the len attribute, so the complexity is O(1).
  • Redis's SDS API is safe, and concatenating strings will not cause buffer overflow. This is because SDS checks whether the SDS space meets the requirements before concatenating strings, and if the space is insufficient, it will automatically expand, thus avoiding buffer overflow issues.

The internal encoding of string objects has three types: int, raw, and embstr.

If a string object stores an integer value and this integer can be represented using a long type, then the integer value will be stored in the ptr attribute of the string object structure (converting void* to long), and the encoding of the string object will be set to int.

If the string object stores a string and the length of this string is less than or equal to 32 bytes (Redis version 2.+), then the string object will use a simple dynamic string (SDS) to store this string and set the object's encoding to embstr, which is an optimized encoding method specifically for storing short strings.

image

If the string object stores a string and the length of this string is greater than 32 bytes (Redis version 2.+), then the string object will use a simple dynamic string (SDS) to store this string and set the object's encoding to raw.

image

Note that the boundary between embstr encoding and raw encoding varies across different Redis versions:

  • Redis 2.+ is 32 bytes
  • Redis 3.0-4.0 is 39 bytes
  • Redis 5.0 is 44 bytes

It can be seen that both embstr and raw encodings use SDS to store values, but the difference is that embstr allocates a contiguous block of memory to store redisObject and SDS through a single memory allocation function, while raw encoding allocates two separate spaces to store redisObject and SDS by calling two memory allocation functions. This approach has many benefits:

  • The number of memory allocations required to create a string object with embstr encoding is reduced from two to one compared to raw encoding;
  • Releasing a string object with embstr encoding also only requires calling one memory release function;
  • Since all data of the string object with embstr encoding is stored in a contiguous block of memory, it can better utilize CPU cache to improve performance.

However, embstr also has its drawbacks:

  • If the length of the string increases and requires reallocation of memory, the entire redisObject and SDS need to be reallocated, so string objects with embstr encoding are effectively read-only. Redis has not written any corresponding modification programs for string objects with embstr encoding. When we perform any modification commands (e.g., append) on string objects with embstr encoding, the program will first convert the object's encoding from embstr to raw before executing the modification command.

Common Commands#

# Set key-value type value
> SET name lin
OK
# Get the corresponding value by key
> GET name
"lin"
# Check if a certain key exists
> EXISTS name
(integer) 1
# Return the length of the string value stored by the key
> STRLEN name
(integer) 3
# Delete the value corresponding to a certain key
> DEL name
(integer) 1

# Batch set key-value type values
> MSET key1 value1 key2 value2 
OK
# Batch get multiple values corresponding to keys
> MGET key1 key2 
1) "value1"
2) "value2"

# Set key-value type value
> SET number 0
OK
# Increment the numeric value stored in the key by one
> INCR number
(integer) 1
# Add 10 to the numeric value stored in the key
> INCRBY number 10
(integer) 11
# Decrement the numeric value stored in the key by one
> DECR number
(integer) 10
# Decrement the numeric value stored in the key by 10
> DECRBY number 10
(integer) 0

# Set the key to expire after 60 seconds (this method sets the expiration time for an existing key)
> EXPIRE name  60 
(integer) 1
# Check how long the data will expire
> TTL name 
(integer) 51

# Set key-value type value and set the expiration time for this key to 60 seconds
> SET key  value EX 60
OK
> SETEX key  60 value
OK

# Insert if not exists
>SETNX key value
(integer) 1

Application Scenarios#

Caching Objects#
  • Directly cache the entire object's JSON, command example: SET user:1 '{"name":"xiaolin", "age":18}'.
  • Use a key structure separated into user:ID, use MSET to store, and MGET to retrieve each attribute value, command example: MSET user:1 xiaolin user:1 18 user:2 xiaomei user:2 20.
General Counting#

Since Redis processes commands in a single-threaded manner, the execution of commands is atomic. Therefore, the String data type is suitable for counting scenarios, such as counting access times, likes, shares, inventory quantities, etc.

Distributed Locks#

The SET command has an NX parameter that can implement "insert only if the key does not exist," which can be used to implement distributed locks:

  • If the key does not exist, it shows that the insertion was successful, which can indicate that the lock was acquired successfully;
  • If the key exists, it will show that the insertion failed, which can indicate that the lock acquisition failed.

Generally, an expiration time is also added to the distributed lock, and the command for the distributed lock is as follows:

SET lock_key unique_value NX PX 10000

The unlocking process is to delete the lock_key key, but it cannot be deleted randomly; it must ensure that the client executing the operation is the one that acquired the lock. Therefore, when unlocking, we first need to check whether the lock's unique_value belongs to the locking client, and only then will we delete the lock_key key.

Shared Session Information#

Typically, when developing a backend management system, we use sessions to save user session (login) states. These session information will be stored on the server side, but this only applies to single-system applications. If it is a distributed system, this model will no longer be applicable.

Therefore, we need to leverage Redis to uniformly store and manage these session information, so that regardless of which server the request is sent to, the server will go to the same Redis to retrieve the relevant session information, thus solving the problem of session storage in distributed systems.

List#

List is a simple list of strings, sorted in the order of insertion, and elements can be added to the List from either the head or the tail. The maximum length of a list is 2^32 - 1, which means each list supports over 4 billion elements.

Internal Implementation#

The underlying data structure of the List type is implemented by doubly linked lists or zip lists:

  • If the number of elements in the list is less than 512 (default value, configurable by list-max-ziplist-entries) and each element's value is less than 64 bytes (default value, configurable by list-max-ziplist-value), Redis will use a zip list as the underlying data structure for the List type;
  • If the number of elements in the list does not meet the above conditions, Redis will use a doubly linked list as the underlying data structure for the List type;

However, after Redis version 3.2, the underlying data structure of the List data type is implemented solely by quicklist, replacing both the doubly linked list and the zip list.

Common Commands#

# Insert one or more values into the head (left) of the key list, with the last value at the front
LPUSH key value [value ...] 
# Insert one or more values into the tail (right) of the key list
RPUSH key value [value ...]
# Remove and return the head element of the key list
LPOP key     
# Remove and return the tail element of the key list
RPOP key 

# Return the elements in the specified range of the list key, with the range specified by the start and stop offsets, starting from 0
LRANGE key start stop

# Pop an element from the head of the key list, blocking for timeout seconds if none exist; if timeout=0, it blocks indefinitely
BLPOP key [key ...] timeout
# Pop an element from the tail of the key list, blocking for timeout seconds if none exist; if timeout=0, it blocks indefinitely
BRPOP key [key ...] timeout

Application Scenarios#

Message Queue#

A message queue must meet three requirements when storing and retrieving messages: message ordering, handling duplicate messages, and ensuring message reliability. Redis's List and Stream data types can meet these three requirements.

What are the drawbacks of using List as a message queue?

List does not support multiple consumers consuming the same message, because once a consumer pulls a message, that message is deleted from the List and cannot be consumed again by other consumers.

To allow a message to be consumed by multiple consumers, multiple consumers must be grouped into a consumer group, allowing multiple consumers to consume the same message, but the List type does not support the implementation of consumer groups.

Hash#

Hash is a collection of key-value pairs (key - value), where the value is in the form: value=[{field1, value1}, ... {fieldN, valueN}]. Hash is particularly suitable for storing objects.

Internal Implementation#

The underlying data structure of the Hash type is implemented by zip lists or hash tables:

  • If the number of elements in the hash type is less than 512 (default value, configurable by hash-max-ziplist-entries) and all values are less than 64 bytes (default value, configurable by hash-max-ziplist-value), Redis will use a zip list as the underlying data structure for the Hash type;
  • If the elements of the hash type do not meet the above conditions, Redis will use a hash table as the underlying data structure for the Hash type.

In Redis 7.0, the zip list data structure has been deprecated and replaced by the listpack data structure.

Common Commands#

# Store a key-value pair in a hash table
HSET key field value   
# Get the value corresponding to the field key in the hash table
HGET key field

# Store multiple key-value pairs in a hash table
HMSET key field value [field value...] 
# Batch get multiple field values from the hash table
HMGET key field [field ...]       
# Delete field values from the hash table
HDEL key field [field ...]    

# Return the number of fields in the hash table
HLEN key       
# Return all key-value pairs in the hash table
HGETALL key 

# Increment the value of field in the hash table by n
HINCRBY key field n                         

Application Scenarios#

Caching Objects#

The structure of Hash type (key, field, value) is similar to that of objects (object id, attribute, value), and can also be used to store objects.

# Store a key-value pair in hash table uid:1
> HMSET uid:1 name Tom age 15
2
# Store a key-value pair in hash table uid:2
> HMSET uid:2 name Jerry age 13
2
# Get all key-value pairs for user id 1 in the hash table
> HGETALL uid:1
1) "name"
2) "Tom"
3) "age"
4) "15"
Shopping Cart#

Using user id as the key, product id as the field, and product quantity as the value, it perfectly constitutes the three elements of a shopping cart, as shown below.
The commands involved are as follows:

  • Add product: HSET cart:{user id} {product id} 1
  • Add quantity: HINCRBY cart:{user id} {product id} 1
  • Total number of products: HLEN cart:{user id}
  • Delete product: HDEL cart:{user id} {product id}
  • Get all products in the shopping cart: HGETALL cart:{user id}

Initially, only the product ID is stored in Redis, and when displaying specific product information, it is still necessary to query the database once with the product id to obtain the complete product information.

Set#

Set type is an unordered and unique collection of key-value pairs, and its storage order does not follow the order of insertion.

A set can store up to 2^32-1 elements. The concept is similar to sets in mathematics, allowing for intersection, union, difference, etc. Therefore, the Set type supports not only CRUD operations within the set but also supports intersection, union, and difference operations among multiple sets.

Internal Implementation#

The underlying data structure of the Set type is implemented by hash tables or integer sets:

  • If all elements in the set are integers and the number of elements is less than 512 (default value, configurable by set-maxintset-entries), Redis will use an integer set as the underlying data structure for the Set type;
  • If the elements in the set do not meet the above conditions, Redis will use a hash table as the underlying data structure for the Set type.

Common Commands#

# Add elements to the set key, ignoring if the element already exists; if the key does not exist, a new set is created
SADD key member [member ...]
# Remove elements from the set key
SREM key member [member ...] 
# Get all elements in the set key
SMEMBERS key
# Get the number of elements in the set key
SCARD key

# Check if member exists in the set key
SISMEMBER key member

# Randomly select count elements from the set key, elements are not removed from the key
SRANDMEMBER key [count]
# Randomly select count elements from the set key, elements are removed from the key
SPOP key [count]

# Intersection operation
SINTER key [key ...]
# Store the intersection result in a new set destination
SINTERSTORE destination key [key ...]

# Union operation
SUNION key [key ...]
# Store the union result in a new set destination
SUNIONSTORE destination key [key ...]

# Difference operation
SDIFF key [key ...]
# Store the difference result in a new set destination
SDIFFSTORE destination key [key ...]

Application Scenarios#

The Set type is suitable for data deduplication and ensuring data uniqueness, and can also be used to count intersections, differences, and unions of multiple sets. However, there is a potential risk. The computation complexity of the difference, union, and intersection of sets is relatively high, and in the case of large data volumes, directly executing these calculations may lead to Redis instance blocking.

Likes#

The Set type can ensure that a user can only like once, here is an example where the key is the article id and the value is the user id.

# User uid:1 likes article article:1
> SADD article:1 uid:1
(integer) 1
# User uid:2 likes article article:1
> SADD article:1 uid:2
(integer) 1
# User uid:3 likes article article:1
> SADD article:1 uid:3
(integer) 1
User uid:1 cancels the like for article:1.

> SREM article:1 uid:1
(integer) 1
Get all users who liked article:1:

> SMEMBERS article:1
1) "uid:3"
2) "uid:2"
Get the number of users who liked article:1:

> SCARD article:1
(integer) 2
Check if user uid:1 liked article article:1:

> SISMEMBER article:1 uid:1
(integer) 0  # Returns 0 indicating no like, returns 1 indicating a like
Mutual Follows#

The Set type supports intersection operations, so it can be used to calculate mutual friends, public accounts, etc.

The key can be the user id, and the value can be the id of the public accounts followed.

# User uid:1 follows public accounts with ids 5, 6, 7, 8, 9
> SADD uid:1 5 6 7 8 9
(integer) 5
# User uid:2 follows public accounts with ids 7, 8, 9, 10, 11
> SADD uid:2 7 8 9 10 11
(integer) 5

Mutual public accounts followed by uid:1 and uid:2:

# Get mutual follows
> SINTER uid:1 uid:2
1) "7"
2) "8"
3) "9"

Recommend public accounts followed by uid:1 to uid:2:

> SDIFF uid:1 uid:2
1) "5"
2) "6"
Verify whether a certain public account is followed by both uid:1 or uid:2:

> SISMEMBER uid:1 5
(integer) 1 # Returns 0, indicating followed
> SISMEMBER uid:2 5
(integer) 0 # Returns 0, indicating not followed
Lottery Activities#

Store the usernames of winners in a certain activity. The Set type, due to its deduplication feature, can ensure that the same user does not win twice.

The key is the lottery activity name, and the value is the employee name. Put all employee names into the lottery box:

>SADD lucky Tom Jerry John Sean Marry Lindy Sary Mark
(integer) 5
If duplicate wins are allowed, the SRANDMEMBER command can be used.

# Draw 1 first prize:
> SRANDMEMBER lucky 1
1) "Tom"
# Draw 2 second prizes:
> SRANDMEMBER lucky 2
1) "Mark"
2) "Jerry"
# Draw 3 third prizes:
> SRANDMEMBER lucky 3
1) "Sary"
2) "Tom"
3) "Jerry"
If duplicate wins are not allowed, the SPOP command can be used.

# Draw 1 first prize
> SPOP lucky 1
1) "Sary"
# Draw 2 second prizes
> SPOP lucky 2
1) "Jerry"
2) "Mark"
# Draw 3 third prizes
> SPOP lucky 3
1) "John"
2) "Sean"
3) "Lindy"

Zset#

The Zset type (sorted set type) adds a sorting attribute score (value) compared to the Set type. For a sorted set ZSet, each stored element consists of two values: one is the element value of the sorted set, and the other is the sorting value.

Sorted sets retain the property that sets cannot have duplicate members (scores can be duplicated), but the difference is that the elements in sorted sets can be sorted.

Internal Implementation#

The underlying data structure of the Zset type is implemented by zip lists or skip lists:

  • If the number of elements in the sorted set is less than 128 and each element's value is less than 64 bytes, Redis will use a zip list as the underlying data structure for the Zset type;
  • If the number of elements in the sorted set does not meet the above conditions, Redis will use a skip list as the underlying data structure for the Zset type;

In Redis 7.0, the zip list data structure has been deprecated and replaced by the listpack data structure.

Common Commands#

Common operations for Zset:

# Add elements with scores to the sorted set key
ZADD key score member [[score member]...]   
# Remove elements from the sorted set key
ZREM key member [member...]                 
# Return the score of the element member in the sorted set key
ZSCORE key member
# Return the number of elements in the sorted set key
ZCARD key 

# Increment the score of the element member in the sorted set key by increment
ZINCRBY key increment member 

# Get elements from the sorted set key in ascending order from start index to stop index
ZRANGE key start stop [WITHSCORES]
# Get elements from the sorted set key in descending order from start index to stop index
ZREVRANGE key start stop [WITHSCORES]

# Return members in the specified score range of the sorted set, sorted from low to high.
ZRANGEBYSCORE key min max [WITHSCORES] [LIMIT offset count]

# Return members in the specified member range, sorted in lexicographical order, scores must be the same.
ZRANGEBYLEX key min max [LIMIT offset count]
# Return members in the specified member range, sorted in reverse lexicographical order, scores must be the same
ZREVRANGEBYLEX key max min [LIMIT offset count]
Zset operation (compared to Set type, ZSet type does not support difference operation):

# Union calculation (same elements' scores are summed), numberkeys is the total number of keys, WEIGHTS is the score multiplier for each key
ZUNIONSTORE destkey numberkeys key [key...] 
# Intersection calculation (same elements' scores are summed), numberkeys is the total number of keys, WEIGHTS is the score multiplier for each key
ZINTERSTORE destkey numberkeys key [key...]

Application Scenarios#

Leaderboard#
 article:1 received 200 likes
> ZADD user:xiaolin:ranking 200 article:1
(integer) 1
# article:2 received 40 likes
> ZADD user:xiaolin:ranking 40 article:2
(integer) 1
# article:3 received 100 likes
> ZADD user:xiaolin:ranking 100 article:3
(integer) 1
# article:4 received 50 likes
> ZADD user:xiaolin:ranking 50 article:4
(integer) 1
# article:5 received 150 likes
> ZADD user:xiaolin:ranking 150 article:5
(integer) 1
To add a like to article article:4, you can use the ZINCRBY command (to increment the score of the element member in the sorted set key):

> ZINCRBY user:xiaolin:ranking 1 article:4
"51"
To check the number of likes for a specific article, you can use the ZSCORE command (returns the score of the element in the sorted set key):

> ZSCORE user:xiaolin:ranking article:4
"50"
To get the top 3 articles with the most likes from Xiaolin, you can use the ZREVRANGE command (get elements from the sorted set key in descending order from start index to stop index):

# WITHSCORES indicates that the score should also be displayed
> ZREVRANGE user:xiaolin:ranking 0 2 WITHSCORES
1) "article:1"
2) "200"
3) "article:5"
4) "150"
5) "article:3"
6) "100"
To get articles with likes between 100 and 200, you can use the ZRANGEBYSCORE command (returns members in the specified score range of the sorted set, sorted from low to high):

> ZRANGEBYSCORE user:xiaolin:ranking 100 200 WITHSCORES
1) "article:3"
2) "100"
3) "article:5"
4) "150"
5) "article:1"
6) "200"

BitMap#

Bitmap, or bit map, is a continuous array of binary numbers (0 and 1) that can locate elements through an offset. BitMap sets 0|1 using the smallest unit bit to represent the value or state of an element, with a time complexity of O(1).

Since a bit is the smallest unit in computing, using it for storage will save a lot of space, making it particularly suitable for scenarios with large data volumes and binary statistics.

Internal Implementation#

Bitmap itself is implemented as a data type for counting binary states using the String type as the underlying data structure.

String type is saved as a binary byte array, so Redis utilizes each bit of the byte array to represent the binary state of an element; you can think of Bitmap as a bit array.

Basic Bitmap Operations:#

# Set value, where value can only be 0 or 1
SETBIT key offset value

# Get value
GETBIT key offset

# Get the count of values equal to 1 in the specified range
# start and end are in bytes
BITCOUNT key start end
Bitmap operation:

# Operations between BitMaps
# operations are bitwise operators, enumerated values
  AND bitwise AND &
  OR bitwise OR |
  XOR bitwise XOR ^
  NOT bitwise NOT ~
# result is the calculated result, stored in this key
# key1 … keyn are the keys participating in the operation, can be multiple, separated by spaces; not operation can only have one key
# When BITOP processes strings of different lengths, the shorter string's missing parts will be treated as 0. The return value is the length of the string saved to destkey (in bytes), equal to the length of the longest input key.
BITOP [operations] [result] [key1] [keyn…]

# Return the position of the first occurrence of the specified value (0/1) in the specified key
BITPOS [key] [value]

Application Scenarios#

Sign-in Statistics#
Assuming we want to count the sign-in status of user ID 100 for June 2022, we can perform the following steps.

First, execute the command below to record that the user has signed in on June 3rd.

SETBIT uid:sign:100:202206 2 1
Next, check whether the user signed in on June 3rd.

GETBIT uid:sign:100:202206 2 
Third, count the number of sign-ins for the user in June.

BITCOUNT uid:sign:100:202206
This way, we know the user's sign-in status for June.

We can execute this command to get the first sign-in date for userID = 100 in June 2022:

BITPOS uid:sign:100:202206 1
Count of Users with Consecutive Sign-ins#

Assuming we want to count the number of users who have signed in for 3 consecutive days, we can perform an AND operation on three bitmaps and save the result to destmap, then execute BITCOUNT on destmap, as follows:

# AND operation
BITOP AND destmap bitmap:01 bitmap:02 bitmap:03
# Count the number of bits = 1
BITCOUNT destmap

Even if one billion data points are generated in a day, the Bitmap occupies little memory, approximately 12 MB (10^8/8/1024/1024). The memory overhead for a 7-day Bitmap is about 84 MB. It is also best to set an expiration time for the Bitmap to allow Redis to delete expired sign-in data and save memory.

HyperLogLog#

HyperLogLog provides approximate deduplication counting. In Redis, each HyperLogLog key only requires 12 KB of memory to count the cardinality of nearly 2^64 different elements, making HyperLogLog very space-efficient compared to Set and Hash types, which consume more memory as the number of elements increases.

Internal Implementation#

Common Commands#

# Add specified elements to HyperLogLog
PFADD key element [element ...]

# Return the estimated cardinality of the given HyperLogLog.
PFCOUNT key [key ...]

# Merge multiple HyperLogLogs into one HyperLogLog
PFMERGE destkey sourcekey [sourcekey ...]

Application Scenarios#

Counting UVs in the Millions#
When counting UVs, you can use the PFADD command (used to add new elements to HyperLogLog) to add each user visiting the page to HyperLogLog.

PFADD page1:uv user1 user2 user3 user4 user5
Next, you can directly obtain the UV value of page1 using the PFCOUNT command, which returns the statistical result of HyperLogLog.

PFCOUNT page1:uv

GEO#

Primarily used to store geographic location information and perform operations on the stored information. In daily life, we increasingly rely on searching for "nearby restaurants" and calling a ride through taxi apps, all of which depend on location-based services (LBS). The data accessed by LBS applications is a set of latitude and longitude information associated with people or objects, and it must be able to query adjacent latitude and longitude ranges, making GEO very suitable for LBS service scenarios.

Internal Implementation#

GEO does not design a new underlying data structure but directly uses the Sorted Set collection type.

The GEO type uses the GeoHash encoding method to convert latitude and longitude into the weight score of elements in the Sorted Set. The two key mechanisms involved are "dividing the two-dimensional map into intervals" and "encoding the intervals." When a set of latitude and longitude falls within a certain interval, the encoding value of the interval is used to represent it, and this encoding value is treated as the weight score of the Sorted Set element.

This allows us to store latitude and longitude in a Sorted Set and utilize the "ordered range lookup by weight" feature provided by the Sorted Set to meet the frequent "search nearby" needs in LBS services.

Common Commands#

# Store specified geographic locations, adding one or more longitude, latitude, and location name (member) to the specified key.
GEOADD key longitude latitude member [longitude latitude member ...]

# Return the positions (longitude and latitude) of all specified names (members) from the given key, returning nil if not found.
GEOPOS key member [member ...]

# Return the distance between two given locations.
GEODIST key member1 member2 [m|km|ft|mi]

# Get the geographic location set within a specified range based on user-provided latitude and longitude coordinates.
GEORADIUS key longitude latitude radius m|km|ft|mi [WITHCOORD] [WITHDIST] [WITHHASH] [COUNT count] [ASC|DESC] [STORE key] [STOREDIST key]

Application Scenarios#

Execute the following command to store the current latitude and longitude of the vehicle with ID number 33 in the GEO collection:
> GEOADD cars:locations 116.034579 39.030452 33

When a user wants to find nearby ride-hailing cars, the LBS application can use the GEORADIUS command.
For example, when the LBS application executes the following command, Redis will look for vehicle information within 5 kilometers centered on the user's latitude and longitude (116.054579, 39.030452) and return it to the LBS application.

> GEORADIUS cars:locations 116.054579 39.030452 5 km ASC COUNT 10

Stream#

Redis Stream is a new data type added in Redis version 5.0, specifically designed for message queues. It perfectly implements message queues, supporting message persistence, automatic generation of globally unique IDs, ack message confirmation mode, consumer group mode, etc., making message queues more stable and reliable.

Common Commands#

Stream message queue operation commands:

XADD: Insert messages, ensuring order, and can automatically generate a globally unique ID;
XLEN: Query message length;
XREAD: Used to read messages, can read data by ID;
XDEL: Delete messages by message ID;
DEL: Delete the entire Stream;
XRANGE: Read range messages
XREADGROUP: Read messages in consumer group form;
XPENDING and XACK:
XPENDING command can be used to query all messages "read but not confirmed" by each consumer in the consumer group;
XACK command is used to confirm that message processing is complete.

Application Scenarios#

The producer inserts a message using the XADD command:

# * indicates that Redis will automatically generate a globally unique ID for the inserted data
# Insert a message into the message queue named mymq, with the key being name and the value being xiaolin
> XADD mymq * name xiaolin
"1654254953808-0"

When the consumer reads messages from the message queue using the XREAD command, it can specify a message ID and start reading from the next message after this message ID (note that it starts reading from the next message after the input message ID, not querying the input ID message).

# Start reading all subsequent messages from the message with ID 1654254953807-0 (in this example, there is a total of 1 message).
> XREAD STREAMS mymq 1654254953807-0
1) 1) "mymq"
   2) 1) 1) "1654254953808-0"
         2) 1) "name"
            2) "xiaolin"

If you want to implement blocking reads (blocking when there is no data), you can call XREAD with the BLOCK configuration option to achieve a blocking read operation similar to BRPOP.

For example, the following command sets the BLOCK 10000 configuration option, where 10000 is in milliseconds, indicating that XREAD will block for 10000 milliseconds (i.e., 10 seconds) when reading the latest messages if no messages arrive, and then return.

# The "$" symbol at the end of the command indicates reading the latest message
> XREAD BLOCK 10000 STREAMS mymq $
(nil)
(10.00s)

Streams can create consumer groups using XGROUP. After creating a consumer group, Streams can use the XREADGROUP command to allow consumers within the group to read messages.

Create two consumer groups, both consuming from the mymq message queue, both specified to start reading from the first message:

# Create a consumer group named group1, 0-0 indicates starting from the first message.
> XGROUP CREATE mymq group1 0-0
OK
# Create a consumer group named group2, 0-0 indicates starting from the first message.
> XGROUP CREATE mymq group2 0-0
OK

Consumers in the group1 consumer group can read all messages from the mymq message queue as follows:

# The last parameter ">" indicates starting to read from the first unconsumed message.
> XREADGROUP GROUP group1 consumer1 STREAMS mymq >
1) 1) "mymq"
   2) 1) 1) "1654254953808-0"
         2) 1) "name"
            2) "xiaolin"

Once a message in the message queue is read by one consumer in the consumer group, it cannot be read by other consumers in that group; that is, consumers in the same consumer group cannot consume the same message. However, consumers from different consumer groups can consume the same message (provided that when creating the message group, different consumer groups specified the same starting position to read messages).

The purpose of using consumer groups is to allow multiple consumers within the group to share the load of reading messages, so we usually let each consumer read part of the messages, thus achieving a balanced distribution of message reading load among multiple consumers.

How to ensure that consumers can still read unprocessed messages after a failure or crash when implementing a message queue based on Stream?

Streams will automatically use an internal queue (also known as the PENDING List) to retain messages read by each consumer in the consumer group until the consumer uses the XACK command to notify Streams that "the message has been processed."

That concludes the implementation of the message queue based on Stream. To summarize:

  • Message ordering: XADD/XREAD
  • Blocking read: XREAD block
  • Duplicate message processing: Stream automatically generates a globally unique ID when using the XADD command;
  • Message reliability: Internally uses the PENDING List to automatically save messages, uses the XPENDING command to view messages that have been read by the consumer group but not confirmed, and the consumer uses the XACK command to confirm messages;
  • Supports consumer group form for consuming data.

What are the differences between Redis's Stream message queue and professional message queues?

A professional message queue must achieve two main objectives:

  • Messages must not be lost.
  • Messages can accumulate.
  1. Can Redis Stream messages be lost?

A message queue is essentially divided into three parts: producer, queue middleware, and consumer. Can Redis Stream message queue ensure that data is not lost in all three stages?

  • Will the Redis producer lose messages? The producer will not lose messages, depending on how well it handles exceptions. As long as it can receive the ack confirmation response from the MQ middleware during the process of producing and submitting the message, it indicates successful sending. Therefore, as long as the return value and exceptions are handled properly, this stage will not lose messages.
  • Will the Redis consumer lose messages? No, because Stream (MQ middleware) will automatically use the internal queue (also known as the PENDING List) to retain messages read by each consumer in the consumer group but not confirmed. After the consumer restarts, it can use the XPENDING command to check the messages that have been read but not confirmed. Once the consumer completes the business logic, it can send the consumption confirmation XACK command, which also ensures that messages are not lost.
  • Will the Redis message middleware lose messages? Yes, Redis can lose data in the following two scenarios:
    • AOF persistence is configured to write to disk every second, but this writing process is asynchronous, so there is a possibility of data loss when Redis crashes.
    • Master-slave replication is also asynchronous, and there is a possibility of data loss during master-slave switching.
  1. Can Redis Stream messages accumulate?

Since Redis stores all data in memory, this means that once message accumulation occurs, it will lead to continuous growth of Redis's memory. If it exceeds the machine's memory limit, it will face the risk of being OOM.
Therefore, Redis Stream provides the ability to specify a maximum length for the queue to avoid this situation.
When a maximum length is specified for the queue, if the queue length exceeds the limit, old messages will be deleted, and only a fixed length of new messages will be retained. In this regard, Stream may still lose messages during message accumulation if a maximum length is specified.
However, professional message queues like Kafka and RabbitMQ store their data on disk, so when messages accumulate, they merely occupy more disk space.

Therefore, using Redis as a queue comes with two potential issues:

  • Redis itself may lose data;
  • Facing message pressure, memory resources may become tight;

Thus, whether Redis can be used as a message queue depends on your business scenario:

  • If your business scenario is simple enough, is not sensitive to data loss, and has a low probability of message accumulation, using Redis as a queue is completely feasible.
  • If your business has a massive number of messages, a high probability of message accumulation, and cannot tolerate data loss, it is better to use professional message queue middleware.

Redis Data Structures#

Key-Value Pairs#

In Redis, the key in the key-value pair is a string object, while the value can be a string object or a collection data type object, such as List, Hash, Set, and Zset objects.

How are these key-value pairs stored in Redis?

Redis uses a "hash table" to store all key-value pairs, and the biggest advantage of a hash table is that it allows us to quickly find key-value pairs in O(1) time complexity. A hash table is essentially an array, and the elements in the array are called hash buckets.

How does Redis store key-value data in hash buckets?

Hash buckets store pointers (dictEntry*) pointing to key-value data, allowing access to key-value data through pointers. Since the value of key-value pairs can store string objects and collection data type objects, the data structure of key-value pairs does not directly store the value itself but instead stores pointers void * key and void * value, pointing to the actual key object and value object, respectively. This way, even if the value is a collection data type, it can still be accessed through the void * value pointer.

SDS#

Each member variable in the data structure is introduced as follows:

  • len records the length of the string. This way, when obtaining the string length, it only needs to return the value of this member variable, with a time complexity of O(1).
  • alloc is the length of space allocated for the character array. This way, when modifying the string, it can calculate the remaining space size through alloc - len to determine whether the space meets the modification requirements. If it does not meet, the space of SDS will automatically expand to the size required for the modification before executing the actual modification operation. Therefore, using SDS does not require manual modification of SDS's space size and will not encounter the previously mentioned buffer overflow issues.
  • flags indicate different types of SDS. A total of five types are designed: sdshdr5, sdshdr8, sdshdr16, sdshdr32, and sdshdr64, which will be explained later.
  • buf[] is the character array used to store actual data. It can store not only strings but also binary data.

Compared to C's native strings:

  • SDS can store not only text data but also binary data. This is because SDS uses the value of the len attribute instead of a null character to determine if the string has ended, and all APIs of SDS will handle the data stored in the buf[] array in a binary manner. Therefore, SDS can store not only text data but also binary data such as images, audio, video, and compressed files.
  • The time complexity for getting the length of a string in SDS is O(1). C language strings do not record their own length, so getting the length has a complexity of O(n); whereas in the SDS structure, the length is recorded using the len attribute, so the complexity is O(1).
  • Redis's SDS API is safe, and concatenating strings will not cause buffer overflow. This is because SDS checks whether the SDS space meets the requirements before concatenating strings, and if the space is insufficient, it will automatically expand, thus avoiding buffer overflow issues.
  • Saves memory space. The SDS structure has a flags member variable that indicates the type of SDS.
    Redis has designed five types: sdshdr5, sdshdr8, sdshdr16, sdshdr32, and sdshdr64. The main difference among these five types lies in the data types of the len and alloc member variables in their data structures.
    The sdshdr16 type has both len and alloc data types as uint16_t, indicating that the length of the character array and the allocated space size cannot exceed 2 to the power of 16.
    The reason for designing different types of structures for SDS is to flexibly store strings of different sizes, thereby effectively saving memory space. For example, when storing small strings, the structure header occupies less space.
    In addition to designing different types of structures, Redis also uses specialized compilation optimizations to save memory space, specifically in the struct declaration with __attribute__ ((packed)), which tells the compiler to cancel the optimization alignment of the structure during the compilation process and align according to the actual occupied byte size.

Linked List#

The list structure provides the head pointer of the linked list, the tail node, the number of nodes len, and customizable dup, free, and match functions.
Since the memory of linked lists is not contiguous, it means that CPU cache cannot be effectively utilized. Each linked list node requires allocation of a linked list node structure header, leading to higher memory overhead. Therefore, in Redis 3.0, the List object will use "zip lists" as the underlying data structure implementation when the data volume is relatively small, which has the advantage of saving memory space and being a compact memory data structure.
In Redis 5.0, a new data structure called listpack was designed, inheriting the compact memory layout of zip lists. Ultimately, in the latest Redis version, the zip list, which was one of the underlying data structure implementations of Hash objects and Zset objects, was replaced by listpack.

Zip List#

The zip list was developed by Redis to save memory; it is a sequential data structure composed of contiguous memory blocks, somewhat similar to an array.
However, zip lists also have their drawbacks:

  • They cannot store too many elements; otherwise, query efficiency will decrease.
  • When adding or modifying an element, the memory space occupied by the zip list needs to be reallocated, which may even trigger a chain update issue.

Therefore, Redis objects (List objects, Hash objects, Zset objects) will only use zip lists as the underlying data structure when the number of elements is relatively small or the element values are not large.

Structural Design#

The zip list has three fields in the header:

  • zlbytes records the total number of bytes occupied by the entire zip list in memory;
  • zltail records the offset of the "tail" node from the starting address, indicating how many bytes away the tail is;
  • zllen records the number of nodes contained in the zip list;
  • zlend marks the endpoint of the zip list, with a fixed value of 0xFF (decimal 255).

In the zip list, if we want to locate the first and last elements, we can directly locate them through the length of the three fields in the header (zllen), with a complexity of O(1). However, when searching for other elements, it is not as efficient and can only be searched one by one, resulting in a complexity of O(N). Therefore, zip lists are not suitable for storing too many elements.

Additionally, the structure of a zip list node (entry) is as follows:

  • prevlen records the length of the "previous node," allowing for backward traversal;
  • encoding records the "type and length" of the actual data in the current node, with two main types: string and integer.
  • data records the actual data of the current node, with the type and length determined by encoding.

This design philosophy of allocating different space sizes based on data size and type is what Redis adopts to save memory.

Specifically, prevlen and encoding are allocated different space sizes based on the size and type of data.

Each node in the zip list records the length of the previous node in the prevlen attribute, and the space size of the prevlen attribute depends on the length value of the previous node. For example:

  • If the length of the previous node is less than 254 bytes, the prevlen attribute requires 1 byte of space to store this length value.
  • If the length of the previous node is greater than or equal to 254 bytes, the prevlen attribute requires 5 bytes of space to store this length value.

The space size of the encoding attribute depends on whether the data is a string or an integer, as well as the length of the string, as shown in the figure below (the content in the figure represents the actual data, i.e., the data field of this article):

  • If the data of the current node is an integer, the encoding will use 1 byte of space for encoding, meaning the length of encoding is 1 byte. By confirming the integer type through encoding, the actual size of the integer data can be determined. For example, if encoding confirms that the data is an int16 integer, then the length of data will be the size of int16.
  • If the data of the current node is a string, the encoding will use 1 byte/2 bytes/5 bytes of space for encoding based on the length of the string. The first two bits of the encoding indicate the data type, while the remaining bits indicate the actual length of the string data, i.e., the length of data.

Chain Update#

When adding or modifying an element in a zip list, if the space is insufficient, the memory space occupied by the zip list needs to be reallocated. When a newly inserted element is relatively large, it may cause the lengths of subsequent elements' prevlen attributes to change, leading to a "chain update" issue, causing every element's space to be reallocated and degrading the performance of accessing the zip list.
Therefore, zip lists will only be used in scenarios where the number of nodes is small enough; even if a chain update occurs, it is acceptable.

Hash Table#

A hash table is an array (dictEntry table), where each element in the array is a pointer to a "hash table node (dictEntry)."
The dictEntry structure not only contains pointers to the key and value but also contains a pointer to the next hash table node, which can link multiple key-value pairs with the same hash value together, thus solving the problem of hash collisions. This is known as chained hashing.

Hash Collision#

Chained Hashing#

The implementation method is that each hash table node has a next pointer to point to the next hash table node. Therefore, multiple hash table nodes can be linked together using this next pointer, forming a singly linked list for multiple nodes allocated to the same hash bucket, thus solving the hash collision problem.

However, the limitation of chained hashing is evident; as the length of the linked list increases, the time taken to query data at that position will also increase, as the time complexity for querying a linked list is O(n).

To solve this problem, rehashing is required, which means expanding the size of the hash table.

Rehash#

When using hash tables, Redis defines a dict structure that contains two hash tables (ht[2]). The reason for defining two hash tables is that rehashing requires both hash tables.
During normal service request phases, all inserted data will be written to "hash table 1," while "hash table 2" has not yet been allocated space.

As the data gradually increases, triggering a rehash operation, this process is divided into three steps:

  • Allocate space for "hash table 2," which is generally twice the size of "hash table 1";
  • Migrate the data from "hash table 1" to "hash table 2";
  • After migration is complete, the space of "hash table 1" will be released, and "hash table 2" will be set as "hash table 1," then a new empty hash table will be created for "hash table 2" in preparation for the next rehash.
    image
    The second step has a problem; if "hash table 1" has a very large amount of data, migrating to "hash table 2" will involve a large amount of data copying, which may block Redis and prevent it from serving other requests.

Trigger Conditions:
image
The conditions for triggering a rehash operation mainly include two:

  • When the load factor is greater than or equal to 1, and Redis is not executing bgsave or bgrewriteaof commands, meaning it is not performing RDB snapshots or AOF rewrites, a rehash operation will occur.
  • When the load factor is greater than or equal to 5, indicating that hash collisions are very severe, a rehash operation will be forced regardless of whether RDB snapshots or AOF rewrites are being executed.
Incremental Rehash#

The migration of data is no longer done in one go but is divided into multiple migrations.
The steps for incremental rehash are as follows:

  • Allocate space for "hash table 2";
  • During the rehash period, each time a hash table element is added, deleted, queried, or updated, Redis will not only perform the corresponding operation but also sequentially migrate all key-value pairs from "hash table 1" at the indexed position to "hash table 2";
  • As the number of client requests for hash table operations increases, at some point in time, all key-value pairs from "hash table 1" will be migrated to "hash table 2," thus completing the rehash operation.

Integer Set#

The integer set is one of the underlying implementations of the Set object. When a Set object contains only integer value elements and the number of elements is not large, it will use the integer set data structure as the underlying implementation.

The integer set is essentially a block of contiguous memory space, and its structure is defined as follows:

typedef struct intset {
    // Encoding method
    uint32_t encoding;
    // Number of elements in the set
    uint32_t length;
    // Array to store elements
    int8_t contents[];
} intset;

As seen, the container for storing elements is an array called contents. Although contents is declared as an array of type int8_t, it does not actually store any int8_t type elements; the actual type of the contents array depends on the value of the encoding attribute in the intset structure.

Upgrade Operation of Integer Set#

The integer set has an upgrade rule, which means that when a new element is added to the integer set, if the type of the new element (int32_t) is longer than that of all existing elements in the integer set (int16_t), the integer set needs to be upgraded. This means expanding the size of the contents array according to the type of the new element (int32_t) before adding the new element to the integer set, while maintaining the order of the integer set.

The process of upgrading the integer set does not involve reallocating a new type of array but expands the space on the original array and divides each element according to the size of the type. If the encoding attribute value is INTSET_ENC_INT16, then the interval between each element is 16 bits.

This saves resources and does not support downgrade operations.

Skip List#

Redis uses skip lists as the underlying implementation for Zset objects, and the advantage of skip lists is that they support average O(logN) complexity for node lookups.

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

During data insertion or update operations for Zset objects, the corresponding data will be inserted or updated in both the skip list and the hash table, ensuring that the information recorded in both the skip list and the hash table is consistent.
The Zset object supports range queries (such as ZRANGEBYSCORE operation) because its data structure design uses a skip list, and it can obtain element weights in constant complexity (such as ZSCORE operation) because it also uses a hash table for indexing.
The hash table in the struct zset is only used to obtain element weights in constant complexity; most operations are implemented by the skip list.

Skip List Structure Design#

A skip list is an improved version of a linked list, implementing a "multi-level" ordered linked list, which allows for faster data positioning.

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;
typedef struct zskiplistNode {
    // Element value of the Zset object
    sds ele;
    // Element weight value
    double score;
    // Backward pointer
    struct zskiplistNode *backward;
  
    // Level array of the node, storing forward pointers and spans for each level
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

The Zset object must simultaneously store "elements" and "element weights," corresponding to the variables sds ele and double score in the skip list node structure. Each skip list node has a backward pointer (struct zskiplistNode *backward) pointing to the previous node, facilitating access to nodes starting from the tail node of the skip list, making reverse lookups convenient.

A skip list is a linked list with a hierarchical relationship, and each level can contain multiple nodes, with each node connected by pointers. This feature is achieved through the level array of the zskiplistNode structure.

Each element in the level array represents a level of the skip list, represented by the zskiplistLevel structure. For example, level[0] represents the first level, and level[1] represents the second level. The zskiplistLevel structure defines "pointers to the next skip list node" and "spans," where spans are used to record the distance between two nodes, which is actually used to calculate the ranking of this node in the skip list.
image

Skip List Node Query Process#

The process of looking up a node in a skip list starts from the highest level of the header node, traversing each level in order. When traversing a node in a certain level of the skip list, two conditions are used to judge based on the SDS type data and the weight of the element:

  • If the weight of the current node is "less than" the weight to be searched, the skip list will access the next node on that level.
  • If the weight of the current node is "equal to" the weight to be searched, and the SDS type data of the current node is "less than" the data to be searched, the skip list will access the next node on that level.
  • If neither of the above two conditions is met, or if the next node is null, the skip list will use the next level pointer in the level array of the currently traversed node to continue searching along the next level, effectively jumping to the next level to continue searching.

Setting the Number of Levels for Skip List Nodes#

The ratio of the number of nodes in adjacent levels of a skip list affects the query performance of the skip list.
The ideal ratio of the number of nodes in adjacent levels of a skip list is 2:1, which can reduce the lookup complexity to O(logN).

Redis adopts a clever method: when creating nodes, the skip list randomly generates the number of levels for each node, without strictly maintaining the ratio of the number of nodes in adjacent levels to be 2:1.

Specifically, when creating a node, a random number in the range of [0-1] is generated. If this random number is less than 0.25 (equivalent to a probability of 25%), the number of levels will increase by 1, and the next random number will be generated until the result is greater than 0.25, at which point the number of levels for that node is determined.

This approach means that the probability of increasing the number of levels does not exceed 25%, and the higher the level, the lower the probability, with a maximum limit of 64 levels.

Why Use Skip Lists Instead of Balanced Trees?#

  • In terms of memory usage, skip lists are more flexible than balanced trees. Each node in a balanced tree contains 2 pointers (pointing to the left and right subtrees), while each node in a skip list contains an average of 1/(1-p) pointers, depending on the size of parameter p. If, as in Redis's implementation, p=1/4, then each node contains an average of 1.33 pointers, which is advantageous compared to balanced trees.
  • When performing range searches, operations on skip lists are simpler than on balanced trees. In a balanced tree, once the specified range's minimum value is found, it still requires in-order traversal to find other nodes not exceeding the maximum value. Without certain modifications to the balanced tree, this in-order traversal is not easy to implement. In contrast, performing range searches on a skip list is straightforward; once the minimum value is found, it can simply traverse the first-level linked list for several steps to achieve the goal.
  • From the perspective of algorithm implementation difficulty, skip lists are much simpler than balanced trees. The insertion and deletion operations in balanced trees may trigger adjustments in subtrees, making the logic complex, while in skip lists, insertion and deletion only require modifying the pointers of adjacent nodes, making the operations simple and fast.

Quicklist#

In Redis 3.2, the underlying implementation of List objects was changed to the quicklist data structure. Essentially, quicklist is a combination of "doubly linked list + zip list," as a quicklist is a linked list where each element is a zip list.

Quicklist addresses the issues of zip lists by controlling the size or number of elements in the zip list within each linked list node, thereby avoiding the problem of chain updates. Since the fewer or smaller the elements in the zip list, the smaller the impact of chain updates, it provides better access performance.

Structural Design#

typedef struct quicklist {
    // Head of the quicklist
    quicklistNode *head;      // Head of the quicklist
    // Tail of the quicklist
    quicklistNode *tail; 
    // Total number of elements in all zip lists
    unsigned long count;
    // Number of quicklistNodes
    unsigned long len;       
    ...
} quicklist;

typedef struct quicklistNode {
    // Previous quicklistNode
    struct quicklistNode *prev;     // Previous quicklistNode
    // Next quicklistNode
    struct quicklistNode *next;     // Next quicklistNode
    // Pointer to the zip list of the quicklistNode
    unsigned char *zl;              
    // Size of the zip list in bytes
    unsigned int sz;                
    // Number of elements in the zip list
    unsigned int count : 16;        // Number of elements in the ziplist 
    ....
} quicklistNode;

The elements of the linked list nodes no longer simply store element values but instead store a zip list, so the quicklistNode structure contains a pointer to the zip list *zl.
image
When adding an element to the quicklist, it does not create a new linked list node like a normal linked list. Instead, it checks whether the zip list at the insertion position can accommodate the element. If it can, it directly saves it to the zip list in the quicklistNode structure; if it cannot, it creates a new quicklistNode structure.

Quicklist controls the size or number of elements in the zip list within the quicklistNode structure to avoid potential risks of chain updates, but this does not completely solve the issue of chain updates.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.