Buffering SQL Writes with Redis
It's no secret that one of Sentry's core technologies is SQL, specifically
PostgreSQL. We're huge advocates of simplicity, and Postgres is one of those tools that's not only quick to get started
with, but can also grow with you. While at our scale very few things are simple, we've
still managed to keep complexity to a minimum.
Postgres at Sentry
To give a bit of background, it's important to understand how we use Postgres
today. We operate two clusters (unique databases), one which stores event
metadata, and the other which stores the remainder of the Sentry platform data.
These clusters operate with cold-standby replicas, which are only used in
disaster or maintenance scenarios.
When data comes into Sentry, it primarily falls into three categories:
the event blob, which is immutable and written into a Riak cluster
key/value pairs representing tags (such as the device name, or operating
system)various attributes like "last time seen" on an aggregate
Most data is stored in SQL, with the exception being the large event blobs. We rely on
Riak for it's operational scalability, but utilize it in only the most primitive way
-- as a key value store. Inside of SQL we maintain references to the Riak keys, keeping
it as our source of truth.
Mutation
We commonly see a pattern of heavily-duplicated, high-frequency events due to the nature
of errors. Internally we take an error and de-duplicate it into an aggregate. This gives
us two scenarios:
A lot of the same errors bubble up to the same aggregate issue
Many unique errors are creating new aggregates (or updating existing ones).
For the data within SQL, there are two types of attributes which see heavy mutation:
Cardinality counters -- frequency of events or the number of values seen for a tag.
"Latest Value" attributes -- such as the last time an event was seen in an aggregate.
Counter Deltas
In the first situation we want to increment a basic counter. This would be
equivalent to the following SQL statement:
UPDATE table SET counter = counter + 1 WHERE key = %s;
Denormalized Attributes
Our second case is a denormalization of the data that is stored with
the event. In the aggregate we store a few attributes of the latest event,
including the timestamp and title.
In Sentry we're content with a last-write-wins scenario, so we simply overwrite
the existing value with the most recently observed values:
UPDATE table SET last_seen = %s WHERE key = %s;
Locking
While the way Postgres does locking is too complex for this post, it's an important factor
in some of the decisions we've taken. Whenever an entity needs written to
Postgres it will take out a shared lock for that row. In Sentry this can be a serious issue as
many times an error will aggregate up into the same entity, which then triggers many updates
to the same row. When these updates are all trying to hit the same entity (row)
throughput grinds to a halt while you wait to acquire the write lock.
There are different approaches to improving write performance in this scenario. For
instance, when dealing with counter data, the data could be split across multiple
rows. That is, instead of having a single (key
, counter
) row, we could have
(key
, counter
, partition
) and split up the writes:
UPDATE table SET counter = counter + 1 WHERE key = %s AND partition = ABS(RANDOM() * 100);
This means we could create multiple rows per entity -- in the example above we have
100 partitions. When it needs incremented we pick a random row to change, which means
the locking can be spread out to more unique rows in the database. This would allow us
a much higher write throughput, but it we would need to aggregate the counter at some point
later. While this approach might ease the counter scenario, it won't resolve the other
types of attributes we need to store.
Our core issue here is the choice to use Postgres for this type of data. Postgres must
provide strong consistency and therefore spends considerable time on locks. We don't
need that level of consistency, and we absolutely don't want the cost of those locks.
By choosing to sacrifice consistency we are be able to greatly increase our throughput. We
do this by buffering writes.
Buffering
To solve our locking issue we take a buffered writes approach. This means that
we aggregate writes over a period of time, and flush them after an interval. Our
constraints primarily revolve around one question: how much data are we willing
to lose? That question controls the flush interval of the buffers. In our case
it's 10 seconds.
In Sentry this means that while we write the event data immediately when the counter
processed -- to Riak, as it's unique and immutable -- we don't apply counters,
search indexes, or many other things for up to 10 seconds. This creates two
scenarios that we need to be aware of:
The UI will not atomically update -- some items will be more up to date than
others.There is a possibility for data loss if we lose lose a buffer partition due to
sustained network failure or a number of other less common scenarios.
Utilizing Redis
We rely on Redis a lot at Sentry, including systems ranging from simple data caches all the
way to persistent storage for time-series data. Our solution to buffering is no different.
The schema of buffers is fairly simple. We have a set of attributes per entity, with
their current value (or delta), and then a set listing those entity's keys. In
Redis we store these in two structures:
A hash per entity
A set of the hashes which need flushed
Due to the simplicity of the schema, it means operations are also easier to reason
about. Whenever a pending write comes in, we step through the following:
Write the changes to the entity's hash key.
With counters we use
HINCRYBY
.With another value -- such as "last seen timestamp" -- we simply
HSET
the value.
ZADD
the key to a 'pending' set using the current timestamp.
Now every tick -- in Sentry's case this is 10 seconds -- we're going to flush the
pending writes. This happens via something acting like a cron:
Get all keys using
ZRANGE
.Fire off a job into our queue with each pending hash key.
ZREM
the given keys.
When a worker receives a job, it does a bit of work and then applies the write:
In a pipeline:
ZREM
the key from 'pending' -- if multiple jobs were queued before this was
executed this ensures the excess can be noop'd.HGETALL
values from the entity's hash key.REM
the entity's hash key.
Convert the pending values to a SQL update, just as we would have without buffering.
Counters do a delta update --
SET counter = counter + %d
.All other values set the new value --
SET value = %s
.
A few notes about the process:
We use a sorted set for the case where we would want to only pop off a set
amount (e.g. we want to process the 100 oldest).The system scales linearly with the addition of Redis nodes by putting a
'pending' key on each node.
With this model we mostly guarantee that only a single row in SQL is being
updated at once, which alleviates most locking contention that we'd see.
Improvements
As with every system, things change, or better ideas emerge. That holds true
with our buffers as well. There's a few improvements and problems that we've
become aware of, and have yet to tackle:
LIFO vs FIFO
The current implementation is a LIFO structure. This works OK for as long as
we're able to keep up with the pending queue. If we weren't however it'd mean that
the most frequent events would have the highest priority. That may seem like a good
idea, but often you'd want to prioritize things which are happening less.
We could resolve that by using the NX
parameter of ZADD
. This would ensure
that the timestamp doesn't get updated if it already exists in the set. Unfortunately
this requires a newer version of Redis, and being that Sentry is shipped to a variety
of environments is something that we don't feel is worth the headache.
Stampede
In the event that the pending writes jobs don't process in a timely manner the
queue can easily get backlogged. This would result in large amounts of tasks
being created. These tasks are generally NOOP when they do get executed, but
they're still far from free.
One way to solve this would be to move to a pull model rather than push. We could
have a workers that are responsible for popping off pending updates continuously
which means that we'd never be executing duplicate work.
This adds complexity in that we'd need to add a new set of background services and
scale those out in addition to our existing queue workers. We've avoided this so
far for the same reasons we don't use newer Redis features.
Loss of Writes
It's possible for a worker to clear a hash and fail midway through the process
so that the data does not get written. For example, if a process were to be killed
by the OOM killer it would have already removed the data from Redis, but not yet
committed changes to Postgres.
This could be improved by adding a secondary "in progress" set and duplicating the
hash. Given that this failure chance is so low, we didn't feel it was worth the extra
cost here as this is an extremely hot path in our application.
Future
The buffer implementation has been around in Sentry for four years now, and has only
had minor changes along the way. Only time will tell how much longer until we face
new constraints, but we're confident that choosing Redis early on was the right choice.
What's the coolest thing you've used Redis for?
p.s. Whether you want to debug Ruby, do Node error tracking, or handle an obscure Java exception, we'll be working hard to provide the best possible experience for you and your team!