MongoDB: Implement a read / write lock (mutex)
I need to implement some locking mechanism with MongoDB, in order to prevent inconsistent data, but allow dirty reads.
The conditions:
Acquiring a WRITE
lock is only possible, if there's no READ
lock and no WRITE
lock.
Acquiring a READ
lock is only possible, if there's no WRITE
lock.
There can be many parallel READ
locks on a single document.
There must be some kind of timeout mechanism: If (for whatever reason) some process does not release its lock, the application must be able to recover.
Dirty reads are possible by simply ignoring all locks within the query.
(Starvation of WRITE
processes is not part of this topic)
Why READ
and WRITE
locks / why not only using a WRITE
lock:
Let's assume, we have 2 collections: contacts
and categories
. It's a nm relationship, where each contact has an array of category IDs.
READ
lock: When adding a category to a contact, we must make sure that this category isn't getting deleted at the moment (which requires a WRITE
lock, see below). And because there can be many READ
locks on the same document, it's possible for multiple processes to add this single category to multiple contacts.
WRITE
lock: When deleting a category, we must first remove the category ID from all contacts. While this operation is running, we must make sure, that it's not possible to add this category to any contact (this operation needs a READ
lock). Afterwards we can safely remove the category document.
This way, there will be always a consistent state.
The timeout:
That's the hardest part. I already tried to implement it twice, but always found some issues, that seemed to be too hard to solve.
The basic idea: Every acquired lock comes with a timestamp until when this lock is valid. If this timestamp is in the past, we can ignore that lock. When a process finished its task, it should remove its lock.
The big challenge was, to have multiple READ
locks, where each READ
lock has its own timeout, but multiple READ
locks can have the same timeout value. And when releasing a READ
lock it must only release itself, all other READ
locks must be preserved.
My last implementation:
{
_id: 1234,
lock: {
read: [
ISODate("2015-06-26T12:00:00Z")
],
write: null
}
}
Either lock.read
can contain elements or lock.write
can be set. It must be never possible to have both set!
The queries:
The queries for this are okay, some could be a bit easier (especially "release read lock"). But the main reason for showing them to you is that I'm still not sure if I haven't missed something.
Preface:
ISODate("now")
is the current time. It's used to ignore all locks that are expired. And it's also used to remove all expired read locks. ISODate("lock expiration")
is used to indicate when this lock will expire and can be ignored / removed. (eg now + 5 seconds
) Acquire READ
lock:
If there's no valid write lock, then insert a read lock.
update(
{
_id: 1234,
$or: [
{ 'lock.write': null },
{ 'lock.write': { $lt: ISODate("now") } }
]
},
{
$set: { 'lock.write': null },
$push: { 'lock.read': ISODate("lock expiration") }
}
)
Acquire WRITE
lock:
If there's no valid read lock and no valid write lock, then set the write lock.
update(
{
_id: 1234,
$and: [
$or: [
{ 'lock.read':{ $size: 0 } },
{ 'lock.read':{ $not: { $gte: ISODate("now") } } }
],
$or: [
{ 'lock.write': null },
{ 'lock.write': { $lt: ISODate("now") } }
]
]
},
{
$set: {
'lock.read': [],
'lock.write': ISODate("lock expiration")
}
}
)
Release READ
lock:
Remove the acquired read lock by using its expiration timestamp.
update(
{
_id: 1234,
'lock.read': ISODate("lock expiration")
},
{
$unset: { 'lock.read.$': null }
}
)
update(
{
_id: 1234,
},
{
$pull: { 'lock.read': { $lt: ISODate("now") } }
}
)
update(
{
_id: 1234
},
{
$pull: { 'lock.read': null }
}
)
(It's possible that lock.read
array contains multiple identical timestamps, if multiple processes acquired a READ
lock. Though we need to only remove one timestamp, and this won't work with $pull
, but works using the positional operator $
. Also I remove all expired locks with an additional update. I tried some things, but wasn't able to reduce it to 2 or even 1 update.)
Release WRITE
lock:
Remove the write log. Here should be nothing to check.
update(
{
_id: 1234
},
{
$set: { 'lock.write': null }
}
)
EDIT 1: Simplified Acquire READ
and WRITE
queries
{ $not: { $gte: ISODate("now") } }
will match only, if the field does not contain anything $gte: ISODate("now")
. Though it will match null
and non-existing fields as well as an empty array.
Acquire READ
lock:
update(
{
_id: 1234,
'lock.write': { $not: { $gte: ISODate("now") } }
},
{
$set: { 'lock.write': null },
$push: { 'lock.read': ISODate("lock expiration") }
}
)
Acquire WRITE
lock:
update(
{
_id: 1234,
'lock.write': { $not: { $gte: ISODate("now") } },
'lock.read': { $not: { $gte: ISODate("now") } }
},
{
$set: {
'lock.read': [],
'lock.write': ISODate("lock expiration")
}
}
)
But still no idea regarding the "Release READ
lock" query...
I thought about some kind of tuple having the timeout timestamp and the count of locks. But then the problem comes with the Acquire READ
lock query.
EDIT 2: Different data structure for easier release READ
lock
{
_id: 1234,
lock: {
read: [
{ timeout: ISODate("2015-06-26T12:00:00Z"), process: ObjectId("...") }
],
write: null
}
}
This works, because an ObjectId
consists of timestamp, machine id, process id and a counter. This way it's not possible to create multiple equal ObjectIds
. Long story short:
When acquiring a READ
lock, we insert a document that consists of the timeout timestamp and a unique ObjectId
. And when releasing it, we use this combination to remove it from the array. So the only interesting queries are:
Aquire WRITE
lock:
update(
{
_id: 1234,
'lock.write': { $not: { $gte: 4 } },
'lock.read.timeout': { $not: { $gte: 4 } }
},
{
$set: {
'lock.read': [],
'lock.write': ISODate("lock expiration")
}
}
)
Release READ
lock:
update(
{
_id: 1234,
},
{
$pull: {
'lock.read': {
$or: [
{ 'timeout': ISODate("lock expiration"), process: ObjectId("...") },
{ 'timeout': { $lt: ISODate("now") } }
]
}
}
}
)
As you can see, we now only need a single query to remove our lock on clean up all timed out locks.
The unique process identifier is quite important, because without it, the $pull
operation could remove the lock of another process if it acquired a lock with the very same timeout value.
Next step would be, to get rid of the process
field and only use an ObjectId
which should be able to hold the timeout
part within. ( eg Mongodb: Perform a Date range query from the ObjectId in the mongo shell )
Questions:
Is this a valid and bulletproof implementation using MongoDB?
If "yes": Can I somehow improve it? (at least the "Release READ
lock" part)
If "no": What's wrong with it? What have I missed?
Thanks in advance for your help!
链接地址: http://www.djcxy.com/p/41808.html上一篇: iPad Web App:在Safari中使用JavaScript检测虚拟键盘?
下一篇: MongoDB:实现读/写锁(互斥锁)