MongoDB: Implement a read / write lock (mutex)

2018-06-14 16:53:36

I need to implement some locking mechanism with MongoDB, in order to prevent inconsistent data, but allow dirty reads.

The conditions:

Acquiring a WRITE lock is only possible, if there's no READ lock and no WRITE lock.

Acquiring a READ lock is only possible, if there's no WRITE lock.

There can be many parallel READ locks on a single document.

There must be some kind of timeout mechanism: If (for whatever reason) some process does not release its lock, the application must be able to recover.

Dirty reads are possible by simply ignoring all locks within the query.

(Starvation of WRITE processes is not part of this topic)

Why `READ` and `WRITE` locks / why not only using a `WRITE` lock:

Let's assume, we have 2 collections: contacts and categories . It's a nm relationship, where each contact has an array of category IDs.

READ lock: When adding a category to a contact, we must make sure that this category isn't getting deleted at the moment (which requires a WRITE lock, see below). And because there can be many READ locks on the same document, it's possible for multiple processes to add this single category to multiple contacts.

WRITE lock: When deleting a category, we must first remove the category ID from all contacts. While this operation is running, we must make sure, that it's not possible to add this category to any contact (this operation needs a READ lock). Afterwards we can safely remove the category document.

This way, there will be always a consistent state.

The timeout:

That's the hardest part. I already tried to implement it twice, but always found some issues, that seemed to be too hard to solve.

The basic idea: Every acquired lock comes with a timestamp until when this lock is valid. If this timestamp is in the past, we can ignore that lock. When a process finished its task, it should remove its lock.

The big challenge was, to have multiple READ locks, where each READ lock has its own timeout, but multiple READ locks can have the same timeout value. And when releasing a READ lock it must only release itself, all other READ locks must be preserved.

My last implementation:

{
  _id: 1234,
  lock: {
    read: [
      ISODate("2015-06-26T12:00:00Z")
    ],
    write: null
  }
}

Either lock.read can contain elements or lock.write can be set. It must be never possible to have both set!

The queries:

The queries for this are okay, some could be a bit easier (especially "release read lock"). But the main reason for showing them to you is that I'm still not sure if I haven't missed something.

Preface:

ISODate("now") is the current time. It's used to ignore all locks that are expired. And it's also used to remove all expired read locks.

ISODate("lock expiration") is used to indicate when this lock will expire and can be ignored / removed. (eg now + 5 seconds )

This is used when acquiring a new lock.

And it's also used when releasing a read lock.

Acquire READ lock:

If there's no valid write lock, then insert a read lock.

update(
  {
    _id: 1234,
    $or: [
      { 'lock.write': null },
      { 'lock.write': { $lt: ISODate("now") } }
    ]
  },
  {
    $set: { 'lock.write': null },
    $push: { 'lock.read': ISODate("lock expiration") }
  }
)

Acquire WRITE lock:

If there's no valid read lock and no valid write lock, then set the write lock.

update(
  {
    _id: 1234,
    $and: [
      $or: [
        { 'lock.read':{ $size: 0 } },
        { 'lock.read':{ $not: { $gte: ISODate("now") } } }
      ],
      $or: [
        { 'lock.write': null },
        { 'lock.write': { $lt: ISODate("now") } }
      ]
    ]
  },
  {
    $set: {
      'lock.read': [],
      'lock.write': ISODate("lock expiration")
    }
  }
)

Release READ lock:

Remove the acquired read lock by using its expiration timestamp.

update(
  {
    _id: 1234,
    'lock.read': ISODate("lock expiration")
  },
  {
    $unset: { 'lock.read.$': null }
  }
)

update(
  {
    _id: 1234,
  },
  {
    $pull: { 'lock.read': { $lt: ISODate("now") } }
  }
)

update(
  {
    _id: 1234
  },
  {
    $pull: { 'lock.read': null }
  }
)

(It's possible that lock.read array contains multiple identical timestamps, if multiple processes acquired a READ lock. Though we need to only remove one timestamp, and this won't work with $pull , but works using the positional operator $ . Also I remove all expired locks with an additional update. I tried some things, but wasn't able to reduce it to 2 or even 1 update.)

Release WRITE lock:

Remove the write log. Here should be nothing to check.

update(
  {
    _id: 1234
  },
  {
    $set: { 'lock.write': null }
  }
)

EDIT 1: Simplified Acquire `READ` and `WRITE` queries

{ $not: { $gte: ISODate("now") } } will match only, if the field does not contain anything $gte: ISODate("now") . Though it will match null and non-existing fields as well as an empty array.

Acquire READ lock:

update(
  {
    _id: 1234,
    'lock.write': { $not: { $gte: ISODate("now") } }
  },
  {
    $set: { 'lock.write': null },
    $push: { 'lock.read': ISODate("lock expiration") }
  }
)

Acquire WRITE lock:

update(
  {
    _id: 1234,
    'lock.write': { $not: { $gte: ISODate("now") } },
    'lock.read': { $not: { $gte: ISODate("now") } }
  },
  {
    $set: {
      'lock.read': [],
      'lock.write': ISODate("lock expiration")
    }
  }
)

But still no idea regarding the "Release READ lock" query...

I thought about some kind of tuple having the timeout timestamp and the count of locks. But then the problem comes with the Acquire READ lock query.

EDIT 2: Different data structure for easier release `READ` lock

{
  _id: 1234,
  lock: {
    read: [
      { timeout: ISODate("2015-06-26T12:00:00Z"), process: ObjectId("...") }
    ],
    write: null
  }
}

This works, because an ObjectId consists of timestamp, machine id, process id and a counter. This way it's not possible to create multiple equal ObjectIds . Long story short:

When acquiring a READ lock, we insert a document that consists of the timeout timestamp and a unique ObjectId . And when releasing it, we use this combination to remove it from the array. So the only interesting queries are:

Aquire WRITE lock:

update(
  {
    _id: 1234,
    'lock.write': { $not: { $gte: 4 } },
    'lock.read.timeout': { $not: { $gte: 4 } }
  },
  {
    $set: {
      'lock.read': [],
      'lock.write': ISODate("lock expiration")
    }
  }
)

Release READ lock:

update(
  {
    _id: 1234,
  },
  {
    $pull: {
      'lock.read': {
        $or: [
          { 'timeout': ISODate("lock expiration"), process: ObjectId("...") },
          { 'timeout': { $lt: ISODate("now") } }
        ]
      }
    }
  }
)

As you can see, we now only need a single query to remove our lock on clean up all timed out locks.

The unique process identifier is quite important, because without it, the $pull operation could remove the lock of another process if it acquired a lock with the very same timeout value.

Next step would be, to get rid of the process field and only use an ObjectId which should be able to hold the timeout part within. ( eg Mongodb: Perform a Date range query from the ObjectId in the mongo shell )

Questions:

Is this a valid and bulletproof implementation using MongoDB?

If "yes": Can I somehow improve it? (at least the "Release READ lock" part)

If "no": What's wrong with it? What have I missed?

Thanks in advance for your help!

链接地址: http://www.djcxy.com/p/41808.html

上一篇: iPad Web App：在Safari中使用JavaScript检测虚拟键盘？

下一篇: MongoDB：实现读/写锁（互斥锁）