Need Help Understanding Bounds

We've recently decided to revisit some of our MongoDB indexes and came across a peculiar result when using a compound index which contains a multikey part.

It's important to note that we're using v2.4.5

TLDR : When using a compound index with multikey part, the bounds of a non-multikey field used for range restriction are dropped.

I'll explain the problem with an example:

Create some data

db.demo.insert(
[{ "foo" : 1, "attr" : [  {  "name" : "a" },  {  "name" : "b" },  {  "name" : "c" } ]},
 { "foo" : 2, "attr" : [  {  "name" : "b" },  {  "name" : "c" },  {  "name" : "d" } ]},
 { "foo" : 3, "attr" : [  {  "name" : "c" },  {  "name" : "d" },  {  "name" : "e" } ]},
 { "foo" : 4, "attr" : [  {  "name" : "d" },  {  "name" : "e" },  {  "name" : "f" } ]}])

Index

db.demo.ensureIndex({'attr.name': 1, 'foo': 1})

Query & Explain

Query on 'attr.name' but constrain the range of the non-multikey field 'foo':

db.demo.find({foo: {$lt:3, $gt: 1}, 'attr.name': 'c'}).hint('attr.name_1_foo_1').explain()
{
    "cursor" : "BtreeCursor attr.name_1_foo_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 2,
    "nscanned" : 2,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 2,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "attr.name" : [
            [
                "c",
                "c"
            ]
        ],
        "foo" : [
            [
                -1.7976931348623157e+308,
                3
            ]
        ]
    }
}

As you can see, the range of 'foo' is not as defined in the query, one end is completely ignored which results in nscanned being larger than it should.

Changing the order of the range operands will alter the dropped end:

db.demo.find({foo: {$gt: 1, $lt:3}, 'attr.name': 'c'}).hint('attr.name_1_foo_1').explain()
{
    "cursor" : "BtreeCursor attr.name_1_foo_1",
    "isMultiKey" : true,
    "n" : 1,
    "nscannedObjects" : 2,
    "nscanned" : 2,
    "nscannedObjectsAllPlans" : 2,
    "nscannedAllPlans" : 2,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "attr.name" : [
            [
                "c",
                "c"
            ]
        ],
        "foo" : [
            [
                1,
                1.7976931348623157e+308
            ]
        ]
    }
}

We're either missing out on some multikey index basics, or we're facing a bug.

We've gone through similar topics, including:

  • https://groups.google.com/forum/#!searchin/mongodb-user/multikey$20bounds/mongodb-user/RKrsyzRwHrE/_i0SxdJV5qcJ
  • Order of $lt and $gt in MongoDB range query
  • Unfortunately these posts address a different use-case where a range is set on the multikeyed value.

    Other things we've tried to do:

  • Change the compound index ordering, starting with the non-multikey field.

  • Put the 'foo' value inside each of the subdocuments in the 'attr' array, index by ('attr.name', 'attr.foo') and do an $elemMatch on 'attr' with a range constraint on 'foo'.

  • Use an $and operator when defining the range:

    db.demo.find({'attr.name': 'c', $and: [{num: {$lt: 3}}, {num: {$gt: 1}}]})
    
  • Use MongoDB v2.5.4

  • None of the above had any effect (v2.5.4 made things worse by dumping both ends of the range completely).

    Any kind of help would be highly appreciated!

    Many Thanks,

    Roi


    With compound indexes where one of the indexed fields is an array, MongoDB will only use either a lower or upper bound for the range query to ensure correct matches are returned. See SERVER-958 for an example where constraining to both upper and lower index bounds would not find the expected document.

    If your range query is on the array field you can potentially use the $elemMatch operator to optimise your query within the expected index bounds. As at MongoDB 2.4, the $elemMatch operator does not work on non-array fields so unfortunately this doesn't help your use case. You can watch/upvote SERVER-6050: Consider allowing $elemMatch applied to non arrays in the MongoDB issue tracker.

    There is also an open issue SERVER-7959: Potentially unexpected scans with compound indexes when some fields are multikey describing this behaviour.


    The $min and $max operators may help to work around this issue, by allowing you to explicitly specify the index bounds.

    Example:

    db.demo.find({foo: {$lt:3, $gt: 1}, 'attr.name': 'c'}).
     hint('attr.name_1_foo_1').
     min({'attr.name': 'c', foo: 1.000001}).
     max({'attr.name': 'c', foo: 3}).explain()
    

    Result:

    {
        "cursor" : "BtreeCursor attr.name_1_foo_1",
        "isMultiKey" : true,
        "n" : 1,
        "nscannedObjects" : 1,
        "nscanned" : 1,
        "nscannedObjectsAllPlans" : 1,
        "nscannedAllPlans" : 1,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
            "start" : {
                "attr.name" : "c",
                "foo" : 1.000001
            },
            "end" : {
                "attr.name" : "c",
                "foo" : 3
            }
        }
    }
    

    There are some important caveats though:

  • $min is always inclusive (like $gte), and $max is always exclusive (like $lt). You may have to adjust your values to get the effect of $gt or $lte.
  • The fields in $min and $max must exactly match the fields in the index.
  • You can only have one set of index bounds per query. There is no equivalent for an $in or $or query.
  • While the operators are documented, it does not seem to be recommended for normal use cases.
  • Point 3 is a blocker for me (I need to do an $in on the array field), so I'm still looking for another solution.

    Source: https://groups.google.com/forum/#!msg/mongodb-user/oxL8wuVdITA/uWJHVbMd_-8J

    链接地址: http://www.djcxy.com/p/57912.html

    上一篇: 构建索引将涵盖查询

    下一篇: 需要帮助了解界限