Basic mongodb theory

This will no doubt be a stupid question, and we can all laugh about how stupid it is when the answer points out the glaring simplicity of it all, but being firmly indoctrinated in the art of relational databases I can't seem to get my head entirely around mongodb - no matter how many articles I read or videos I watch.

Here's my situation. I have a project which potentially will have millions of users. Core features:

  • Have a list of users of 4 different types
  • One type of these users can create events
  • Other user types can apply to be perform at the event (there is then a request system between the applying party and organising party to agree upon terms)
  • Other user types can attend the event
  • All user types can follow the event
  • Every user can upload an unlimited amount of images to their "gallery"
  • Now I would instantly know how to go about normalising a MySQL database and joining queries to get the data I require, but what about mongodb?

    Since all of this information is relational to the users do I just create a single collection for users? For each user do I create a document? Does this document store all details of events, requests and images relevent to that user - or just some sort of id for these things that I then cross reference? If not wouldn't this replicate a lot of data - ie if I had to replicate all event data for every user following/attending/performing at that event and put it in that users document (I'm sure this isn't the case - but without joins how do I get "join" the user and all event data if events are stored in another collection?). What about images? A users document can be 16mb - but if I allow unlimited images and everything related to the user was stored in a single document then the images alone could grow larger than a single document?

    I'm sure that I'm not understanding pretty vital to understanding mongodb - enlighten me!

    Thanks.


    You can use 2 different User & Event collection to design your app. Something like this

    UserDocument Collection
        -Type
        -Details    
    
    EventDocument Collection
        -Created By
        -EventDetail
        -AppliedUsers
             -"User A",User B"
        -AttendingUsers
             -"User C",User D"
        -FollowingUsers
             -"User E",User F"
    

    Event documet got all the userid's of applied, attending & following users using Dbref.

    One more approach is store frequently accessed user document fields along with the DBref objects. This avoids the unnecessary hits to the db and storing redundant (complete user data) data in the document. something like

     EventDocument Collection
        -Created By
        -EventDetail
        -AppliedUsers
             -"User"
                 - Name
                 - XXX
                 - DbRef to User A
        -AttendingUsers
             -"User"
                 - Name
                 - XXX
                 - DbRef to User B
    
        -FollowingUsers
             -"User"
                 - Name
                 - XXX
                 - DbRef to User C
    
             -"User"
                 - Name
                 - XXX
                 - DbRef to User D
    

    For images you can use GridFs. This will split the large files in to smaller chunks.


    Initially i suggest to create only UserDocument and embed all event related collection inside of user, in the future you will see if events will be big collection(more that mongodb limit 4mb) you will move it into separate collection. As for images look at mongodb gridFs feature, it allow you to store file with any size. In user document you could store only collection of fileId.

    When you start to design document databe schema always start from embedding evething, later you will see what you need move into separate collection. In you case if you will need for example show list of all events you can't do it easy, because you need to load each user and get embeded collection of events, in such situation need to move events in separate collection.

    Update:

    Because you need to refernce to the event from any user document, you need to move event into separate collection, because it's always bad to reference to the embedded collections.

    So after disscussions with myself, seems to me that following scheme should fit you need:

    UserDocument Collection
        -UserId
        -Type
        -Details    
        -Events(EventId)
        -AppliedEvents
        -AttendingEvents
        -Files(it's not actual files it just references to gridFs filess)
    
    EventDocument Collection
        -EventId
        -EventDetail   
        -FollowingUsers
    

    I've moved almost all into UserDocument, because User is a 'strong' entity and you will work with user more than with event(it seems so for me).


    You should follow the suggestions that@Bugai13 and @Ramesh Vel suggested regarding the design of your DB, images and DBRefs. I just wanted to clarify a couple of things.

    If not wouldn't this replicate a lot of data - ie if I had to replicate all event data for every user following/attending/performing at that event and put it in that users document

    People came up with normalisation in relational databases at a time when storage was expensive - hence splitting data in multiple data and reconstructing them using joins. Now that storage is relatively very cheap, if you need performance, having repetition of data is not frowned upon at all. It does depend on the application, however, your query pattern, the amount of data you're storing and speed of reads/writes you're after. But, you'll say, won't more writes (since no normalisation) lead to worse performance? Not necessarily, depends on the app. If you're worried about this, look at sharding (for MongoDB: http://www.mongodb.org/display/DOCS/Sharding+Introduction).

    but without joins how do I get "join" the user and all event data if events are stored in another collection?

    Also note that, as far as I understand (happy to be corrected on this), there isn't a 'join' operation in MongoDB. This happends on just some drivers. As the docs say here :

    DBRef's have the advantage of allowing optional automatic client-side dereferencing with some drivers

    Notice that dereferencing happens client-side only, and it happens only for 'some' drivers. As far as I gather, PHP does but the Java driver doesn't - you would have to handle the join at the application level by fetching the two result sets from the separate collections and join them by hand, despite the DBRef.

    链接地址: http://www.djcxy.com/p/50924.html

    上一篇: SFTP到EC2 Windows实例

    下一篇: 基本的mongodb理论