What are Aggregates and PODs and how/why are they special?

This FAQ is about Aggregates and PODs and covers the following material:

  • What are Aggregates ?
  • What are POD s (Plain Old Data)?
  • How are they related?
  • How and why are they special?
  • What changes for C++11?

  • How to read:

    This article is rather long. If you want to know about both aggregates and PODs (Plain Old Data) take time and read it. If you are interested just in aggregates, read only the first part. If you are interested only in PODs then you must first read the definition, implications, and examples of aggregates and then you may jump to PODs but I would still recommend reading the first part in its entirety. The notion of aggregates is essential for defining PODs. If you find any errors (even minor, including grammar, stylistics, formatting, syntax, etc.) please leave a comment, I'll edit.

    What are aggregates and why they are special

    Formal definition from the C++ standard (C++03 8.5.1 §1) :

    An aggregate is an array or a class (clause 9) with no user-declared constructors (12.1), no private or protected non-static data members (clause 11), no base classes (clause 10), and no virtual functions (10.3).

    So, OK, let's parse this definition. First of all, any array is an aggregate. A class can also be an aggregate if… wait! nothing is said about structs or unions, can't they be aggregates? Yes, they can. In C++, the term class refers to all classes, structs, and unions. So, a class (or struct, or union) is an aggregate if and only if it satisfies the criteria from the above definitions. What do these criteria imply?

  • This does not mean an aggregate class cannot have constructors, in fact it can have a default constructor and/or a copy constructor as long as they are implicitly declared by the compiler, and not explicitly by the user

  • No private or protected non-static data members . You can have as many private and protected member functions (but not constructors) as well as as many private or protected static data members and member functions as you like and not violate the rules for aggregate classes

  • An aggregate class can have a user-declared/user-defined copy-assignment operator and/or destructor

  • An array is an aggregate even if it is an array of non-aggregate class type.

  • Now let's look at some examples:

    class NotAggregate1
    {
      virtual void f() {} //remember? no virtual functions
    };
    
    class NotAggregate2
    {
      int x; //x is private by default and non-static 
    };
    
    class NotAggregate3
    {
    public:
      NotAggregate3(int) {} //oops, user-defined constructor
    };
    
    class Aggregate1
    {
    public:
      NotAggregate1 member1;   //ok, public member
      Aggregate1& operator=(Aggregate1 const & rhs) {/* */} //ok, copy-assignment  
    private:
      void f() {} // ok, just a private function
    };
    

    You get the idea. Now let's see how aggregates are special. They, unlike non-aggregate classes, can be initialized with curly braces {} . This initialization syntax is commonly known for arrays, and we just learnt that these are aggregates. So, let's start with them.

    Type array_name[n] = {a1, a2, …, am};

    if(m == n)
    the ith element of the array is initialized with ai
    else if(m < n)
    the first m elements of the array are initialized with a1, a2, …, am and the other n - m elements are, if possible, value-initialized (see below for the explanation of the term)
    else if(m > n)
    the compiler will issue an error
    else (this is the case when n isn't specified at all like int a[] = {1, 2, 3}; )
    the size of the array (n) is assumed to be equal to m, so int a[] = {1, 2, 3}; is equivalent to int a[3] = {1, 2, 3};

    When an object of scalar type ( bool , int , char , double , pointers, etc.) is value-initialized it means it is initialized with 0 for that type ( false for bool , 0.0 for double , etc.). When an object of class type with a user-declared default constructor is value-initialized its default constructor is called. If the default constructor is implicitly defined then all nonstatic members are recursively value-initialized. This definition is imprecise and a bit incorrect but it should give you the basic idea. A reference cannot be value-initialized. Value-initialization for a non-aggregate class can fail if, for example, the class has no appropriate default constructor.

    Examples of array initialization:

    class A
    {
    public:
      A(int) {} //no default constructor
    };
    class B
    {
    public:
      B() {} //default constructor available
    };
    int main()
    {
      A a1[3] = {A(2), A(1), A(14)}; //OK n == m
      A a2[3] = {A(2)}; //ERROR A has no default constructor. Unable to value-initialize a2[1] and a2[2]
      B b1[3] = {B()}; //OK b1[1] and b1[2] are value initialized, in this case with the default-ctor
      int Array1[1000] = {0}; //All elements are initialized with 0;
      int Array2[1000] = {1}; //Attention: only the first element is 1, the rest are 0;
      bool Array3[1000] = {}; //the braces can be empty too. All elements initialized with false
      int Array4[1000]; //no initializer. This is different from an empty {} initializer in that
      //the elements in this case are not value-initialized, but have indeterminate values 
      //(unless, of course, Array4 is a global array)
      int array[2] = {1, 2, 3, 4}; //ERROR, too many initializers
    }
    

    Now let's see how aggregate classes can be initialized with braces. Pretty much the same way. Instead of the array elements we will initialize the non-static data members in the order of their appearance in the class definition (they are all public by definition). If there are fewer initializers than members, the rest are value-initialized. If it is impossible to value-initialize one of the members which were not explicitly initialized, we get a compile-time error. If there are more initializers than necessary, we get a compile-time error as well.

    struct X
    {
      int i1;
      int i2;
    };
    struct Y
    {
      char c;
      X x;
      int i[2];
      float f; 
    protected:
      static double d;
    private:
      void g(){}      
    }; 
    
    Y y = {'a', {10, 20}, {20, 30}};
    

    In the above example yc is initialized with 'a' , yxi1 with 10 , yxi2 with 20 , yi[0] with 20 , yi[1] with 30 and yf is value-initialized, that is, initialized with 0.0 . The protected static member d is not initialized at all, because it is static .

    Aggregate unions are different in that you may initialize only their first member with braces. I think that if you are advanced enough in C++ to even consider using unions (their use may be very dangerous and must be thought of carefully), you could look up the rules for unions in the standard yourself :).

    Now that we know what's special about aggregates, let's try to understand the restrictions on classes; that is, why they are there. We should understand that memberwise initialization with braces implies that the class is nothing more than the sum of its members. If a user-defined constructor is present, it means that the user needs to do some extra work to initialize the members therefore brace initialization would be incorrect. If virtual functions are present, it means that the objects of this class have (on most implementations) a pointer to the so-called vtable of the class, which is set in the constructor, so brace-initialization would be insufficient. You could figure out the rest of the restrictions in a similar manner as an exercise :).

    So enough about the aggregates. Now we can define a stricter set of types, to wit, PODs

    What are PODs and why they are special

    Formal definition from the C++ standard (C++03 9 §4) :

    A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor. Similarly, a POD-union is an aggregate union that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor. A POD class is a class that is either a POD-struct or a POD-union.

    Wow, this one's tougher to parse, isn't it? :) Let's leave unions out (on the same grounds as above) and rephrase in a bit clearer way:

    An aggregate class is called a POD if it has no user-defined copy-assignment operator and destructor and none of its nonstatic members is a non-POD class, array of non-POD, or a reference.

    What does this definition imply? (Did I mention POD stands for Plain Old Data ?)

  • All POD classes are aggregates, or, to put it the other way around, if a class is not an aggregate then it is sure not a POD
  • Classes, just like structs, can be PODs even though the standard term is POD-struct for both cases
  • Just like in the case of aggregates, it doesn't matter what static members the class has
  • Examples:

    struct POD
    {
      int x;
      char y;
      void f() {} //no harm if there's a function
      static std::vector<char> v; //static members do not matter
    };
    
    struct AggregateButNotPOD1
    {
      int x;
      ~AggregateButNotPOD1() {} //user-defined destructor
    };
    
    struct AggregateButNotPOD2
    {
      AggregateButNotPOD1 arrOfNonPod[3]; //array of non-POD class
    };
    

    POD-classes, POD-unions, scalar types, and arrays of such types are collectively called POD-types.
    PODs are special in many ways. I'll provide just some examples.

  • POD-classes are the closest to C structs. Unlike them, PODs can have member functions and arbitrary static members, but neither of these two change the memory layout of the object. So if you want to write a more or less portable dynamic library that can be used from C and even .NET, you should try to make all your exported functions take and return only parameters of POD-types.

  • The lifetime of objects of non-POD class type begins when the constructor has finished and ends when the destructor has finished. For POD classes, the lifetime begins when storage for the object is occupied and finishes when that storage is released or reused.

  • For objects of POD types it is guaranteed by the standard that when you memcpy the contents of your object into an array of char or unsigned char, and then memcpy the contents back into your object, the object will hold its original value. Do note that there is no such guarantee for objects of non-POD types. Also, you can safely copy POD objects with memcpy . The following example assumes T is a POD-type:

    #define N sizeof(T)
    char buf[N];
    T obj; // obj initialized to its original value
    memcpy(buf, &obj, N); // between these two calls to memcpy,
    // obj might be modified
    memcpy(&obj, buf, N); // at this point, each subobject of obj of scalar type
    // holds its original value
    
  • goto statement. As you may know, it is illegal (the compiler should issue an error) to make a jump via goto from a point where some variable was not yet in scope to a point where it is already in scope. This restriction applies only if the variable is of non-POD type. In the following example f() is ill-formed whereas g() is well-formed. Note that Microsoft's compiler is too liberal with this rule—it just issues a warning in both cases.

    int f()
    {
      struct NonPOD {NonPOD() {}};
      goto label;
      NonPOD x;
    label:
      return 0;
    }
    
    int g()
    {
      struct POD {int i; char c;};
      goto label;
      POD x;
    label:
      return 0;
    }
    
  • It is guaranteed that there will be no padding in the beginning of a POD object. In other words, if a POD-class A's first member is of type T, you can safely reinterpret_cast from A* to T* and get the pointer to the first member and vice versa.

  • The list goes on and on…

    Conclusion

    It is important to understand what exactly a POD is because many language features, as you see, behave differently for them.


    What changes for C++11?

    Aggregates

    The standard definition of an aggregate has changed slightly, but it's still pretty much the same:

    An aggregate is an array or a class (Clause 9) with no user-provided constructors (12.1), no brace-or-equal-initializers for non-static data members (9.2), no private or protected non-static data members (Clause 11), no base classes (Clause 10), and no virtual functions (10.3).

    Ok, what changed?

  • Previously, an aggregate could have no user-declared constructors, but now it can't have user-provided constructors. Is there a difference? Yes, there is, because now you can declare constructors and default them:

    struct Aggregate {
        Aggregate() = default; // asks the compiler to generate the default implementation
    };
    

    This is still an aggregate because a constructor (or any special member function) that is defaulted on the first declaration is not user-provided.

  • Now an aggregate cannot have any brace-or-equal-initializers for non-static data members. What does this mean? Well, this is just because with this new standard, we can initialize members directly in the class like this:

    struct NotAggregate {
        int x = 5; // valid in C++11
        std::vector<int> s{1,2,3}; // also valid
    };
    

    Using this feature makes the class no longer an aggregate because it's basically equivalent to providing your own default constructor.

  • So, what is an aggregate didn't change much at all. It's still the same basic idea, adapted to the new features.

    What about PODs?

    PODs went through a lot of changes. Lots of previous rules about PODs were relaxed in this new standard, and the way the definition is provided in the standard was radically changed.

    The idea of a POD is to capture basically two distinct properties:

  • It supports static initialization, and
  • Compiling a POD in C++ gives you the same memory layout as a struct compiled in C.
  • Because of this, the definition has been split into two distinct concepts: trivial classes and standard-layout classes, because these are more useful than POD. The standard now rarely uses the term POD, preferring the more specific trivial and standard-layout concepts.

    The new definition basically says that a POD is a class that is both trivial and has standard-layout, and this property must hold recursively for all non-static data members:

    A POD struct is a non-union class that is both a trivial class and a standard-layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). Similarly, a POD union is a union that is both a trivial class and a standard layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). A POD class is a class that is either a POD struct or a POD union.

    Let's go over each of these two properties in detail separately.

    Trivial classes

    Trivial is the first property mentioned above: trivial classes support static initialization. If a class is trivially copyable (a superset of trivial classes), it is ok to copy its representation over the place with things like memcpy and expect the result to be the same.

    The standard defines a trivial class as follows:

    A trivially copyable class is a class that:

    — has no non-trivial copy constructors (12.8),

    — has no non-trivial move constructors (12.8),

    — has no non-trivial copy assignment operators (13.5.3, 12.8),

    — has no non-trivial move assignment operators (13.5.3, 12.8), and

    — has a trivial destructor (12.4).

    A trivial class is a class that has a trivial default constructor (12.1) and is trivially copyable.

    [ Note: In particular, a trivially copyable or trivial class does not have virtual functions or virtual base classes.—end note ]

    So, what are all those trivial and non-trivial things?

    A copy/move constructor for class X is trivial if it is not user-provided and if

    — class X has no virtual functions (10.3) and no virtual base classes (10.1), and

    — the constructor selected to copy/move each direct base class subobject is trivial, and

    — for each non-static data member of X that is of class type (or array thereof), the constructor selected to copy/move that member is trivial;

    otherwise the copy/move constructor is non-trivial.

    Basically this means that a copy or move constructor is trivial if it is not user-provided, the class has nothing virtual in it, and this property holds recursively for all the members of the class and for the base class.

    The definition of a trivial copy/move assignment operator is very similar, simply replacing the word "constructor" with "assignment operator".

    A trivial destructor also has a similar definition, with the added constraint that it can't be virtual.

    And yet another similar rule exists for trivial default constructors, with the addition that a default constructor is not-trivial if the class has non-static data members with brace-or-equal-initializers, which we've seen above.

    Here are some examples to clear everything up:

    // empty classes are trivial
    struct Trivial1 {};
    
    // all special members are implicit
    struct Trivial2 {
        int x;
    };
    
    struct Trivial3 : Trivial2 { // base class is trivial
        Trivial3() = default; // not a user-provided ctor
        int y;
    };
    
    struct Trivial4 {
    public:
        int a;
    private: // no restrictions on access modifiers
        int b;
    };
    
    struct Trivial5 {
        Trivial1 a;
        Trivial2 b;
        Trivial3 c;
        Trivial4 d;
    };
    
    struct Trivial6 {
        Trivial2 a[23];
    };
    
    struct Trivial7 {
        Trivial6 c;
        void f(); // it's okay to have non-virtual functions
    };
    
    struct Trivial8 {
         int x;
         static NonTrivial1 y; // no restrictions on static members
    };
    
    struct Trivial9 {
         Trivial9() = default; // not user-provided
          // a regular constructor is okay because we still have default ctor
         Trivial9(int x) : x(x) {};
         int x;
    };
    
    struct NonTrivial1 : Trivial3 {
        virtual void f(); // virtual members make non-trivial ctors
    };
    
    struct NonTrivial2 {
        NonTrivial2() : z(42) {} // user-provided ctor
        int z;
    };
    
    struct NonTrivial3 {
        NonTrivial3(); // user-provided ctor
        int w;
    };
    NonTrivial3::NonTrivial3() = default; // defaulted but not on first declaration
                                          // still counts as user-provided
    struct NonTrivial5 {
        virtual ~NonTrivial5(); // virtual destructors are not trivial
    };
    

    Standard-layout

    Standard-layout is the second property. The standard mentions that these are useful for communicating with other languages, and that's because a standard-layout class has the same memory layout of the equivalent C struct or union.

    This is another property that must hold recursively for members and all base classes. And as usual, no virtual functions or virtual base classes are allowed. That would make the layout incompatible with C.

    A relaxed rule here is that standard-layout classes must have all non-static data members with the same access control. Previously these had to be all public, but now you can make them private or protected, as long as they are all private or all protected.

    When using inheritance, only one class in the whole inheritance tree can have non-static data members, and the first non-static data member cannot be of a base class type (this could break aliasing rules), otherwise, it's not a standard-layout class.

    This is how the definition goes in the standard text:

    A standard-layout class is a class that:

    — has no non-static data members of type non-standard-layout class (or array of such types) or reference,

    — has no virtual functions (10.3) and no virtual base classes (10.1),

    — has the same access control (Clause 11) for all non-static data members,

    — has no non-standard-layout base classes,

    — either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and

    — has no base classes of the same type as the first non-static data member.

    A standard-layout struct is a standard-layout class defined with the class-key struct or the class-key class.

    A standard-layout union is a standard-layout class defined with the class-key union.

    [ Note: Standard-layout classes are useful for communicating with code written in other programming languages. Their layout is specified in 9.2.—end note ]

    And let's see a few examples.

    // empty classes have standard-layout
    struct StandardLayout1 {};
    
    struct StandardLayout2 {
        int x;
    };
    
    struct StandardLayout3 {
    private: // both are private, so it's ok
        int x;
        int y;
    };
    
    struct StandardLayout4 : StandardLayout1 {
        int x;
        int y;
    
        void f(); // perfectly fine to have non-virtual functions
    };
    
    struct StandardLayout5 : StandardLayout1 {
        int x;
        StandardLayout1 y; // can have members of base type if they're not the first
    };
    
    struct StandardLayout6 : StandardLayout1, StandardLayout5 {
        // can use multiple inheritance as long only
        // one class in the hierarchy has non-static data members
    };
    
    struct StandardLayout7 {
        int x;
        int y;
        StandardLayout7(int x, int y) : x(x), y(y) {} // user-provided ctors are ok
    };
    
    struct StandardLayout8 {
    public:
        StandardLayout8(int x) : x(x) {} // user-provided ctors are ok
    // ok to have non-static data members and other members with different access
    private:
        int x;
    };
    
    struct StandardLayout9 {
        int x;
        static NonStandardLayout1 y; // no restrictions on static members
    };
    
    struct NonStandardLayout1 {
        virtual f(); // cannot have virtual functions
    };
    
    struct NonStandardLayout2 {
        NonStandardLayout1 X; // has non-standard-layout member
    };
    
    struct NonStandardLayout3 : StandardLayout1 {
        StandardLayout1 x; // first member cannot be of the same type as base
    };
    
    struct NonStandardLayout4 : StandardLayout3 {
        int z; // more than one class has non-static data members
    };
    
    struct NonStandardLayout5 : NonStandardLayout3 {}; // has a non-standard-layout base class
    

    Conclusion

    With these new rules a lot more types can be PODs now. And even if a type is not POD, we can take advantage of some of the POD properties separately (if it is only one of trivial or standard-layout).

    The standard library has traits to test these properties in the header <type_traits> :

    template <typename T>
    struct std::is_pod;
    template <typename T>
    struct std::is_trivial;
    template <typename T>
    struct std::is_trivially_copyable;
    template <typename T>
    struct std::is_standard_layout;
    

    What has changed for C++14

    We can refer to the Draft C++14 standard for reference.

    Aggregates

    This is covered in section 8.5.1 Aggregates which gives us the following definition:

    An aggregate is an array or a class (Clause 9) with no user-provided constructors (12.1), no private or protected non-static data members (Clause 11), no base classes (Clause 10), and no virtual functions (10.3).

    The only change is now adding in-class member initializers does not make a class a non-aggregate. So the following example from C++11 aggregate initialization for classes with member in-pace initializers:

    struct A
    {
      int a = 3;
      int b = 3;
    };
    

    was not an aggregate in C++11 but it is in C++14. This change is covered in N3605: Member initializers and aggregates, which has the following abstract:

    Bjarne Stroustrup and Richard Smith raised an issue about aggregate initialization and member-initializers not working together. This paper proposes to fix the issue by adopting Smith's proposed wording that removes a restriction that aggregates can't have member-initializers.

    POD stays the same

    The definition for POD(plain old data) struct is covered in section 9 Classes which says:

    A POD struct110 is a non-union class that is both a trivial class and a standard-layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). Similarly, a POD union is a union that is both a trivial class and a standard-layout class, and has no non-static data members of type non-POD struct, non-POD union (or array of such types). A POD class is a class that is either a POD struct or a POD union.

    which is the same wording as C++11.

    链接地址: http://www.djcxy.com/p/6876.html

    上一篇: 在C ++ 20中不推荐使用pod吗?

    下一篇: 什么是聚合和POD以及它们如何/为什么是特殊的?