订阅软件研发RSS CSDN首页> 软件研发

经典图书评注:Accustoming Yourself to C++

发表于2011-07-05 18:08| 次阅读| 来源CSDN| 0 条评论| 作者CSDN

摘要:由Scott Meyers所著的图书Effective C++ ——55 Specific Ways to Improve Your Programs and Designs(中文译名《Effective C++:改善程序与设计的55个具体做法》),是一本非常经典的C++图书,被喻为C++程序员的必读

导读:由Scott Meyers所著的图书《Effective C++ ——55 Specific Ways to Improve           Programs and Designs》(中文译名《Effective C++:改善程序与设计的55个具体做法》),是一本非常经典的C++图书,被喻为C++程序员必读书籍。电子工业出版社力邀国内资深专家执笔,在英文原著基础上增加中文点评与注释,旨在以先行者的学研心得与实践感悟,对读者阅读与学习加以点拨、指明捷径。本文节选自第1章:Accustoming Yourself to C++

 

Regardless of your programming background, C++ is likely to take a little getting used to. It’s a powerful language with an enormous range of features, but before you can harness that power and make effective use of those features, you have to accustom yourself to C++’s way of doing things. This entire book is about how to do that, but some things are more fundamental than others, and this chapter is about some of the most fundamental things of all.

每种语言都有它解决问题的方法,虽然最后都能殊途同归,但选择的方法不同,程序性能也千差万别。特定的方法可以最大地发挥语言的性能;有些方法则出于惯例。程序员与人交流合作,比如编写程序库供人使用,使用别人编写好的程序库,都需要遵从一定的惯例,这样才能减少沟通的成本。这些惯例背后都有其深厚的原因,并非一句话能够解释清楚。不幸的是,C++社区过于庞大,并非所有的意见都有统一的结论。

Item 1: View C++ as a federation of languages. 

In the beginning, C++ was just C with some object-oriented features tacked on. Even C++’s original name, “C with Classes,” reflected this simple heritage.

As the language matured, it grew bolder and more adventurous, adopting ideas, features, and programming strategies different from those of C with Classes. Exceptions required different approaches to structuring functions (see Item 29). Templates gave rise to new ways of thinking about design (see Item 41), and the STL defined an approach to extensibility unlike any most people had ever seen.

Today’s C++ is a multiparadigm programming language, one supporting a combination of procedural, object-oriented, functional, generic, and metaprogramming features. This power and flexibility make C++ a tool without equal, but can also cause some confusion. All the “proper usage” rules seem to have exceptions. How are we to make sense of such a language?

无人能掌握C++所有的枝节。这并非夸张的说法,也不是藐视读者的智商。因为C++本身不断在发展,不断地加入新的东西。

很多年之前,我学习C++时用的第一个C++编译器(Turbo C++ 1.0)中,template还只是一个被保留而未实现任何功能的关键字。可在C++诞生的若干年后,它居然成为了 STL的基石。这个不起眼的小玩意,即使是C++之父,一开始也未能意识到其蕴涵的巨大能量。

The easiest way is to view C++ not as a single language but as a federation of related languages. Within a particular sublanguage, the rules tend to be simple, straightforward, and easy to remember. When you move from one sublanguage to another, however, the rules may change. To make sense of C++, you have to recognize its primary sublanguages. Fortunately, there are only four:

C. Way down deep, C++ is still based on C. Blocks, statements, the preprocessor, built-in data types, arrays, pointers, etc., all come from C. In many cases, C++ offers approaches to problems thatare superior to their C counterparts (e.g., see Items 2 (alternativesto the preprocessor) and 13 (using objects to manage resources)),but when you find yourself working with the C part of C++, therules for effective programming reflect C’s more limited scope: notemplates, no exceptions, no overloading, etc. 

Object-Oriented C++. This part of C++ is what C with Classes was all about: classes (including constructors and destructors), encapsulation, inheritance, polymorphism, virtual functions (dynamic binding), etc. This is the part of C++ to which the classic rules for object-oriented design most directly apply.

Template C++. This is the generic programming part of C++, the one that most programmers have the least experience with. Template considerations pervade C++, and it’s not uncommon for rules of good programming to include special template-only clauses(e.g., see Item 46 on facilitating type conversions in calls to template functions). In fact, templates are so powerful, they give riseto a completely new programming paradigm, template metaprogramming(TMP). Item 48 provides an overview of TMP, but unless you’re a hard-core template junkie, you need not worry about it. The rules for TMP rarely interact with mainstream C++ programming.

The STL. The STL is a template library, of course, but it’s a very special template library. Its conventions regarding containers, iterators, algorithms, and function objects mesh beautifully, but templates and libraries can be built around other ideas, too. The STL has particular ways of doing things, and when you’re working with the STL, you need to be sure to follow its conventions.

C++不是绝对意义上的C++。在本书的第二版中没有Item 1这一节,而在这一版中,把这一大段放在了第一条,可见作者对这个问题的重要性也是逐步才认识到的。我对此深以为然。这一篇是全书的中心,读此书必须先细细品味它。如果之前读过第二版,对比一下行文风格,就能发现二者有极大差异。作者不再强调在C++中必须怎样做,文字中隐隐透着些许无奈,本篇就是最佳注脚。

在我看来,C++各个不同方面的差异性要远大于它们的共性。C++经过几十年发展逐渐演变成今天这样,将如此之多的编程风格糅合在同一门语言中,让它们能和谐共存,是非常困难的事情。因为要尽可能满足各种项目、各种用户在各种时期的不同需求,所以C++不是在一开始经过深思熟虑定义出来的。C++语言发展到今天,还能发展下去,难能可贵。所以,C++新标准从 1998 年到现在,十多年过去了,还未能完全定稿,真的很容易理解。

在某些C++教材上,反复强调不要把C++当成 C 使用(包括本书第二版),在某种意义上说没错。但只使用C++的一部分——只是C的部分,仅仅利用C++的改进来弥补 C 的一些缺陷,在工程实践中也是个不错的方案。如何使用C++最好,仅取决于你的开发团队怎样定义你们使用的C++,并且是否全部认同。Google在这一点上做得很好,在网络上流传着Google发布的C++编码规范,建议大家看一看。有规范,并且大家一起遵守,比到底规范了些什么重要得多。

我在2005年到2006年间,曾经在团队中推广过一段时间类似C的C++子集做开发,那和我早些年编写的C++程序风格完全不同,也工作良好。不过这段经历使我对面向对象和模板技术做了许多反思,并最终转向彻底的纯C语言开发。

我个人觉得,应该多尝试一些不同的东西,而不要武断地把任何技术当成唯一真理。你可以热爱面向对象,也可以尝试一下Template。但需要警惕的是,虽然C++允许把各种不同风格的编程方式杂糅在一起使用,每种都提供了高性能的支持,可以取各家之所长,有种世界在我手中的感觉,甚至可以让C++程序员心中不断生出创新的快感,殊不知,其引起的冲突和复杂性,可以轻易超过个人能控制的范畴。尤其对于聪明的C++程序员,更是危险。这一点仅仅学习语言,而不经过长年的经验积累,是很难有切身体会的。

Keep these four sublanguages in mind, and don’t be surprised when you encounter situations where effective programming requires that you change strategy when you switch from one sublanguage to another. For example, pass-by-value is generally more efficient than pass-by-reference for built-in (i.e., C-like) types, but when you move from the C part of C++ to Object-Oriented C++, the existence of userdefined constructors and destructors means that pass-by-referenceto- constis usually better. This is especially the case when working in Template C++, because there, you don’t even know the type of object you’re dealing with. When you cross into the STL, however, you know that iterators and function objects are modeled on pointers in C, so for iterators and function objects in the STL, the old C pass-by-value rule applies again. (For all the details on choosing among parameter-passing options, see Item 20.)

C++, then, isn’t a unified language with a single set of rules; it’s a federation of four sublanguages, each with its own conventions. Keep these sublanguages in mind, and you’ll find that C++ is a lot easier to understand.

Things to Remember

Rules for effective C++ programming vary, depending on the part of C++ you are using.

定义你想怎么使用C++非常重要,这决定了你的项目是否能够一直做下去直到发布。就算只有你一个人做项目,你也会使用别人的代码(至少是标准库),或提供扩展接口供别人编写扩展。这都会和并非出自你手的代码打交道。即使所有的一切都是由你一个人掌握,你也不可能随心所欲地使用那些C++中看起来最酷的特性,因为你总会发现C++中还有更有趣的东西可供挖掘。这种想法很危险,因为如此一来项目会逐渐偏离原始的目标,编写C++代码只是为了用C++编写,而非为了解决问题。

Item 2: Prefer consts, enums, and inlines to #defines.

This Item might better be called “prefer the compiler to the preprocessor,” because #define may be treated as if it’s not part of the language per se. That’s one of its problems. When you do something like this,

  1. #define ASPECT_RATIO 1.653   

the symbolic name ASPECT_RATIO may never be seen by compilers; it may be removed by the preprocessor before the source code ever gets to a compiler. As a result, the name ASPECT_RATIO may not get entered into the symbol table. This can be confusing if you get an error during compilation involving the use of the constant, because the error message may refer to 1.653, not ASPECT_RATIO. If ASPECT_RATIO were defined in a header file you didn’t write, you’d have no idea where that 1.653 came from, and you’d waste time tracking it down. This problem can also crop up in a symbolic debugger, because, again, the name you’re programming with may not be in the symbol table.

The solution is to replace the macro with a constant:

  1. const double AspectRatio = 1.653; // uppercase names are usually for  
  2. // macros, hence the name change 

As a language constant, AspectRatio is definitely seen by compilers and is certainly entered into their symbol tables. In addition, in the case of a floating point constant (such as in this example), use of the constant may yield smaller code than using a #define. That’s because the preprocessor’s blind substitution of the macro name ASPECT_RATIO with 1.653 could result in multiple copies of 1.653 in your object code, while the use of the constant AspectRatio should never result in more than one copy.

宏对于C++而言不是好东西。C++的使用惯例中,往往用各种手段来避免使用宏定义。为什么在C语言中常见的宏定义,到了C++中就不受待见了呢?光用C语言中没有好的替代品来解释是不够的。

C++强调强类型,这可以帮助程序员从纷纷扰扰、乱花迷人眼的语法糖陷阱中解脱出来,帮助编译器自动发现程序员的错误。而C语言的哲学则是可显性,推荐程序“表里如一”。C语言虽然类型较弱,但尽可能地把实际工作展现出来。在大多数情况下,宏在语言中起到的作用是使程序更易读、可配置,而并非改变语言的表现形式,或是提供一种DSL(领域相关语言)。

When replacing #defines with constants, two special cases are worth mentioning. The first is defining constant pointers. Because constant definitions are typically put in header files (where many different source files will include them), it’s important that the pointer be declared const, usually in addition to what the pointer points to. To define a constant char*-based string in a header file, for example, you have to write const twice:

  1. const char * const authorName = "Scott Meyers"

For a complete discussion of the meanings and uses of const, especially in conjunction with pointers, see Item 3. However, it’s worth reminding you here that string objects are generally preferable to their char*-based progenitors, so authorName is often better defined this way:

const std::string authorName("Scott Meyers");

这里可见 build-in类型在C++中不受欢迎。如果你不打算使用C风格的C++。使用 std::string总是比const char *要好一些。

C接口中常见的void **类型,在C++风格的程序中也不多见。

The second special case concerns class-specific constants. To limit the scope of a constant to a class, you must make it a member, and to ensure there’s at most one copy of the constant, you must make it a static member:

  1. class GamePlayer {  
  2.  
  3. private:  
  4.  
  5. static const int NumTurns = 5; // constant declaration  
  6.  
  7. int scores[NumTurns]; // use of constant  
  8.  
  9. ...  
  10.  
  11. };  
  12.  

What you see above is a declaration for NumTurns, not a definition. Usually, C++ requires that you provide a definition for anything you use, but class-specific constants that are static and of integral type (e.g., integers, chars, bools) are an exception. As long as you don’t take their address, you can declare them and use them without providing a definition. If you do take the address of a class constant, or if your compiler incorrectly insists on a definition even if you don’t take the address, you provide a separate definition like this:

  1. const int GamePlayer::NumTurns; // definition of NumTurns; see  
  2.  
  3. // below for why no value is given  
  4.  

You put this in an implementation file, not a header file. Because the initial value of class constants is provided where the constant is declared (e.g., NumTurns is initialized to 5 when it is declared), no initial value is permitted at the point of definition.

Note, by the way, that there’s no way to create a class-specific constant using a #define, because #defines don’t respect scope. Once a macro is defined, it’s in force for the rest of the compilation (unless it’s #undefed somewhere along the line). Which means that not only can’t #defines be used for class-specific constants, they also can’t be used to provide any kind of encapsulation, i.e., there is no such thing as a “private” #define. Of course, const data members can be encapsulated; NumTurns is.

Older compilers may not accept the syntax above, because it used to be illegal to provide an initial value for a static class member at its point of declaration. Furthermore, in-class initialization is allowed only for integral types and only for constants. In cases where the above syntax can’t be used, you put the initial value at the point of definition:

  1. class CostEstimate {  
  2.  
  3. private:  
  4.  
  5. static const double FudgeFactor; // declaration of static class  
  6.  
  7. ... // constant; goes in header file  
  8.  
  9. };  
  10.  
  11. const double // definition of static class  
  12.  
  13. CostEstimate::FudgeFactor = 1.35; // constant; goes in impl. file  
  14.  

This is all you need almost all the time. The only exception is when you need the value of a class constant during compilation of the class, such as in the declaration of the array GamePlayer::scores above (where compilers insist on knowing the size of the array during compilation). Then the accepted way to compensate for compilers that (incorrectly) forbid the in-class specification of initial values for static integral class constants is to use what is affectionately (and non-pejoratively) known as “the enum hack.” This technique takes advantage of the fact that the values of an enumerated type can be used where ints are expected, so GamePlayer could just as well be defined like this:

  1. Class GamePlayer {  
  2.  
  3. private:  
  4.  
  5. enum { NumTurns = 5 }; // “the enum hack” — makes  
  6.  
  7. // NumTurns a symbolic name for 5  
  8.  
  9. int scores[NumTurns]; // fine  
  10.  
  11. ...  
  12.  
  13. };  
  14.  

关于enum { NumTurns = 5 };

早期的C++编译器无法识别数字常量,而C++也不支持用变量定义数组。为了避免宏的使用,使用 enum 是常见的方法。

关于int scores[Numturns];

纯粹的STL派可能更喜爱使用std::vector或boost::array。但不可否认,几乎没有人坚持完全不用build-in数组。

The enum hack is worth knowing about for several reasons. First, the enum hack behaves in some ways more like a #define than a const does, and sometimes that’s what you want. For example, it’s legal to take the address of a const, but it’s not legal to take the address of an enum, and it’s typically not legal to take the address of a #define, either. If you don’t want to let people get a pointer or reference to one of your integral constants, an enum is a good way to enforce that constraint. (For more on enforcing design constraints through coding decisions, consult Item 18.) Also, though good compilers won’t set aside storage for const objects of integral types (unless you create a pointer or reference to the object), sloppy compilers may, and you may not be willing to set aside memory for such objects. Like #defines, enums never result in that kind of unnecessary memory allocation.

A second reason to know about the enum hack is purely pragmatic. Lots of code employs it, so you need to recognize it when you see it. In fact, the enum hack is a fundamental technique of template metaprogramming (see Item 48).

Getting back to the preprocessor, another common (mis)use of the #define directive is using it to implement macros that look like functions but that don’t incur the overhead of a function call. Here’s a macro that calls some function f with the greater of the macro’s arguments:

  1. // call f with the maximum of a and b  
  2.  
  3. #define CALL_WITH_MAX(a, b) f((a) > (b) ? (a) : (b))  
  4.  

Macros like this have so many drawbacks, just thinking about them is painful.

Whenever you write this kind of macro, you have to remember to parenthesize all the arguments in the macro body. Otherwise you can run into trouble when somebody calls the macro with an expression. But even if you get that right, look at the weird things that can happen:

  1. int a = 5b = 0;  
  2.  
  3. CALL_WITH_MAX(++a, b); // a is incremented twice  
  4.  
  5. CALL_WITH_MAX(++a, b+10); // a is incremented once  
  6.  

max是永远的关于宏的反面案例。可悲的是,这里列出的template的解决方案也并非完美。有兴趣的同学可以在Google中搜索一篇题为min, max, and more的文章。那篇文章也正是本书作者Scott Meyers所写,你会惊叹把这么一件简单的事情做得完全正确是如此的困难。

Here, the number of times that a is incremented before calling f depends on what it is being compared with!

Fortunately, you don’t need to put up with this nonsense. You can get all the efficiency of a macro plus all the predictable behavior and type safety of a regular function by using a template for an inline function (see Item 30):

  1. template<typename T> // because we don’t  
  2.  
  3. inline void callWithMax(const T& a, const T& b) // know what T is, we  
  4.  
  5. { // pass by reference-  
  6.  
  7. f(a > b ? a : b); // const — see Item 20  
  8.  
  9. }  
  10.  

This template generates a whole family of functions, each of which takes two objects of the same type and calls f with the greater of the two objects. There’s no need to parenthesize parameters inside the function body, no need to worry about evaluating parameters multiple times, etc. Furthermore, because callWithMax is a real function, it obeys scope and access rules. For example, it makes perfect sense to talk about an inline function that is private to a class. In general, there’s just no way to do that with a macro.

Given the availability of consts, enums, and inlines, your need for the preprocessor (especially #define) is reduced, but it’s not eliminated. #include remains essential, and #ifdef/#ifndef continue to play important roles in controlling compilation. It’s not yet time to retire the preprocessor, but you should definitely give it long and frequent vacations.

Things to Remember

 For simple constants, prefer const objects or enums to #defines.

 For function-like macros, prefer inline functions to #defines.

#p#

Item 3: Use const whenever possible.

The wonderful thing about const is that it allows you to specify a semantic constraint — a particular object should not be modified — and compilers will enforce that constraint. It allows you to communicate to both compilers and other programmers that a value should remain invariant. Whenever that is true, you should be sure to say so, because that way you enlist your compilers’ aid in making sure the constraint isn’t violated.

The const keyword is remarkably versatile. Outside of classes, you can use it for constants at global or namespace scope (see Item 2), as well as for objects declared static at file, function, or block scope. Inside classes, you can use it for both static and non-static data members. For pointers, you can specify whether the pointer itself is const, the data it points to is const, both, or neither:

  1. char greeting[] = "Hello";  
  2. char *p = greeting; // non-const pointer,  
  3. // non-const data  
  4. const char *p = greeting; // non-const pointer,  
  5. // const data  
  6. char * const p = greeting; // const pointer,  
  7. // non-const data  
  8. const char * const p = greeting; // const pointer,  
  9. // const data 

This syntax isn’t as capricious as it may seem. If the word const appears to the left of the asterisk, what’s pointed to is constant; if the word const appears to the right of the asterisk, the pointer itself is constant; if const appears on both sides, both are constant†.

When what’s pointed to is constant, some programmers list const before the type. Others list it after the type but before the asterisk. There is no difference in meaning, so the following functions take the same parameter type:

  1. void f1(const Widget *pw); // f1 takes a pointer to a  
  2. // constant Widget object  
  3. void f2(Widget const *pw); // so does f2  

Because both forms exist in real code, you should accustom yourself to both of them.

STL iterators are modeled on pointers, so an iterator acts much like a T* pointer. Declaring an iterator const is like declaring a pointer const (i.e., declaring a T* const pointer): the iterator isn’t allowed to point to something different, but the thing it points to may be modified. If you want an iterator that points to something that can’t be modified (i.e., the STL analogue of a const T* pointer), you want a const_iterator:

  1. std::vector<int> vec;  
  2. ...  
  3. const std::vector<int>::iterator iter = // iter acts like a T* const  
  4. vec.begin();  
  5. *iter = 10; // OK, changes what iter points to  
  6. ++iter; // error! iter is const  
  7. std::vector<int>::const_iterator cIter = //cIter acts like a const T*  
  8. vec.begin();  
  9. *cIter = 10; // error! *cIter is const  
  10. ++cIter; // fine, changes cIter 

Some of the most powerful uses of const stem from its application to function declarations. Within a function declaration, const can refer to the function’s return value, to individual parameters, and, for member functions, to the function as a whole.

Having a function return a constant value often makes it possible to reduce the incidence of client errors without giving up safety or efficiency. For example, consider the declaration of the operator* function for rational numbers that is explored in Item 24:

class Rational { ... };

const Rational operator*(const Rational& lhs, const Rational& rhs);

Many programmers squint when they first see this. Why should theresult of operator* be a const object? Because if it weren’t, clientswould be able to commit atrocities like this:

  1. Rational a, b, c;  
  2. ...  
  3. (a * b) = c; // invoke operatoron the  
  4. // result of a*b! 

I don’t know why any programmer would want to make an assignment to the product of two numbers, but I do know that many programmers have tried to do it without wanting to. All it takes is a simple typo (and a type that can be implicitly converted to bool):

if (a * b = c) ... // oops, meant to do a comparison!

现代编译器都会提醒你,这里有可能应该写“==”而不是“=”。如果你确信你需要“=”,而不是“==”,可以再加一层括号来避免编译器的问责。

Such code would be flat-out illegal if a and b were of a built-in type. One of the hallmarks of good user-defined types is that they avoid gratuitous incompatibilities with the built-ins (see also Item 18), and allowing assignments to the product of two numbers seems pretty gratuitous to me. Declaring operator*’s return value const prevents it, and that’s why it’s The Right Thing To Do in this case.

There’s nothing particularly new about const parameters — they act just like local const objects, and you should use both whenever you can. Unless you need to be able to modify a parameter or local object, be sure to declare it const. It costs you only the effort to type six characters, and it can save you from annoying errors such as the “I meant to type ‘==’ but I accidently typed ‘=’” mistake we just saw.

const Member Functions

The purpose of const on member functions is to identify which member functions may be invoked on const objects. Such member functions are important for two reasons. First, they make the interface of a class easier to understand. It’s important to know which functions may modify an object and which may not. Second, they make it possible to work with const objects. That’s a critical aspect of writing efficient code, because, as Item 20 explains, one of the fundamental ways to improve a C++ program’s performance is to pass objects by reference- to-const. That technique is viable only if there are const member functions with which to manipulate the resulting const-qualified objects.

Many people overlook the fact that member functions differing only in their constness can be overloaded, but this is an important feature of C++. Consider a class for representing a block of text:

  1. class TextBlock {  
  2. public:  
  3. ...  
  4. const char& operator[](std::size_t position) const // operator[] for  
  5. { return text[position]; } // const objects  
  6. char& operator[](std::size_t position) // operator[] for  
  7. { return text[position]; } // non-const objects  
  8. private:  
  9. std::string text;  
  10. };  
  11. TextBlock’s operator[]s can be used like this:  
  12. TextBlock tb("Hello");  
  13. std::cout << tb[0]; // calls non-const  
  14. // TextBlock::operator[]  
  15. const TextBlock ctb("World");  
  16. std::cout << ctb[0]; // calls const TextBlock::operator[]  
  17. Incidentally, const objects most often arise in real programs as a result of being passed by pointer- or reference-to-const. The example of ctb above is artificial. This is more realistic:  
  18. void print(const TextBlock& ctb) // in this function, ctb is const  
  19. {  
  20. std::cout << ctb[0]; // calls const TextBlock::operator[]  
  21. ...  
  22. }  
  23. By overloading operator[] and giving the different versions different return types, you can have const and non-const TextBlocks handled differently:  
  24. std::cout << tb[0]; // fine — reading a  
  25. // non-const TextBlock  
  26. tb[0] = 'x'; // fine — writing a  
  27. // non-const TextBlock  
  28. std::cout << ctb[0]; // fine — reading a  
  29. // const TextBlock  
  30. ctb[0] = 'x'; // error! — writing a  
  31. // const TextBlock 

Note that the error here has only to do with the return type of the operator[] that is called; the calls to operator[] themselves are all fine. The error arises out of an attempt to make an assignment to a const char&, because that’s the return type from the const version of operator[].

Also note that the return type of the non-const operator[] is a reference to a char — a char itself would not do. If operator[] did return a simple char, statements like this wouldn’t compile:

tb[0] = 'x';

That’s because it’s never legal to modify the return value of a function that returns a built-in type. Even if it were legal, the fact that C++ returns objects by value (see Item 20) would mean that a copy of tb.text[0] would be modified, not tb.text[0] itself, and that’s not the behavior you want.

Let’s take a brief time-out for philosophy. What does it mean for a member function to be const? There are two prevailing notions: bitwise constness (also known as physical constness) and logical constness.

The bitwise const camp believes that a member function is const if and only if it doesn’t modify any of the object’s data members (excluding those that are static), i.e., if it doesn’t modify any of the bits inside the object. The nice thing about bitwise constness is that it’s easy to detect violations: compilers just look for assignments to data members. In fact, bitwise constness is C++’s definition of constness, and a const member function isn’t allowed to modify any of the non-static data members of the object on which it is invoked.

Unfortunately, many member functions that don’t act very const pass the bitwise test. In particular, a member function that modifies what a pointer points to frequently doesn’t act const. But if only the pointer is in the object, the function is bitwise const, and compilers won’t complain. That can lead to counterintuitive behavior. For example, suppose we have a TextBlock-like class that stores its data as a char* instead of a string, because it needs to communicate through a C API that doesn’t understand string objects.

  1. class CTextBlock {  
  2.  
  3. public:  
  4.  
  5. ...  
  6.  char& operator[](std::size_t position) const //inappropriate(but bitwise  
  7.  
  8. { return pText[position]; } //const) declaration of  
  9.  
  10. //operator[]  
  11.  
  12. private:  
  13.  
  14. char *pText;  
  15.  
  16. };  

This class (inappropriately) declares operator[] as a const member function, even though that function returns a reference to the object’s internal data (a topic treated in depth in Item 28). Set that aside and note that operator[]’s implementation doesn’t modify pText in any way. As a result, compilers will happily generate code for operator[]; it is, after all, bitwise const, and that’s all compilers check for. But look what it allows to happen:

  1. const CTextBlock cctb("Hello"); // declare constant object  
  2. char *pc = &cctb[0]; // call the const operator[] to get a  
  3. // pointer to cctb’s data  
  4. *pc = ’J’; // cctb now has the value “Jello”  
  5. Surely there is something wrong when you create a constant object with a particular value and you invoke only const member functions on it, yet you still change its value!  
  6. This leads to the notion of logical constness. Adherents to this philosophy argue that a const member function might modify some of the bits in the object on which it’s invoked, but only in ways that clients cannot detect. For example, your CTextBlock class might want to cache the length of the textblock whenever it’s requested:  
  7. class CTextBlock {  
  8. public:  
  9. ...  
  10. std::size_t length() const;  
  11. private:  
  12. char *pText;  
  13. std::size_t textLength; // last calculated length of textblock  
  14. bool lengthIsValid; // whether length is currently valid  
  15. };  
  16. std::size_t CTextBlock::length() const  
  17. {  
  18. if (!lengthIsValid) {  
  19. textLength = std::strlen(pText); // error! can’t assign to textLength  
  20. lengthIsValid = true; // and lengthIsValid in a const  
  21. } // member function  
  22. return textLength;  
  23. }  

This implementation of length is certainly not bitwise const — both textLength and lengthIsValid may be modified — yet it seems as though it should be valid for const CTextBlock objects. Compilers disagree. They insist on bitwise constness. What to do?

The solution is simple: take advantage of C++’s const-related wiggle room known as mutable. mutable frees non-static data members from the constraints of bitwise constness:

  1. class CTextBlock {  
  2.  
  3. public:  
  4.  
  5. ...  
  6.  
  7. std::size_t length() const;  
  8.  
  9. private:  
  10.  
  11. char *pText;  
  12.  
  13. mutable std::size_t textLength; // these data members may  
  14.  
  15. mutable bool lengthIsValid; // always be modified, even in  
  16.  
  17. }; // const member functions  
  18.  
  19. std::size_t CTextBlock::length() const  
  20.  
  21. {  
  22.  
  23. if (!lengthIsValid) {  
  24.  
  25. textLength = std::strlen(pText); // now fine  
  26.  
  27. lengthIsValid = true; // also fine  
  28.  
  29. }  
  30.  
  31. return textLength;  
  32.  
  33. }  
  34.  
  35. Avoiding Duplication in const and Non-const Member Functions  
  36.  
  37. mutable is a nice solution to the bitwise-constness-is-not-what-I-hadin- mind problem, but it doesn’t solve all const-related difficulties. For example, suppose that operator[] in TextBlock (and CTextBlock) not only returned a reference to the appropriate character, it also performed bounds checking, logged access information, maybe even did data integrity validation. Putting all this in both the const and the non-const operator[] functions (and not fretting that we now have implicitly inline functions of nontrivial length — see Item 30) yields this kind of monstrosity:  
  38.  
  39. class TextBlock {  
  40.  
  41. public:  
  42.  
  43. ...  
  44.  
  45. const char& operator[](std::size_t position) const  
  46.  
  47. {  
  48.  
  49. ... // do bounds checking  
  50.  
  51. ... // log access data  
  52.  
  53. ... // verify data integrity  
  54.  
  55. return text[position];  
  56.  
  57. }  
  58.  
  59. char& operator[](std::size_t position)  
  60.  
  61. {  
  62.  
  63. ... // do bounds checking  
  64.  
  65. ... // log access data  
  66.  
  67. ... // verify data integrity  
  68.  
  69. return text[position];  
  70.  
  71. }  
  72.  
  73. private:  
  74.  
  75. std::string text;  
  76.  
  77. };  
  78.  

Ouch! Can you say code duplication, along with its attendant compilation time, maintenance, and code-bloat headaches? Sure, it’s possible to move all the code for bounds checking, etc. into a separate member function (private, naturally) that both versions of operator[] call, but you’ve still got the duplicated calls to that function and you’ve still got the duplicated return statement code.

What you really want to do is implement operator[] functionality once and use it twice. That is, you want to have one version of operator[] call the other one. And that brings us to casting away constness.

As a general rule, casting is such a bad idea, I’ve devoted an entire Item to telling you not to do it (Item 27), but code duplication is no picnic, either. In this case, the const version of operator[] does exactly what the non-const version does, it just has a const-qualified return type. Casting away the const on the return value is safe, in this case, because whoever called the non-const operator[] must have had a non-const object in the first place. Otherwise they couldn’t have called a non-const function. So having the non-const operator[] call the const version is a safe way to avoid code duplication, even though it requires a cast. Here’s the code, but it may be clearer after you read the explanation that follows:

  1. class TextBlock {  
  2.  
  3. public:  
  4.  
  5. ...  
  6.  
  7. const char& operator[](std::size_t position) const // same as before  
  8.  
  9. {  
  10.  
  11. ...  
  12.  
  13. ...  
  14.  
  15. ...  
  16.  
  17. return text[position];  
  18.  
  19. }  
  20.  
  21. char& operator[](std::size_t position) // now just calls const op[]  
  22.  
  23. {  
  24.  
  25. return  
  26.  
  27. const_cast<char&>( // cast away const on  
  28.  
  29. // op[]’s return type;  
  30.  
  31. static_cast<const TextBlock&>(*this) // add const to *this’s type;  
  32.  
  33. [position] // call const version of op[]  
  34.  
  35. );  
  36.  
  37. }  
  38.  
  39. ...  
  40.  
  41. };  
  42.  

As you can see, the code has two casts, not one. We want the non-const operator[] to call the const one, but if, inside the non-const operator[], we just call operator[], we’ll recursively call ourselves. That’s only entertaining the first million or so times. To avoid infinite recursion, we have to specify that we want to call the const operator[], but there’s no direct way to do that. Instead, we cast *this from its native type of TextBlock& to const TextBlock&. Yes, we use a cast to add const! So we have two casts: one to add const to *this (so that our call to operator[] will call the const version), the second to remove the const from the const operator[]’s return value.

The cast that adds const is just forcing a safe conversion (from a non-const object to a const one), so we use a static_cast for that. The one that removes const can be accomplished only via a const_cast, so we don’t really have a choice there. (Technically, we do. A C-style cast would also work, but, as I explain in Item 27, such casts are rarely the right choice. If you’re unfamiliar with static_cast or const_cast, Item 27 contains an overview.)

On top of everything else, we’re calling an operator in this example, so the syntax is a little strange. The result may not win any beauty contests, but it has the desired effect of avoiding code duplication by implementing the non-const version of operator[] in terms of the const version. Whether achieving that goal is worth the ungainly syntax is something only you can determine, but the technique of implementing a non-const member function in terms of its const twin is definitely worth knowing.

Even more worth knowing is that trying to do things the other way around — avoiding duplication by having the const version call the non-const version — is not something you want to do. Remember, a const member function promises never to change the logical state of its object, but a non-const member function makes no such promise. If you were to call a non-const function from a const one, you’d run the risk that the object you’d promised not to modify would be changed. That’s why having a const member function call a non-const one is wrong: the object could be changed. In fact, to get the code to compile, you’d have to use a const_cast to get rid of the const on *this, a clear sign of trouble. The reverse calling sequence — the one we used above — is safe: the non-const member function can do whatever it wants with an object, so calling a const member function imposes no risk. That’s why a static_cast works on *this in that case: there’s no const-related danger.

As I noted at the beginning of this Item, const is a wonderful thing. On pointers and iterators; on the objects referred to by pointers, iterators, and references; on function parameters and return types; on local variables; and on member functions, const is a powerful ally. Use it whenever you can. You’ll be glad you did.

Things to Remember

Declaring something const helps compilers detect usage errors. const can be applied to objects at any scope, to function parameters and return types, and to member functions as a whole.

Compilers enforce bitwise constness, but you should program using conceptual constness.

When const and non-const member functions have essentially identical implementations, code duplication can be avoided by having the non-const version call the const version.

#p#

Item 4: Make sure that objects are initialized before they’re used.

C++ can seem rather fickle about initializing the values of objects. For example, if you say this,

int x;

in some contexts, x is guaranteed to be initialized (to zero), but in others, it’s not. If you say this,

class Point {

int x, y;

};

...

Point p;

p’s data members are sometimes guaranteed to be initialized (to zero),but sometimes they’re not. If you’re coming from a languagewhereuninitialized objects can’t exist, pay attention, because this is important.

Reading uninitialized values yields undefined behavior. On some platforms, the mere act of reading an uninitialized value can halt your program. More typically, the result of the read will be semi-random bits, which will then pollute the object you read the bits into, eventually leading to inscrutable program behavior and a lot of unpleasant debugging.

Now, there are rules that describe when object initialization is guaranteed to take place and when it isn’t. Unfortunately, the rules are complicated— too complicated to be worth memorizing, in my opinion. In general, if you’re in the C part of C++ (see Item 1) and initialization would probably incur a runtime cost, it’s not guaranteed to take place. If you cross into the non-C parts of C++, things sometimes change. This explains why an array (from the C part of C++) isn’t necessarily guaranteed to have its contents initialized, but a vector (from the STL part of C++) is.

The best way to deal with this seemingly indeterminate state of affairs is to always initialize your objects before you use them. For non-member objects of built-in types, you’ll need to do this manually. For example:

  1. int x = 0; // manual initialization of an int  
  2. const char * text = "A C-style string"; // manual initialization of a  
  3. // pointer (see also Item 3)  
  4. double d; // “initialization” by reading from  
  5. std::cin >> d; // an input stream  

For almost everything else, the responsibility for initialization falls on constructors. The rule there is simple: make sure that all constructors initialize everything in the object.

The rule is easy to follow, but it’s important not to confuse assignment with initialization. Consider a constructor for a class representing entries in an address book:

  1. class PhoneNumber { ... };  
  2. class ABEntry { // ABEntry = “Address Book Entry”  
  3. public:  
  4. ABEntry(const std::string& name, const std::string& address,  
  5. const std::list<PhoneNumber>& phones);  
  6. private:  
  7. std::string theName;  
  8. std::string theAddress;  
  9. std::list<PhoneNumber> thePhones;  
  10. int numTimesConsulted;  
  11. };  
  12. ABEntry::ABEntry(const std::string& name, const std::string& address,  
  13. const std::list<PhoneNumber>& phones)  
  14. {  
  15. theName = name; // these are all assignments,  
  16. theAddress = address; // not initializations  
  17. thePhones = phones;  
  18. numTimesConsulted = 0;  
  19. }  

大部分情况下,我们已经不再需要逐条指令、每个时钟周期地去推敲代码的执行性能。即使你有强迫症,非要这样做,许多现代CPU(尤其是复杂指令集的CPU)的工作方式也不能精确地保证每条指令到底需要多长时间完成。所以,即便你觉得某处变量声明处的初始化过程不必要,也最好养成习惯去初始化变量。

当你把编译器的警告开关打开(以我最常用的编译器gcc为例),编译器通常会检查出那些可能未被初始化就开始使用的变量。你可以强迫编译器在警告的同时停止编译,这么做可以帮助你培养出初始化变量的好习惯。

This will yield ABEntry objects with the values you expect, but it’s still not the best approach. The rules of C++ stipulate that data members of an object are initialized before the body of a constructor is entered. Inside the ABEntry constructor, theName, theAddress, and thePhones aren’t being initialized, they’re being assigned. Initialization took place earlier — when their default constructors were automatically called prior to entering the body of the ABEntry constructor. This isn’t true for numTimesConsulted, because it’s a built-in type. For it, there’s no guarantee it was initialized at all prior to its assignment.

A better way to write the ABEntry constructor is to use the member initialization list instead of assignments:

  1. ABEntry::ABEntry(const std::string& name, const std::string& address,  
  2. const std::list<PhoneNumber>& phones)  
  3. :theName(name),  
  4. theAddress(address), // these are now all initializations  
  5. thePhones(phones),  
  6. numTimesConsulted(0)  
  7. {} // the ctor body is now empty 

This constructor yields the same end result as the one above, but it will often be more efficient. The assignment-based version first called default constructors to initialize theName, theAddress, and thePhones, then promptly assigned new values on top of the default-constructed ones. All the work performed in those default constructions was therefore wasted. The member initialization list approach avoids that problem, because the arguments in the initialization list are used as constructor arguments for the various data members. In this case, theName is copy-constructed from name, theAddress is copy-constructed from address, and thePhones is copy-constructed from phones. For most types, a single call to a copy constructor is more efficient — sometimes much more efficient — than a call to the default constructor followed by a call to the copy assignment operator.

For objects of built-in type like numTimesConsulted, there is no difference in cost between initialization and assignment, but for consistency, it’s often best to initialize everything via member initialization. Similarly, you can use the member initialization list even when you want to default-construct a data member; just specify nothing as an initialization argument. For example, if ABEntry had a constructor taking no parameters, it could be implemented like this:

  1. ABEntry::ABEntry()  
  2. : theName(), // call theName’s default ctor;  
  3. theAddress(), // do the same for theAddress;  
  4. thePhones(), // and for thePhones;  
  5. numTimesConsulted(0) // but explicitly initialize  
  6. {} // numTimesConsulted to zero  

Because compilers will automatically call default constructors for data members of user-defined types when those data members have no initializers on the member initialization list, some programmers consider the above approach overkill. That’s understandable, but having a policy of always listing every data member on the initialization list avoids having to remember which data members may go uninitialized if they are omitted. Because numTimesConsulted is of a built-in type, for example, leaving it off a member initialization list could open the door to undefined behavior.

Sometimes the initialization list must be used, even for built-in types. For example, data members that are const or are references must be initialized; they can’t be assigned (see also Item 5). To avoid having to memorize when data members must be initialized in the member initialization list and when it’s optional, the easiest choice is to always use the initialization list. It’s sometimes required, and it’s often more efficient than assignments.

如果你用到了多重继承,你会发现有时使用初始化列表是避免迷惑编译器的唯一方法。

Many classes have multiple constructors, and each constructor has its own member initialization list. If there are many data members and/or base classes, the existence of multiple initialization lists introduces undesirable repetition (in the lists) and boredom (in the programmers). In such cases, it’s not unreasonable to omit entries in the lists for data members where assignment works as well as true initialization, moving the assignments to a single (typically private) function that all the constructors call. This approach can be especially helpful if the true initial values for the data members are to be read from a file or looked up in a database. In general, however, true member initialization (via an initialization list) is preferable to pseudo-initialization via assignment.

One aspect of C++ that isn’t fickle is the order in which an object’s data is initialized. This order is always the same: base classes are initialized before derived classes (see also Item 12), and within a class, data members are initialized in the order in which they are declared. In ABEntry, for example, theName will always be initialized first, theAddress second, thePhones third, and numTimesConsulted last. This is true even if they are listed in a different order on the member initialization list (something that’s unfortunately legal). To avoid reader confusion, as well as the possibility of some truly obscure behavioral bugs, always list members in the initialization list in the same order as they’re declared in the class.

Once you’ve taken care of explicitly initializing non-member objects of built-in types and you’ve ensured that your constructors initialize their base classes and data members using the member initialization list, there’s only one more thing to worry about. That thing is — take a deep breath — the order of initialization of non-local static objects defined in different translation units.

Let’s pick that phrase apart bit by bit.

A static object is one that exists from the time it’s constructed until the end of the program. Stack and heap-based objects are thus excluded. Included are global objects, objects defined at namespace scope, objects declared static inside classes, objects declared static inside functions, and objects declared static at file scope. Static objects inside functions are known as local static objects (because they’re local to a function), and the other kinds of static objects are known as non-local static objects. Static objects are automatically destroyed when the program exits, i.e., their destructors are automatically called when main finishes executing.

A translation unit is the source code giving rise to a single object file. It’s basically a single source file, plus all of its #include files.

The problem we’re concerned with, then, involves at least two separately compiled source files, each of which contains at least one nonlocal static object (i.e., an object that’s global, at namespace scope, or static in a class or at file scope). And the actual problem is this: if initialization of a non-local static object in one translation unit uses a non-local static object in a different translation unit, the object it uses could be uninitialized, because the relative order of initialization of non-local static objects defined in different translation units is undefined.

An example will help. Suppose you have a FileSystem class that makes files on the Internet look like they’re local. Since your class makes the world look like a single file system, you might create a special object at global or namespace scope representing the single file system:

  1. class FileSystem { // from your library’s header file  
  2. public:  
  3. ...  
  4. std::size_t numDisks() const; // one of many member functions  
  5. ...  
  6. };  
  7. extern FileSystem tfs; // declare object for clients to use;  
  8. // (“tfs” = “the file system”);definition  
  9. // is in some.cpp file in your library   
  10. A FileSystem object is decidedly non-trivial, so use of the tfs object before it has been constructed would be disastrous.   
  11. Now suppose some client creates a class for directories in a file system. Naturally, their class uses the tfs object:  
  12. class Directory { // created by library client  
  13. public:  
  14. Directory( params );  
  15. ...  
  16. };  
  17. Directory::Directory( params )  
  18. {  
  19. ...  
  20. std::size_t disks = tfs.numDisks(); // use the tfs object  
  21. ...  
  22. }  

Further suppose this client decides to create a single Directory object for temporary files:

Directory tempDir( params ); // directory for temporary files

Now the importance of initialization order becomes apparent: unless tfs is initialized before tempDir, tempDir’s constructor will attempt to use tfs before it’s been initialized. But tfs and tempDir were created by different people at different times in different source files — they’re non-local static objects defined in different translation units. How can you be sure that tfs will be initialized before tempDir?

You can’t. Again, the relative order of initialization of non-local static objects defined in different translation units is undefined. There is a reason for this. Determining the “proper” order in which to initialize non-local static objects is hard. Very hard. Unsolvably hard. In its most general form — with multiple translation units and non-local static objects generated through implicit template instantiations (which may themselves arise via implicit template instantiations) — it’s not only impossible to determine the right order of initialization, it’s typically not even worth looking for special cases where it is possible to determine the right order.

Fortunately, a small design change eliminates the problem entirely. All that has to be done is to move each non-local static object into its own function, where it’s declared static. These functions return references to the objects they contain. Clients then call the functions instead of referring to the objects. In other words, non-local static objects are replaced with local static objects. (Aficionados of design patterns will recognize this as a common implementation of the Singleton pattern.)

This approach is founded on C++’s guarantee that local static objects are initialized when the object’s definition is first encountered during a call to that function. So if you replace direct accesses to non-local static objects with calls to functions that return references to local static objects, you’re guaranteed that the references you get back will refer to initialized objects. As a bonus, if you never call a function emulating a non-local static object, you never incur the cost of constructing and destructing the object, something that can’t be said for true non-local static objects.

Here’s the technique applied to both tfs and tempDir:

  1. class FileSystem { ... }; // as before  
  2. FileSystem& tfs() // this replaces the tfs object; it could be  
  3. { // static in the FileSystem class  
  4. static FileSystem fs; // define and initialize a local static object  
  5. return fs; // return a reference to it  
  6. }  
  7. class Directory { ... }; // as before  
  8. Directory::Directory( params ) // as before, except references to tfs are  
  9. { // now to tfs()  
  10. ...  
  11. std::size_t disks = tfs().numDisks();  
  12. ...  
  13. }  
  14. Directory& tempDir() // this replaces the tempDir object; it  
  15. { // could be static in the Directory class  
  16. static Directory td; // define/initialize local static object  
  17. return td; // return reference to it  
  18. }  

Clients of this modified system program exactly as they used to, except they now refer to tfs() and tempDir() instead of tfs and tempDir. That is, they use functions returning references to objects instead of using the objects themselves.

The reference-returning functions dictated by this scheme are always simple: define and initialize a local static object on line 1, return it on line 2. This simplicity makes them excellent candidates for inlining, especially if they’re called frequently (see Item 30). On the other hand, the fact that these functions contain static objects makes them problematic in multithreaded systems. Then again, any kind of non-const static object — local or non-local — is trouble waiting to happen in the presence of multiple threads. One way to deal with such trouble is to manually invoke all the reference-returning functions during the single- threaded startup portion of the program. This eliminates initialization- related race conditions.

Of course, the idea of using reference-returning functions to prevent initialization order problems is dependent on there being a reasonable initialization order for your objects in the first place. If you have a system where object A must be initialized before object B, but A’s initialization is dependent on B’s having already been initialized, you are going to have problems, and frankly, you deserve them. If you steer clear of such pathological scenarios, however, the approach described here should serve you nicely, at least in single-threaded applications.

To avoid using objects before they’re initialized, then, you need to do only three things. First, manually initialize non-member objects of built-in types. Second, use member initialization lists to initialize all parts of an object. Finally, design around the initialization order uncertainty that afflicts non-local static objects defined in separate translation units.

Things to Remember

Manually initialize objects of built-in type, because C++ only sometimes initializes them itself.

In a constructor, prefer use of the member initialization list to assignment inside the body of the constructor. List data members in the initialization list in the same order they’re declared in the class.

Avoid initialization order problems across translation units by replacing non-local static objects with local static objects.

non-local static(非局部静态)对象的构造次序问题,从理论上很容易理解,但实践中一旦碰到总会让程序员吐血。新手与明白其中精妙的C++程序员的唯一差距在于经验。只有积累一定的经验后,碰到这个问题引起的故障时才能快速定位,推断出问题的根源。但想完全规避这类问题,除非在编码规范中定下规矩,坚决不使用非局部静态对象。

在我个人的编程历史上,很少在同一问题上栽两次跟头,而这类问题则是个例外。

很多年前的一个项目中,我将一个类声明成了静态对象,而这个类引用了一个 static的vector 对象。结果这个vector的构造时序晚于静态对象。

在系统的运行过程中,这个静态对象的几个成员函数先于vector的构造函数被调用,导致在构造vector之前,首先调用了vector的resize方法。可怕的是,这个先于vector构造函数的调用虽然结果未定义,但在现实中却可以得到合法的结果。程序则可以正常地运行下去。当调用vector的构造函数后,程序的内部状态被破坏了。这个破坏却不是立刻暴露出来。等到程序挂掉,表现出bug的时候,已经很难追踪问题的根源了,甚至bug都难以重现。

追根溯源,除了编写C++代码过程中的不良习惯(使用non-local static对象)外,出现这个问题的原因还在于,C++语言对于未定义的数据上的执行行为也未定义,而这种未定义掩盖了问题,从表面上看,程序运行良好。比如,许多OS分配的内存空间内的数据都是以零填充的。而零对于C++对象来说经常就是默认值,这导致未被正确初始化的C++对象看起来也能暂时正常工作。

很难相信,我在碰到并检查出这个bug的十年后,居然又在一个新项目中制造了完全相同的bug。唯一的区别是,第二次我只用半小时就定位了问题代码。

从这个问题也能看出内建数组和STL中vector的不同点。虽然std::vector极力模拟出和数组同样的特性,但更复杂的实现却使得它与数组终归有所区别。若单从运行性能上看,内建数组倒是没有明显的优势。

欢迎参加点评活动:读大师的书 说自己的话——《传世经典书丛评注版》邀你来点评

链接地址:http://blog.csdn.net/broadview2006/article/details/6587459

更多详请点击这里:http://download.csdn.net/source/3419905

0
0
经典图书评注:Accustoming Yourself to C++