-
Notifications
You must be signed in to change notification settings - Fork 42
General C Coding Style
Michel Machado edited this page Apr 6, 2018
·
11 revisions
- Comments explaining statements should be full sentences, so they begin with a capital letter and finish with proper punctuation. If a comment is just a title, it doesn't need a closing period.
- Comments explaining members of structs or other similar small descriptions don't need to be a full sentence.
- The exact formatting of comments depends on the convention of the codebase. For example, some codebases prefer multi-line comments to begin with a line containing just /*, where others prefer the comment itself to start on that line: /* This piece of code.... Check the style guide of the codebase if you're unsure what the expectations are.
- Each commit should only provide one piece of functionality. If the commit does multiple things, it should probably be reorganized or split into more commits.
- All of the commits should have a tag that specifies what part of the tree is being changed. If it's at the top of the repository it could be a single file (e.g., README), or a change that affects an entire directory or subsystem (e.g., gk), or even a change that affects many parts of the project as a whole (e.g., gatekeeper). For example, in Gatekeeper, one may would like to add some functionality to GK block using the tag "gk: add xxx functionality".
- All missing code must be tagged. Especially in the first patches, there will be chunks of missing code. All these chunks must have a description of what is missing preceded by one of the following tags: TODO or XXX. The tag TODO must be used when the missing code is needed for a functional build, whereas the tag XXX are for possible new features or improvements, and important suggestions.
- Only add code needed for the current patch. Needs of future patches must be added for future patches. Everything in a pull request must be justified.
- If some lines of code will be removed in the following patch, then they just shouldn't be in the first patch to start with.
- Each commit must move the repository to a state in which the whole code compiles. The compiler is our best friend, do not reject its code review.
- DO NOT BREAK the binary search of git (1). This mechanism to squash bugs is very useful. In order to avoid breaking it, any code in a commit must be called. If there is a function that is going to be called by a future commit, the future commit should add that function.
- Code that can be potentially used by multiple functional blocks should be maintained in a shared lib sub-folder.
- For shared data structures among multiple instances/functional blocks, one can simply use reference counting to allow the last instance to destroy it.
- One should only include dependent files when the current file needs it. For example, assume we have files X.h and X.c, and X.c needs a header file Y.h, one should put Y.h in X.c, since X.h does not directly need it.
- If the constants are just examples, one should add a comment saying that they should be analyzed or tested further to find effective values. Moreover, the expression that defines the constant should be between parentheses.
- If we need some constant number, consider using a macro variable instead of the number directly for better readability.
- Declare all local variables at the opening of their blocks. Although modern C allows one to declare variables virtually anywhere, this rule makes it easy to find all local variables of a given block.
- Declare a local variable at the beginning of the block that gives it the smallest scope. This improves readability limiting the number of variables on which one must pay attention. If a local variable is used in two or more blocks, declare it in the innermost block that includes the blocks that use the variable.
- Avoid global variables as much as possible. Global variables can lead to bugs that are really hard to debug because they enlarge the amount of state that one needs to keep in mind while reasoning about the code.
- Global variables are guaranteed to be automatically initialized to zeros, so one should only have initializers to them if the values are not zeros. Moreover, having zero initializers does grow the footprint of the final binary. Notice that this only applies to global variables, as the values of local variables are undefined when a function enters.
- Use spaces for alignment in structs, except for very long entries.
- Whenever a cast is unnecessary, drop it. Having implicit casts, that is, casts that the compiler does on its own, are best because whenever a type changes the compiler will check if the uses of that type are still correct. However, coders are the ones responsible for the correctness of explicit casts, which is harder to maintain.
- Don't break error strings even if they go above the line limit. The reason for this is to enable "grepping" for error messages.
- Begin blocks (if, while, for, struct declarations) with the opening brace on the same line, not on its own line.
- Use goto/labels to make a function fail gracefully. A tip to avoid forgetting label cleanups: whenever you have a function that can return an error, it must have a cleanup. Another case is functions that don't return errors, but change some state. Failing gracefully means that the function reverses all of its side effects before returning.
- When there's nothing to clean up on an error return in a function, just returning where the error is identified produces a code that is clearer: no need for variable ret, no goto and label, and no need to scroll to the end of the function to find out that there's nothing to clean up.
- Add a break at the end of a switch block to guarantee that if someone ever needs to move the default case, the break is there. Also, this last break is free because the compiler automatically removes it with any optimization parameter.
- When an if statement finishes with a return, the else case is not needed. This makes the code just a sequence of if statements, which is easier to read. For example, when the code needs to deal with IPv4 and IPv6 protocols separately and other protocols should have a return value -1, the code could be written as follows:
- When a test is done in a critical zone (called often or in the data path) the code can use the likely() and unlikely() macros to indicate the expected or preferred fast path.
- Functions with only a few lines of code can be made inline. Moreover, static functions defined in a header file must be declared as static inline in order to prevent compiler warnings about the function being unused.
- When a for loop spans multiple lines, we recommend to put braces around it.
- More than one indent is used when you need to continue to the next line in a control block's condition, or when you want to align consecutive lines to an appropriate spot.
- Whenever possible, use an existing error code (e.g., -ENOENT) instead of an arbitrary number as a return value. Notice that a function should not mix arbitrary numbers (i.e. -1) and error codes (i.e. -ENOENT), because it's possible that someone later uses an error code that overlaps with an arbitrary number.
- When a function is too large (say, it has 300 lines of code), the function should be broken down into smaller functions. This not only improves readability, but also reduces the complexity and improves the robustness of the function.
- When a small block of code has a meaningful functionality, it could become a small function to make the code more readable.
switch (val) { case x: ...... break; case y: ...... break; default: ...... break; }
if (proto == ETHER_TYPE_IPv4) { ...... return ret; } if (likely(proto == ETHER_TYPE_IPv6)) { ...... return ret; } return -1;
- Although not required, setting the pointer to a dynamically allocated structure to NULL after freeing the allocation it held would help to catch bugs rather quickly. For example, consider mistakenly calling the deallocation function twice.
- For sizeof(), use the variable name as the parameter instead of the type of that variable, in case its type changes in the future.
- For void *memcpy(void *dest, const void *src, size_t n), it's always safer to put the destination field in the sizeof(). The reason being this guarantees that one can't ever go over buffer.
- Code dealing with packets, in practice, CANNOT assume that packets are always well formed. While it's okay to assume that a packet always has an Ethernet header, anything above that could be missing. Unless you check the length, you may be accessing invalid memory positions.
- As long as a function can return an error message, one must have ways to deal with the situation where the error happens.
- Where appropriate, make compile-time parameters into runtime parameters.
- Whenever it's possible to use a public library, please use it instead of writing it by yourself. This way, the code will be more robust and stable. For example, one can use the linked list implementation from the Linux kernel.
- Put the definitions of data structures and functions only in the file that needs them. This way, the code can be cleaner and simpler. Also, it limits the scope of the code, and it's easier for debugging when there is an error in that code.
- For calculating the size of bool arrays, as funny as it may sound, the C standard doesn't state that bool is one-byte long; Therefore, use sizeof() instead of the number of elements in the bool array to obtain its size.
...... ptr = malloc(); ...... free(ptr); ptr = NULL;
All grants that have generously supported the development of Linux XIA are listed on our Funding page.