Finding code of interest is a vital skill but our tools are often too simple. Martin Moene tries something better.
Writing software isn’t what it used to be. ‘But I enjoy it more than ever’, you say. Yes, indeed. I mean, we write differently, our writing style has changed. And sometimes we may like to rewrite history a little and change some old-fashioned code to something contemporary.
Figure 1: ADA source code excerpt that led to the rapid unscheduled disassembly of Ariane 5, Flight 501 [Ariane-5]. |
Take for example software I’m working on and that contains code not unlike the Ariane 5 code in Figure 1 in several places. (The definition of
uiMinValue
and
uiMaxValue
is not shown here.)
void foo( const int uiValue ) { int uiTrueValue = uiValue; if ( uiTrueValue < uiMinValue ) { uiTrueValue = uiMinValue; } else if ( value > uiMaxValue ) { uiTrueValue = uiMaxValue; } //use uiTrueValue... }
Use a more intent-revealing name and change the formatting and it’s easier to see what the code is about, as similarities and differences in the code fragment now stand out.
void foo( const int value ) { int clampedValue = value; if ( clampedValue < minValue ) clampedValue = minValue; else if ( clampedValue > maxValue ) clampedValue = maxValue; // use clampedValue... }
Although I like this kind of code formatting, it touches on the
unsustainable
[
Henney11
]. It’s also still rather long winded and
clampedValue
is mutable while there’s no reason why it should be. So, let’s change the code further to use a single assignment.
void foo( const int value ) { using namespace std; const int clampedValue = min( max( value, minValue ), maxValue ); // use clampedValue ... }
A nice C++ idiom. Or so I thought. Until I looked at such code with a domain expert. I had to explain that this code clamps the given value between two extremes. And I agree: the code tells what it’s intended for in a rather obscure way. Why not just say
clamp
or
limit
?
void foo( const int value ) { const int clampedValue = clamp( value, minValue, maxValue ); // use clampedValue ... }
It turns out that
clamp
is one of the algorithms in library
Boost.Algoritms
[
Boost
]. Microsoft’s C++ AMP library for parallelism also provides a
clamp
function [
AMP
]. Several interesting suggestions for such a function were made on mailing list accu-general [
Alternative
].
CppCheck rules
Knowing what we have and what we want to change it to, the next question is: how do we find all occurrences we want to change?
Of course we can try and use grep or the search function of our IDE, but it may prove difficult to specify a search that allows for sufficient variation and that may span multiple lines.
Now I had been using CppCheck occasionally to assess code from the CodeBlocks IDE [ CppCheck ]. When I noticed a new version was available I glanced over the manual and noticed CppCheck allows you to specify simple rules via an XML format specification [ Manual ].
<?xml version="1.0"?> <rule> <tokenlist>LIST</tokenlist> <pattern>PATTERN</pattern> <message> <id>ID</id> <severity>SEVERITY</severity> <summary>SUMMARY</summary> </message> </rule>
With such a rule, CppCheck can search for a pattern in the code that may span multiple lines. PATTERN, a Perl-compatible regular expression [
PCRE
] controls which code patterns will match the search. For our purpose,
LIST
is specified as simple and can be omitted. Examples for
ID
,
SEVERITY
and
SUMMARY
of an existing message are:
variableScope
,
style
and
"The scope of the variable 'var' can be reduced"
.
Please don’t also fall into the trap of thinking your PCRE knowledge suffices to proceed and specify a search pattern. Instead start with reading Writing Cppcheck rules [ Rules ].
CppCheck search pattern
As a first step to compose a regular expression for the pattern you are interested in, run CppCheck with rule
.+
on a code excerpt that contains the pattern.
prompt> cppcheck.exe --rule=".+" file-with-code-pattern.cpp
For the first example above, this gives:
[example1.cpp:1]: (style) found \ ' int foo ( const int uiValue ) { \ int uiTrueValue ; uiTrueValue = uiValue ; \ if ( uiValue < uiMinValue ) { uiTrueValue = uiMinValue ; } \ else { if ( uiMaxValue < uiTrueValue ) { uiTrueValue = uiMaxValue ; } } \ return uiTrueValue ; }'
The multiple-line source code is parsed, simplified and presented as a single-line token stream. Note that the second comparison is changed from
>
to
<
and has its left- and right-hand sides swapped. If CppCheck discovers that
uiTrueValue
isn’t used further on, it removes the assignments from the
if
-
else
block. For more information on the transformations CppCheck applies, see [
Simplification
].
To find the code we’re looking for, we may search for the following token stream.
if ( ... < ... ) { ... = ... ; } else { if ( ... < ... ) { ... = ... ; }
Here we choose to match the
if
-
else
-
if
language construct and comparison operator. We accept any operand as indicated by the ellipses, and require the presence of an assignment in the
if
blocks of the program.
This token stream can be matched by the following regular expression.
if \( \w+ \x3c \w+ \) { \w+ = \w+ ; } else { if \( \w+ \x3c \w+ \) { \w+ = \w+ ; } }
We use
\x3c
for the comparison, as specifying
<
or
\<
does not work, due to the embedding in XML. Change
\x3c
to
\x3c=?
if you want to match both less-than and less-equal. If you want to match no-matter-what code instead of the assignment, you can use
{ .*? }
or
{ [^}]*? }
for the block. Be careful though to not specify a pattern that is too greedy. Apparently leaving part of the pattern unused when all input has been consumed does not prevent declaring victory. You’ll get a single match at best.
To allow both
if...if
and
if...else if
, specify
(else_{_)?if
for the second
if
in the pattern and omit the terminating
_}
from the pattern. Take note of the space following the opening brace in the
(else_{_)?
part. (The underscore denotes a space.)
Now let’s apply CppCheck with a rule for this pattern to a file that contains several variations of
if
-
else
constructs (available from [
GitHub
]).
prompt> cppcheck --rule-file="my-rule.xml" sample.cpp
This gives:
Checking sample.cpp... [sample.cpp:118]: (error) Address of an auto-variable returned. [sample.cpp:5]: (style) Consider replacing if-else-if with clamp(value, minval, maxval). ... 6 more similar messages
Without further parameters, CppCheck only shows error messages and matches to our own rules. You can use option
--enable=...
to enable more kinds of checks from CppCheck itself, such as
style
and
portability
warnings [
Wiki
]. With option
--template=...
you can get various output formats, e.g. Visual Studio compatible output (
vs
) and GNUC compatible output (
gcc
). Run
cppcheck -h
to see the tool’s complete command line usage.
Results
In the circa 400k line codebase at hand, CppCheck found 15 occurrences with the pattern that matches both
if-if
and
if-else-if
and both less-than and less-equal comparisons. Of these, 7 represent code we are interested in and 8 are false positives. Six false positives are due to two long
if-else
chains in a single file and the other 2 are due to the same code of a duplicated file.
Running a check with a looser search pattern for the code in the
if
blocks (
{ .*? }
) gave 31 occurrences but didn’t reveal any missed opportunities.
To get rid of the false positives due to longer
if-else-if
chains, we may extend our rule with look-ahead and look-behind assertions [
PCRE-man
]. A positive look-ahead assertion is specified with
(?=...)
, a negative assertion as
(?!...)
. Look-behind assertions are
(?<=...)
for positive and
(?<!...)
for negative assertions.
To not match
if
when preceded by
else_{_
we can specify a negative look-behind assertion like:
(?<!else { )if
Unfortunately this pattern contains a
<
which cannot be replaced with
\x3
. To protect it, we enclose the
complete
pattern within
<![CDATA[...]]>
[
XML
].
<pattern><![CDATA[(?<!else { )if \( \w+ <=? \w+ \) { [^}]*? } else { if \( \w+ <=? \w+ \) { [^}]*? } }]]></pattern>
Note that we also specified the less-than character in the
if
statements as plain
<
.
Searching the complete Boost 1.53 source tree with the rule in
my-rule.xml
didn’t turn up a single occurrence. This may indicate that a relatively general pattern doesn’t necessarily lead to many false positives, or perhaps more likely, that Boost’s programming style involves none of these old-fashioned
if-else-if
constructs.
Wrapup
This article showed how we can use CppCheck to easily find occurrences of code fragments with a specific structure that may span multiple lines.
To close, a quote from the CppCheck webpage:
Using a battery of tools is better than using 1 tool. Therefore we recommend that you also use other tools.
For inspiration, see Wikipedia’s ‘List of tools for static code analysis’ [ Wikipedia ].
Acknowledgements
Many thanks to Ric Parkin and the Overload team for their care and feedback. Also thanks to all who made suggestions on the subject of clamping via accu-general [ Alternative ].
Notes and references
Code for this article is available on [ GitHub ].
[Alternative] On mailinglist accu-general several people made suggestions about a clamp function.
Phil Nash suggests the following approaches:
-
clamp( factor ).between( minFactor,maxFactor );
-
clampBetween( factor, minFactor, maxFactor );
-
[Limits clamp: factor between: minFactor and:maxFactor]
(Objective-C)
Gennaro Prota suggests names
clamp_to_range
,
constrain_to_range
,
restrain_to_range
,
limit_to_range
,
bring_to_range
and
to_nearest_in_range
. He also notes that to make such function
constexpr
, it shouldn’t be implemented in terms of
std:min()
and
std::max()
. In turn a reviewer noted that
std:min()
and
std::max()
are likely to become
constexpr
in C++17 (LWG issue 2350, voted into the working paper at the last meeting, in Issaquah).
Initially I used
limit
as this word also appears in
std::numeric_limits
in the C++ standard library. However Jonathan Wakely argues ...
if you propose it [for the standard] I suggest you call it
clamp
,
I expect that’s the most well-known name
.
[AMP] Microsoft. C++ Accelerated Massive Parallelism library. http://msdn.microsoft.com/en-us/library/hh265137.aspx
[Ariane-5] Image from presentation ‘A Question of Craftsmanship’ by Kevlin Henney. InfoQ. 9 March 2014 http://www.infoq.com/presentations/craftsmanship-view . See also ‘Cluster (spacecraft)’ Wikipedia. http://en.wikipedia.org/wiki/Ariane_5_Flight_501 Accessed on 12 March 2014.
[Boost] The Boost Algoritm library contains a version of
clamp
as shown here plus a version that also takes a comparison predicate.
http://www.boost.org/libs/algorithm/doc/html/algorithm/Misc.html#the_boost_algorithm_library.Misc.clamp
[CppCheck] CppCheck homepage: http://cppcheck.sourceforge.net/ Cppcheck is a static analysis tool for C/C++ code. Unlike C/C++ compilers and many other analysis tools it does not detect syntax errors in the code. Cppcheck primarily detects the types of bugs that the compilers normally do not detect. The goal is to detect only real errors in the code (i.e. have zero false positives). Cppcheck is supposed to work on any platform that has sufficient cpu and memory and can be used from a GUI, from the command line, or via a plugin.
[GitHub] Code for Search with CppCheck. Martin Moene. 11 March 2014. https://github.com/martinmoene/martin-moene.blogspot.com/tree/master/Search%20with%20CppCheck
[Henney11] Kevlin Henney. ‘Sustainable space’ CVu , 22(6):3, January 2011. Kevlin Henney shares a code layout pattern.
[Manual] CppCheck Manual. http://cppcheck.sourceforge.net/manual.html (HTML format) and http://cppcheck.sourceforge.net/manual.pdf (PDF format)
[PCRE] PCRE – Perl Compatible Regular Expressions. http://www.pcre.org/
[PCRE-man] PCRE man page, section Lookbehind assertions. http://www.pcre.org/pcre.txt
[Rules] Daniel Marjamäki. Writing Cppcheck rules. Part 1 – Getting started (PDF). 2010. http://sourceforge.net/projects/cppcheck/files/Articles/writing-rules-1.pdf/download
[Simplification] Daniel Marjamäki. Writing Cppcheck rules. Part 2 – The Cppcheck data representation (PDF). 2010. http://sourceforge.net/projects/cppcheck/files/Articles/writing-rules-2.pdf/download
[Wiki] CppCheck Wiki ( http://sourceforge.net/apps/mediawiki/cppcheck/ ) Describes checks performed by CppCheck.
[Wikipedia] List of tools for static code analysis: http://en.wikipedia.org/w/index.php?title=List_of_tools_for_static_code_analysis Accessed 17 February 2014.
[XML] It took me some time to discover to use the CDATA construct. Once I found out, I couldn’t easily find a suggestion of this approach in relation to CppCheck.
I filed this suggestion ( https://sourceforge.net/apps/trac/cppcheck/ticket/5551 ) in the CppCheck issue tracker.