Last time, we saw how to provide formatting for a simple user-defined class. Spencer Collyer builds on this, showing how to write a formatter for more complicated types.
In the previous article in this series [Collyer24], I showed how to write a class to format user-defined classes using the std::format
library. In this article I will describe how this can be extended to container classes or any other class that holds objects whose type is specified by the user of your class.
A note on the code listings: The code listings in this article have lines labelled with comments like // 1
. Where these lines are referred to in the text of this article, it will be as ‘line 1
’ for instance, rather than ‘the line labelled // 1
’.
Nested formatter objects
The objects created from the formatter
template structs are just ordinary C++ objects – there is nothing special about them 1. In particular, there is nothing to stop you including an object of a formatter
template type inside one of your user-defined formatter
structs.
You might wonder why you would want to do that. One simple case is if you have a templated container class, and want to create a formatter
that can output the container in one go, rather than having to write code to iterate over the container and output each value in turn. Having a nested formatter
for the contained value type allows you to do this and allow the values to be formatted differently to the default, as the following examples will show. Other uses will no doubt come to mind for your own classes.
A formatter for std::vector
The first example we will look at is a simple formatter
for std::vector
. The code is given in Listing 1, and sample output is in Listing 2.
#include <format> #include <iostream> #include <vector> using namespace std; template<typename T> struct std::formatter<vector<T>> { constexpr auto parse(format_parse_context& parse_ctx) { auto iter = parse_ctx.begin(); auto get_char = [&]() { return iter != parse_ctx.end() ? *iter : 0; }; char c = get_char(); if (c == 0 || c == '}') // 1 { m_val_fmt.parse(parse_ctx); // 2 return iter; } auto get_next_char = [&]() { // 3 ++iter; char vc = get_char(); if (vc == 0) { throw format_error( "Invalid vector format specification"); } return vc; }; if (c == 'w') // 4 { m_lc = get_next_char(); m_rc = get_next_char(); ++iter; } if ((c = get_char()) == 's') // 5 { m_sep = get_next_char(); ++iter; } if ((c = get_char()) == '/' || c == '}') // 6 { if (c == '/') // 7 { ++iter; } parse_ctx.advance_to(iter); // 8 iter = m_val_fmt.parse(parse_ctx); // 9 } if ((c = get_char()) != 0 && c != '}') // 10 { throw format_error( "Invalid vector format specification"); } return iter; } auto format(const vector<T>& vec, format_context& format_ctx) const { auto pos = format_ctx.out(); // 11 bool need_sep = false; for (const auto& val : vec) { if (need_sep) // 12 { *pos++ = m_sep; if (m_sep != ' ') { *pos++ = ' '; } } if (m_lc != '\0') // 13 { *pos++ = m_lc; } format_ctx.advance_to(pos); // 14 pos = m_val_fmt.format(val, format_ctx); // 15 if (m_rc != '\0') // 16 { *pos++ = m_rc; } need_sep = true; } return pos; } private: char m_lc = '\0'; char m_rc = '\0'; char m_sep = ' '; formatter<T> m_val_fmt; // 17 }; int main() { vector<int> vec{1, 2, 4, 8, 16, 32}; cout << format("{}\n", vec); // a cout << format("{:w[]}\n", vec); // b cout << format("{:s,}\n", vec); // c cout << format("{:w[]s,}\n", vec); // d cout << format("{:w[]/3}\n", vec); // e cout << format("{:s;/+0{}}\n", vec, 5); // f vector<vector<int>> vec2{ {1, 2, 3}, {40, 50, 60}, {700, 800, 900} }; cout << format("{}\n", vec2); // g cout << format("{:w[]}\n", vec2); // h cout << format("{:s,}\n", vec2); // i cout << format("{:w[]s,}\n", vec2); // j cout << format("{:w[]/s,}\n", vec2); // k cout << format("{:s;/s,/03}\n", vec2); // l } |
Listing 1 |
a: 1 2 4 8 16 32 b: [1] [2] [4] [8] [16] [32] c: 1, 2, 4, 8, 16, 32 d: [1], [2], [4], [8], [16], [32] e: [ 1] [ 2] [ 4] [ 8] [ 16] [ 32] f: +0001; +0002; +0004; +0008; +0016; +0032 g: 1 2 3 40 50 60 700 800 900 h: [1 2 3] [40 50 60] [700 800 900] i: 1 2 3, 40 50 60, 700 800 900 j: [1 2 3], [40 50 60], [700 800 900] k: [1, 2, 3] [40, 50, 60] [700, 800, 900] l: 001, 002, 003; 040, 050, 060; 700, 800, 900 |
Listing 2 |
The format specification we will use has the following form:
[ 'w' lc rc ] [ 's' sep ] [ '/' [ value-fmt-spec ] ]
The element starting with w
allows the user to specify characters to wrap the vector values in the output. The w
must be followed by exactly two characters. The first character, lc, is written before the value, and the second, rc, is written after the value. If not given, no wrapper characters are output.
The element starting with s
allows the user to specify a single character to act as a separator between the individual vector element values. If given, the s
must be followed by exactly one character, which will be used as the separator. If not given, it defaults to the space character. If a separator is given it will be followed by a space in the output.
The /
delimits the start of the format-spec for the vector’s value type. This will be read by the member variable m_val_fmt
, defined in line 17
, to set up the formatting for the vector values. If not given, it will use the default formatting for the value type. It is allowable – although not really useful – to give a /
character with no following format-spec.
The parse function
The first few lines of the parse
function, up to line 1
, are the same as the ones for the Point
class described in my previous article.
The first notable change is line 2
. This calls the parse
function on the nested m_val_fmt
object, which is the formatter
for the vector’s value type. Doing this allows the m_val_fmt
object to set up its formatting for the default case where no format-spec is given.
The get_next_char
function defined starting at line 3
is used to read the next character from the format-spec. It throws an exception if there are no more characters to read, as indicated by getting 0 back from the get_char
function. As with the get_char
function, when this function is done it leaves the iter
variable pointing at the character read.
The if
-statement starting at line 4
simply processes any w
element to read the wrapper characters. It should be obvious what it is doing. Similarly, the code starting at line 5
just processes any s
element to read the separator character.
The if
-statement starting at line 6
holds the code to initialise the m_val_fmt
object when we don’t have an empty format-spec. The if
-statement condition has to check for both the /
character that indicates the value type has a format-spec, and also for the }
character that indicates the end of the format-spec, i.e. the case where there is no specific format-spec for the value type.
Line 7
checks for the /
character and, if present, increments iter
. This is because the /
character is not part of the value type’s format-spec so seeing it would confuse the m_val_fmt.parse
function.
Line 8
is important because, by calling the advance_to
function on parse_ctx
, it resets parse_ctx
’s idea of where in the format-spec the start point is located. When line 9
then calls m_val_fmt.parse
, it will start the processing at the correct position, i.e. the start of the value type’s embedded format-spec, not the vector
’s format-spec.
When the m_val_fmt.parse
function returns, it should have processed everything up to the }
that terminates the format-spec. Note that in this case the }
is doing double duty, as it terminates both the vector
format-spec and the embedded value type format-spec. Line 10
carries out our normal check for correct termination of the format-spec.
The format function
Line 11
puts the current output iterator from format_ctx
into the pos
variable. This indicates where the next data is written to in the output.
The majority of the function is just a loop over the vector’s values. The interesting parts are described below.
Line 12
checks if we need to output a separator character. The first time through the loop this will be false, but on subsequent iterations it will be true. The body of the if
-statement just outputs the separator character, then if it is not a space it outputs a space character as well. As we are just outputting single characters each time we can use the *pos++ = c
form to write them to the output.
Lines 13
and 16
write the wrapper characters, if they are defined.
Line 14
sets up the format_ctx
variable correctly for the output in the next line. By calling advance_to
on format_ctx
we set its output iterator to match the position we have reached up to this point in the function.
Line 15
outputs the current value by calling the format
function on the m_val_fmt
object. Because we have updated the output iterator on format_ctx
in the line above, the value will be written to the correct position in the output. The format
function returns the new value of the output iterator.
Test cases
The first set of test cases in the main
function use a simple vector-of-ints as the value to output.
Test case a
checks that the default formatting works for the vector
and its contained values.
Test cases b
, c
, and d
just check that the various parts of the vector
format-spec work, but with no value format-spec, so the values will just use the default output.
Test case e
checks that using a format-spec for the value works correctly. Using wrapper characters lets us check that the output values are indeed all output in fields three characters wide.
Test case f
shows that you can use nested format specifiers in the value format-spec, in this case picking up the width from the argument list.
The second set of test cases use a vector-of-vectors-of-ints as the value to output.
Test case g
checks that the default formatting works.
Note that in the output for case g
, there is no way to tell where one nested vector ends and the next one starts. Test cases h
, i
, and j
use the various parts of the vector
format-spec to delimit the nested vectors in various ways.
Test case k
checks that the nested vectors are output using the value format-spec, as can be seen from each value in them being separated by the comma specified by the format-spec.
Test case l
checks that the nested vector’s format-spec can handle a format-spec for their values – in this case indicating a three character wide, zero-padded field.
A formatter for std::map
The next example we will look at is a formatter
for std::map
. This is more complicated because we want to allow format-specs for both the key type and value type of the map. The code is given in Listing 3, and sample output is in Listing 4.
#include <format> #include <iostream> #include <map> using namespace std; template<typename K, typename V> struct formatter<map<K,V>> { constexpr auto parse(format_parse_context& parse_ctx) { auto iter = parse_ctx.begin(); auto get_char = [&]() { return iter != parse_ctx.end() ? *iter : 0; }; char c = get_char(); if (c == 0 || c == ‘}’) { m_key_fmt.parse(parse_ctx); // 1 m_val_fmt.parse(parse_ctx); // 2 return iter; } auto get_next_char = [&]() { ++iter; char vc = get_char(); if (vc == 0) { throw format_error( "Invalid map format specification"); } return vc; }; if (c == 'w') // 3 { m_lc = get_next_char(); m_rc = get_next_char(); ++iter; } if ((c = get_char()) == 'c') // 4 { m_con = get_next_char(); ++iter; } if ((c = get_char()) == 's') // 5 { m_sep = get_next_char(); ++iter; } if ((c = get_char()) == '/') // 6 { // Next char must be '{' at start of key // format spec if ((c = get_next_char()) != '{') // 7 { throw format_error( "Invalid map format specification"); } parse_ctx.advance_to(++iter); // 8 iter = m_key_fmt.parse(parse_ctx); // 9 // Iter should point to '}' at end of key // format spec if ((c = get_char()) != '}') // 10 { throw format_error( "Invalid map format specification"); } // Next char must be '{' at start of value // format spec if ((c = get_next_char()) != '{') // 11 { throw format_error( "Invalid map format specification"); } parse_ctx.advance_to(++iter); iter = m_val_fmt.parse(parse_ctx); // Iter should point to '}' at end of // value format spec if ((c = get_char()) != '}') { throw format_error( "Invalid map format specification"); } // Advance past the '}' at end of value // format spec ++iter; } else if (c == '}') // 12 { parse_ctx.advance_to(iter); m_key_fmt.parse(parse_ctx); m_val_fmt.parse(parse_ctx); } if ((c = get_char()) != 0 && c != '}') // 13 { throw format_error( "Invalid map format specification"); } return iter; } auto format(const map<K,V>& vals, format_context& format_ctx) const { auto pos = format_ctx.out(); // 14 bool need_sep = false; for (auto val : vals) { if (need_sep) // 15 { *pos++ = m_sep; if (m_sep != ' ') { *pos++ = ' '; } } if (m_lc != '\0') // 16 { *pos++ = m_lc; } format_ctx.advance_to(pos); // 17 pos = m_key_fmt.format(val.first, format_ctx); *pos++ = m_con; // 18 format_ctx.advance_to(pos); // 19 pos = m_val_fmt.format(val.second, format_ctx); if (m_rc != '\0') // 20 { *pos++ = m_rc; } need_sep = true; } return pos; } private: char m_lc = '\0'; char m_rc = '\0'; char m_sep = ' '; char m_con = '='; formatter<K> m_key_fmt; formatter<V> m_val_fmt; }; int main() { map<int, string> map1{ {1, "a"}, {2, "bc"}, {3, "def"} }; cout << format("{}\n", map1); // a cout << format("{:w[]}\n", map1); // b cout << format("{:s,}\n", map1); // c cout << format("{:c:}\n", map1); // d cout << format("{:w[]c:s,}\n", map1); // e cout << format("{:w[]/{}{5}}\n", map1); // f cout << format("{:s;/{3}{5}}\n", map1); // g cout << format("{:s;/{3}{}}\n", map1); // h } |
Listing 3 |
a: 1=a 2=bc 3=def b: [1=a] [2=bc] [3=def] c: 1=a, 2=bc, 3=def d: 1:a 2:bc 3:def e: [1:a], [2:bc], [3:def] f: [1=a ] [2=bc ] [3=def ] g: 1=a ; 2=bc ; 3=def h: 1=a; 2=bc; 3=def |
Listing 4 |
The format specification we will use has the following form:
[ 'w' lc rc ] [ 'c' conn ] [ 's' sep ] [ '/' '{' key-fmt-spec '}' '{' value-fmt-spec '}' ]
The elements starting with w
and s
have identical purposes and default to the ones we used for std::vector
.
The element starting with c
allows you to specify the connecting character that is output between the key and the value. The c
must be followed by exactly one character. If not specified, the default value is =
.
The /
character introduces the format-specs for the key and value types of the map. Unlike the case for std::vector
, these format-specs are mandatory if you have a /
character. Unsurprisingly, key-fmt-spec is the one for the key type, and the value-fmt-spec is the one for the value type. You can use a default {}
for either of these if you don’t want to change that particular item’s format.
Note that these two nested format-specs are surrounded by {
and }
characters. This breaks one of the guidelines I gave in the previous article for format specification mini-languages (see the appendix ‘Simple Mini-Language Guidelines’ in that article). The reason for this is as follows. The parse
functions in formatter
s need to see a }
character terminating the format-spec they are processing. This means when processing the key-fmt-spec, we need a }
character at the end of the key-fmt-spec, before the value-fmt-spec starts. This could be confusing as it might look like it is the }
that terminates the std::map
’s format-spec. Using a {
at the start of the key-fmt-spec helps to make it clear it is a single unit. As for the value-fmt-spec, that could use the }
at the end of the std::map
format-spec as its terminator, just like we do for std::vector
above, but for consistency between the two format-specs it made more sense to also surround it with {
and }
characters.
The parse function
Much of the parse
function is similar to the one for std::vector
shown previously. Lines 1
and 2
handle the case where we have a default format-spec, calling the respective parse
functions on the nested formatter
s for the key and value types. Note that we assume here that the m_key_fmt.parse
function doesn’t alter the parse_ctx
value passed to it. If you are concerned that it might do, you can take a copy of parse_ctx
and pass that copy to the m_val_fmt.parse
function instead.
The if
-statements starting at lines 3
and 5
read the w
and s
elements, just as the corresponding lines do for std::vector
. The if
-statement starting at line 4
reads the c
element, which must have a single character following it.
The if
-statement starting at line 6
handles any nested format-specs defined. As mentioned previously, they are mandatory if the /
character is present.
Line 7
checks for the {
that indicates the start of the key-fmt-spec, and if not present throws a format_error
. We just report a generic error text here, but obviously a more expressive text would help the user find the error quicker.
Line 8
uses the advance_to
function to set up the iterator in parse_ctx
. Note that we increment the value passed in as we need to skip the {
detected in the previous line, which is not part of the key-fmt-spec. Line 9
then calls m_key_fmt.parse
so the formatter
for the key type can parse the key-fmt-spec. Finally, line 10
checks that the key-fmt-spec is correctly terminated with a }
character.
The code starting at line 11
then does the same work, but for the value type, using the m_val_fmt
member variable.
If the condition in line 6
is false it means we don’t have format specifications for the key or value types. Line 12
checks if we have reached the end of the format-spec for the map, and if so the controlled lines call the parse
functions on m_key_fmt
and m_val_fmt
to set them to their defaults.
Finally, line 13
does the usual check to make sure we have reached the end of the format-spec.
The format function
The format
function for std::map
is similar to the one for std::vector
given previously.
Line 14
picks up the current output iterator from format_ctx
. The function then enters a loop over all the values in the map.
Line 15
checks if we need to output a separator character, and if so the controlled block does that work. Line 16
then does the same for the left-hand wrapper character.
Line 17
then sets the output iterator in format_ctx
to the now-current value, and the following line uses m_key_fmt.format
to output the key, returning the new value of the output iterator. Line 18
then outputs the connector character.
Line 19
updates the format_ctx
output iterator again so the following line can output the value using m_val_fmt.format
.
Line 20
then outputs the right-hand wrapper character, if required.
Test cases
Test case a
checks that the default formatting works for map
and its contained key-value pairs.
Test cases b
, c
, d
, and e
check that the various parts of the map
’s format-spec work correctly, singly and in combination.
Test cases f
, g
, and h
test that using format-specs for the key and value parts works, including that using default format-specs is allowed.
Summary
In this article we have shown how you can write a formatter for a container type, or any other class where the types of some elements are unknown to you when writing the formatter because they are specified by the user of the class.
In the next and final article of this series I will show you how to create format wrappers, special purpose classes that allow you to apply specific formatting to existing classes.
References
[Collyer24] Spencer Collyer ‘User-Defined Formatting in std::format
: Part 1’, Overload 180, April 2024, available at https://accu.org/journals/overload/32/180/collyer/
Footnote
Spencer has been programming for more years than he cares to remember, mostly in the financial sector, although in his younger years he worked on projects as diverse as monitoring water treatment works on the one hand, and television programme scheduling on the other.