String Processing
Cpp
#include <algorithm>
#include <string>
int main() {
std::string msg = " \t\n\t Hello, World! \t\n\t ";
auto ltrim = [](std::string &in) {
in.erase(in.begin(), std::find_if(in.begin(), in.end(),
[](int ch) {
return !std::isspace(ch);
}
));
};
auto rtrim = [](std::string &in) {
in.erase(std::find_if(in.rbegin(), in.rend(), [](int ch) {
return !std::isspace(ch);
}).base(), in.end());
};
auto trim = [&](std::string &in) {
ltrim(in);
rtrim(in);
};
trim(msg);
// msg is not "Hello, World!"
}
JavaScript
// String with mixed-character whitespace at beginning and end
const msg = " \t\n\t Hello, World! \t\n\t ";
msg.trim();
// returns "Hello, World!"
What This Code Does
This code simply trims leading and trailing whitespace from a string.
What's the Same
Almost nothing.
What's Different
Almost everything.
The most obvious difference is that the C++ version goes about writing the
implementation for trim
rather than using something from
the standard library. Why is that?
Well... C++ can, at times, be very minimal in its stdlib. That is to say that no
trim
function exists directly. However, using the ability to remove
sub-sections of the std::string
along with some utilities from the
<algorithm>
header (std::find_if
), a simple and
efficient version can be created.
In practice, most organizations will either develop their own internal utility libraries that would likely include these functions or adopt an open-source implementation such as Boost. As an example, if we were using Boost, we would simply write:
#include <boost/algorithm/string.hpp>
boost::trim(msg);
Obviously there is a bit of code missing (the main
function and
declaration of msg
, but you can see that the important code is
now equivalent to the JS version.
The other difference worth pointing out is that the C++ does the modification to
the string in-place. If we were to examine the msg
in the JavaScript
version after running trim, you'll notice nothing was changed.
var msg = " \t\n\t Hello, World! \t\n\t ";
msg.trim(); // value returned is "Hello, World!"
console.log(msg); // prints " \t\n\t Hello, World! \t\n\t "
More Processing
Cpp
#include <string>
#include <boost/algorithm/string_regex.hpp>
#include <boost/regex.hpp>
int main() {
std::string msg = "Total Price -> [1.23]";
boost::to_lower(msg);
// msg is now "total price -> [1.23]"
msg = "Total Price -> [1.23]";
boost::to_upper(msg);
// msg is now "TOTAL PRICE -> [1.23]"
msg = "Total Price -> [1.23]";
boost::replace_all(msg, "[", "(");
boost::replace_all(msg, "]", ")");
// msg is now "Total Prices -> (1.23)"
msg = "Total Price -> [1.23]";
boost::replace_all_regex(msg, boost::regex("[\\[\\]\\->]"),
std::string(""));
// msg is not "Total Prices 1.23"
}
JavaScript
// String with mixed-character whitespace at beginning and end
const msg = "Total Price -> [1.23]";
msg.toLowerCase();
// returns "total price -> [1.23]"
msg.toUpperCase();
// returns "TOTAL PRICE -> [1.23]"
msg.replace('[', '(').replace(']', ')');
// returns "Total Price -> (1.23)"
msg.replace(/[\[\]\->]/, '');
// returns "Total Price 1.23"
What This Code Does
This code modifies the string cases (to upper and lower case) and modifies string contents using both string-literals and regular expressions as lookup values.
What's the Same
Almost everything.
What's Different
Almost nothing. Well, we did cheat this time around and use the Boost library. Using a freely available utility library is common in many C++ projects and allows us to fill the gaps that exist in the standard library. With this library added, our code is now very comparable, aside from syntactic differences, to the JS version.
Sub-Strings & Concatenation
Cpp
#include <string>
#include <string_view>
int main() {
std::string greeting = "Hello";
std::string to = "World";
std::string msg = greeting + ", " + to;
// msg is a new string of "Hello, World"
std::string_view view(msg);
// create a "view" into the string
view.substr(0, 5);
// returns new string_view of "Hello"
view.substr(7);
// returns new string_view of "World!"
}
JavaScript
const greeting = "Hello";
const to = "World!";
const msg = greeting + ", " + to;
// msg is a new string of "Hello, World!"
msg.slice(0, 5);
// returns new string "Hello"
msg.slice(7);
// returns new string "World!"
What This Code Does
This code demonstrates concatenation (combining) of strings as well as taking sub-sections of strings. Such code is commonly found across application domains.
What's the Same
Both implementations take advantage of built-in string manipulation to concat strings and look at string sub-sections.
What's Different
C++ makes use something called std::string_view
where as the JS examples
siply return new strings. A std::string_view
behaves like a string, but
doesn't perform any copies when looking at sub-sectins of a string.
std::string_view
is also useful when you want to manipulate the "view" of
a string without actually changing (mutating) the underlying data. This is particularly
useful when writing methods that accept data immutably, but can still "simulate" changes
locally without any unseen "effects" (if you come from a functional programming
background, the idea of avoiding effects should be very familiar).
It's worth noting that string
s in JS are immutable. Thus, any methods
that return a portion of a string, such as slice
could, under the covers,
return a view into the same data.
String Formatting
Cpp
#include <array>
#include <iostream>
#include <iomanip>
#include <string>
int main() {
std::string name = "FOOBAR";
auto max_gain = 0.15f;
auto max_loss = 0.76f;
std::cout << "Weekly Changes: " << name
<< " +" << (max_gain * 100) << "%"
<< " -" << (max_loss * 100) << "%"
<< "\n";
// prints "Weekly Changes: FOOBAR +15% -76%"
// Print list as a right-aligned list of hex values
std::array some_numbers { 123456, 90346873, 28 };
std::ios state(nullptr);
state.copyfmt(std::cout);
for (auto num: some_numbers) {
std::cout << "\t0x";
std::cout << std::hex << std::uppercase << std::setw(8)
<< std::setfill('0') << num;
std::cout.copyfmt(state);
std::cout << "\n";
}
/* Prints:
0x0001E240
0x05629579
0x0000001C
*/
}
JavaScript
const name = "FOOBAR";
const max_gain = 0.15;
const max_loss = 0.76;
console.log(`Weekly Changes: ${name} +${max_gain * 100}%` +
` -${max_loss * 100}%`);
// prints "Weekly Changes: FOOBAR +15% -76%"
// Print list as a right-aligned list of hex values
const someNumbers = [123456, 90346873, 28];
for (i in someNumbers) {
const hex = someNumbers[i].toString(16);
let padding = "";
if (hex.length < 8) {
padding = "0".repeat(8 - hex.length);
}
console.log(`\t0x${padding}${hex}`);
}
/* Prints:
0x0001e240
0x05629579
0x0000001c
*/
What This Code Does
This code demonstrates how string formatting is done with two very simple examples.
The first simply prints out a stock-symbol and it's maximum change (positive and
negative), representing a floating-point number as a percentage. The second example
prints a list of base-10 integers as their hex value. All hex values have a prefix
of 0x
and are eight characters wide (left-padded with zeros).
What's the Same
Both examples shown have to manually format the percentage values in the first example.
What's Different
While the JS version relies on string interpolation to provide formatting, C++ relies on
an iostream
(Input/Output Stream). An iostream
is more than just
a simple mechanism for printing as it allows us to specify transformers that apply to all
values sent through the stream. Not in particular this bit of code:
std::cout << std::hex << std::uppercase << std::setw(8)
<< std::setfill('0') << num;
These modifiers inform the stream that we want to print hex values, in uppercase, with a
width of 8 characters, and a "fill" of 0
(our left-padding). Then we print
the value num
, which passes through all of these filters we've set on our
stream. While this can be an extremely powerful abstraction, we can see that it is also
a little cumbersome when data is interleaved with other output in which we don't want to
apply the same formatting rules. We use the state
object to constantly reset
our stream state before printing other values.
Because of C++'s close relationship with C, we can also use the
printf
/sprintf
family of functions for formatting, in which our
hex-formatting code would look like:
printf("\t0x%08X\n", num);
This approach is also often the recommended approach when performing formatting in a performance-critical/sensitive portion of code.