String Processing

Cpp

#include <algorithm>
#include <string>

int main() {
    std::string msg = " \t\n\t Hello, World! \t\n\t ";

    auto ltrim = [](std::string &in) {
        in.erase(in.begin(), std::find_if(in.begin(), in.end(),
            [](int ch) {
                return !std::isspace(ch);
            }
        ));
    };

    auto rtrim = [](std::string &in) {
        in.erase(std::find_if(in.rbegin(), in.rend(), [](int ch) {
            return !std::isspace(ch);
        }).base(), in.end());
    };

    auto trim = [&](std::string &in) {
        ltrim(in);
        rtrim(in);
    };

    trim(msg);
    // msg is not "Hello, World!"
}

JavaScript

// String with mixed-character whitespace at beginning and end
const msg = " \t\n\t Hello, World! \t\n\t  ";

msg.trim();
// returns "Hello, World!"

What This Code Does

This code simply trims leading and trailing whitespace from a string.

What's the Same

Almost nothing.

What's Different

Almost everything.

The most obvious difference is that the C++ version goes about writing the implementation for trim rather than using something from the standard library. Why is that?

Well... C++ can, at times, be very minimal in its stdlib. That is to say that no trim function exists directly. However, using the ability to remove sub-sections of the std::string along with some utilities from the <algorithm> header (std::find_if), a simple and efficient version can be created.

In practice, most organizations will either develop their own internal utility libraries that would likely include these functions or adopt an open-source implementation such as Boost. As an example, if we were using Boost, we would simply write:

#include <boost/algorithm/string.hpp>

boost::trim(msg);

Obviously there is a bit of code missing (the main function and declaration of msg, but you can see that the important code is now equivalent to the JS version.

The other difference worth pointing out is that the C++ does the modification to the string in-place. If we were to examine the msg in the JavaScript version after running trim, you'll notice nothing was changed.

var msg = " \t\n\t Hello, World! \t\n\t ";
msg.trim();       // value returned is "Hello, World!"
console.log(msg); // prints " \t\n\t Hello, World! \t\n\t "

More Processing

Cpp

#include <string>
#include <boost/algorithm/string_regex.hpp>
#include <boost/regex.hpp>

int main() {
    std::string msg = "Total Price -> [1.23]";

    boost::to_lower(msg);
    // msg is now "total price -> [1.23]"

    msg = "Total Price -> [1.23]";
    boost::to_upper(msg);
    // msg is now "TOTAL PRICE -> [1.23]"

    msg = "Total Price -> [1.23]";
    boost::replace_all(msg, "[", "(");
    boost::replace_all(msg, "]", ")");
    // msg is now "Total Prices -> (1.23)"

    msg = "Total Price -> [1.23]";    
    boost::replace_all_regex(msg, boost::regex("[\\[\\]\\->]"),
        std::string(""));
    // msg is not "Total Prices  1.23"
}

JavaScript

// String with mixed-character whitespace at beginning and end
const msg = "Total Price -> [1.23]";

msg.toLowerCase();
// returns "total price -> [1.23]"

msg.toUpperCase();
// returns "TOTAL PRICE -> [1.23]"

msg.replace('[', '(').replace(']', ')');
// returns "Total Price -> (1.23)"

msg.replace(/[\[\]\->]/, '');
// returns "Total Price  1.23"

What This Code Does

This code modifies the string cases (to upper and lower case) and modifies string contents using both string-literals and regular expressions as lookup values.

What's the Same

Almost everything.

What's Different

Almost nothing. Well, we did cheat this time around and use the Boost library. Using a freely available utility library is common in many C++ projects and allows us to fill the gaps that exist in the standard library. With this library added, our code is now very comparable, aside from syntactic differences, to the JS version.

Sub-Strings & Concatenation

Cpp

#include <string>
#include <string_view>

int main() {
    std::string greeting = "Hello";
    std::string to       = "World";

    std::string msg = greeting + ", " + to;
    // msg is a new string of "Hello, World"
    
    std::string_view view(msg);
    // create a "view" into the string

    view.substr(0, 5);
    // returns new string_view of "Hello"

    view.substr(7);
    // returns new string_view of "World!"
}

JavaScript

const greeting = "Hello";
const to       = "World!";

const msg = greeting + ", " + to;
// msg is a new string of "Hello, World!"


msg.slice(0, 5);
// returns new string "Hello"

msg.slice(7);
// returns new string "World!"

What This Code Does

This code demonstrates concatenation (combining) of strings as well as taking sub-sections of strings. Such code is commonly found across application domains.

What's the Same

Both implementations take advantage of built-in string manipulation to concat strings and look at string sub-sections.

What's Different

C++ makes use something called std::string_view where as the JS examples siply return new strings. A std::string_view behaves like a string, but doesn't perform any copies when looking at sub-sectins of a string.

std::string_view is also useful when you want to manipulate the "view" of a string without actually changing (mutating) the underlying data. This is particularly useful when writing methods that accept data immutably, but can still "simulate" changes locally without any unseen "effects" (if you come from a functional programming background, the idea of avoiding effects should be very familiar).

It's worth noting that strings in JS are immutable. Thus, any methods that return a portion of a string, such as slice could, under the covers, return a view into the same data.

String Formatting

Cpp

#include <array>
#include <iostream>
#include <iomanip>
#include <string>

int main() {
    std::string name = "FOOBAR";
    auto max_gain = 0.15f;
    auto max_loss = 0.76f;

    std::cout << "Weekly Changes: " << name
        << " +" << (max_gain * 100) << "%"
        << " -" << (max_loss * 100) << "%"
        << "\n";
    // prints "Weekly Changes: FOOBAR +15% -76%"

    // Print list as a right-aligned list of hex values
    std::array some_numbers { 123456, 90346873, 28 };

    std::ios state(nullptr);
    state.copyfmt(std::cout);

    for (auto num: some_numbers) {
        std::cout << "\t0x";
        std::cout << std::hex << std::uppercase << std::setw(8)
            << std::setfill('0') << num;
        std::cout.copyfmt(state);
        std::cout << "\n";
    }
    /* Prints:
        0x0001E240
        0x05629579
        0x0000001C
    */
}

JavaScript

const name = "FOOBAR";
const max_gain = 0.15;
const max_loss = 0.76;

console.log(`Weekly Changes: ${name} +${max_gain * 100}%` +
    ` -${max_loss * 100}%`);
// prints "Weekly Changes: FOOBAR +15% -76%"

// Print list as a right-aligned list of hex values
const someNumbers = [123456, 90346873, 28];
for (i in someNumbers) {
    const hex = someNumbers[i].toString(16);
    let padding = "";
    if (hex.length < 8) {
        padding = "0".repeat(8 - hex.length);
    }
    console.log(`\t0x${padding}${hex}`);
}
/* Prints:
    0x0001e240
    0x05629579
    0x0000001c
*/

What This Code Does

This code demonstrates how string formatting is done with two very simple examples. The first simply prints out a stock-symbol and it's maximum change (positive and negative), representing a floating-point number as a percentage. The second example prints a list of base-10 integers as their hex value. All hex values have a prefix of 0x and are eight characters wide (left-padded with zeros).

What's the Same

Both examples shown have to manually format the percentage values in the first example.

What's Different

While the JS version relies on string interpolation to provide formatting, C++ relies on an iostream (Input/Output Stream). An iostream is more than just a simple mechanism for printing as it allows us to specify transformers that apply to all values sent through the stream. Not in particular this bit of code:

 std::cout << std::hex << std::uppercase << std::setw(8)
    << std::setfill('0') << num;

These modifiers inform the stream that we want to print hex values, in uppercase, with a width of 8 characters, and a "fill" of 0 (our left-padding). Then we print the value num, which passes through all of these filters we've set on our stream. While this can be an extremely powerful abstraction, we can see that it is also a little cumbersome when data is interleaved with other output in which we don't want to apply the same formatting rules. We use the state object to constantly reset our stream state before printing other values.

Because of C++'s close relationship with C, we can also use the printf/sprintf family of functions for formatting, in which our hex-formatting code would look like:

printf("\t0x%08X\n", num);

This approach is also often the recommended approach when performing formatting in a performance-critical/sensitive portion of code.

Fork me on GitHub