UTL

Collection of self-contained header-only libraries for C++17

View on GitHub

utl::stre

<- to README.md

<- to implementation.hpp

utl::stre (aka string expansions) header contains implementations of most commonly used string utils.

Motivation: Despite the seeming triviality of the topic a lot of implementations found online are either horribly inefficient or contain straight up bugs in some edge cases. Here, the goal is to “get it right” so no time would be spent reinventing the wheel in the future.

Definitions

// Trimming
template <class T> std::string trim_left( T&& str, char trimmed_char = ' ');
template <class T> std::string trim_right(T&& str, char trimmed_char = ' ');
template <class T> std::string trim(      T&& str, char trimmed_char = ' ');

// Padding
std::string pad_left( std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad_right(std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad(      std::string_view str, std::size_t length, char padding_char = ' ');

std::string pad_with_leading_zeroes(unsigned int number, std::size_t length = 10);

// Case conversions
template <class T> std::string to_lower(T&& str);
template <class T> std::string to_upper(T&& str);

// Substring checks
bool starts_with(std::string_view str, std::string_view substr);
bool ends_with(  std::string_view str, std::string_view substr);
bool contains(   std::string_view str, std::string_view substr);

// Token manipulation
template<class T> std::string replace_all_occurrences(T&& str, std::string_view from, std::string_view to);

std::vector<std::string> split_by_delimiter(std::string_view str, std::string_view delimiter, bool keep_empty_tokens = false);

// Other utils
std::string repeat_char(              char  ch, std::size_t repeats);
std::string repeat_string(std::string_view str, std::size_t repeats);

std::string escape_control_chars(std::string_view str);

std::size_t index_of_difference(std::string_view str_1, std::string_view str_2);

[!Note] Functions that can utilize mutable input string for a more efficient implementation are declared with template <class T> and use perfect forwarding. This means whenever r-value arguments are provided they automatically get reused, while l-values are copied.

Methods

Trimming

template <class T> std::string trim_left( T&& str, char trimmed_char = ' ');
template <class T> std::string trim_right(T&& str, char trimmed_char = ' ');
template <class T> std::string trim(      T&& str, char trimmed_char = ' ');

Trims characters equal to trimmed_char from the left / right / both sides of the string str.

Padding

std::string pad_left( std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad_right(std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad(      std::string_view str, std::size_t length, char padding_char = ' ');

Pads string str with characters padding_char from left / right / both sides until it reaches size length.

Note: If str.size >= length the string is left unchanged.

std::string pad_with_leading_zeroes(unsigned int number, std::size_t length = 10);

Pads given integer with leading zeroes until its length reaches length. Useful for numbering files/data entries so they can be lexicographically sorted.

Note: If number has more than length digits, resulting string is the same as std::to_string(number).

Case conversions

template <class T> std::string to_lower(T&& str);

Replaces all uppercase letters ABCDEFGHIJKLMNOPQRSTUVWXYZ in the string str with corresponding lowercase letters abcdefghijklmnopqrstuvwxyz.

template <class T> std::string to_upper(T&& str);

Replaces all lowercase letters abcdefghijklmnopqrstuvwxyz in the string str with corresponding uppercase letters ABCDEFGHIJKLMNOPQRSTUVWXYZ.

Substring checks

bool starts_with(std::string_view str, std::string_view substr);
bool ends_with(  std::string_view str, std::string_view substr);
bool contains(   std::string_view str, std::string_view substr);

Returns true if string str starts with / ends with / contains the substring substr.

Token manipulation

template<class T> std::string replace_all_occurrences(T&& str, std::string_view from, std::string_view to);

Scans through the string str and replaces all occurrences of substring from with a string to.

std::vector<std::string> split_by_delimiter(std::string_view str, std::string_view delimiter, bool keep_empty_tokens = false);

Splits string str into a vector of std::string tokens based on delimiter.

By default keep_empty_tokens is false and "" is not considered to be a valid token — in case of leading / trailing / repeated delimiters, only non-empty tokens are going to be inserted into the resulting vector. Setting keep_empty_tokens to true overrides this behavior and keeps all the empty tokens intact.

Other utils

std::string repeat_char(              char  ch, std::size_t repeats);
std::string repeat_string(std::string_view str, std::size_t repeats);

Repeats given character or string a given number of times and returns as a string.

std::string escape_control_chars(std::string_view str);

Escapes all control & non-printable characters in the string str using standard C++ notation (see corresponding example for a better idea).

Useful when printing strings to the terminal during logging & debugging.

std::size_t index_of_difference(std::string_view str_1, std::string_view str_2);

Returns the index of the first character that is different between string str_1 and str_2.

When both strings are the same, returns str_1.size().

Throws std::logical_error if str_1.size() != str_2.size().

Examples

Trimming strings

[ Run this code ]

using namespace utl;

assert(stre::trim_left( "   lorem ipsum   ") ==    "lorem ipsum   ");
assert(stre::trim_right("   lorem ipsum   ") == "   lorem ipsum"   );
assert(stre::trim(      "   lorem ipsum   ") ==    "lorem ipsum"   );

assert(stre::trim("__ASSERT_MACRO__", '_') == "ASSERT_MACRO");

Padding strings

[ Run this code ]

using namespace utl;

assert(stre::pad_left( "value", 9) == "    value");
assert(stre::pad_right("value", 9) == "value    ");
assert(stre::pad(      "value", 9) == "  value  ");

assert(stre::pad(" label ", 15, '-') == "---- label ----");

assert(stre::pad_with_leading_zeroes(17) == "0000000017");

Converting string case

[ Run this code ]

using namespace utl;

assert(stre::to_lower("Lorem Ipsum") == "lorem ipsum");
assert(stre::to_upper("lorem ipsum") == "LOREM IPSUM");

Using substring checks

[ Run this code ]

using namespace utl;

assert(stre::starts_with("lorem ipsum", "lorem"));
assert(stre::ends_with(  "lorem ipsum", "ipsum"));
assert(stre::contains(   "lorem ipsum", "em ip"));

Performing token manipulations

[ Run this code ]

using namespace utl;

// Replacing tokens
assert(stre::replace_all_occurrences("xxxAAxxxAAxxx",  "AA",  "BBB") == "xxxBBBxxxBBBxxx" );

// Splitting by delimiter
auto tokens = stre::split_by_delimiter("aaa,bbb,ccc,", ",");
assert(tokens.size() == 3);
assert(tokens[0] == "aaa");
assert(tokens[1] == "bbb");
assert(tokens[2] == "ccc");

// Splitting by complex delimiter while keeping the empty tokens
tokens = stre::split_by_delimiter("(---)lorem(---)ipsum", "(---)", true);
assert(tokens.size() == 3);
assert(tokens[0] == "");
assert(tokens[1] == "lorem");
assert(tokens[2] == "ipsum");

Using other utilities

[ Run this code ]

using namespace utl;

// Repeating chars/strings
assert(stre::repeat_char(    'h', 7) == "hhhhhhh"        );
assert(stre::repeat_string("xo-", 5) == "xo-xo-xo-xo-xo-");

// Escaping control chars in a string   
const std::string text = "this text\r will get messed up due to\r carriage returns.";
std::cout
    << "Original string prints like this:\n" <<                            text  << "\n\n"
    << "Escaped  string prints like this:\n" << stre::escape_control_chars(text) << "\n\n";

// Getting index of difference
assert(stre::index_of_difference("xxxAxx", "xxxxxx") == 3);

Output:

Original string prints like this:
 carriage returns.p due to

Escaped  string prints like this:
this text\r will get messed up due to\r carriage returns.