utl::stre
utl::stre (aka string expansions) header contains implementations of most commonly used string utils.
Motivation: Despite the seeming triviality of the topic a lot of implementations found online are either horribly inefficient or contain straight up bugs in some edge cases. Here, the goal is to “get it right” so no time would be spent reinventing the wheel in the future.
Definitions
// Trimming
template <class T> std::string trim_left( T&& str, char trimmed_char = ' ');
template <class T> std::string trim_right(T&& str, char trimmed_char = ' ');
template <class T> std::string trim( T&& str, char trimmed_char = ' ');
// Padding
std::string pad_left( std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad_right(std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad( std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad_with_leading_zeroes(unsigned int number, std::size_t length = 10);
// Case conversions
template <class T> std::string to_lower(T&& str);
template <class T> std::string to_upper(T&& str);
// Substring checks
bool starts_with(std::string_view str, std::string_view substr);
bool ends_with( std::string_view str, std::string_view substr);
bool contains( std::string_view str, std::string_view substr);
// Token manipulation
template<class T> std::string replace_all_occurrences(T&& str, std::string_view from, std::string_view to);
std::vector<std::string> split_by_delimiter(std::string_view str, std::string_view delimiter, bool keep_empty_tokens = false);
// Other utils
std::string repeat_char( char ch, std::size_t repeats);
std::string repeat_string(std::string_view str, std::size_t repeats);
std::string escape_control_chars(std::string_view str);
std::size_t index_of_difference(std::string_view str_1, std::string_view str_2);
[!Note] Functions that can utilize mutable input string for a more efficient implementation are declared with
template <class T>
and use perfect forwarding. This means whenever r-value arguments are provided they automatically get reused, while l-values are copied.
Methods
Trimming
template <class T> std::string trim_left( T&& str, char trimmed_char = ' ');
template <class T> std::string trim_right(T&& str, char trimmed_char = ' ');
template <class T> std::string trim( T&& str, char trimmed_char = ' ');
Trims characters equal to trimmed_char
from the left / right / both sides of the string str
.
Padding
std::string pad_left( std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad_right(std::string_view str, std::size_t length, char padding_char = ' ');
std::string pad( std::string_view str, std::size_t length, char padding_char = ' ');
Pads string str
with characters padding_char
from left / right / both sides until it reaches size length
.
Note: If str.size >= length
the string is left unchanged.
std::string pad_with_leading_zeroes(unsigned int number, std::size_t length = 10);
Pads given integer with leading zeroes until its length reaches length
. Useful for numbering files/data entries so they can be lexicographically sorted.
Note: If number
has more than length
digits, resulting string is the same as std::to_string(number)
.
Case conversions
template <class T> std::string to_lower(T&& str);
Replaces all uppercase letters ABCDEFGHIJKLMNOPQRSTUVWXYZ
in the string str
with corresponding lowercase letters abcdefghijklmnopqrstuvwxyz
.
template <class T> std::string to_upper(T&& str);
Replaces all lowercase letters abcdefghijklmnopqrstuvwxyz
in the string str
with corresponding uppercase letters ABCDEFGHIJKLMNOPQRSTUVWXYZ
.
Substring checks
bool starts_with(std::string_view str, std::string_view substr);
bool ends_with( std::string_view str, std::string_view substr);
bool contains( std::string_view str, std::string_view substr);
Returns true
if string str
starts with / ends with / contains the substring substr
.
Token manipulation
template<class T> std::string replace_all_occurrences(T&& str, std::string_view from, std::string_view to);
Scans through the string str
and replaces all occurrences of substring from
with a string to
.
std::vector<std::string> split_by_delimiter(std::string_view str, std::string_view delimiter, bool keep_empty_tokens = false);
Splits string str
into a vector of std::string
tokens based on delimiter
.
By default keep_empty_tokens
is false
and ""
is not considered to be a valid token — in case of leading / trailing / repeated delimiters, only non-empty tokens are going to be inserted into the resulting vector. Setting keep_empty_tokens
to true
overrides this behavior and keeps all the empty tokens intact.
Other utils
std::string repeat_char( char ch, std::size_t repeats); std::string repeat_string(std::string_view str, std::size_t repeats);
Repeats given character or string a given number of times and returns as a string.
std::string escape_control_chars(std::string_view str);
Escapes all control & non-printable characters in the string str
using standard C++ notation (see corresponding example for a better idea).
Useful when printing strings to the terminal during logging & debugging.
std::size_t index_of_difference(std::string_view str_1, std::string_view str_2);
Returns the index of the first character that is different between string str_1
and str_2
.
When both strings are the same, returns str_1.size()
.
Throws std::logical_error
if str_1.size() != str_2.size()
.
Examples
Trimming strings
[ Run this code ]
using namespace utl;
assert(stre::trim_left( " lorem ipsum ") == "lorem ipsum ");
assert(stre::trim_right(" lorem ipsum ") == " lorem ipsum" );
assert(stre::trim( " lorem ipsum ") == "lorem ipsum" );
assert(stre::trim("__ASSERT_MACRO__", '_') == "ASSERT_MACRO");
Padding strings
[ Run this code ]
using namespace utl;
assert(stre::pad_left( "value", 9) == " value");
assert(stre::pad_right("value", 9) == "value ");
assert(stre::pad( "value", 9) == " value ");
assert(stre::pad(" label ", 15, '-') == "---- label ----");
assert(stre::pad_with_leading_zeroes(17) == "0000000017");
Converting string case
[ Run this code ]
using namespace utl;
assert(stre::to_lower("Lorem Ipsum") == "lorem ipsum");
assert(stre::to_upper("lorem ipsum") == "LOREM IPSUM");
Using substring checks
[ Run this code ]
using namespace utl;
assert(stre::starts_with("lorem ipsum", "lorem"));
assert(stre::ends_with( "lorem ipsum", "ipsum"));
assert(stre::contains( "lorem ipsum", "em ip"));
Performing token manipulations
[ Run this code ]
using namespace utl;
// Replacing tokens
assert(stre::replace_all_occurrences("xxxAAxxxAAxxx", "AA", "BBB") == "xxxBBBxxxBBBxxx" );
// Splitting by delimiter
auto tokens = stre::split_by_delimiter("aaa,bbb,ccc,", ",");
assert(tokens.size() == 3);
assert(tokens[0] == "aaa");
assert(tokens[1] == "bbb");
assert(tokens[2] == "ccc");
// Splitting by complex delimiter while keeping the empty tokens
tokens = stre::split_by_delimiter("(---)lorem(---)ipsum", "(---)", true);
assert(tokens.size() == 3);
assert(tokens[0] == "");
assert(tokens[1] == "lorem");
assert(tokens[2] == "ipsum");
Using other utilities
[ Run this code ]
using namespace utl;
// Repeating chars/strings
assert(stre::repeat_char( 'h', 7) == "hhhhhhh" );
assert(stre::repeat_string("xo-", 5) == "xo-xo-xo-xo-xo-");
// Escaping control chars in a string
const std::string text = "this text\r will get messed up due to\r carriage returns.";
std::cout
<< "Original string prints like this:\n" << text << "\n\n"
<< "Escaped string prints like this:\n" << stre::escape_control_chars(text) << "\n\n";
// Getting index of difference
assert(stre::index_of_difference("xxxAxx", "xxxxxx") == 3);
Output:
Original string prints like this:
carriage returns.p due to
Escaped string prints like this:
this text\r will get messed up due to\r carriage returns.