paint-brush
Everything You Need to Know About Text String Manipulationby@tom2
1,430 reads
1,430 reads

Everything You Need to Know About Text String Manipulation

by RutkatJuly 23rd, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

For those new to coding or even experienced coders, this guide details how to manipulate text strings, just like the pros.

Company Mentioned

Mention Thumbnail
featured image - Everything You Need to Know About Text String Manipulation
Rutkat HackerNoon profile picture

For those new to coding or even experienced coders, this guide details how to manipulate text strings, just like the pros using Javascript. It is useful if you haven't worked with strings or user-facing web applications. You will quickly go from beginner to expert using javascript, built-in methods, and powerful regular expressions. If you're a javascript beginner, the built-in methods just mean that they are functions native to the javascript coding language.

Have you wondered how censoring words on the internet occurs? Perhaps you want to know why your username on apps has to conform to specific rules? This is done through string manipulation using code such as javascript. A string is just a specific name used to label a piece of data that contains text and can consist of alphanumeric characters mixed with numbers and symbols.

Why is it important? Every software application with a presentation layer (web app) applies a form of string manipulation, and it is the foundation of algorithms. Think about how it applies to business ideas as well. Grammarly is an excellent example of a business that is all about string manipulation.

Text And Strings

The first thing to consider is how to engage text manipulation from a visual perspective. For example, if you're a non-coder or just a human being, you know you can write text on paper, on your smartphone, computer, and even rice. Okay, maybe not rice. The writing can occur from left-to-right, top-to-bottom, right-handed, left-handed, etc. Afterward, you can manipulate what you wrote with an eraser, scratching it out, or tapping the backspace key.

From a coder's perspective, it doesn't work the same way, except when writing the actual code. The code instructions for manipulating strings have restrictions and specific methods. You will learn these methods here but let's start with a visual approach to envision how code will do the magical transformations.

Direction ↔️

Like writing, strings can be manipulated from left-to-right and right-to-left. The length of a string can be as little as a single space to pages of text, but most commonly in code, a string will not be longer than a sentence. A string can be a username, phone number, a snippet of code, a poem etc. When working with a specific coding language, there are built-in methods to use, or you can create your own custom method. A combination of these methods can manipulate text to do virtually whatever you want. You can become a string master with the force of practice.

Besides processing a string from left-to-right or right-to-left, it can be broken down and manipulated to individual characters using the number representing the position of any character. This is known as the index value of the string. For example, the string "Hello!" contains 6 characters, so your code can directly access any letter by indicating a corresponding index number.

"Hello!"
123456 (number represents position in string)

Traversing

Several coding methods will process the string in this ascending-numerical order however since computers compute with a basis of zero, the first item position is always 0. To be more accurate, I should state that the computer is traversing, not processing strings. The difference is that "processing" indicates an effect happens, whereas "traversing" indicates a passage or travel across something. When dealing with code instructions, you should be conscious about the computing resources utilized so you may not need to process every character in a string but rather traverse to the individual character you need to change.

For example, your objective is to remove punctuation, so you have several approaches to remove the "!" From "Hello!". You can use a method to find the position of "!" or you can access the last character of the string. These methods include getting the length of the string, getting the index of "!" or traversing the string in reverse. If you use the length method, you have to remember to subtract 1 since computing starts with zero. Also, spaces count as part of the string and will have an index position, thus increasing the length of the string.

The INDEX number represents the position of a character in a string.

"Hello!"
 012345 character positions

"Hello!".length - 1
Length is a property of a string.

"Hello!"
012345 (character positions of the string Hello!)

"Hello!".length
(Length is a method of strings which gets the character count)

Here are methods to get the position of a character in a string:

"Hello!".indexOf("!")
(Find the first position of a character in the order of left-to-right)

"Hello!".lastIndexOf("!")
(Find the last position of a character in the order of right-to-left)

"Hello!".length - 1
(Find the character count minus 1 which equals the last position of the string)

(All give 5 ast the result. You can do the opposite with the charAt() method which returns the character from a string specified by the index position)

"Hello!".charAt(5)
(Result is "!")

One Character

Now you know the basics of traversing a string one character at a time, which are from the left, from the right, and from the end using index numbers. However, not all methods return the position of the character you seek. You may prefer a result as a boolean data type instead. Meaning your search is a test that returns true or false.

Boolean test methods:

includes, startsWith, endsWith.

"Hello!".includes("!")
(Returns True)

"Hello!".startsWith("!")
(Returns False)

"Hello!".endsWith("!")
(Returns True)


These character boolean tests are not as useful as finding the position of a character. You cannot proceed with your algorithm if your purpose is to modify the string with the same search query this way. Besides, there are more powerful methods for true/false checks, which we will be described later.

Up to this point we have learned to traverse a string left-to-right and right-to-left so what's the next step? Modification!

We can use several built-in methods or create our own for changing the text in a string. Let's start with the methods which don't require indicating a search query or index position. Since humans care more about uppercase and lowercase letters than computers, we can instantly transform an entire string using these two methods:

"Hello!".toUpperCase()
(Result becomes "HELLO!")

"Hello!"toLowerCase()
(Result becomes "hello!")

If you have seen a camel, then you know they have humps, and in programming, when code

LooksLikeThis
- it is called the camel case. This is because it has humps and no spaces. Eventually, you will have to traverse came case strings in your coding career. We do this to make the text easier to read for humans because who likes to read "a sEnTEnCe liKE ThiS!?" THOSE THIS UPPERCASE TEXT REALLY MEAN I'M YELLING? LOL

A practical use case to use this method is for blogs which take an article title and create a URL known as a slug.

Example
Article name: "Mastering String Manipulation"
Slug url: "/mastering-string-manipulation/"

Since there are multiple methods to get the same result, Let's begin with a simple example of combining strings. This is known as concatenation. You can use the

+
symbol or the
concat()
method in Javascript. Please note that since Javascript does not automatically enforce data types, you should ensure that the data types are strings as opposed to arrays or booleans when using +. This topic is for another entire article.

With the lack of data type enforcement, erroneous output can occur as a result of type coercion. In other words, it means the + sign can accidentally change an integer to a string.

"Hello" + "World"
(Result "HelloWorld")

"Hello".concat(" World")
(Result "Hello World")

"12" + 12
(Result "1212", not 24 because of type coercion)

The modern to concatenate strings is with template literals. They utilize the back-tick symbol

`
and curly braces
{}
after the
$
symbol. Using these three symbols is required for every string variable.

This occurs in emails as well as websites which personalize the writing output using the user's information.

var myString = "Hello"
var string2 = "World"
console.log(`${myString} ${string2}`)

(Result "Hello World")

Previously I stated that empty spaces count towards the length of a string. In other words, they occupy a space in a string and can be manipulated as well. Since we want to be efficient in saving data as well as making text easy to read, we want to prevent unnecessary blank space and this can be done with the

trim()
method.

The method removes empty spaces at the beginning and end of a string but not in the middle. If you want to remove empty space in the middle of a string, you have to utilize a more powerful method known as a "regular expression," which will be described in this article.

"   Hello World.   ".trim()
(Result "Hello World")

To do the opposite (add spaces), there is a method for that. You can pad a string at the end or beginning with any character. Let's say your web app deals with sensitive information like credit cards, or you have ID numbers that have to conform to a specific length. You can use the

padStart()
and
padEnd()
methods for this. For example, a credit card number is saved in the app, but you only want to show the last four digits prefixed with the * symbol.

"4444".padStart(8, "*")
(Result "********4444")

"1234".padStart(4, "0")
(Result "00001234")

Besides concatenating strings, you can also repeat them with a multiplier. It's uncommon to repeat text, so the method will be more useful for symbols such as periods. For example, when you need to truncate a string and indicate to the reader that the string continues, you can use ellipses like this... It could also be useful for songs where lyrics are repeated. I think it's rare to see this method used in daily programming but not impossible.

"Hello-".repeat(3)
(Result "Hello-Hello-Hello-")

Pizza Slice 🍕

Let's expand our character searches!

Using the previous search methods, we are only able to retrieve one character at a time from a string. What if we want to select a word or a section of a string using a range. We certainly can do that by slicing a pizza and eating the slice we want. Almost!

The string method is called slice, so a pizza slice is a good metaphor. For this, you have to pass in the start and end positions of the index. The start position can be a negative number which will traverse the string in reverse or from the end of it. You may think, wouldn't it be easier to just match a word inside a string? Well, yes, but in many cases, coders may not be able to predict what strings they will encounter or the string will be a pre-determined length.

"Hello World".slice(6)
(Result "World")

"Hello World".slice(6, 8)
(Result "Wo")

"Hello World".slice(-3)
(Result "rld")

Up to this point, you have learned to traverse strings from the left and from the right, get character positions, do boolean tests, transform character cases, concatenate strings, remove empty space, pad, repeat strings, and extract substrings. How about we learn how to modify our strings with the

replace()
method.

Practical scenarios for this can be removing explicit words, swapping the first name with the last name, swapping "-" for empty space " " or vice-versa.

The difference with the replace method compared to the previous methods is that

replace()
accepts strings and regular expressions as search queries. It also accepts a function as a second parameter, but we won't go into custom functions at this time. With replace, you don't need to rely on using index positions but you need to be familiar with regular expressions (regex for short) because it is the way to replace multiple instances of the search query. Note the usage of a regular expression with the forward slashes surround the search term and the /g to represent global replacement in the entire string.

"Very bad word".replace("bad", "good")
(Result "Very good word")

"Very bad bad word".replace("bad", "good")
(Result "Very good bad word")

"Very bad bad word".replace(/bad/, "good")
(Result "Very good bad word')

"Very bad bad word".replace(/bad/g, "good")
(Result "Very good good word")

Cryptic Patterns

Are you beginning to feel the power of string manipulation? Say yes 🙂, it's a good affirmation as part of becoming an expert. A regexp can be denoted using the forward-slash/outside of the search word and the letter g after the second slash / indicates a global search which will replace multiple instances of the word inside the string. Generally, it's better to use

indexOf()
and
replace()
for faster function execution speed and when searching for one instance of a word. That's why multiple methods exist.

To understand regular expressions, it's recommended to memorize the character symbols used - there are many symbols, including letter cases. In fact, there's nothing regular about "regular expressions". It should be called "cryptic patterns" because no human being can read them without finding the meaning of the symbols used. To simplify the meaning of human language consumption, you can also say they are string-searching algorithms.

Magic Wand 🪄

Before I show you some of the characters used, I would like to paint 🎨 you a picture of the traversing that happens using regexp. First, imagine a magic wand in your hand. Waving the magic wand releases magical stars ✨ onto the string which modify it to the desired string you want. Each star represents a symbol in the regular expression, and that is what you have to come up with as a search pattern.

Regular expressions are truly powerful search techniques. You can find a needle in a haystack instantly. Many input forms on the web use regular expressions to convert text into specific formats such as zip codes, phone numbers, domain names, currency values, and the list can go on. Note that there are different regular expression engines depending on the programming language. The following is specific to javascript:

/term/ 
regexp
always has to be contained inside two forward slashes. "A/B/C" is not a regexp. Every character or symbol between the slashes represents something other than the symbol itself.

/abc/
Any alphabetical character without symbols is equivalent to a regular consecutive search string.

/\$/
An explicit search for a symbol has to be prefixed with a backward slash \, in this case, it's the dollar symbol. It's called escaping even though none of them will run away. The symbols still need to escape from the wrath of your cryptic search desires.

/^abc/
and
/abc$/
These symbols don't have to be escaped. They are the carrot ^ and dollar sign $. Their purpose is to restrict the search to the beginning and end of a string, respectively. This is also known as anchoring so they can be called anchors. In this case, it means if "abc" is in the middle of "xyzabczyx", it will be ignored. ^ means the string must start with "abc" and $ means that the string must end with "abc". You can apply one or both.

What if you don't want to search for an alphabetical character nor a symbol but a formatting change in the string. Since I mentioned an empty space has meaning in code, so does a tab, a new line, and a carriage return. These can be searched using a combination of the backslash and one letter. For brevity, we've excluded the surrounding slashes.

\n find a new line
\t find a tab
\r find a carriage return

This is mind-blowing, right? You can manipulate empty space and look for invisible meta-characters which control formatting using regexp. Let's try a regexp example based on what we know so far. We want a specific dollar amount at the beginning of a string $10.xx and any cent amount.

/^\$10\.\d\d/

Meanings of each symbol:
^ must start with
\$ dollar sign
\d any digit 0-9

We are using ^ to match the start
-then a backslash \ to escape the dollar $ sign
-the number 10 followed by an escaped period \.
-the escaped \d represents any digit 0-9, so we have it twice

As previously mentioned, adding a backslash to any letter changes the search pattern. Here are some search patterns with the backlash and letter combination.

\w matches any word
\d matches any digit
\s matches empty space

In addition to that, you can match the negation of the opposite with the capital letter equivalents.

\W doesn't match a word
\D doesn't match a digit
\S doesn't match empty space

Globally Insensitive 🌎

Now that you are getting more comfortable with the possibilities of regular expressions, you need to be aware of the letters "g" and "i" at the ending of the regexp term, right after the second forward slash. These are known as flags that modify your search. The "g" means global, so it will return more than one result match if available, while the "i" means insensitive in regards to text case. Uppercase or lowercase will not matter using this flag.

/term/g Finds multiple instances, not just the first one
/term/i Find uppercase and lowercase characters
/term/gi You can combine both flags g and i

To expand on your searches, here's the next eddition of complexity. You may want to find a combination of letters, numbers, or symbols. You can do this by grouping inside parentheses

()
and brackets
[]
. The brackets are specific to character ranges such as 0-9 or A-Z uppercase, a-z lowercase.

You can use multiple dashes for multiple ranges inside a single set of brackets. The parentheses are not useful alone, but when you have additional search terms in one regexp. To throw in a monkey wrench, the carrot

^
symbol inside a bracket set will negate the search.

/[abc]/ Matches any of the letters a, b, or c
/[0-7]/ Matches numbers 0-7 anywhere in the string
/[^0-7]/ Doesn't match numbers 0-7 anywhere in the string

[0-9] is identical to the

\d
for digits while
\w
is identical for [a-z] words.

Using parentheses

()
is useful when you want to search more than one pattern such as international phone numbers while brackets
[]
or for searching sets. When using parentheses in your search, you may also include the pipe symbol
|
as an OR operator. This means your result can be the search pattern on either side of the pipe. This is known as alternation. Here are examples:

/[abc](123)/ matches a, b, or c followed by 123
/gr[ae]y/ matches gray or grey
/(gray|grey)/ matches gray or grey as the entire word

Quantity to Match

Do you want to match a specific amount of letters or numbers? Perhaps 0 or 1, 1 or many, only 4. It's all possible with regular expression quantifiers. Here's are quantifier symbols and how you can use them. We will use the letter "a" as part of the example.

/a*/ Matches 0 or more instances of the letter a
/a+/ Matches 1 or more instances of the letter a
/a?/ Matches 0 or 1 instances of the letter a
/a{4}/ Matches exactly 4 consecutive instances of letter a
/a{2,3}/ Matches between 2-3 instances of the letter a

The possibilities don't stop here. This is why algorithms utilize regular expressions regularly so becoming an export in them is going to take you a long way. In total, there are 11 metacharacters available for regular expressions.
They are:

\ ^ $ . | ? * + () [] {}

Each one has a purpose.

Another practical example is to find html tags because they are the foundation of websites. Let's think this through before typing out the expression. We need at least one letter because all tags start with a letter, and while it should be lowercase, we may encounter legacy html that is capitalized. Next, we shall expect more letters or a number such as h1 tags. While the

*
will get one or more characters, we can limit the amount using
{}
instead. The following will capture html tags without attributes:

/<[A-Za-z][A-Za-z0-9]*>/g 
(Matches html tags which can be any letter followed by a number)

Finally, there is another advanced concept if regular expressions weren't advanced enough. It is called the lookahead. There's a positive and a negative lookahead. It must be placed inside parentheses and begin with a question mark ?. Essentially a lookahead matches the search pattern but does not capture it or you can think of it as to match something not followed by something else. This is useful when making a combined search pattern by grouping. To demonstrate, let's search for a dollar value in a string that is followed by "USD", but we don't want to capture the "USD". We will use the positive lookahead using

(?=
and the negative lookahead using
(?!
.

/\$30(?=USD)/
(Matches $30 from the string "The product costs $30USD")

/\$30(?!USD)/
(Matches $30 from the string "The USD value is $30")

Begin Your Journey

Now you have gone through the fundamentals of querying, matching, and modifying the data primitives of javascript known as strings. Just reading this won't give you the ability to work these methods. You must use it in practice through code editors and internet browsers. The examples provided in this article can be used to test them for yourself, and you should retype them instead of copying and pasting. So go forth and build up your skills in coding with javascript.

Article Photo credit https://unsplash.com/@agni11

Also published here.