String Manipulation

String Manipulation: Introduction to stringr and regular expressions.

Some primer on why regex is useful



Case Manipulation





It might be of interest to see how long the sentence is in terms of how many letters each word is


Now the sentence is a string vector, so the str_length function will vectorize it


String Manipulation






Regular Expression Glossary:

Looking for numbers

\\d and [0-9]

Example:

Looking for boundary character

\\b

Example:


Looking for word characters

\\w

Example:

Look for characters in the range of a-z (case-sensitive)

[a-z]

Example:



Look for characters in the range of A-Z (case-sensitive)

[A-Z]

Look for characters in the range of A-Z and a-z (case-sensitive)

[aA-zZ]

Match your pattern exactly n times

{n}

Match your pattern >= n

{n,}

Match your pattern between n and k times

{n,k}

Example:



Keep matching until you encounter a new pattern

+




Match any character except for line break. Useful when you don’t know how many characters are in the pattern

.



Match zero or more times

*


Match start of a string

^

Example: WHY DOES THIS NOT WORK?

Match end of a string

$

Example:


Regular Expressions

Replace a word with something else, we will return the string into sentence format


Replace any three letter word with “cake”



Remove a pattern



Detect

Does anything in your string match this pattern



Using stringi to generate passwords

stri_rand_strings accepts the following arguments:

  • n: The number of strings you want to make

  • length: The length of the string you want

  • pattern: The pattern you want to match


String Manipulation: Use Cases - String Extraction

Imagine that you were given the following dataset:


Your task is to extract just the numbers. You could do it one of two ways:



Both methods return the same values, but with fewer needed regular expressions to match in the second example