Python — Data types and operations — Strings
vvrubel- Quotes and multi-line strings
- Escape sequences
- String formatting
- Basic string methods
- Split and join
- Search in a string
Quotes and multi-line strings
You are already familiar with strings that are extremely common and useful in programming. Let's take a look at some features of Python strings related to quotes and multi-line strings.
§1. Quotes
As you know, a string literal is surrounded by a pair of single or double quotes. There is basically no difference between the two, but there are some common conventions concerning the use:
- use double quotes if your string contains single quotes, for example,
"You're doing great!" - use single quotes if your string contains double quotes, for example,
'Have you read "Hamlet"?' - do NOT mix two styles in one literal, for example, something like
"string!'is NOT correct - most importantly, be consistent in your use!
There is a way to include any quotes in your string, regardless of the style of the outer quotes, and that is to use the backslash symbol (\) before the quotes inside of the string. The backslash will basically tell Python that the quote symbol that follows it is a part of the string rather than its end or beginning. It is called escaping, and you'll learn about it in detail in the next topics.
So in the examples below both ways of writing the strings are correct and will produce the same result:
# example 1
print("You're doing great!")
print('You\'re doing great!')
# example 2
print("Have you read \"Hamlet\"?")
print('Have you read "Hamlet"?')
§2. Multiline strings
Strings can represent a long text, a single character or even zero characters (like an empty string). But so far our strings were just in one line, no matter how long they were. You can also write multi-line strings in Python, and to do that you need to use triple quotes on each side of the string literal. Again, which quotes to choose, single or double, is up to you, both work fine in Python.
- Multi-line string in double quotes:
print("""This
is
a
multi-line
string""")
- Multi-line string in single quotes:
print('''This
is
a
multi-line
string''')
Both examples print the same result:
This is a multi-line string
Well, these are just some basics, strings in Python are much more interesting, and there are so many things you can do with them!
Escape sequences
Let's recall the problems that you may encounter using the print() function.
Nobody likes to write long forms of words, so here comes apostrophe. Be careful, though, when using an apostrophe in strings in Python, you might get an error message. Let's take a look at the example below:
# Bad example warning = 'That's my car' print(warning)
The sentence seems to look fine, but Python will show you an error message «EOL while scanning string literal». Why did that error occur? The abbreviation "EOL" stands for "End-of-line" and it means that Python went through the text and didn't find the end of the string. The sentence was divided into two parts. The first one is "That" and the second one is "s my car".
§1. What is an escape sequence?
To avoid the described problem you should use escape sequences.
All escape sequences start with a backslash \, which is interpreted as an escape character. A backslash indicates that one or more characters must be handled in a special way or that the next character after it has a different meaning.
Add an escape character to our example and see that the quote is now interpreted as the literal symbol of a single quote without any errors.
# Better example warning = 'That\'s my car' print(warning) # That's my car
Don't forget that single quotes and an apostrophe in the same sentence are a bad style! According to PEP8, it is better to use double quotes in these cases.
A backslash can be used in a combination with another symbol to do some particular things. For example, if you use \b in the middle of the string and then try to print it, you won't see the character before \b. The combination \b means a backspace character:
print('deleted\b sign') # delete sign
Take a look at some other escape sequences:
\n– newline\t– horizontal tabulation\r– moves all characters after\rto the beginning of the line, overwriting as many characters as moved.
The use of escape sequences in a string is always the same. Nevertheless, we will give more examples in the next section.
So, what if you need to print the backslash itself?
print('\') # SyntaxError: EOL while scanning string literal
The error happens because Python expects an escape sequence that is not there. In this case, we must use a double backslash \\
print('\\') # \
You add an extra \ to tell Python that the next \ should not be interpreted as the start of an escape sequence.
Now if you want to print text that contains \, you can double it. For example, this is useful when you need to print literally \n, because print('\n') will only output a new blank line. Double backslash will help you in such situations!
print('\\n') # \n
And you can also write:
metal = '\m/' print(metal) # \m/
Why is everything correct in such a case? Our string with \ printed correctly, because \m is not an escape expression. Therefore, no formatting has occurred. One more example:
face = '\^_^/' print(face) # \^_^/
The function repr() returns a printable representation of this string, thus, escape sequences are visible.
print(repr(face)) # '\\^_^/'
§2. Other examples
Let's consider an example with the escape sequence \n:
# The new line
print('Hello \nWorld!')
The \n combination starts a new line, so you will see the following output:
Hello World!
The next example shows the escape sequence \t. As it was said above, \t is used for tabulation. If you put it in the middle of a string, the two parts of the string will be divided by some blank space that is called tabulation. It is quite useful when you work with a text.
# The tabulation
print('Hello\tWorld!') # Hello World!
Another escape sequence that can be useful while you are working with text is \r. The common name for this escape sequence is a carriage return. It moves characters after \r to the beginning of the line, replacing the exact number of old characters. That is, if the length of the string is longer before this escape sequence, then only the required number of characters is rewritten.
# The characters removal
print("Hello, dear \rWorld!") # World! dear
Please note that the string length remains the same!
print(len("Hello, dear \rWorld!")) # 19
Escape sequences are simple to use, aren't they? Let's talk more about the length of strings. For example:
# Comparing the lengths greeting = 'Hello, John' nice_greeting = 'Hello, \nJohn' print(greeting) # Hello, John print(nice_greeting) # Hello, # John print(len(greeting)) # 11 print(len(nice_greeting)) # 12
After calling the len() function, we can see that the length of the string with an escape sequence (in this case \n) is greater.
Be careful while working with strings because the function print() doesn't show escape sequences.
Thus, we have introduced basic escape sequences so that you can work with them in Python strings!
String formatting
There are certain situations when you want to make your strings kind of "dynamic", i.e. make them change depending on the value of a variable or expression. For example, you want to prompt the user for their name and print the greeting to them with the name they entered. But how can you embed a variable into a string? If you do it in the most intuitive way, you'll be disappointed:
a = input()
print('Hello, a')
The output will be:
Hello, a
Luckily, Python offers a large variety of methods to format the output the way you want and we'll concentrate on the main two:
- Formatted string literals
- The
str.format()method
Earlier the % operator was in use. This built-in operator derived from C-language and was used in some situations by following the scheme:
Thus, the variable to the right of % was included in the string to the left of it. If we'd wanted to divide 11 by 3, the / operator would have returned a float number with a lot of decimal places.
print(11 / 3) # 3.6666666666666665
With % character, we could control the number of decimal places, for example, reduce their number to 3 or 2:
print('%.3f' % (11/3)) # 3.667
print('%.2f' % (11/3)) # 3.67
For every operation, you had to know plenty of specifiers, length modifiers and conversion types. A huge variety of extra operators led to some common errors. That's why more modern and easy to use operators were introduced. Progress never stops, you know!
Formatting your strings also makes your code look more readable and easily editable.
§1. The str. format() method
The operation of the method is already described in its name: in the string part, we introduce curly braces as placeholders for variables enlisted in the format part:
print('Mix {}, {} and a {} to make an ideal omelet.'.format('2 eggs', '30 g of milk', 'pinch of salt'))
The expressions are put instead of braces in the order they were mentioned:
Mix 2 eggs, 30 g of milk and a pinch of salt to make an ideal omelet.
You can use the same variable in one string more than once if it's necessary. Furthermore, you can address the objects by referring to their positions in curly braces (as usual, the numbering starts from zero). Attention: the order you choose can be very important. The following two codes:
print('{0} in the {1} by Frank Sinatra'.format('Strangers', 'Night'))
and
print('{1} in the {0} by Frank Sinatra'.format('Strangers', 'Night'))
will have different outputs:
Strangers in the Night by Frank Sinatra Night in the Strangers by Frank Sinatra
The second output sounds really weird, doesn't it?
If you've mentioned more variables than needed in the format part, the extra ones just will be ignored.
We can also use keywords to make such string more readable. Don't forget, that you can easily break the lines! For example:
print('The {film} at {theatre} was {adjective}!'.format(film='Lord of the Rings',
adjective='incredible',
theatre='BFI IMAX'))
Note that you can also mix the order as you want if you use keywords. Here's the formatted string:
The Lord of the Rings at BFI IMAX was incredible!
Also, you can combine both positional and keyword arguments:
print('The {0} was {adjective}!'.format('Lord of the Rings', adjective='incredible'))
# The Lord of the Rings was incredible!
Keep tabs on the order of your arguments, though:
print('The {0} was {adjective}!'.format(adjective='incredible', 'Lord of the Rings'))
# SyntaxError: positional argument follows keyword argument
The last code snippet resulted in SyntaxError, since positional arguments are to be mentioned first.
Remember as a Python rule that keyword arguments are always written after positional, or non-keyword, arguments.
§2. Formatted string literals
Formatted string literals (or, simply, f-strings) are used to embed the values of expressions inside string literals. This way is supposed to be the easiest one: you only need to put f before the string and put the variables you want to embed into the string in curly braces. They are also the newest feature among all string formatting methods in Python.
name = 'Elizabeth II'
title = 'Queen of the United Kingdom and the other Commonwealth realms'
reign = 'the longest-lived and longest-reigning British monarch'
f'{name}, the {title}, is {reign}.'
If you print this short string, you'll see an output that is five times longer than its representation in code:
Elizabeth II, the Queen of the United Kingdom and the other Commonwealth realms, is the longest-lived and longest-reigning British monarch.
You can also use different formatting specifications with f-literals, for example rounding decimals would look like this:
hundred_percent_number = 1823
needed_percent = 16
needed_percent_number = hundred_percent_number * needed_percent / 100
print(f'{needed_percent}% from {hundred_percent_number} is {needed_percent_number}')
# 16% from 1823 is 291.68
print(f'Rounding {needed_percent_number} to 1 decimal place is {needed_percent_number:.1f}')
# Rounding 291.68 to 1 decimal place is 291.7
You can read more about Format Specification Mini-Language in Python in the official documentation.
Maybe, you'll think that these methods are not important and overrated but they give you the opportunity to make your code look fancy and readable.
Basic string methods
As you already know, the string is one of the most important data types in Python. To make working with strings easier, Python has many special built-in string methods. We are about to learn some of them.
An important thing to remember, however, is that the string is an immutable data type! It means that you cannot just change the string in-place, so most string methods return a copy of the string (with several exceptions). To save the changes made to the string for later use you need to create a new variable for the copy that you made or assign the same name to the copy. So, what to do with the output of the methods depends on whether you are going to use the original string or its copy later.
§1. "Changing" a string
The first group of string methods consists of the ones that "change" the string in a specific way, that is they return the copy with some changes made.
The syntax for calling a method is as follows: a string is given first (or the name of a variable that holds a string), then comes a period followed by the method name and parentheses in which arguments are listed.
Here’s a list of common string methods of that kind:
str.replace(old, new[, count])replaces all occurrences of theoldstring with thenewone. Thecountparameter is optional, and if specified, only the firstcountoccurrences are replaced in the given string.str.upper()converts all characters of the string to the upper case.str.lower()converts all characters of the string to the lower case.str.title()converts the first character of each word to upper case.str.swapcase()converts upper case to lower case and vice versa.str.capitalize()changes the first character of the string to the title case and the rest to the lower case.
And here's an example of how these methods are used (note that we don't save the result of every method):
message = "bonjour and welcome to Paris!"
print(message.upper()) # BONJOUR AND WELCOME TO PARIS!
# `message` is not changed
print(message) # bonjour and welcome to Paris!
title_message = message.title()
# `title_message` contains a new string with all words capitalized
print(title_message) # Bonjour And Welcome To Paris!
print(message.replace("Paris", "Lyon")) # bonjour and welcome to Lyon!
replaced_message = message.replace("o", "!", 2)
print(replaced_message) # b!nj!ur and welcome to Paris!
# again, the source string is unchanged, only its copy is modified
print(message) # bonjour and welcome to Paris!
§2. "Editing" a string
Often, when you read a string from somewhere (a file or the input) you need to edit it so that it contains only the information you need. For instance, the input string can have a lot of unnecessary whitespaces or some trailing combinations of characters. The "editing" methods that can help with that are strip(), rstrip() and lstrip().
str.lstrip([chars])removes the leading characters (i.e. characters from the left side). If the argumentcharsisn’t specified, leading whitespaces are removed.str.rstrip([chars])removes the trailing characters (i.e. characters from the right side). The default for the argumentcharsis also whitespace.str.strip([chars])removes both the leading and the trailing characters. The default is whitespace.
The chars argument, when specified, is a string of characters that are meant to be removed from the very end or beginning of the word (depending on the method you're using). See how it works:
whitespace_string = " hey "
normal_string = "incomprehensibilities"
# delete spaces from the left side
whitespace_string.lstrip() # "hey "
# delete all "i" and "s" from the left side
normal_string.lstrip("is") # "ncomprehensibilities"
# delete spaces from the right side
whitespace_string.rstrip() # " hey"
# delete all "i" and "s" from the right side
normal_string.rstrip("is") # "incomprehensibilitie"
# no spaces from both sides
whitespace_string.strip() # "hey"
# delete all trailing "i" and "s" from both sides
normal_string.strip("is") # "ncomprehensibilitie"
Keep in mind that the methods strip(), lstrip() and rstrip() get rid of all possible combinations of specified characters:
word = "Mississippi"
# starting from the right side, all "i", "p", and "s" are removed:
print(word.rstrip("ips")) # "M"
# the word starts with "M" rather than "i", "p", or "s", so no chars are removed from the left side:
print(word.lstrip("ips")) # "Mississippi"
# "M", "i", "p", and "s" are removed from both sides, so nothing is left:
print(word.strip("Mips")) # ""
Use them carefully, or you may end up with an empty string.
§3. Conclusions
Thus, we have considered the main methods for strings. Here is a brief recap:
- While working with string, you have to remember that strings are immutable, thus all the methods that "change" them only return the copy of a string with necessary changes.
- If you want to save the result of the method call for later use, you need to assign this result to a variable (either the same or the one with a different name).
- If you want to use this result only once, for example, in comparisons or just to print the formatted string, you are free to use the result on spot, as we did within
print().
Split and join
In Python, strings and lists are quite similar. Firstly, they both pertain to sequences, although strings are limited to characters while lists can store data of different types. In addition, you can iterate both over strings and lists. However, sometimes you need to turn a string into a list or vice versa. Python has this kind of tools. The methods that will help you to accomplish this task are split(), join() and splitlines().
§1. Split a string
The split() method divides a string into substrings by a separator. If the separator isn't given, whitespace is used as a default. The method returns a list of all the substrings and, notably, the separator itself is not included in any of the substrings.
# split example
definition = input() # 'Coin of the realm is the legal money of the country'
definition.split()
# ['Coin', 'of', 'the', 'realm', 'is', 'the', 'legal', 'money', 'of', 'the', 'country']
definition.split("legal")
# ['Coin of the realm is the ', ' money of the country']
You can also specify how many times the split is going to be done with the maxsplit argument that comes after the separator. The number of elements in the resulting list will be equal to maxsplit + 1.
If the argument isn't specified, all possible splits are made.
# maxsplit example
definition = input() # 'Coin of the realm is the legal money of the country'
definition.split("of", 1)
# ['Coin ', ' the realm is the legal money of the country']
definition.split("of")
# ['Coin ', ' the realm is the legal money ', ' the country']
If the separator doesn't occur in the string, then the result of the method is a list with the original string as its only element:
definition = input() # 'Coin of the realm is the legal money of the country'
definition.split("hi!") # wrong separator
# ['Coin of the realm is the legal money of the country']
Thus, in all cases split() allows us to convert a string into a list.
It may also be useful to read input directly into several variables with split():
name, surname = input().split() # Forrest Gump print(name) # Forrest print(surname) # Gump
It's pretty efficient when you know the exact number of input values. In case you don't, it's likely to result in ValueError with a message telling you either that there are too many values to unpack or not enough of them. So keep that in mind!
§2. Join a list
The join() method is used to create a string out of a collection of strings. However, its use has a number of limitations. First, the argument of the method must be an iterable object with strings as its elements. And second, the method must be applied to a separator: a string that will separate the elements in a resulting string object. See below the examples of that:
word_list = ["dog", "cat", "rabbit", "parrot"] " ".join(word_list) # "dog cat rabbit parrot" "".join(word_list) # "dogcatrabbitparrot" "_".join(word_list) # "dog_cat_rabbit_parrot" " and ".join(word_list) # "dog and cat and rabbit and parrot"
Note that this method only works if the elements in the iterable object are strings. If, for example, you want to create a string of integers, it will not work. In this case, you need to convert the integers into strings explicitly or just work with strings right from the outset.
int_list = [1, 2, 3] " ".join(int_list) # TypeError! str_list = ["1", "2", "3"] " ".join(str_list) # "1 2 3"
§3. Split multiple lines
The splitlines() method is similar to split(), but it is used specifically to split the string by the line boundaries. There are many escape sequences that signify the end of the line, but the split() method can only take one separator. So this is where the splitlines() method comes in handy:
# splitlines example long_text = 'first line\nsecond line\rthird line\r\nfourth line' long_text.splitlines() # ['first line', 'second line', 'third line', 'fourth line']
The method has an optional argument keepends that has a True or False value. If keepends = True linebreaks are included in the resulting list:
# keepends long_text = 'first line\nsecond line\rthird line\r\nfourth line' long_text.splitlines(keepends=True) # ['first line\n', 'second line\r', 'third line\r\n', 'fourth line']
You can also use several string methods at once. It is called chaining, and it works because most of the string methods return a copy of the original string:
# chaining example sent = input() # "Mary had a little lamb" new_sent = sent.lower().split() # ["mary", "had", "a", "little", "lamb"]
But do not get carried away, because the length of a line should be no more than 79 characters, and we definitely do not want to break PEP 8!
§4. Conclusion
We have learned how to convert strings to lists via the split() and splitlines() methods, and how to get strings back from lists via the join() method. As a recap, consider the following:
- Splitting and joining methods do not change the original string.
- If you need to use the "changed" string several times, you need to assign the result of the respective method to a variable.
- If you need to use this result only once, you can work with it on spot, for example,
print()it. - There are a lot of parameters in string methods. You can check the documentation if you need to fine-tune your program.
Search in a string
One of the essential skills when working with data is to be able to search it and locate specific bits of information. Working with textual data in Python, you may need to get some information about its content: whether it includes a specific substring (i.e. part of the string), where this substring is, or how many times it occurs in the text. In this topic, we will learn how to do it.
§1. Membership testing
We'll start with the first question: how can we define if there's a specific pattern in our string? One way to do it is called membership testing, and it is implemented with the help of the operators in and not in. When we write pattern in string, the left operand should be a string, and membership test checks if string contains pattern as a substring.
If membership test returns True, this means that there exists a position in string starting from which you can read the pattern in the string.
print("apple" in "pineapple") # True
print("milk" in "yogurt") # False
Interestingly, an empty string is considered to be a substring of any string.
print('' in '') # True
print('' not in "lemon") # False
§2. Boolean search in a string
Apart from knowing that a substring just occurs in the string, we can determine that the string starts or ends with a specific pattern. Methods startswith() and endswith() return True if the pattern is found and False otherwise.
email = "email_address@something.com"
print(email.startswith("www.")) # False
print(email.endswith("@something.com")) # True
Optional values for start and end that bound the search area can be added: string.startswith(pattern, start, end). When we specify only one additional element, it's automatically considered as start.
email = "my_email@something.com"
print(email.startswith("email", 2)) # False
print(email.startswith("email", 3)) # True
In the example above, when we specified the start argument as 2, we limited the search to the substring "_email@something.com", which actually doesn't start with "email". Then we fixed this off-by-one mistake by setting start to 3.
Note that the substring bound by the start and end indexes does include the character with the start index but does not include the element with the end index.
email = "my_email@something.com"
print(email.endswith("@", 5, 8)) # False
print(email.endswith("@", 5, 9)) # True
The substring defined for the search in the first case is "ail", while in the second one it's "ail@".
§3. Element position
Now, as we know how to check if a string contains a substring, starts or ends with it, let's learn how to define the exact position of the substring. We can use the methods find() or index() to do so:
best = "friend"
print(best.find("i")) # 2
print(best.index("i")) # 2
They work absolutely the same except that the former returns -1 if it can't find the given element, while the latter raises ValueError.
print(best.find("u")) # -1
print(best.index("u")) # ValueError
So, all the examples with find() below will work with index() as well.
We can search both for single characters and for longer substrings. In the latter case, the index of the first character of the substring is returned.
print(best.find("end")) # 3
In the string friend, the substring end occupies positions from 3 to 5, and the start index is returned. Keep in mind that both methods return only the index of the first occurrence of the element we search for.
magic = "abracadabra"
print(magic.find("ra")) # 2
However, we can additionally specify an interval for searching, just as with the boolean search: string.find(pattern, start, end).
print(magic.find("ra", 5)) # 9
print(magic.find("ra", 5, 10)) # -1
Once again, the end index is not included in the search area.
Alternatively, we can use methods rfind() and rindex() to search backward from the end of the string.
print(magic.rfind("ra")) # 9
print(magic.rindex("a")) # 10
§4. Element number
Finally, it's often useful to count how many times an element (a char or a substring) occurs in the string, and for this, we can use the method count().
magic = "abracadabra"
print(magic.count("abra")) # 2
print(magic.count("a")) # 5
§5. Conclusions
In this topic, we have examined different aspects of searching through a string and learned how to locate specific patterns. Now you will be able:
- To test for membership in a text,
- To check that the string starts or ends with a specific pattern,
- To find the exact position of a substring,
- To count how many times a pattern occurs in the text.
This knowledge will be helpful in real-world tasks, so let's practice it!