Input and Output
To serve their purpose, many programs need to interact with their environment. Two important parts of the environment are the user and the file system.
User Interaction
Whenever a program is executed, there exist three data streams that are constantly ready to process data:
the standard input stream, the standard output stream, and the standard error stream.
When the program is executed in the command line, all three streams are normally associated with the active command line window.
In this case, outputs (e.g., the side effect of print('An output')
) are shown in this window, and inputs are received via the window.
In Jupyter Notebooks, the standard data streams are connected to the cell in which the code is executed.
The standard data streams are specified in the
sys
module asstdin
,stdout
, andstderr
. E.g., if we assign a different stream object tosys.stdout
(such as an opened file), future outputs are passed to that stream and no longer printed directly to our console (or below our active Jupyter Notebooks cell).
# Input
value = input('Please enter a value: ') # Value is treated as a String
value = float(input('Please enter a value: ')) # Value is casted to float
# Evaluation of an input such as ['OneValue', 'AnotherValue']
# Caution: Allows input of potentially malicious code!
value = eval(input('Please enter a list of values: ')) # Input [1, 2] is evaluated
# to a list object [1, 2]
# Output
print('One') # One
print('One', 'Two') # One Two
print('One', 'Two', sep=' | ') # One | Two
Working with the File System
Character Encodings
Internally, the content of a file is nothing but a sequence of memory cells, so basically a chain of zeroes and ones (called bits). Thanks to character encodings, we can work with comprehensible text instead. A character encoding specifies, inter alia, which characters are available (charset), how many bits represent one character, and how a sequence of bits is mapped to a certain character. The transformation of text into sequences of bits is called encoding, the transformation of bit sequences into text is called decoding. For a long time, the 7 bit character encoding ASCII (American Standard Code for Information Interchange) with a charset of 128 characters was sufficiently expressive. Today, encodings allowing for many special characters are needed to support internationalization. Therefore, encodings like UTF-8 for Unicode or ISO 8859-1 for Latin 1 are frequently used. By enabling us to work with human-readable characters, character encodings offer an abstraction from the internal data representation of a computer.
To interpret a file correctly, a program must know the file encoding, which can be specified when the file (i.e., the sequence of bits) is opened (e.g., f = open(filename, encoding='utf-8')
).
If we don't specify the encoding explicitly, the (system-dependent) standard encoding is used.
Python 3 assumes that the source code of Python programs has been encoded in UTF-8, and you might want to make sure that your text editor is configured accordingly.
If the encoding of a file is unknown, finding the 'correct' encoding is hard for a computer since the machine cannot know which decoded content it should find in the file.
Therefore, it is a good habit to work with one standard encoding and specify that encoding explicitly (for most use cases, the best encoding choice is UTF-8).
In Python 3, Strings are principally sequences of Unicode characters.
Therefore, special characters can be used without problems (e.g., value = 'äöüÄÖÜßéèê'
).
Some combinations of characters have a special meaning, e.g., \n
signals a line break.
Thus, to include a literal backslash (\
) in a String, we need to write two backslashes (\\
).
This can be avoided by prepending r
(for raw String) to the String in question (e.g., r'C:\TestDirectory\n'
).
Using the function eval('Some String')
, Strings can be interpreted as program code.
If we prepend an f
(for formatted String) to a String, those parts of the String that are included in curly braces will be evaluated as expressions before the result is included in the String (e.g., f'2 + 8 is {2 + 8}.'
).
Apart from Strings (which are character sequences), Python also knows byte sequences, i.e., sequences of values between 0 and 255.
A byte consists of eight bits, which means that there are possible values. These values are usually represented as hexadecimal numbers, which is indicated by a
\x
before the number (e.g.,\xAB\x0F
).
Byte sequences are specified like Strings - but we prepend a b
(for byte sequence) instead of an r
or an f
, e.g., b'\x00\x00'
.
We can also (try to) decode byte sequences to Strings (e.g., b'\x00\x00'.decode('utf-8')
).
Opening and Closing Files
Strings in Python consist of Unicode characters. Files, in contrast, are sequences of bits. Thus, Python needs a way to convert bits to Unicode characters. If we specify the correct encoding, Python can do the decoding for us.
Files without text (e.g., image files or applications) cannot be decoded to Unicode characters.
Python can open these files in binary mode if we pass rb
for read binary - or wb
/ab
for write binary/append binary - as an argument to the open
function.
f = open('file.txt', 'r', encoding='utf-8')
# Opens the file file.txt, returning a stream object.
# Instead of a file name, we can also pass an (absolute or relative) file path,
# e.g. folder\file.txt oder C:/folder/file.txt .
# With 'r' as the second argument, the file is opened in read mode
# (the argument need not be given, default of the encoding parameter: r).
# The encoding parameter specifies that UTF-8 should be used for decoding.
f.close() # Closes the file (important!).
Instead of closing files manually, we will normally use a with open
-construct.
This ensures that the file is closed even when our program terminates unexpectedly.
with open('datei.txt', 'r', encoding='utf-8') as f:
content = f.read() # Saves the content of the file in the variable content;
# the file is automatically closed at the end
# of the code block (signaled by the end of the indentation).
Reading Files
try:
with open('file.txt', 'r', encoding='latin-1') as f:
textAsList = f.readlines()
# returns a list of lines, where the lines are Strings
textAsString = f.read()
# returns the content of the file as a String
textAsString2 = f.read()
# now returns an empty String because the end of the file has been reached
f.seek(0)
# goes to byte position 0 (i.e., to the beginning of the file)
f.read(10)
# reads 10 characters, starting from the current position
with open('file.txt', 'r', encoding='cp1252') as f:
for line in f: # the stream object is an iterator over lines
print(line)
with open('file.txt', 'r') as f:
print(f.readline()) # returns the first line
print(f.readline()) # returns the second line
except FileNotFoundError:
print('File not found.')
The keywords
try
andexcept
are used to handle errors that the programmer can foresee - in this case, the situation that the file to be opened cannot be found. Read more on the programmatic treatment of exceptions here.
Writing Files
with open('file.txt', 'w', encoding='utf-8') as f:
# Returns a stream object.
# File is created automatically if it does not exist.
# Mode 'w' (write) opens the file for writing and overwrites any previous content.
# Mode 'a' (append) adds data at the end of the file.
f.write('Test text') # Var. 1: writing as a method of the object f
print('Test text', file=f) # Var. 2: print function
# If we pass a stream object as the file parameter,
# the output is written to the specified file.