seminars.fb

Seminar (Fri May 6): “Mastering the Basics of Python” (developing mastery of basic Python syntax and functionality)


Title	“Mastering the Basics of Python”
Topic	developing mastery of basic Python syntax and functionality
Date	Fri May 6
Time	10am~11am PST
Keywords	Python, the built-in data types, the built-in functions, the standard library, advanced syntax

Audience

These sessions are designed for a broad audience of non-software engineers and software programmers of all backgrounds and skill-levels.

Our expected audience should comprise attendees with a…

basic familiarity with Python for basic scripting and analysis tasks.

During this session, we will endeavor to guide our audience to developing…

strong proficiency with the core syntax of Python.
strong proficiency with the core functionality of Python, including the use of the standard library and the builtins (types and functions.)

Abstract

In previous seminars, we have used all manner of Python syntax and core functionality to demonstrate points about data analysis and software development. We have not called attention to some of the precise choices made in our code samples, preferring instead to discuss the use-case or theoretical topic at hand.

In this seminar, we will dive into some of the exacting, precise choices that we regularly make when writing even very simple pieces of Python code. While many of the topics we will discuss could be classified as “introductory” Python, we will approach them from the perspective of someone who has already written a good deal of code in Python, someone who is looking to revisit and solidify decisions they may subconsciously make every day in their code.

Sample Agenda:

the builtin types:
- what is the difference between a list and a tuple (and why is mutability/immutability the least interesting aspect)
- what is a dict, what is a collections.defaultdict, and what is a collections.Counter, and what is a pandas.Series; how are they similar, how do they differ, and how do they solve subtly different problems?
- what is a numpy.ndarray and how does it conceptually differ from a list?
- what are the set and frozenset types?
- what does it mean for something to be hashable? (why does this not require immutability but is commonly associated with it?)
the builtin functions:
- what is the key= argument, how do we use it, why is it preferrably to the Decorate-Sort-Undecorate/“Schwarzian Transform” formulation? (and what is the model for doing sorts, mins, and maxes in pandas/NumPy with a custom predicate?)
- what is the id function, and how can we use it to better understand whether we are working with a view or a copy? (what is the difference between early/late-binding? what is the difference between live/snapshot views? how do we understand these questions in the context of numpy or pandas?)
- what is the iter and next function? what is the difference between an iterator and an iterable? why does this matter, and how can we use this knowledge effectively?
- what are the map and filter functions? what is the itertools module? when might we use these instead of comprehension syntax?
sytnax:
- what is lambda syntax, why is it useful, and what does it tell people who are reading our code?
- what are keyword-only and positional-only arguments, where are they useful, and how to we use them to improve long-term maintainability of our code?
- what is the difference between from module import f, import module, import module as mod, and from module import f as func?

What’s Next?

Did you enjoy this seminar? Did you learn something new that will help you as you as you write larger Python scripts and analyses and write libraries to empower your colleagues’ work.

In a future seminar, we can go deeper into new syntax added to Python ≥3.6, and new approaches to writing Python that have evolved in the past five years.

We can discuss…

key recent additions to the Python standard library that reveal a new, contemporary way of structuring Python programmes (e.g., eschewing “value intermediation” for an “objects-first”/“GC-first” approach.)
recent additions to the Python core language and standary library enabling static verification via type hints (PEP=484) and external tooling such as MyPy.
the long history of functional programming approaches in Python, and how they fit into the contemporary structure of the language, its tools, and recent guidance from language core developers.

Notes

Question: Why does `dict` raise `KeyError`?

Why does dict raise KeyError?

print("Let's take a look!")

The Basics

A dict represents a one-way mapping; it’s a way to relate to data-sets in a one-to-many relationship.

hosts = {
    'abc.corp.net': ...,
    'def.corp.net': ...,
    'xyz.corp.net': ...,
}

The dict facilitates fast lookup, fast membership checking.

from IPython import get_ipython; run_line_magic = get_ipython().run_line_magic
from random import randrange

print(' list '.center(80, '\N{box drawings light horizontal}'))
for sz in [100, 1_000, 10_000, 100_000, 1_000_000]:
    data = [randrange(100) for _ in range(sz)]
    run_line_magic('time', '-1 in data')

print(' dict '.center(80, '\N{box drawings light horizontal}'))
for sz in [100, 1_000, 10_000, 100_000, 1_000_000]:
    data = {randrange(100) for _ in range(sz)}
    run_line_magic('time', '-1 in data')

You use d[k] syntax to retrieve (“getitem”) and set (“setitem”) entries.

en_fr = {
    'one':   'un',
    'two':   'deux',
    'three': 'trois',
}
en_fr['four'] = 'quatre'

en_word = 'five'
fr_word = en_fr.get(en_word, '')

print(f'To say {en_word!r} in French, you say {fr_word}')
print(f'To say {en_word!r} loudly in French, you say {fr_word.upper()}!')

But! if you look up an entry that doesn’t exist, you get a KeyError!

Beyond the Basics

The dict has a .get method which allows you to supply a default if the entry is not found.

devices = {
    'storage':  4,
    'compute':  3,
}

# print(f"{devices['ai/ml']     = }")
print(f"{devices.get('ai/ml', 0) = }")

There are subtypes of the dict class in the collections module—e.g., collections.defaultdict

from collections import defaultdict

devices = defaultdict(int, {
    'storage':  4,
    'compute':  3,
})

print(f"{devices['ai/ml'] = }")
print(f"{devices['networking'] = }")

A collections.Counter is a subtype of dict with special behaviour and methods for counting things (i.e., mapping entities to non-zero integer values—“counts.”)

from collections import Counter

devices = Counter({
    'storage':  4,
    'compute':  3,
})

print(f"{devices['ai/ml'] = }")
print(f"{devices.most_common(1) = }")

Conclusions & Consequences: Why `KeyError`?

Don’t handle errors unless you can meaningfully do something about them.
Develop familiarity with dict-based datatypes like collections.defaultdict or collections.Counter.

en_fr = {
    'one':   'un',
    'two':   'deux',
    'three': 'trois',
}
en_fr['four'] = 'quatre'

en_word = 'two'
fr_word = en_fr[en_word]

print(f'To say {en_word!r} in French, you say {fr_word}')
print(f'To say {en_word!r} loudly in French, you say {fr_word.upper()}!')

devices = {
    'storage':  4,
    'compute':  3,
}

print(f"{devices['ai/ml']     = }")

class mydict(dict):
    def __missing__(self, key):
        return ...

Question: What is `tuple` good for?

What is tuple good for?

print("Let's take a look!")

The Basics

The list type represents a collection of items. We often loop over these items with a for loop.

hosts = [
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
]

for h in hosts:
    print(f'{h = }')

It has a “human ordering.”

hosts = [
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
]

for h in sorted(hosts, reverse=True):
    print(f'{h = }')

It can be mutated—i.e., changed in place—using xs[idx] syntax or methods like xs.append:

hosts = [
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
]
hosts.append('ghi.corp.net')
hosts.insert(0, 'jkl.corp.net')
hosts[0] = hosts[0].replace('jkl.', 'klm.')

for h in sorted(hosts, reverse=True):
    print(f'{h = }')

But the tuple type also exists, seems to operate very similarly to the list, except it’s immutable. Why is this even useful?

hosts = (
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
)
hosts[0] = hosts[0].replace('abc.', 'bca.') # TypeError!

Beyond the Basics

In addition to looping syntax, We have unpacking syntax in Python, which works with both tuple and list, except it requires an exact match for unpacking to work.

t = 1, 2
a, b, c = t

print(f'{a = }')
print(f'{b = }')
print(f'{c = }')

In addition to list and tuple, we also have set, which represents a mathematical set (unique elements with typical set operations.) The set type is mutable but does not have a “human ordering.”

all_hosts = {
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
}

active_hosts = {
    'xyz.corp.net',
}

new_hosts = {
    'ghi.corp.net'
}

print(f'{all_hosts - active_hosts = }')
print(f'{all_hosts | new_hosts    = }')

The contents of a dict are typically “homogeneous.”

d = {
    'a': 1,
    'b': 2,
    'c': 3,
}

...
...
...
...

k = 'c'
v = d[k]
v + 1 # XXX: how do I know this will work?

The contents of a list are typically “homogeneous” as well…

xs = [1, 2, 3, 4.0, 5+3j]

...
...
...
xs.append(6)
xs.clear()
...
...

for x in xs:
    print(f'{x + 1 = }')

But the contents of a tuple are typically “heterogeneous”… and I typically use it with unpacking syntax.

from pandas import to_datetime

t = 'abc.corp.net', 16, to_datetime('2020-01-01')
...
...
...
...
...
...
host, ports, installed = t

A tuple is a record (e.g., a row in a database) and a list is a collection (e.g., a table in a database.)

from pandas import to_datetime

devices = [
    ('abc.corp.net', 16, to_datetime('2020-01-01')),
    ('def.corp.net', 32, to_datetime('2020-02-06')),
    ('xyz.corp.net', 16, to_datetime('2020-01-08')),
]

for host, ports, installed in devices:
    print(f'{host} was installed {to_datetime("2020-12-31") - installed} ago')

Conclusions & Consequences: Why `tuple`?

Choose the appropriate type for “modelling” your data—expressing what this data means.
- list: collection of similar entities
- dict: mapping from (similar) unique entities to similar entities
- tuple: one entity with multiple fields
- set: grouping of unique entities
Consider the behaviour of these types as consequences and guidances for their use.
- mutable vs immutable
- human ordered vs machine ordered
  - insertion ordered (e.g., dict)
  - ordering considered for equality or not (e.g., dict vs OrderedDict)
- hashable vs not hashable

from pandas import to_datetime

devices = [
    ('abc.corp.net', 16, to_datetime('2020-01-01'), 'cisco'),
    ('def.corp.net', 32, to_datetime('2020-02-06'), 'cisco'),
    ('xyz.corp.net', 16, to_datetime('2020-01-08'), 'infinera'),
]

vendors_by_ports = {}
for host, ports, installed, vendor in devices:
    if ports not in vendors_by_ports:
        vendors_by_ports[ports] = set()
    vendors_by_ports[ports].add(vendor)

for ports, vendors in vendors_by_ports.items():
    print(f'Ports: {ports} Vendors: {", ".join(vendors)}')

Question: What are extended unpacking syntax and the additional unpacking generalisations?

What are extended unpacking syntax and the additional unpacking generalisations?

Additional unpacking syntax was added in Python 3.0:

PEP 3132: Extended Iterable Unpacking

Additional unpacking generalisations were added in Python 3.5:

PEP 448: Additional Unpacking Generalizations

print("Let's take a look!")

The Basics

Python has unpacking syntax for “destructuring” elements in an Iterable—i.e., decomposing a collection into individual elements, binding each element to a separate variable name.

t = 1, 2, 3
a, b, c = t

There are some idioms associated with this syntax, like swapping two values without a temporary…

x = 1
y = 20

print(' Before '.center(20, '\N{box drawings light horizontal}'))
print(f'{x = }')
print(f'{y = }')

x, y = y, x

print(' After '.center(20, '\N{box drawings light horizontal}'))
print(f'{x = }')
print(f'{y = }')

… or performing multiple variable assignments on one line.

x, y = 123, 456
print(f'{x = }')
print(f'{y = }')

Unpacking requires that you have exactly the same number of elements in the Iterable as variables you specify.

t = 1, 2, 3
a, b, c = t

Beyond the Basics

A * in unpacking syntax packs any additional items into a list.

t = 1, 2, 3
a, b, c, *rest = t
print(f'{a    = }')
print(f'{b    = }')
print(f'{c    = }')
print(f'{rest = }')

A * in a list literal unpacks elements from an Iterable into a list:

xs = [1, 2, 3]
ys = [4, 5, 6, 7]

ws = [xs, ys]
zs = [*xs, *ys]

print(f'{ws = }')
print(f'{zs = }')

A * in a set or tuple literal does similar, but unpacks into a set or tuple respectively:

xs = [1, 2, 3, 4]
ys = [4, 5, 6, 7]

ws = *xs, *ys
zs = {*xs, *ys}

print(f'{ws = }')
print(f'{zs = }')

A ** in a dict does similar, but unpacks elements from a Mapping into a dict, performing a merge.

d1 = {'a': 1, 'b':  2, 'c':  3         }
d2 = {        'b': 20, 'c': 30, 'd': 40}

d3 = {**d1, **d2}

print(f'{d3 = }')

We have a number of different ways to do merges in Python.

We can use a collections.ChainMap…

from collections import ChainMap

d1 = {'a': 1, 'b':  2, 'c':  3         }
d2 = {        'b': 20, 'c': 30, 'd': 40}

d3 = ChainMap(d2, d1)
for k, v in d3.items():
    print(f'{k = }: {v = }')
del d2['b']
del d2['c']
for k, v in d3.items():
    print(f'{k = }: {v = }')
print(f'{d3 = }')

We can use itertools.chain with .items()…

from itertools import chain

d1 = {'a': 1, 'b':  2, 'c':  3         }
d2 = {        'b': 20, 'c': 30, 'd': 40}

d3 = dict(chain(d1.items(), d2.items()))

d3 = d1.copy()
d3.update(d2)

print(f'{d3 = }')

We can use the ** unpacking syntax or (in Python ≥3.9) we can use the | operator.

d1 = {'a': 1, 'b':  2, 'c':  3         }
d2 = {        'b': 20, 'c': 30, 'd': 40}

d3 = {**d1, **d2}
d4 = d1 | d2

print(f'{d3 = }')
print(f'{d4 = }')

Conclusions & Consequences: What are extended unpacking syntax and the additional unpacking generalisations?

Carefully consider syntax in terms of human readability and expressivity.
Carefully consider syntax in terms of precision and eliminating amibiguity.

entries = [123, ..., ..., ..., ..., ..., 456]

if len(entries) < 2:
    raise ValueError('...')

head = entries[0]
tail = entries[-1]

diff = head - tail

entries = [..., ..., ..., ..., ...]
head, *_, tail = entries

def process(in_use, in_maintenance):
    for dev in in_use:
        ...
    for dev in in_maintenance:
        ...

def process(in_use, in_maintenance):
    for dev in in_use + in_maintenance: # possible TypeError!
        ...

process([..., ...], [..., ...])
process([..., ...], {..., ...})

from itertools import chain

def process(in_use, in_maintenance):
    for dev in chain(in_use, in_maintenance):
        ...
    list(chain(in_use, in_maintenance))

def process(in_use, in_maintenance):
    return [*in_use, *in_maintenance]

def process(in_use, in_maintenance):
    return {*in_use, *in_maintenance}

Question: Why even comprehension syntax?

Why does comprehension syntax exist?

Comprehension syntax was added in Python 2.0 with:

PEP 202: List Comprehensions

It was further extened to dict and set in Python 2.7 and 3.0 with:

PEP 274: Dict Comprehensions

It was even further extended with async syntax in Python 3.6 with:

PEP 530: Asynchronous Comprehensions

print("Let's take a look!")

The Basics

In Python, we have a for-loop which we typically use as a “for-each” loop.

hosts = [
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
]

for h in hosts:
    print(f'{h = }')

We are discouraged from using it as a C-style for-loop:

hosts = [
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
]

for idx in range(len(hosts)):
    print(f'{hosts[idx] = }')

We have iteration helpers to allow us to use this a “for-each” loop in many situations (and we are encouraged to write our own iteration helpers.)

hosts = [
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
]
datacenters = [
    'ghi1',
    'ghi2',
    'klm1',
]


# for idx in range(len(hosts)):
#     h, d = hosts[idx], datacenters[idx]
#     print(f'{h} in {d}')

for h, d in zip(hosts, datacenters, strict=True):
    print(f'{h} in {d}')

We also have comprehension syntax, but it is more limited than for-loop syntax:

hosts = [
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
]

names = []
domains = set()
for h in hosts:
    n, d = h.split('.', 1)
    names.append(n)
    domains.add(d)

print(f'{names = }')
print(f'{domains = }')

… which can be rewritten as…

hosts = [
    'abc.corp.net',
    'xyz.corp.net',
    'def.corp.net',
]

names = [h.split('.', 1)[0] for h in hosts]
domains = {h.split('.', 1)[-1] for h in hosts}

print(f'{names = }')
print(f'{domains = }')

Beyond the Basics

We have a list, set, and dict comprehension (but no tuple comprehension.)

xs = [-3, -2, -1, 0, 1, 2, 3]

all_squares  = [x**2 for x in xs]
uniq_squares = {x**2 for x in xs}
squared_xs   = {x: x**2 for x in xs}

print(f'{all_squares  = }')
print(f'{uniq_squares = }')
print(f'{squared_xs   = }')

However, we cannot do the following with comprehension syntax…

from collections import defaultdict

xs = [-3, -2, -1, 0, 1, 2, 3]

squared_xs = defaultdict(set)
for x in xs:
    squared_xs[x**2].add(x)

print(f'{squared_xs   = }')

Comprehensions can have filters and multiple levels, but…

xss = [[-3, -2, -1], [0], [1, 2, 3]]

ys = []
for xs in xss:
    for x in xs:
        ys.append(x**2)

zs = [x**2 for xs in xss for x in xs]

print(f'{ys = }')
print(f'{zs = }')

xss = [[-3, -2, -1], [0], [1, 2, 3]]

ys = []
for xs in xss:
    if len(xs) > 1:
        for x in xs:
            if x % 2 == 0:
                ys.append(x**2)

zs = [x**2 for xs in xss if len(xs) > 1 for x in xs if x % 2 == 0]

print(f'{ys = }')
print(f'{zs = }')

… comprehensions cannot:

include any “statements”—they can perform only one single computation (one “expression”)
refer to themselves as they are being constructed (they are not stateful.)

Conclusions & Consequences: Why does comprehension syntax exist?

Choose more restricted approaches in order to make your code more readable.
Prefer syntax for what it means not for terseness.
- comprehension syntax means “create some new data via a simple mapping & filtering process”

xs = [1, 2, 3, 4]

# good!
for x in xs:
    print(f'{x = }')

# misleading! — why did you create a list you didn't care about?
[print(f'{x = }') for x in xs]

# can you “skim” this?
for x in xs:
    for y in x:
        if ...:
            continue
        ...
    if ...:
        break
        ...

...
...
...
...
xs = [... for ... in ... if ...]
...
...
...
ys = [f(x) for x in xs if cond(x)]
...
...
...

Question: Why do I need context managers?

Why do I need context managers?

Context managers were added to Python 2.5 with:

PEP 343: The “with” Statement

print("Let's take a look!")

The Basics

We often want to work with some resource, like a file or a database connection. This resource requires some set-up and tear-down.

f = open(__file__)
...
...
...
0 / 0
...
...
f.close()

We cannot rely on ourselves to do the tear-down manually, because we could forget or because an error could occur which would cause our tear-down code not to run.

f = open(__file__)
try:
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    ...
    ...
finally:
    f.close()

We cannot rely on the garbage collector to do this work for us, because the lifetime of our resources isn’t guaranteed to be tightly scoped to a block.

Therefore, we need special syntax to ensure we can sequence two operations:

if a “before” action run, then an “after” action is guaranteed to run
in-between these two actions, we do our work

with open(__file__) as f:
    print(f'{not f.closed = }')
    ...

print(f'{not f.closed = }')

Beyond The Basics

This pattern is so general that we want to extend it to a variety of situations.

For example, database connections…

from sqlite3 import connect

with connect(':memory:') as conn:
    pass

… temporary directories…

from tempfile import TemporaryDirectory

with TemporaryDirectory() as d:
    pass

… or even configuration and settings.

from decimal import localcontext

with localcontext() as ctx:
    ctx.prec = 20
    ...

As with any “extension” that implements the Python vocabulary, we use special methods __enter__ and __exit__.

class T:
    def __enter__(self):
        print('before')
    def __exit__(self, exc_type, exc_value, traceback):
        print('after')

with T():
    print('inside')

Context managers are a sequencing mechanisms, but so are generators.

def g():
    print('before')
    yield
    print('after')

gi = g()
next(gi)
print('inside')
next(gi, None)

We can adapt one mechanism to the other using the decorator contextlib.contextmanager.

from contextlib import contextmanager

@contextmanager
def g():
    print('before')
    yield
    print('after')

with g():
    print('inside')

Conclusions & Consequences: Why do I need context managers?

Context managers are a way to manage any pairing of before/after actions.
- This often occurs with resource management (activate/deactivate, open/close, &c.)
Context managers are very easy to write using the contextlib.contextmanager decorator: write them often!
Context managers show the nesting of operations very clearly.

with test_db() as db:
    with test_data(db): # baseline
        ...
        with test_data(db): # scenario #1
            ...
    with test_data(db): # alternate baseline
        ...
    with test_data(db): # alternate baseline
        ...

from contextlib import contextmanager
from sqlite3 import connect
from tempfile import TemporaryDirectory
from pathlib import Path
from random import choice, randrange
from string import ascii_lowercase

@contextmanager
def test_db():
    create = '''
        create table test (
            name text
          , value number
        );
    '''
    drop = 'drop table test'
    with TemporaryDirectory() as d:
        d = Path(d)
        with connect(d / 'test.db') as db:
            try:
                db.execute(create)
                yield db
            finally:
                db.execute(drop)

@contextmanager
def test_data(db):
    data = [
        (''.join(choice(ascii_lowercase) for _ in range(2)), randrange(100))
        for _ in range(10)
    ]
    try:
        db.executemany('insert into test values (?, ?)', data)
        yield
    finally:
        db.executemany('delete from test where name=? and value=?', data)

with test_db() as db:
    with test_data(db):
        ...
        with test_data(db):
            cur = db.execute('select name, sum(value) from test group by name limit 3')
            for row in cur: print(f'{row = }')
        with test_data(db):
            ...
    with test_data(db):
        ...

Question: Why do I need `asyncio`?

Why do I need asyncio?

Special syntax for asyncio was added to Python 3.5 with:

PEP 492 – Coroutines with async and await syntax

print("Let's take a look!")

The Basics

If I want to work concurrently, I have the following choices:

threading
multiprocessing
asyncio

In threading, I have one process:

all data is shared within that process (no penalty for sharing data; no runtime boundary)
scheduling is determined by the OS
the GIL prevents two threads from executing bytecode at the same time

from threading import Thread
from queue import Queue
from dataclasses import dataclass
from string import ascii_lowercase
from random import choice
from time import sleep

@dataclass
class Job:
    name : str
    @classmethod
    def from_random(cls):
        name = ''.join(choice(ascii_lowercase) for _ in range(4))
        return cls(name)

def producer(q):
    while True:
        for _ in range(choice([1, 2])):
            j = Job.from_random()
            print(f'Enqueueing job {j = }')
            q.put(j)
        sleep(1)

def consumer(name, q):
    while True:
        j = q.get()
        print(f'Servicing job {j = } @ {name = }')
        sleep(.1)

def main():
    q = Queue()
    pool = [
        Thread(target=producer, kwargs={'q': q}),
        Thread(target=consumer, kwargs={'q': q, 'name': 'consumer#1'}),
        Thread(target=consumer, kwargs={'q': q, 'name': 'consumer#2'}),
    ]
    for x in pool: x.start()
main()

In multiprocessing, I have multiple process:

all data is isolated within each process (∴ penalty for sharing data; runtime boundary)
scheduling is determined by the OS
two processes can execute code simultaneously

from multiprocessing import Process, Queue
from dataclasses import dataclass
from string import ascii_lowercase
from random import choice
from time import sleep

@dataclass
class Job:
    name : str
    @classmethod
    def from_random(cls):
        name = ''.join(choice(ascii_lowercase) for _ in range(4))
        return cls(name)

def producer(q):
    while True:
        for _ in range(choice([1, 2])):
            j = Job.from_random()
            print(f'Enqueueing job {j = }')
            q.put(j)
        sleep(1)

def consumer(name, q):
    while True:
        j = q.get()
        print(f'Servicing job {j = } @ {name = }')
        sleep(.1)

def main():
    q = Queue()
    pool = [
        Process(target=producer, kwargs={'q': q}),
        Process(target=consumer, kwargs={'q': q, 'name': 'consumer#1'}),
        Process(target=consumer, kwargs={'q': q, 'name': 'consumer#2'}),
    ]
    for x in pool: x.start()
main()

Therefore:

if we need to do a lot of computational work, we cannot use threading
if we need to share complex data, we cannot use multiprocessing
- Note: multiprocessing.shared_memory can reduce the penalty for sharing data (but will not eliminate the runtime boundary)
often, we distinguish between I/O bound and compute bound problems
- I/O bound problems are mostly waiting for some external operation—use threading
- compute bound problems are mostly spending time in computation—use multiprocessing

Beyond The Basics

As a third option, we have asyncio. On face, it looks similar to threading:

In asyncio, I have one process:

all data is shared within that process (no penalty for sharing data; no runtime boundary)
scheduling is determined by the application
appropriate for I/O bound problems

Note the special async and await syntax:

from asyncio import gather, run, sleep as aio_sleep
from asyncio.queues import Queue
from dataclasses import dataclass
from string import ascii_lowercase
from random import choice

@dataclass
class Job:
    name : str
    @classmethod
    def from_random(cls):
        name = ''.join(choice(ascii_lowercase) for _ in range(4))
        return cls(name)

async def producer(q):
    while True:
        for _ in range(choice([1, 2])):
            j = Job.from_random()
            print(f'Enqueueing job {j = }')
            await q.put(j)
        await aio_sleep(1)

async def consumer(name, q):
    while True:
        j = await q.get()
        print(f'Servicing job {j = } @ {name = }')
        await aio_sleep(.1)

async def main():
    q = Queue()
    tasks = [
        producer(q=q),
        consumer(q=q, name='consumer#1'),
        consumer(q=q, name='consumer#1'),
    ]
    await gather(*tasks)

run(main())

Conclusions & Consequences: Why do I need `asyncio`?

Consider the following axes when introducing concurrency?
- Am I compute bound? → use multiprocessing
- Am I I/O bound? → use threading or asyncio
- Do I want coöperative scheduling? → use asyncio
- Do I want preëmptive scheduling? → use threading
- Do I need to pass complex data without a runtime boundary? → use threading or asyncio
asyncio coöperative scheduling means everyone needs to coöperate. Consider asyncio on day one of your project.

seminars.fb

Seminar (Fri May 6): “Mastering the Basics of Python” (developing mastery of basic Python syntax and functionality)

Audience

Abstract

What’s Next?

Notes

Question: Why does dict raise KeyError?

Question: Why does dict raise KeyError?

The Basics

Beyond the Basics

Conclusions & Consequences: Why KeyError?

Question: What is tuple good for?

Question: What is tuple good for?

The Basics

Beyond the Basics

Conclusions & Consequences: Why tuple?

Question: What are extended unpacking syntax and the additional unpacking generalisations?

Question: What are extended unpacking syntax and the additional unpacking generalisations?

The Basics

Beyond the Basics

Conclusions & Consequences: What are extended unpacking syntax and the additional unpacking generalisations?

Question: Why even comprehension syntax?

Question: Why even comprehension syntax?

The Basics

Beyond the Basics

Conclusions & Consequences: Why does comprehension syntax exist?

Question: Why do I need context managers?

Question: Why do I need context managers?

The Basics

Beyond The Basics

Conclusions & Consequences: Why do I need context managers?

Question: Why do I need asyncio?

Question: Why do I need asyncio?

The Basics

Beyond The Basics

Conclusions & Consequences: Why do I need asyncio?

Question: Why does `dict` raise `KeyError`?

Question: Why does `dict` raise `KeyError`?

Conclusions & Consequences: Why `KeyError`?

Question: What is `tuple` good for?

Question: What is `tuple` good for?

Conclusions & Consequences: Why `tuple`?

Question: Why do I need `asyncio`?

Question: Why do I need `asyncio`?

Conclusions & Consequences: Why do I need `asyncio`?