Python

From miki
Revision as of 09:19, 10 October 2024 by Mip (talk | contribs) (→‎Tutorials)
Jump to navigation Jump to search

References

Books

  • O'Reilly's Python in a Nutshell
  • The Python language reference

Links

Python 3
Including Language Reference.
Python 2.7
Other versions of Python are available [1]
Variants and distributions
  • ipython
  • Jupyter — The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
  • Anaconda
PEP
Coding style
  • References:
Miscellaneous
Nice example of generating / testing regex in Python (with nice / small test framework)
Some tips on debugging in Python, mostly focussing on using a logger (instead of printf). See also HN page for many interesting comments.
A summary of changes in each version of Python
Libraries
  • seaborn is a powerful python toolkit to visualize statistical data.
  • plumbum, a library to mimic bash-like commands, to ease rewrite bash scripts in Python, including pipes.
  • tqdm - progress bar, a library to make easily progress bar out of loops, iterable
  • pwntools, a CTF framework and exploit development library.
  • In particular, check pwntools tubes, a library for talking to sockets, processes, ssh connections. Useful for automation (see this CTF writeup for an example).
Profiler
# As simple as
py-spy --pid 12345                         # Display activity of given pid in real-time!
Formatter

Tutorials

Very clear, terse, covering many topics (strings, regex)

Shell

In a command shell, use pydoc to get help:

pydoc repr               # Get help on 'repr' command

Same can be achieved in python interpreter:

help()                 # Interactive help
help('repr')           # Same as typing 'repr' in interactive help
help(repr)             # Help on repr builtin

Testing

Very powerful and easy to use.
  • Hypothesis - Hypothesis is an advanced testing library for Python.
  • Quickcheck a testing library.
  • doctest, very useful to test docstrings (can also produce unittest test suites).
  • unittest, a unit test framework module.

Examples

Packages

To get a simple list of all available wheels for one package visit https://pypi.org/simple/<PACKAGE-NAME>/, eg. https://pypi.org/simple/pycryptodome/.

Packages can be downloaded easily with pip3.

# Download a package into dist/
pip3 download -d dist --prefer-binary bitstring

# Download a package for a specific platform, only taking binary [https://stackoverflow.com/questions/24097168/how-to-download-cross-platform-wheels-via-pip]
pip3 download -d dist --platform=manylinux2010_x86_64 --only-binary=:all: pycryptodome
pip3 download -d dist --platform=manylinux1_x86_64 --only-binary=:all: pycryptodome

Install

Virtual Environments

A Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them.

References
Virtualenv
# Install
sudo apt install virtualenv python3-virtualenv

# Create a new environment
virtualenv -p python3 venv         # To use python3. venv is recommended default to add to .gitignore, etc.
source venv/bin/activate           # Enter environment. From now on, packages will only be installed locally
# Do stuff - pip3 install ...
deactivate                         # Exit environment

Update Python

 ❗  It is not recommended to update the system Python

Some links:

Install pip and setuptools

To install setuptools, the easiest is to use pip, which comes pre-installed in later versions of Python:

pip install -U setuptools

To bootstrap the setuptools on an naked installation:

cd /path/to/your/python
wget https://bootstrap.pypa.io/ez_setup.py -O - | ./python
wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo ./python       # System-wide
wget https://bootstrap.pypa.io/ez_setup.py -O - | ./python - --user   # User-local path

See Install pip setuptools and wheels for more information.

Install module online

Python comes with a wide range of libraries, called modules. There are several ways to install these modules.

Using the distribution
  • For instance, in Debian:
apt-cache search --names-only python-       # View available modules
sudo apt-get install python-pyscard         # Install the pyscard module
Using pip

pip is the new way to install modules. It uses the wheel format.

sudo pip install Pygments

This is equivalent to:

sudo python -m pip install Pygments

This last form can be used to explicit which python runtime must be used:

sudo /path/to/your/python -m pip install Pygments

Use --user to install for user only:

pip install --user Pygments

Use --target SITE to specify manually the target SITE:

pip install --target SITE Pygments

See tip below on how to obtain the default site.

Using easy_install

easy_install is the old way to install modules. It uses the egg format.

sudo easy_install Pygments
Using the source

Download and uppack the package

wget http://sourceforge.net/projects/pyscard/files/pyscard/pyscard%201.6.12/pyscard-1.6.12.tar.gz#md5=908d2530972ea91eb4bb66987e0e1e98
tar -xvzf pyscard-1.6.12.tar.gz
cd pyscard-1.6.12

To install globally (in /usr/local/lib/python2.7/dist-packages or similar):

sudo ./setup.py install

To install locally (in ~/.local/lib/python2.7/site-packages, use --user:

sudo ./setup.py install --user

One can also use pip to install from source:

sudo pip install .       # Global install
pip install --user .     # Local install

Install modules offline

To install a Python module on a machine that has no connection to Internet [3]:

  • On a machine with internet connection
# For instance, to install package neovim
mkdir tmp && cd tmp
pip download neovim
  • On the offline machine, which has access to tmp/:
# For instance, to install package neovim
cd tmp
pip install --no-index --find-links ./ neovim

If you don't have pip on the offline machine, and you can't use an OS package, install directly from source:

python setup.py install
Advanced usage

Binary package are available for several platforms. For instance, visiting https://pypi.org/simple/pycryptodome/:

   pycryptodome-3.10.4-cp35-abi3-macosx_10_9_x86_64.whl
   pycryptodome-3.10.4-cp35-abi3-manylinux1_i686.whl
   pycryptodome-3.10.4-cp35-abi3-manylinux1_x86_64.whl
   pycryptodome-3.10.4-cp35-abi3-manylinux2010_i686.whl
   pycryptodome-3.10.4-cp35-abi3-manylinux2010_x86_64.whl
   pycryptodome-3.10.4-cp35-abi3-manylinux2014_aarch64.whl
   pycryptodome-3.10.4-cp35-abi3-win32.whl
   pycryptodome-3.10.4-cp35-abi3-win_amd64.whl

To download pycryptodome for a given platform:

pip3 download -d dist --platform=manylinux2010_x86_64 --only-binary=:all: pycryptodome
pip3 download -d dist --platform=manylinux1_x86_64 --only-binary=:all: pycryptodome
pip3 download -d dist --platform=win_amd64 --only-binary=:all: pycryptodome
pip3 download -d dist --platform=win32 --only-binary=:all: pycryptodome

Install python2 pip on Debian Bullseye

  • Download python-pip and python-pip-whl from Buster.
  • Install both packages, this will uninstall python-pip-whl 20.1 and python3-pip
sudo apt install ./python-pip_18.1-5_all.deb ./python-pip-whl_18.1-5_all.deb
  • Upgrade pip:
sudo python2 -m pip install -U pip
  • Install back Bullseye python-pip3
sudo apt install python-pip-whl python3-pip
  • Confirm some packages that were installed with python2 pip:
sudo apt install libpython-all-dev python-all python-all-dev python-pkg-resources python-setuptools

Install local version of python with pyenv

sudo apt-get install -y git
sudo apt-get install -y build-essential libbz2-dev libssl-dev libreadline-dev \
                        libffi-dev libsqlite3-dev tk-dev

# optional scientific package headers (for Numpy, Matplotlib, SciPy, etc.)
sudo apt-get install -y libpng-dev libfreetype6-dev   

# Install pyenv
curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash

This is optional, we can enable pyenv in .profile or .bashrc

export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"

Install new version of python:

pyenv install 3.6.0

Use the version in virtualenv:

virtualenv -p ~/.pyenv/versions/3.6.0/bin/python3.6 venv
source venv/bin/activate
python3 --version

Interactive mode

Python can be run interactively, which is a very powerful way to develop new applications.

Python

To import an existing module, use import as usual:

import mymod             # Import module in current session
from mymod import *      # Idem, but remove mymod. prefix to symbols

iPython / Jupyter

To import an existing module, use import as above or command run:

import mymod       # import file 'mymod.py'
run mymod

Reloading modules (automatically)

When working on a module, iPython can reload that module automatically [4]:

%load_ext autoreload
%autoreload 2           # Module will be reloaded at each carriage return
import mymod
# ...or...
load_ext autoreload
autoreload 2           # Module will be reloaded at each carriage return
import mymod

%autoreload?           # for help

Modules can be reloaded manually (this works in standard Python interpreter):

reload(mymod)

To load and configure the extension when launching ipython3 (eg. in a bash script):

# Need to pass 'autoreload' through -c because it can't appear in a python module
ipython3 --ext='autoreload' -c "autoreload 2" -i   # Explicitly say we want interactive

# OR...
ipython3 --config=ip3.py -c "autoreload 2" -i
# Then create a file ip3.py:
#
#     c.InteractiveShellApp.extensions = [
#             'autoreload'
#     ]
#

More complete example with also module loading:

ipython3 --ext='autoreload' -c "autoreload 2" -m dev -i

# File dev.py:
#
#     import logging
#     logging.basicConfig(level=logging.INFO)
#     import my_module as my

Python variants

iPy

Use iPy (ipython) to get an interactive shell with auto-completion, instant help...

%magic                    # Get help on %magic commands (%run,...)
?run                      # Get help on %run magic
%run script.py            # Run given script
%run -i script.py         # ... with inspect mode on
%run -i -e script.py      # ... ... and ignore sys.exit() call
!cmd                      # Run shell command 'cmd', for instance ...
!ls                       # ... List file in current directory

Pypy

PyPy is a fast, compliant alternative implementation of the Python language, which usually runs python programs faster thanks to its Just-in-Time compiler.

Install
On Lucid 64-bit, the easiest is to download the dedicated tarball:
wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.2.1-linux64.tar.bz2
tar -cvjf pypy-2.2.1-linux64.tar.bz2
Install virtualenv, then install pypy as virtual environment my-pypy-env
sudo apt-get install python-virtualenv
virtualenv -p pypy-2.2.1-linux64/bin/pypy my-pypy-env
Modules must be installed separatedly for this virtual environment. For instance
./my-pypy-env/bin/pip install libnum
Run
Run python programs using python or pypy
./my-pypy-env/bin/pypy

Python 3 Reference

Source: Python reference, w3schools python tutorial and O'Reilly Python in a Nutshell

Keywords

False      await      else       import     pass
None       break      except     in         raise
True       class      finally    is         return
and        continue   for        lambda     try
as         def        from       nonlocal   while
assert     del        global     not        with
async      elif       if         or         yield

In addition, the following have special meaning:

  • _*. Also _ is last evaluation result in interactive mode.
  • __*__ system-defined names.
  • __* class-private names (rewritten as mangled form by the compiler).

Assert takes a condition, and optional message:

assert a>10
assert a>10, f"a {a} is not greater than 10"

Literals

See Literals in Python reference and Python in a Nutshell.

42         # Integer literal
3.14       # Floating-point literal
3.14e-10   # Floating-point literal
1.0j       # Imaginary literal

[42, 3.14, 'hello']    # List
[]                     # Empty list
100, 200, 300          # Tuple
()                     # Empty tuple
{'x':42, 'y':3.14}     # Dictionary
{}                     # Empty dictionary
{1, 2, 4, 8, 'string'} # Set
# There is no literal to denote an empty set; use set() instead
string literals (str objects)
"hello"
'hello'
"""Good
night"""               # Triple-quoted string literal
r"\b\x"                # raw -- ignore escape sequences
R"\b\x"                # raw -- ignore escape sequences
f"name is {name!r}"    # formatted string literals
multiline string literals (str objects)
"multiline\nstring"    # simple quote with embedded \n
"""multi-line
string"""              # triple quote, preserve newlines, but not indent friendly
("multi-line\n"
"string")              # Using bracket, recommended by PEP. indent friendly.
"multi-line\n" \
"string"               # Using backslash. indent friendly.
bytes literals (bytes objects)
b"abc\x81\x82"
B"abc\x81\x82"
rb"abc\x81\x82"        # raw -- ignore escape sequences
RB"abc\x81\x82"        # raw -- ignore escape sequences
formatted string literals (3.6)
name="Fred"
f'His name is {name!r}'     # !r conversion, applies repr()
f'His name is {repr(name)}' # equivalent
                            # !s does str(), !a does ascii()
f'length is {len(name)}'    # expression
width=8; prec=3;
f'{3.14159:{width}.{prec}}' # integer formatting
n = 1024
f'{n:x}'                    # '400'
f'{n:4x}'                   # ' 400'
f'{n:04x}'                  # '0400'
f'{n:#x}'                   # '0x400'
f'{n:#6x}'                  # ' 0x400'
f'{n:#06x}'                 # '0x0400'
today = datetime(year=2017, month=1, day=27)
f'{today:%B %d, %Y}'        # date format specifier
f'{n} vs {{n}} vs {{{n}}}'  # '1024 vs {n} vs {1024}'
Raw string literals
r'^foo\.bar$'               # Useful for regex mainly (fix invalid escape sequence)
bar="BAR"
fr'^foo\.{bar}$'            # raw AND formatted string

Operators

+       -       *       **      /       //      %      @
<<      >>      &       |       ^       ~       :=
<       >       <=      >=      ==      !=

Operators and their evaluation order, from highest to lowest:

, [...] {...} `...`                   # Tuple, list & dict. creation; string conv.
s[i] s[i:j] s.attr f(...)             # indexing & slicing; attributes, function calls
+x, -x, ~x                            # Unary operators
x**y                                  # Power
x*y x/y x//y x%y                      # mult, division, floor division (integer division), modulo
x+y x-y                               # addition, substraction
x<<y   x>>y                           # Bit shifting
x&y                                   # Bitwise "and"; also intersection of sets
x^y                                   # Bitwise exclusive or
x|y                                   # Bitwise "or"; also union of sets
x<y  x<=y  x>y  x>=y  x==y x!=y  x<>y # Comparison
x is y   x is not y                   # identity
x in s   x not in s                   # membership
not x                                 # boolean negation
x and y                               # boolean and
x or y                                # boolean or
lambda args: expr                     # anonymous function
Arithmetic operators
1//2                                  # Floor division (PEP-238)
ternary operator
x_sign = 'positive' if (x>=0) else 'negative'
Notes
  • Use is or notfor testing None
if (p.poll() is None):         # Use 'is' for testing None
    print "None"
if not p.poll():               # ... or 'not'
    print "None"

Delimiters

(       )       [       ]       {       }
,       :       .       ;       @       =       ->
+=      -=      *=      /=      //=     %=      @=
&=      |=      ^=      >>=     <<=     **=

Characers with special meanings as part of other tokens:

' " # \

Data types

isinstance('foo',str)         # True, if class or any subclass
isinstance(num,(int,float))   # ... note that several types can be tested
issubclass(type('foo'),str)   # True, same but with type
type('foo')                   # str
type('foo') is str            # True, if class (but not subclass)

In python 2:

# isinstance(o,str)           # Don't do this in Python 2
isinstance(o, basestring)     # Do this instead
isinstance(o, (str,unicode))  # ... or this

Boolean

True            # constant for true
False           # constant for false
bool(x)         # To convert to bool built-in type

Avoid unnecessary call to bool(x).

if x:                     # GOOD
if bool(x):               # BAD
if x is True:             # BAD
if x == True:             # BAD
if bool(x) == True:       # BAD

A valid use:

def count_trues(seq): return sum(bool(x) for x in seq)   # Ensure each item is counted either as 0 or 1

One can use multiple compare operator as in mathematical notations, which is very useful in assert for instance:

assert 0 <= x < 10

Strings

Strings in Python are immutable objects. There are many differences between Python2 and Python3.

Python 2 Python 3

There are two type of strings:

  • str (like 'foo') that are bytestring, ie. array of bytes.
type('foo')
# <type 'str'>
  • unicode (like u'foo') that are textual string (Unicode).
type(u'foo')
# <type 'unicode'>

There are two type of strings:

  • str (like 'foo') that are textual string (Unicode).
type('foo')
# <class 'str'>
  • bytes (like b'foo') that are bytestring, ie. array of bytes.
type(b'foo')
# <class 'bytes'>

So Python3's 'foo' is Python2's u'foo', and Python2's 'bar' is Python3's b'bar'.

s.decode()
Converts bytes to str (unicode).
s.encode()
Converts str (unicode) to bytes.
s.decode()
bytes only. Converts bytes to str (unicode).
s.encode()
str only. Converts str (unicode) to bytes.
b'hello' == 'hello'.encode()          # str to bytes
'hello' == b'hello'.decode()          # bytes to str
"def" in "abcdefgh"                   # substring
s.upper()                             # Change 'uppercase' to 'UPPERCASE'
', '.join(set_3)                      # Join a sequence
map(ord, hex_data)                    # [0xDE, 0xAD, 0xBE, 0xEF]

# Strings function
s="Hello, World"
s.endswith('World')                   # True
s.startswith('Hello')                 # True

list, bytes, int conversion

In python3:

# int <-> bytes
i=1234
i.to_bytes(4,'big')          # Convert int i into 4-byte bytes array (big endian)
i.to_bytes(4,'little')       # Convert int i into 4-byte bytes array (little endian)
                             # Use (x.bit_length()+7)//8 as length to size automatically
s=b'\x80\00'
int.from_bytes(s,'big')      # Convert bytes into a int (big endian)
int.from_bytes(s,'little')   # Convert bytes into a int (little endian)

# bytes <-> list
l=[1,2,3,4]
b=bytes(l)                   # b'\x01\x02\x03\x04'
list(b)                      # [1, 2, 3, 4]

See also Hex for example of conversion into hexadecimal strings.

Bitstring

See Bitstring module.

List

Nice tutorial: http://effbot.org/zone/python-list.htm

a=[0,3,6]
print a[1]                     # 3

a=[0] * 1000                   # Array with 1000 elements
len(a)                         # Number of elements

b=a                            # This only copy the REFERENCE
b[0]+=1                        # ... this also changes a[0]
b=a[:]                         # This makes a NEW COPY
b=a.copy()                     # PYTHON >3.3

import copy
a=[[1,2],[3,4]]
b=a.deepcopy()                 # Deep copy - MUST for dimension >= 2

a[:]=a[::-1]                   # Reassign element in the list (here in reverse order)
a=a[::-1]                      # Idem, but create a new object

a=[];
a.append(12);                  # Create object before appending
a[len(a):] = [13];             # Same as appending

a=[1,2,3]
a.extend([4,5,6])              # [1,2,3,4,5,6] -- extend with an iterable

l=[1,2,3]
l.pop()                        # 3 - pop last element
l.pop(0)                       # 1 - pop first element - consider deque and popleft() for better perf
del l[0]                       # Delete first element

list("abc")                    # ['a', 'b', 'c']

line = '1234567890'
n = 2
[line[i:i+n] for i in range(0, len(line), n)]   # ['12', '34', '56', '78', '90']

def shiftRow(word, n):
    return word[n:]+word[0:n]
state[i::4] = shiftRow(state[i::4],i)      # Apply shiftRow on 4 bytes distant of 4 each

alist = map(lambda b: sbox[b],alist)

state[:] = [ a ^ b for a,b in zip(state,roundKey) ]    # Ex-oring 2 lists of integers

# Multi-dimensional list
matrix = [[0 for _ in range(5)] for _ in range(5)]     # Initialize bi-dimensional array
matrix = [[0]*5 for _ in range(5)]                     # faster way
# matrix = 5*[5*[0]]                                   # WRONG - 5 times copy of same

# Compare - simply use ==
[1,2,3] == [1,2,3]                                     # True
[1,2,3] == [1,2,3,4]                                   # False
[1,2,3] == ['a','b']                                   # False

# ... to remove order and duplicates, use set()
set([1,2,3]) == set([2,1,3,3])                         # True

# Reverse
L = reversed(range(8))                                 # [7,6,5,4,3,2,1,0], but as a range iterator

# Sort
a.sort()

# Sum
a=[8,19,3,17,12,2]
sum(x <= 10 for x in a)
sum(1 for x in a if x <= 10)                          # List comprehension

# logical and, or, not
a=[True, False, True]
all(a)                                                # False - logical and
any(a)                                                # True  - logical or
[not x for x in a]                                    # [False, True, False] - logical not

def count(iterable):
    return sum(1 for _ in iterable)
sub10Count = count(x for x in a if x <= 10)           # Cheap (doesn't create useless list) and readable

# Adding (https://stackoverflow.com/questions/18713321/element-wise-addition-of-2-lists)
                            [sum(x) for x in zip(list1, list2)]     # 177ms
from itertools import izip; [sum(x) for x in izip(list1, list2)]    # 139ms
                            [a + b for a, b in zip(list1, list2)]   # 112ms, most pythonic
from itertools import izip; [a + b for a, b in izip(list1, list2)]  #  71ms, pythonic
from operator import add;   map(add, list1, list2)                  #  44ms

from itertools import product;                       # Generate all possible combinations of a list
[list(x) for x in product([0,1],range(4))]           # [[0, 0], [0, 1], [0, 2], [0, 3], [1, 0], [1, 1], [1, 2], [1, 3]]

import numpy as np
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
sum_vector = vector1 + vector2                                      # 25x faster

# Find *first* matching item
["foo", "bar", "baz"].index("bar")                                  # 1  !!! Throws ValueError if item not found
try:
    return L.index(obj)                                             # Fastest method - note: this could return -1
except ValueError:
    # ...
if obj in L:
    return L.index(obj)                                             # Faster if obj not found

# Find all items
[i for i, e in enumerate([1, 2, 1]) if e == 1]                      # [0, 2]
g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
next(g)                                                             # 0
next(g)                                                             # 2

# Check all items (works on any iterable type)
s = [10,12,14]
all(x >= 10 for x in s)                                             # True

# Flatten a nested list
a = [(1,2),(3,4),(5,6)]
[x for sub in a for x in sub]                                       # [1,2,3,4,5,6]

# Deduplicate a list
a = [1,2,3,1,2]
a = sorted(set(a))                                                  # [1,2,3]

# Count matching elements
[1,2,2,3,1,2,2].count(2)                                            # 4
# Sort based on object attribute
ut.sort(key=lambda x: x.count, reverse=True)   # To sort the list in place...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)  # To return a new list, use the sorted() built-in function...
(From stackoverflow [5])
for c in list(sha256.digest()):
    key.append(ord(c))

Dictionary

D = { 'x':42, 'y':3.14, 'z':7 }
D['x']                              # 42
del D[k]                            # Removes from dictionary D the item whose key is k
#Spare matrix
Matrix = {}
Matrix[1,2] = 15                    # This works because 1,2 -- a tuple -- is used as a key

for key in d:                       # Loop over keys in dictionary d
    pass
for key, value in d.items():        # Loop over keys and values in dictionary d
    pass
D.keys()                            # Keys
D.values()                          # Values
D.items()                           # Keys and Values

Set

S = set()                           # Empty set
S = {1,2,3}                         # Set with some values
S.add(4)                            # Add an element
S.update([2,5])                     # Add a list of element
S.update({2,5})                     # ... any iterable

(elem,) = S                         # get only elem -- fail if S not singleton
elem = next(iter(S))                # get any elem -- work if S singleton or not

Control flow statements

if

if x < 0: print('x is negative')
elif x % 2: print('x is positive and odd')
else: print('x is even and non-negative')

# Better style (PEP 8):
if x < 0:
    print('x is negative')
elif x % 2:
    print('x is positive and odd')
else:
    print('x is even and non-negative')

while

count = 0
while x > 0:
    x //= 2              # truncating division
    count += 1
    print('The approximate log2 is', count)

for

for letter in 'ciao':
    print('give me a', letter, '...')

# target can be a tuple
for key, value in d.items():
    if key and value:        # print only true keys and values
        print(key, value)

# ... or something else (LHS expression)
prototype = [1, 'placemarker', 3]
for prototype[1] in 'xyz': print(prototype)
# prints [1, 'x', 3], then [1, 'y', 3], then [1, 'z', 3]

# Using range():
for i in range(10):
    pass
for i in range(5,10):
    pass
for i in reversed(range(10)): # to go backwards
    pass

# for ... break... else
for i in range(10):
    print(i)
else:
    print("done.")  # Executed only if no 'break' in the loop

List comprehension are often a nice alternative to for loops:

#Using list comprehension:
result1 = [x+1 for x in some_sequence]
#... same as:
result2 = []
for x in some_sequence:
    result2.append(x+1)
# Comprehension list may have 'if', or nested for
result3 = [x+1 for x in some_sequence if x>23]
result5 = [x for sublist in listoflists for x in sublist]

# Dict comprehension
d = {n:n//2 for n in range(5)}
print(d) # prints: {0:0, 1:0, 2:1, 3:1, 4:2] or other order

break

while True:               # this loop can never terminate naturally
    x = get_next()
    y = preprocess(x)
    if not keep_looping(x, y): break
    process(x, y)

continue

for x in some_container:
    if not seems_ok(x): continue

for-else and while-else

for x in some_container:
    if is_ok(x): break # item x is satisfactory, terminate loop
else:
    print('Beware: no satisfactory item was found in container')
    x = None

pass

if condition1(x):
    process1(x)
elif x>23 or condition2(x) and x<5:
    pass                  # nothing to be done in this case
elif condition3(x):
    process3(x)
else:
    process_default(x)

try-except-finally-else / raise

try:
    print(x)
except:
    print("An exception occured")
try:
  print(x)
except NameError:      # Can give many except
  print("Variable x is not defined")
except:
  print("Something else went wrong")
try:
  print(x)
except:
  print("Something  went wrong")
else:   # exec'ed if no error and NO BREAK
  print("try block finished")
try:
  print(x)
except:
  print("Something  went wrong")
finally:   # exec'ed no matter what
  print("the 'try except is finished'")
raise Exception("Sorry, that was wrong")
try:
    i = int(s.strip())
except OSError as err:
    print("OS error: {0}".format(err))
except ValueError:
    print("Could not convert data to an integer.")
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise

with

The with statement is the Python embodiment of the well-known C++ idiom “resource acquisition is initialization" (RAII)

with expression [as varname]:
    statement(s)

yield

The yield statement is used to create generators, ie like range.

It's also very handy to build co-routines, ie. functions that resume their execution where they last returned.

  • Use yield keyword to quit the generator, and optionally return a value.
  • Either call the generator with next(...), or use any python construct that accepts a generator.
def foo(x):
    yield x + 1
    yield x + 2
    yield x + 3

gen = foo(0)
print(next(gen))  # 1
print(next(gen))  # 2
print(next(gen))  # 3
# print(next(gen))  # Exception: StopIteration

for x in foo(10):
    print(x)  # 11, 12, 13
  • Note that calling the generator DOES NOTHING. It simply returns a generator object that provides the `__next__` interface.
foo(0)  # Does nothing
  • Python knows that a function is a generator because it contains the keyword yield.
  • Even if the yield statement is NOT executed, the function will still behave as a generator.
  • That means mixing yield and return is likely WRONG.
def foo(x):
    if x >= 10:
        yield x + 1
        yield x + 2
        yield x + 3
        return x + 4  # WRONG - value ignored
    else:
        return x + 100  # WRONG

for x in foo(10):
    print(x)  # 11, 12, 13

print(foo(0))  # <generator object foo at 0x...>
print(next(foo(0)))  # Exception: StopIteration
  • Any function can be a generator, including class methods.
class Foo(object):
   def __init__(self,x):
      self.x = x
   def foo(self):
        yield self.x + 1
        yield self.x + 2
        yield self.x + 3

for x in Foo(10).foo():
    print(x)  # 11, 12, 13
  • If the generator calls other functions where yield occurs, the calling generator must used the yield from syntax.
  • Remember that calling a generator (ie. a function that contains the yield keyword) does NOTHING.
def foo(x):
    yield from bar(x)  # Execution will stop here
    x = baz(x)  # A normal function
    yield from bar(x)  # Execution will stop here

def bar(y):
    yield y + 1
    yield y + 2

def baz(z):
    return z + 10

for x in foo(10):
    print(x)  # 11, 12, 21, 22

Functions

a = 'global'
def afunction():
    global a                         # Use 'global' to change scope of a variable
    a = 'still using global'
    b = 'local'

Example typed function:

def int_to_bytes(x: int) -> bytes:
    return x.to_bytes((x.bit_length() + 7) // 8, 'big')
    
def int_from_bytes(xbytes: bytes) -> int:
    return int.from_bytes(xbytes, 'big')

Handy functions

# all - Check all items in an iterable
import string
s = 'deadbeef'
all([c in string.hexdigits for c in s])                             # True
all(c in string.hexdigits for c in s)                               # True - shorter
a = [10,12,14]
all(x >= 10 for x in a)                                             # True

Docstrings

def toh(cls,s):
    """ Convert a (binary) string into an hexadecimal string.
    >>> mc.toh('ABCD')
    '41424344'
    >>> mc.toh('mycrypto')
    '6d7963727970746f'
    """
    return s.encode('hex')

Docstrings can also be defined at module level. The docstring line must appear before imports:

#! /usr/bin/python3

"""Use int() to convert either binary or hex string to an integer
>>> int('11110000',2)
240
"""

import binascii

Use module doctest to test examples in docstrings:

# Check docstring examples on exec (not on import)
if __name__ == "__main__":
    import doctest
    doctest.testmod()

See Doctest section for more information.

Classes

An empty class:

class Empty(object):
    pass

e=Empty()

A class with constructor and data members:

class Basic(object):
    __param = None                           # __* denotes a class-private member

    def __init__(self, param):
        self.__param = param
        print "Basic is born with param %s" % param

b1=Basic('foo')
b2=Basic(param='bar')

A class that inherits:

# BETTER: Use super() to call base class implementation
class Child(Parent):
    __param = None

    def __init__(self, param):
        super().__init__()                   # Must call EXPLICITLY parent constructor
        self.__param = param

# AVOID: Name directly the parent class
class Child(Parent):
    __param = None

    def __init__(self, param):
        Parent.__init__(self)                # Must call EXPLICITLY parent constructor
        self.__param = param

Class members can be defined as properties:

class Rectangle(object):
    def __init__(self, width, height):
        self.width = width
        self.height = height
    @property
    def area(self):
        '''area of the rectangle'''
        return self.width * self.height
    @area.setter
    def area(self, value):
        scale = math.sqrt(value/self.area)
        self.width *= scale
        self.height *= scale

Classes may have static methods and class methods [6]:

class Rectangle(object):
    max_area = 10       # A class variable shared by all instances

    def __init__(self, width, height):
        self.width = width
        self.height = height

    @staticmethod
    def give_height(area,width):
        return area / width

    @classmethod
    def get_max_height(cls,max_area):
        return cls.max_area

There is no concept of private method, but methods that are not meant to be called can be prefixed with two underscores. Python will then mangle this name with the class name, hence avoid collision with any sub-class method.

class Rectangle(object):
    def __area(self, width, height):
        return width*height
    def area(self):
        return self.__area(self.width,self.height)

R=Rectangle(5,10)
print(R.area())                  # To call the public function:
print(R._Rectangle__area(2,3))   # To call the private function
Multiple inheritance
  • It is essential to use super() to avoid that any base class constructor is called multiple times:
  • When constructors use different set of parameters, one must use the mixin-pattern, passing parameters through *args and **kwargs.
  • Here an example (not using *args, which is often less recommended in the mixin-pattern) (see SO and ChatGPT)
class BaseA:
    def __init__(self, name="A", **kwargs):
        print(f"BaseA __init__(name={name})")
        super().__init__(**kwargs)    # Useful for bug detection
                                      # Call to object will fail if there are extra parameters not absorbed by children

class BaseB(BaseA):
    def __init__(self, name="B", barg=2, **kwargs):
        print(f"BaseB __init__(name={name},barg={barg})")
        super().__init__(name=name, **kwargs)  # Param absorbed locally must be reinjected if needed


class BaseC(BaseA):
    def __init__(self, carg=1, name="C", **kwargs):
        print(f"BaseC __init__(carg={carg}, name={name})")
        super().__init__(name=name, **kwargs)


class BaseD(BaseC, BaseB):
    def __init__(self, name="D", **kwargs):
        print(f"BaseD __init__(name={name})")
        super().__init__(name=name, **kwargs)


class BaseE(BaseD):
    def __init__(self, name="E"):
        print(f"BaseE __init__(name={name})")
        BaseD.__init__(self, name=name, carg=3)  # We can pass `carg` even though it's not in BaseD


# Create an instance of BaseE
e = BaseE()     # Output:                        # MRO: depth-first search, from left-to-right, then linearization
                # BaseE __init__(name=E)
                # BaseD __init__(name=E)
                # BaseC __init__(carg=3, name=E)
                # BaseB __init__(name=E,barg=2)
                # BaseA __init__(name=E)

Multiple inheritance initialization may become very tricky in complex hierarchy:

  • A typical error is TypeError "missing 1 required positional argument".
  • This happens when a base class calls __init__ of another class in the tree, but doesn't pass necessary parameters.
  • Here an example:
#! /usr/bin/python3

# 1st version:
# * C() ok
# * D() fails because missing param for B.__init__
class A(object):
    def __init__(self, id: int, **kwargs):
        super().__init__(**kwargs)
        self.id = id

# 2nd version:
# * C() fails because extra param for object.__init__
# * D() ok
class A(object):
    def __init__(self, id: int, **kwargs):
        super().__init__(id, **kwargs)
        self.id = id

class B(object):
    def __init__(self, id: int, **kwargs):
        super().__init__(**kwargs)
        self.id = id

class C(A):
    def __init__(self):
        super().__init__(id=123)

class D(B, A):
    def __init__(self):
        super().__init__(id=123)

C()
D()
  • Here, B.__init__ will consume automatically the id parameter from **kwargs.
  • So when it calls super().__init__, it will actually call B.__init__, and this constructor would need again id.

There are several possible fixes:

  • A possible fix is to use different names for the parameter id. However this may not be appropriate if this field is indeed meant to be common to the whole hierarchy.
  • Another fix is to create a parent base class that would be the only one initializing the common member id.
  • Finally, an alternative is to use class mixin (see SO).
The idea is to have class without constructors (or without parameters), and rely on the classes that extend them to perform the necessary initialization:
class B(object):
    id: int

    # No constructor

    def use_id(self):
        return id

Likewise, we may want to override methods in a multi-inheritance hierarchy. In that case, it is better to force use of named parameters to avoid overridden methods to be called in an incompatible way:

class A(object):
    def bind(self, *args, a: int | None = None):
        assert len(args) == 0  # Ensure no unnamed parameters are passed
        if a is not None:
            self.a = a

class B(object):
    def bind(self, *args, b: int | None = None):
        assert len(args) == 0  # Ensure no unnamed parameters are passed
        if b is not None:
            self.b = b

class C(B, A):
    def bind(self, *args, a: int | None = None, b: int | None = None, c: int | None = None):
        assert len(args) == 0  # Ensure no unnamed parameters are passed
        A.bind(self, *args, a=a)
        B.bind(self, *args, b=b)
        if c is not None:
            self.c = c

MyC = C()
MyC.bind(a=1, b=2, c=3)
print(MyC.a, MyC.b, MyC.c)

An alternative method is to use only the simple signature def bind(self,**kwargs), but then we lose the type of parameters in the function signature, which is often useful when using smart editors:

class A(object):
    def bind(self, **kwargs):
        a = kwargs.get('a')
        if a is not None:
            self.a = a
Tips
# Show content of an object
O = SomeObject()
O.__dict__

Classes - advanced

  • On slots:

Modules

References:

Overview

Assume we have a module named module.py:

import module;               # Import everything in module.* namespace
from module import *;        # Import everything in current namespace

Use built-in __import__, or better yet importlib.import_module, to import a module whose name is in a string (http://effbot.org/zone/import-string.htm)

mymodule = __import__('mymodule')  # Import module from string - see http://effbot.org/zone/import-string.htm

import importlib
importlib.import_module('mymodule')

Note that modules can be imported anywhere, not just at the start of the file. This allows for loading a module only when necessary.

Import path

Use sys.path.append to add a path to import module from.

import sys
sys.path.append('some/custom/path')
import module;               # Import module from a custom path

Say we are developing a new version of a module mymodule in src/mymodule/__init__.py, and we already have an old version installed.

To force importing the local version, the easiest:

PYTHONPATH=src ipython3
import mymodule
Keywords
  • __name__, the name (i.e. a string) of the current module.
We can use this to get a reference to the current module (e.g. for DocTest) [7]:
current_module = __import__(__name__)             # Note that there is no import. Python imports each module only once

import sys
current_module = sys.modules[__name__]            # Requires importing sys
  • __module__ in a class or function is the module name of the class / function.
def some_fct():
    current_module = __import__(some_fct.__module__)

class some_class:
    def fct(self):
        current_module = __import__(some_class.__module__)  # Can we use 'self'?
  • __package__ returns the name of the current package (__package__ and __name__ are the same if from a (top) __init__.py) (see PEP366)
Top-level script vs module
  • There are two ways to load a python file: as the top-level script, or as a module.
  • File is loaded as top-level script when it is executed directly (eg. python myfile.py). The __name__ of the top-level script is always __main__.
  • A module is a file imported with import mymodule. The __name__ is mymodule.
  • A module can be part of a package (say package.mysubpackage.mymodule). Module in package can do relative import (from .. import blah).
Relative imports
  • This usually works, at least for a top-level script (see also post above, and section below Get path of current file)
# To import ../../some/package/mymodule.py, relatively to current file
sys.path.append(os.path.dirname(os.path.abspath(__file__)) + '/../../some/package')
import mymodule

Lambda

f=lambda x: x+2
f(1)              # 3

# Lambda can use variable in scope:
i=1
f=lambda x: x+i
f(1)              # 2
i=2
f(1)              # 3

# To FREEZE the context, pass it through DEFAULT param value:
i=1
f=lambda x,i=i: x+i   # Passing i (global) as DEFAULT value to param i (local)
f(1)              # 2
i=2
f(1)              # 2

Python 2 reference

Print

for i in range(10):
    print i

# Add a comma to remove carriage return
for i in range(10):
    print i,                     # 0 1 2 3 4 5 6 7 8 9

To enable Python 3 print function:

from __future__ import print_function        # Enable v3 print in Python 2.x

Basic I/O in Python

Source: O'Reilly Python in a Nutshell.

String formatting with format or formatted-string literals

Source:

Available since Python 3.

formatted-string
i, f, s = 1234, 1/3, 'foo'

# {} can contain any python expression, even complex ones
f"length of '{s}' is {len(s)}"     # "length of 'foo' is 3"
f"length of '{f'length of {s} is {len(s)}'}' is {len(f'length of {s} is {len(s)}')}"
                                   # "length of 'length of foo is 3' is 18"

# Use 'x:width.prec' to set (minimum) width / precision
f"{i:10}"    # '      1234'
f"{s:10}"    # 'foo       '
f"{f:10.6}"  # '  0.333333'
f"{f:10.6f}" # '  2.500000'

# Use 'd', 'b', 'x', 'o' for decimal / binary / hexadecimal / octal output
# Can also prefix with '0' by prefixing width with '0'
f'  {i:010d}' # '  0000001234'
f'0x{i:010x}' # '0x00000004d2'
f'0o{i:010o}' # '0o0000002322'
f'0b{i:010b}' # '0b10011010010'

# Use '<', '>' or '^' to change alignment
# Prefix it with any character to change padding (default ' ')
f"{i:<10}"   # '1234      '
f"{i:^10}"   # '   1234   '
f"{i:>10}"   # '      1234'
f"{i:>010}"  # '0000001234'
f"{s:<10}"   # 'foo       '
f"{s:^10}"   # '   foo    '
f"{s:>10}"   # '       foo'
f"{s:_^10}"  # '___foo____'

# Use '{{' or '}}' to include left or right curly braces
f"{{{i}}}"   # '{1234}'
format
# v3 - String formatting
# '{[selector][conversion]:[format_specifier]}'.format(value)
'First: {} second: {}'.format(1, 'two')
'Second: {1} first: {0}'.format(1, 'two')                        # Give positional for all 
'a: {a}, 1st: {}, 2nd: {}, a again: {a}'.format(1, 'two', a=3)   # Give name for some
'a: {a} first:{0} second: {1} first: {0}'.format(1, 'two', a=3)  # Can mix name and positional

# Using sequences and composites:
'p0[1]: {[1]} p1[0]: {[0]}'.format(('zero', 'one'), ('two', 'three'))
'p1[0]: {1[0]} p0[1]: {0[1]}'.format(('zero', 'one'), ('two', 'three'))
'{} {} {a[2]}'.format(1, 2, a=(5, 4, 3))
'First r: {.real} Second i: {a.imag}'.format(1+2j, a=3+4j)

# Field width
'{:^12s}'.format(s)
'{:.>12s}'.format(s)
print('{:,}'.format(12345678))

# Precision specification
'as f: {:.4f}'.format(x)
'as g: {:.4g}'.format(x)
'as s: {:.6s}'.format(s)

String formatting with %

Available in Python 2 and 3.

# format % values
'result = %d' % x               # %d - decimal
'answers: %d %f' % x, y         # %f - float
'%x' % hexval                   # Print hex
'File not found %r' % filename  # !!! USE %r to log possibly erroneous strings !!!

Input parsing

See also modules parse and re.

# Using built-ins
print(int('2'))
print(float('3.14'))

# Using ast.literal_eval()
import ast
print(ast.literal_eval('23'))     # 23
print(ast.literal_eval('[2,3]'))  # [2, 3]
print(ast.literal_eval('2+3'))    # raises ValueError
print(ast.literal_eval('2+'))     # raises SyntaxError

# Using split()
a='abc\ndef\n123\n'
a.split('\n')                     # ['abc', 'def', '123', '']
a.strip().split('\n')             # ['abc', 'def', '123']
a='12,34,56'
a.split(',')                      # ['12','34','56']
[int(x) for x in a.split(',')]    # [12,34,56]

Text output

print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

for i in range(10):
    print(i,"",end="")             # 0 1 2 3 4 5 6 7 8 9 
for i in range(10):
    print(f"{i} ",end="")          # 0 1 2 3 4 5 6 7 8 9 

import sys;
sys.stdout                         # Standard output
sys.stderr                         # Standard error

# Output to a file
print(file=f,'...')
f.write('...')
sys.stdout.write(...)              # Using write with stdout

# Output to stderr
sys.stderr.write(...)              # Using write
print(file=sys.stderr,'...')       # Using print
def eprint(*args, **kwargs):
    print(*args, file=sys.stderr, **kwargs)
eprint('...')                      # Using custom fct

Text input

See also #io module.

Standard input
import sys;
sys.stdin                          # Standard input

# Input (from stdin only)
input(prompt='')                   # v3: same as v2 raw_input; v2: same as eval(raw_input(prompt))
raw_input(prompt='')               # v2 only
File input with context manager (recommended) - file input
# Context manager - f is closed automatically
with open("test.txt", "U") as f: # "U" for universal line ending
    for line in f:
        print(line.rstrip('\n'))   # Or rstrip() to right strip all blanks (no need for "U" then)

# Even more compact
for line in open('test.txt', 'U'): # file will be closed when object out of scope
    print(line.rstrip())           # Or rstrip() to right strip all blanks (no need for "U" then)
Open / close
# read a file
f = open("demofile3.txt", "rU")    # "r" optional, "U" for universal line ending
print(f.read())
f.close()
fileinput
import fileinput

# Iterate over all files in sys.argv or stdin
for line in fileinput.input():
    print(line.rstrip())            # Right-strip all blansk (CR,LF,SPC)

# Can override list of files -- here explicit use as context manager
with fileinput.input(files=('spam.txt', 'eggs.txt'),mode="U") as f:
    for line in f:
        print(line.rstrip())

Binary

See also this SO post for various methods and benchmarks.

with open("filename","rb") as f:
    byte = f.read(1)
    while byte:
        print(byte)
        byte = f.read(1)

# Python 3.8 - Using walrus operator
with open("filename","rb") as f:
    while (byte := f.read(1)):
        print(byte)
        byte = f.read(1)

# Much better perf - read the whole at once
with open("filename","rb") as f:
    data = f.read()
for byte in data:
    print(byte)

# Almost identical perf, read chunked
with open("filename", "rb") as f:
    data = f.read(CHUNKSIZE)
    while data:
        for byte in data:
            print(byte)
        data = f.read(CHUNKSIZE)

Some ways to extract data from bytes:

# Extract an int from bytes
data = b'\x01\x02\x03\x04\x05\x06\x07\x08'
i = int.from_bytes(data[:4], byteorder='little', signed=False)

# Extract several int
# http://docs.python.org/library/struct.html#struct.unpack
import struct
(a,b) = struct.unpack('ii',data)

Standard Library

sys module

Arguments

sys.argv, len(sys.argv)          # Argument list, number of arguments ([0] -> exec name)
if ("-h" in sys.argv) or ("--help" in sys.argv):
    printUsage()
for a in range(len(sys.argv)):
    if sys.argv[a] == "-e":
        # handler

Exit

sys.exit()

io module

The io module (docs.python.org).

To open a file:

# - mode can be 'r', 'w', 'a', 'r+', 'w+', 'a+', ...
#   Default is text 't', add 'b' for binary, 'U' for universal line ending
open(file, mode='r', buffering=-1, encoding=None, errors='strict', newline=None, closefd=True, opener=os.open)

with io.open(...) as f:            # PYTHONIC way, open is a manager
    # ...
with open(...) as f:               # SAME... no need for io
    # ...

for line in open(...):             # PYTHONIC way to read line by line, file close automatically
    # ...

f = open(...)                      # BAD. No guarantee that f gets closed
data = open(...).read()            # ALSO BAD. handle may survive until GC, or because exception thrown.
                                   # https://stackoverflow.com/questions/2404430/does-filehandle-get-closed-automatically-in-python-after-it-goes-out-of-scope

File operations:

f.close()
f.flush()
str = f.read(size=-1)              # bytestring in bynary mode, text string otherwise.
str = f.readline(size=-1)
lst = f.readlines(size=-1)
lst = [l.strip() for l in open(...)] # To get rid of '\n', trailing spaces...
with open(filename) as f:
    mylist = f.read().splitlines() # To get rid of '\n' only
f.write(s)
f.writelines(lst)                  # Same as: for line in lst: f.write(line)

Iterations:

for line in f:
    # ...                          # !!! 'break' and 'next(t)' interferes with file's position
                                   # f.readline() is ok.

Binary file:

f.seek(10)                         # Go to ofs 10, from start of file
f.seek(10,os.SEEK_SET)             # ... same
f.seek(-10,os.SEEK_CUR)            # Move pos 10 bytes backward

os and filesystem operations

import os
os.remove(path)                 # Remove a file
os.unlink(path)                 # ... idem
os.rmdir(path)                  # Remove an (empty) directory
os.path.dirname(path)
os.path.basename(path)

import shutil
shutil.rmtree(path, ignore_errors=False, onerror=None)   # Remove a directory and all its content
import os.path
os.path.isfile(fname)            # True if fname exists and is a file

if not os.path.exists(directory):
    os.makedirs(directory)       # Create directory if does not exists

try:                             # Avoid race condition if directory created by another process
    os.makedirs(path)            # But we could fix solution above as well
except OSError:                  # This one always trigger an exception in nominal case
    if not os.path.isdir(path):  
        raise

Scanning a directory

import glob
tests = glob.glob('tests/tests_*.py')
for t in tests:
    print("tests %s" % t)

# https://stackoverflow.com/questions/6773584/how-is-pythons-glob-glob-ordered
import os
sorted(glob.glob('*.png'))                         # Sort by name
sorted(glob.glob('*.png'), key=os.path.getmtime)   # Sort by modification time
sorted(glob.glob('*.png'), key=os.path.getsize)    # Sort by size

Executing a command in subshell:

os.system(f"diff -rq {dir1} {dir2} >/dev/null 2>/dev/null")   # Return code are multiplied by 256

It is however recommended to use subsystem.call rather than os.system.

argparse module

See excellent argparse tutorial.

import argparse

# Parse command line
parser = argparse.ArgumentParser()
parser.add_argument("-p", "--port", type=int, default=PORT, help="server port number")
group = parser.add_mutually_exclusive_group()
group.add_argument("-a", "--attach", action="store_true", help="don't start a new server but attach to a running one")
group.add_argument("-t", "--target", default=TARGET, help="path to server executable")
parser.add_argument("test", nargs='+', help="path to python module containing tests to run")
parser.add_argument("-v", "--verbose", action="count", help="increase output verbosity")
args = parser.parse_args()

bench = TestBench(target=args.target, port=args.port, attach=args.attach)

for s in args.test:
    print("test %s" % s)

argparse can take custom types [8]:

def argument_date(str_date):
    # Not the most efficient to roundtrip like this, but
    # fits well with your existing code
    now = datetime.datetime.utcnow().date()
    if str_date == "yesterday":
        str_date = str(now - datetime.timedelta(1))
    elif str_date == "today"
        str_date = str(now)

    try:
        return datetime.strptime(str_date, "%Y-%m-%d").replace(tzinfo=pytz.utc)
    except ValueError as e:
        raise argparse.ArgumentTypeError(e)

parser = argparse.ArgumentParser(prog='PROG')
parser.add_argument('start', type=argument_date, help='Start date (YYYY-MM-DD, yesterday, today)')
parser.add_argument('end', type=argument_date, nargs='?', help='End date (YYYY-MM-DD, yesterday, today)')

Help / description test are formatted automatically. To avoid that [9]:

from argparse import RawTextHelpFormatter
parser = ArgumentParser(description='test', formatter_class=RawTextHelpFormatter)

When using sub-parsers, we can display the usage string in the epilog [10]:

parser.epilog = f"commands usage:\n  {command_list.format_usage()}  {command_ensure.format_usage()}"

Or even simply in the usage, stripping the first occurence of usage:

parser.usage = f"{parser.format_usage()[7:]}{command_list.format_usage()}{command_ensure.format_usage()}"

Positional parameters are required by default. Use default to make it optional. Using nargs='?' also surround the parameters with [ ... ] in the help text.

parser = argparse.ArgumentParser(prog='frobble')
parser.add_argument('bar', nargs='?', type=int, default=42,
                    help='the bar to %(prog)s (default: %(default)s)')
parser.print_help()
# usage: frobble [-h] [bar]
# 
# positional arguments:
#  bar     the bar to frobble (default: 42)
# 
# options:
#  -h, --help  show this help message and exit

Example above also show the use of %(default)s place holder in help string (more available, see documentation).

random module

import random
IV = []
for i in range(16):
    IV.append(random.randint(0, 255))

datetime module

From datetime module (docs.python.org):

import datetime
print datetime.datetime.today()  
print datetime.datetime.now()    # similar, but possibly more accurate
print datetime.date.now()        # date only

# Compute an epoch, eg since 1899-12-31:
# https://stackoverflow.com/questions/151199/how-to-calculate-number-of-days-between-two-given-dates
delta = datetime.datetime.now() - datetime.datetime.strptime('1899-12-31','%Y-%m-%d')
delta.days

To add a timezone information (eg. UTC) to datetime object [11]

from datetime import datetime, timezone

dt = datetime.now()
dt = dt.replace(tzinfo=timezone.utc)

print(dt.isoformat())
# '2017-01-12T22:11:31+00:00'

To convert a JSON serialized datetime string [12] from ECMAScript format / ISO8601 / RFC 3339 format (eg 1985-04-12T23:20:50.52Z, where Z is the time zone for UTC) [13]

# In javascript:
#     var d = new Date("2011-05-25T13:34:05.787000");
#     d.toJSON()
#     # '2011-05-25T20:34:05.787Z'
dt = datetime.strptime('2011-05-25T20:34:05.787Z', '%Y-%m-%dT%H:%M:%S.%fZ')
# datetime.datetime(2011, 5, 25, 20, 34, 5, 787000)

# Set the UTC timezone:
dt = dt.replace(tzinfo=datetime.timezone.utc)

Extract some usual information from datetime:

# Source: ChatGPT
def extract_date_info(timestamp):
    # Parse the timestamp string to a datetime object
    dt = datetime.datetime.strptime(timestamp, "%Y%m%dT%H%M%S.%fZ")

    # Extract the year
    year = dt.year

    # Extract the week number (ISO week number, where Monday is considered the first day of the week)
    week_number = dt.isocalendar()[1]

    # Extract the weekday (0 is Monday, 6 is Sunday)
    weekday = dt.weekday()

    return year, week_number, weekday

# Your JSON timestamp
json_timestamp = "20230518T132843.000Z"

Do some datetime arithmetics:

# Source: ChatGPT
def subtract_time(timestamp, hours=0, minutes=0):
    # Parse the timestamp string to a datetime object
    dt = datetime.datetime.strptime(timestamp, "%Y%m%dT%H%M%S.%fZ")

    # Create a timedelta object for the time difference
    time_difference = datetime.timedelta(hours=hours, minutes=minutes)

    # Subtract the time difference from the original datetime object
    new_dt = dt - time_difference

    # Convert the new datetime object back to a string in the original format
    new_timestamp = new_dt.strftime("%Y%m%dT%H%M%S.%fZ")

    return new_timestamp

# Your JSON timestamp
json_timestamp = "20230518T132843.000Z"

bitstring module

from bitstring import *
s  = Bits('0x8081828384858687')
s  = Bits(hex='8081828384858687')
s  = Bits(bytes=b'\x80\x81\x82\x83\x84\x85\x86\x87')
sa = BitArray('0x8081828384858687')    # same as Bits, but mutable

s << 8                           # Logical shift
s[8:] + '0x00'                   # ... same as above
s <<= 8                          # ... (with mutation)
sa.rol(8)                        # Cyclic shift (with mutation)
s[8:] + s[:7]                    # ... same as above

See bitstring [14] (manual)

Named Tuple / Data Classes

Source: SO

Using named tuples from collections (docs.python.org):

# Python 2 and 3
from collections import namedtuple
MyStruct = namedtuple("MyStruct", "field1 field2 field3")

m = MyStruct("foo", "bar", "baz")
m = MyStruct(field1="foo", field2="bar", field3="baz")

Since Python 3.6, improved NamedTuple (docs.python.org):

# Python 3.6
from typing import NamedTuple

class User(NamedTuple):
    name: str

class MyStruct(NamedTuple):
    foo: str
    bar: int
    baz: list
    qux: User

my_item = MyStruct('foo', 0, ['baz'], User('peter'))
# or
my_item = MyStruct(foo='foo', 
                bar=0, 
                baz=['baz'], 
                qux=User('peter'))

# NamedTuples are immutable. Use _replace to change some fields
my_item = my_item._replace(foo='foz',bar=1)

Since Python 3.7, Data Classes:

# Python 3.7
from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float
    z: float = 0.0

p = Point(1.5, 2.5)

print(p)  # Point(x=1.5, y=2.5, z=0.0)

parsing module

Source: https://pypi.org/project/parse/, https://stackoverflow.com/questions/2175080/sscanf-in-python

Parse is the opposite of format.

from parse import parse

parsed=parse('{} fish',"blue fish")
print(parsed[0])                       # 'blue'

min,max,letter,pwd=parse('{:d}-{:d} {}: {}',"4-8 n: noon")
print(f"{min}, {max}, {letter}, {pwd}")

regex module

Source: w3schools

See also parse module for more advanced parsing

# It's a good habit to put re string in raw r'...' strings!

import re
re.search(r'^The.*Spain$', "The rain in Spain")    # re.Match object
re.match(r'The.*Spain$', "The rain in Spain")      # match always start from beg.

re.findall(r'.ai', "The rain in Spain")            # ['rai', 'pai']
re.split(r'\s', "The rain in Spain")               # ['The', 'rain', 'in', 'Spain']
re.sub(r'\s', "_", "The rain in Spain")            # The_rain_in_Spain
re.sub(r'\s', "_", "The rain in Spain", 2)         # The_rain_in Spain

# re.Match object
re.search(r'.ai', "The rain in Spain").group()     # rai
re.search(r'.ai', "The rain in Spain").span()      # (4, 7)
re.search(r'.ai', "The rain in Spain").string      # The rain in Spain

# re.sub
x = re.sub(r'\s', "_", "The rain in Spain")        # The_rain_in_Spain

# re.compile
r = re.compile(r'.ai')
r.search("The rain in Spain").group()              # rai

# flags
a="foo\nbar"
re.match(r'foo\nbar',a) is not None           # True
re.match(r'foo\nbar$',a) is not None          # True
re.match(r'foo$\nbar',a) is not None          # False - ^ and $ only match first/last
re.match(r'fo.*ar$',a) is not None            # False - . doesn't match \n
re.match(r'fo.*ar$',a,re.S) is not None       # True - re.S: . match also \n
re.match(r'foo$\nbar$',a,re.M) is not None    # True - re.M: ^ and $ match any \n
re.match(r'f.*^bar$',a,re.S|re.M) is not None # True

Subsystem

Execute a command in a subshell (better than os.system):

import subsystem
code = subsystem.call("diff", "-rq", dir1, dir2, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

Here, no specific handling is necessary if dir1 or dir2 contains special characters.

Cryptography

from Crypto.Cipher import AES
def toh(s):
    return s.encode('hex')
def tos(h):
    return h.replace(' ','').decode('hex')
def aes(k,p):
    a=AES.new(tos(k))
    return toh(a.encrypt(tos(p)))
def aesinv(k,c):
    a=AES.new(tos(k))
    return toh(a.decrypt(tos(c)))
def sxor(h1,h2):
    return toh(''.join(chr(ord(a) ^ ord(b)) for a,b in zip(tos(h1),tos(h2))))

Example of use:

ipython
run mycrypto                    # Assuming script in current dir and named 'mycrypto.py'
key='00112233 44556677 8899aabb ccddeeff'
p0='00000100 80000000 00000000 00000000'
c0=aes(key,p0)
p1='aaaaaaaa bbbbbbbb cccccccc dddddddd'
c1=aes(key,sxor(c0,p1))

Using Hash:

from Crypto.Hash import SHA256

h = SHA256.new()
h.update(b'Hello')
print h.hexdigest()
Modular inverse [15]
# Using gmpy2 - FASTEST                       # Old Python: use gmpy
import gmpy2
gmpy2.invert(1234567, p)                      # 1000000 loops, best of 3: 737 ns per loop (p 1024-bit)
gmpy2.divm(1, 1234567, p)                     # 1000000 loops, best of 3: 933 ns per loop (p 1024-bit)

# Using egcd function - NO DEPS, BUT SLOWER
def egcd(a, b):
    if a == 0:
        return (b, 0, 1)
    else:
        g, y, x = egcd(b % a, a)
        return (g, x - (b // a) * y, y)

def modinv(a, m):
    g, x, y = egcd(a, m)
    if g != 1:
        raise Exception('modular inverse does not exist')
    else:
        return x % m
timeit modinv(1234567,p)                     # 100000 loops, best of 3: 13.6 us per loop (p 1024-bit)

# Using pow() - SIMPLEST BUT SLOWEST
timeit pow(1234567,p-2,p)                    # 100 loops, best of 3: 4.22 ms per loop
modular exponentiation
from gmpy import mpz
def power_mod(a, b, n):
    return long(pow(mpz(a),b,n))

# or built-in:
pow(a,b,n)
# Example from https://www.quickprogrammingtips.com/python/how-to-calculate-sha256-hash-of-a-file-in-python.html
import hashlib

filename = input("Enter the input file name: ")
sha256_hash = hashlib.sha256()
with open(filename,"rb") as f:
    # Read and update hash string value in blocks of 4K
    for byte_block in iter(lambda: f.read(4096),b""):
        sha256_hash.update(byte_block)
    print(sha256_hash.hexdigest())
  • Package pycryptodome or pycryptodomex
  • Same as pycrypto, but more algo, like AES-GCM.
  • Also support SHA, SHA-256...
  • Note pycryptodome must be imported as import Crypto, pycryptodomex must be imported as import Cryptodome.
Ed25519, Curve25519, X25519
  • Basically nothing.
  • Ed25519 — See ed25519.py (cr.yp.to).
  • x25519 — RFC7748 is actually the best we can find!!!. It provides a basic implementation in python. Also some copies found in other blogs (x25519.py best although seems buggy (very long pub key), [16], [17]).
  • curve25519-donna, this might be interesting, but no docs. There are tests in the module we could use to understand the package (see in /usr/local/lib/python3.7/dist-packages/curve25519).
  • pure25519 — very clean lib for ed25519, but doesn't implement curve25519. So signing only. Also python implementation, so slow.
  • pynacl / python-ed25519 - ed25519 only, and again some high-level sh*t, like these guys must feel to be on a mission. Importing an existing private key seems a NP-complete problem, or they don't like hexadecimal with their salt.
  • cryptography.io "hazardous" material (https://cryptography.io/en/latest/hazmat/primitives/asymmetric/x25519/), but dying is better option IMO.

Mathematics

Python has built-in support for unlimited precision integer arithmetics.

11//2                   # integer division
2**255 - 19             # exponentiation
pow(2,5,11)             # modexp, faster than 2**5 % 11
25 % 4                  # modulo

For floating-point, there is the mpmath library.

import mpmath

# mpmath uses gmpy if available
print(mpmath.libmp.BACKEND)
# 'gmpy'

# Set decimal precision
mpmath.mp.dps = 50

# Create float with mpf
# ... pay attention that Python float are not accurate!
mpmath.mpf(2.1)                # mpf('2.100000000000000088817841970012523233890533447265625')
mpmath.mpf(21)/10              # mpf('2.1000000000000000000000000000000000000000000000000011')

# output - can't use python {x:10.6} on mpf object, need to convert with nstr
x = mpmath.mpf(21)/10
print(f"{mpmath.nstr(x,20)}")                   # 222.1
print(f"{mpmath.nstr(x,20,strip_zeros=False)}") # 222.10000000000000000

# Miscellaneous functions
mpmath.power(1+1/2,40)   # mpf('11057332.3209400121422731899656355381011962890625')
mpmath.factorial(40)     # mpf('815915283247897734345611269596115894272000000000.0')
mpmath.binomial(160,80)  # mpf('92045125813734238026462263037378063990076729140.0')

Logging

See logging module.

To use logging in a module:

import logging

logger = logging.getLogger(__name__)
logger.debug('Debug message')
logger.info('Info message')
logger.warning('Warning message')
logger.error('errror message')
# To see logging msg printed by a module
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)

# To see timestamps
logging.basicConfig(
    format='%(asctime)s %(levelname)-8s %(message)s',
    level=logging.INFO,
    datefmt='%Y-%m-%d %H:%M:%S')

JSON

From json module (docs.python.org):

import json
from io import StringIO

# Encode python object to JSON
json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
# '["foo", {"bar": ["baz", null, 1.0, 2]}]'
print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True))
# {"a": 0, "b": 0, "c": 0}
# ... to a file
io = StringIO()
json.dump(['streaming API'], io)
io.getvalue()
# '["streaming API"]'
# ... compact encoding
json.dumps([1, 2, 3, {'4': 5, '6': 7}], separators=(',', ':'))
# '[1,2,3,{"4":5,"6":7}]'
# ... pretty print
print(json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4))
# {
#     "4": 5,
#     "6": 7
# }

# Decode JSON
json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]')
# ['foo', {'bar': ['baz', None, 1.0, 2]}]
# ... from a file
json.load(open("file.json"))
json.load(StringIO('["streaming API"]'))
# ['streaming API']

Using JSON sample file from JSON page, we can easily query specific fields:

import json

def extract_window_lines(json_data,title):
    lines = []
    for tab in json_data["tabs"]:
        for window in tab["windows"]:
            if window["title"] == title:
                lines.append(window["lines"])
    return lines

with open("my_file.json", "r") as file:
    json_data = json.load(file)

lines = extract_lines_from_json(json_data)

Pickle

Use Pickle to serialize python objects.

Requests

From requests module (docs.python-requests.org, see also quickstart):

#https://gist.github.com/tetafro/7e1eb8549c324835cf23a283d9e60aed
import requests

BASE_URL = 'http://example.com'
AUTH_URL = BASE_URL + '/login'
CREDENTIALS = {'username': 'user', 'password': 'qwerty'}

session = requests.Session()
session.post(AUTH_URL, data=CREDENTIALS)
print(session.cookies)

file_url = BASE_URL + '/files/name.txt'
resp = session.get(file_url, stream=True)

if resp.status_code == 200:
    filename ='/tmp/myfile.txt'
    with open(filename, 'wb') as f:
        for chunk in resp.iter_content(chunk_size=1024):
            if chunk:
f.write(chunk)

To import with some cookies and URL parameters:

url='https://royaleapi.com/data/replay'
params = {
    'tag':'02P9YP0VQUY9',
    'team_tags':'2L0VJG02',
    'opponent_tags':'89UCYQ0C0',
    'team_crowns':'2',
    'opponent_crowns':'1',
    'referrer_path':'https://royaleapi.com/decks/winner/gc'
}
headers = {
    'Cookie': '__royaleapi_session=************************************',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
    'Accept': '*/*',
    'Accept-Encoding': 'identity',
    'Connection': 'Keep-Alive'
}
r = requests.get(url, params, headers=headers)
if r.status_code == 200:
    print(r.text)
    print(r.json().keys())
    print(r.json()['success'])

html.parser

From html.parser From html.parser module (docs.python.org):

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Encountered a start tag:", tag)

    def handle_endtag(self, tag):
        print("Encountered an end tag :", tag)

    def handle_data(self, data):
        print("Encountered some data  :", data)

parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
            '<body><h1>Parse me!</h1></body></html>')

multiprocessing

Example of a producer / consumers:

import multiprocessing

def consumer(q_in,q_out):
    cnt = 0
    while True:
        elt = q_in.get()

        if elt is None:
            break

        print(elt)
        cnt += 1

        q_out.put(cnt)

    # sentinel to notify main process we are done
    q_out.put(None)

    return 0

q_in = multiprocessing.Queue()       # Use maxsize param to avoid the queue to grow too big
q_out = multiprocessing.Queue()
num_processes = multiprocessing.cpu_count()
print(f"Using {num_processes} processes.")
processes = []
for i in range(num_processes):
    process = multiprocessing.Process(target=consumer, args=(q_in,q_out))
    processes.append(process)
    process.start()

# Feed some work
for i in range(1000):
    q_in.put(i)

# Send the consumer stop sentinel
for process in processes:
    q_in.put(None)

# Print result until we received all sentinels
while num_processes > 0:
    elt = q_out.get()
    if elt is None:
        num_processes -= 1
    else:
        print(elt)

# Wait for the process to exit
for process in processes:
    process.join()

Notes:

results = []
while True:
    try:
        result = resultQueue.get(False, 0.01)
        results.append(result)
    except queue.Empty:
        pass
    allExited = True
    for t in processes:
        if t.exitcode is None:
            allExited = False
            break
    if allExited & resultQueue.empty():
        break

Libraries

Box

Box provides Python dictionaries with advanced dot notation access.

from box import Box

movie_box = Box({
    "Robin Hood: Men in Tights": {
        "imdb_stars": 6.7,
        "length": 104,
        "stars": [ {"name": "Cary Elwes", "imdb": "nm0000144", "role": "Robin Hood"},
                   {"name": "Richard Lewis", "imdb": "nm0507659", "role": "Prince John"} ]
    }
})

movie_box.Robin_Hood_Men_in_Tights.imdb_stars
# 6.7

movie_box.Robin_Hood_Men_in_Tights.stars[0].name
# 'Cary Elwes'

NumPy

NumPy is a package for scientific computing in Python.

MSYS2

  • Install numpy on MSYS2 using MinGW64 package:
pacman -S mingw64/mingw-w64-x86_64-python3
pacman -S mingw64/mingw-w64-x86_64-python3-numpy

Random

Random bytes, float, int, permutation, shuffle, choice...

np.random.randint(0, 10, 5)      # array([7, 2, 6, 1, 8])
np.random.bytes(5)               # b'=\xd6;\\G'
np.random.permutation(5)         # array([3, 4, 2, 0, 1])
np.random.permutation(range(0,10,2)) # array([4, 2, 8, 0, 6])
a=[1,2,3,4]
np.random.shuffle(a)
a                                # [4, 3, 2, 1]
np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])  # array([3, 3, 0])

Arrays

Manipulating arrays is very easy and efficient in NumPy.

# Standard operation element-wise
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a * b                            # array([2.,  4.,  6.])

# BROADCASTING - Automatic extension of arrays
# ... Also work for multi-dimensional arrays
a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b                            # array([2.,  4.,  6.])

# Compare element-wise
a = np.random.randint(0, 10, 5)  # 5 random numbers
b = np.random.randint(0, 10, 5)
a != b                           # array([ True, True, True, True, False])

# Sum on elements
np.sum(a != b)                   # 4

# Logical operation (logical_or, logical_and, ...)
# https://numpy.org/doc/stable/reference/routines.logic.html
b = np.random.randint(0,10,5)
a = np.random.randint(0,10,5)
a                                # array([7, 2, 6, 1, 8])
b                                # array([3, 9, 8, 7, 6])
np.logical_or(a<5,b<5)           # array([ True,  True, False,  True, False])

Testing

pytest

See pytest.

Doctest

References
Overview

The module searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown. These can be used as basic documentation and working examples.

Here an example script:

# file sxor.py
import binascii

def sxor(s1,s2):
    """Xor two strings together.
    >>> sxor('abcd','1234')
    'b9f9'
    """
    s1=binascii.unhexlify(s1)
    s2=binascii.unhexlify(s2)
    return binascii.hexlify(bytes(a ^ b for a,b in zip(s1,s2))).decode()

# Footer to trigger doctest automatically when script is run.
# Alternatively, trigger it with:
# 
#     python -m doctest sxor.py
#
if __name__ == "__main__":
    import doctest
    doctest.testmod()

Now, we can run the tests with:

python3 sxor.py

No output means there was no errors. Use -v to get more output:

python3 sxor.py -v
# Trying:
#     sxor('abcd','1234')
# Expecting:
#     'b9f9'
# ok
# 1 items had no tests:
#     __main__
# 1 items passed all tests:
#    1 tests in __main__.sxor
# 1 tests in 2 items.
# 1 passed and 0 failed.
# Test passed.

Instead of using the footer code, one may call doctest from the command line (since Python 2.6):

python3 -m doctest sxor.py
Running Doctest on a module

Say we are developing a module sample containing docstrings, and we want to run the tests. Here several methods (some refs: [18]).

  • Using footer trick — DOES NOT ALWAYS WORK
Using the footer trick above will work only if running the module script directly:
python3 sample.py
python3 sample/__init__.py     # If module as a directory
python3 sample                 # FAIL! can't find '__main__' module in sample
Note that this may additionally fail if the module does some relative imports (which are only available in packages).
  • Using a test() function and external test runner script.
The trick is to pass the module to test as argument m to testmod (note that using argument name=__name__ only changes the name displayed during testing [19]).
Several solutions:
Comments file run_all_tests.py file sample.py
Using a test function in the module itself. The idea is to get the module __name__ from that function, and pass a reference to that module to testmod.
#! /bin/python3

for file in ['sample']:
    temp_module = __import__(file)
    temp_module.test()
# A simple test
def hello():
    """
    >>> hello()
    'Hello'
    """
    return "Hello"

# A test with import
def world():
    """
    >>> import sample
    >>> sample.world()
    'World!'
    """
    return "World!"

# Simplest - no need to duplicate module name
def test():
    import doctest
    # Note: don't confuse args 'm' with 'name'. 'm=' optional, but 'verbose=' mandatory
    doctest.testmod(m=__import__(__name__), verbose=True)  

# Another method to avoid duplicating module name
def test():
    import doctest, sys
    # Note: don't confuse args 'm' with 'name'. 'm=' optional, but 'verbose=' mandatory
    doctest.testmod(m=sys.modules[__name__], verbose=True)  

# Maybe simpler to read, but must duplicate module name
def test():
    import doctest, sample
    # Note: don't confuse args 'm' with 'name'. 'm=' optional, but 'verbose=' mandatory
    doctest.testmod(m=sample, verbose=True)
Alternatively we can move the doctest method in the runner
import doctest, sample
doctest.testmod(sample)
def hello():
    """
    >>> hello()
    'Hello'
    """
    return "Hello"
A more advanced method that transform DocTests into unittest suite, that we can then run. The advantage is that it doesn't pollute module with test-only code, and the test can be run in a richer framework.
#! /bin/python3

for file in ['sample']:
    import doctest, unittest
    temp_module = __import__(file)
    test_suite = doctest.DocTestSuite(temp_module)
    unittest.TextTestRunner().run(test_suite)
def hello():
    """
    >>> hello()
    'Hello'
    """
    return "Hello"
Tips
  • To write good module docstrings, "think about somebody doing help(yourmodule) at the interactive interpreter's prompt — what do they want to know?" [20]. See pep-0257 for more recommendations

Debug with icecream ic()

See icecream - never debug with print() again

Reproducibility

  • Tackle random:
  • Seed random sources (os.urandom, random).
  • Replace libc getrandom() with own implementation (see HN comment)
cat getrandom.c 

# #include <string.h>
# #include <sys/types.h>
# 
# ssize_t getrandom(void \*buf, size_t buflen, unsigned int flags) {
#   memset(buf, 0, buflen);
#   return buflen;
# }

cc getrandom.c -shared -o getrandom.so

LD_PRELOAD=./getrandom.so python3 -c 'import os; print(os.urandom(8))'
# b'\x00\x00\x00\x00\x00\x00\x00\x00'
  • Tackle other non-deterministic source (pid, time, scheduling)

Packaging

Poetry

# Create the pyproject.toml
poetry new my-project   # Create brand new project
poetry init             # Create from an existing project

# Install dependencies
poetry install

# Run
poetry run pytest
poetry run black

# Enable virtualenv
poetry shell
deactivate
  • To create virtualenv in a local ./.env/ folder:
poetry config virtualenvs.in-project true --local
  • Push the following files to git:
pyproject.toml
poetry.toml
poetry.lock

Tips

Simple HTTP Server

It's very easy to setup an ad-hoc HTTP server with Python. Just open a shell in a folder with some contents to share, and type:

python -m SimpleHTTPServer

More available at http://docs.python.org/2/library/internet.html (see BaseHTTPServer and CGIHTTPServer).

Detect interactive mode

References: [21], [22]

Started with First method Second method Third method Fourth method
import __main__ as main print hasattr(main, '__file__') def in_ipython(): try: __IPYTHON__ except NameError: return False return True import sys print hasattr(sys, 'ps1'): import sys print bool(sys.flags.interactive)
python mymod.py True - - -
python -i mymod.py True - - True
python then import mymod - - True -
ipython mymod.py True True - -
ipython -i mymod.py True True - -
ipython then run mymod.py True True - -
ipython then run -i mymod.py True True - -
ipython then import mymod - True - -
ipython -i then import mymod - True - -

Find duplicates in list

From stackoverflow [23]

import collections

def fastest():                         # 134 us - Fastest
    seen = set()
    seen_add = seen.add                                            # To avoid lookup 'add' ever time an item is inserted
    seen_twice = set( x for x in l if x in seen or seen_add(x) )   # adds all elements it doesn't know yet to seen and all other to seen_twice
    return list( seen_twice )                                      # turn the set into a list (as requested)

def compact():                         # 415 us
    return [x for x, y in collections.Counter(l).items() if y > 1]

def slowest():                         # 19.2 ms
    return list(set([x for x in l if l.count(x) > 1]))

Start post-mortem debugger on exception

From stackoverflow [24]

>>> import pdb
>>> pdb.pm()

Miscellaneous

Detect whether a variable is defined

Note it is bad practice to define a variable conditionally [25]. An interesting use case is to run code and define variable conditionally based on interactive status.

# Using try ... except
try: myvar
except NameError: print "variable 'myvar' IS defined"

# Using vars() / globals()
'myvar' in vars() or 'myvar' in globals()
# ...pedantic...
'myvar' in vars(__builtins__)

Analyse memory usage

Dowser
  • See [26] — seems better suited to find memory leaks, not to analyse usage for memory hungry applications
memory_profiler
sudo pip install -U memory_profiler
sudo pip install psutil
  • Add @profile decorator
@profile
def primes(n): 
    ...
  • Run the profiler
 python -m memory_profiler primes.py

The Pythonic way

Type import this in a Python interpreter, you get this:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Detect Python 2 or Python 3 dependency

For instance, does gdb uses python 2 or 3?

ldd $(which gdb)|grep python
# libpython3.5m.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0 (0x00007f442a960000)

Find character in a string

The fastest and simplest is to use in operator, like

if '.' in name:
    # ...

To detect more characters, we must use a regex [28]:

>>> import re
>>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):
...     return not bool(search(strg))
>>> special_match("az09.")
True
>>> special_match("az09.\n")
False

Note:

  • search is faster than using match.
  • If using match, there is no need to use ^...$ to force a full match.
  • Regex should use raw string r'...'.
  • If using the regex multiple times, compile it once and reuse later!

Detect Python version, location...

From pwndbg [29]:

# Find the Python version
PYVER=$(python -c 'import platform; print(".".join(platform.python_version_tuple()[:2]))')
PYTHON=$(python -c 'import sys; print(sys.executable)')
PYTHON="${PYTHON}${PYVER}"

# Find the Python site-packages that we need to use
SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
# or to get user site
SITE_PACKAGES=$(python -c 'import site; print(site.getusersitepackages())')

Using script above, one can install a module using pip for the given python/site installation.

# Install Python dependencies using pip
sudo ${PYTHON} -m pip install --target ${SITE_PACKAGES} -Ur requirements.txt

Display random distribution with seaborn

seaborn is a powerful python toolkit to visualize statistical data.

Assume a data file like

head -n 5 samples
# 19.2
# 6.6
# 7.9
# 5.5
# 3.6
# ...

To visualize into seaborn:

# First setup seaborn - https://seaborn.pydata.org/tutorial/distributions.html
%matplotlib gtk

import numpy as np
import pandas as pd
from scipy import stats, integrate
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
np.random.seed(sum(map(ord, "distributions")))

# Then load our file - https://stackoverflow.com/questions/36343646/reading-a-text-file-and-converting-string-to-float
file_in = open('../samples','r')
for z in file_in.read().split('\n'):
    if z: y.append(float(z))
file_in.close()

# Then tell seaborn to show the distribution. If 
sns.distplot(y)

# Normally the graph should pop up automatically. If not:
# plt.show()
# sns.plt.show();

Convert bytes to str and vice-versa

Python v2 and v3 have different types of strings.

  • In v2, the type str is a sequence of bytes, while unicode are for Unicode text strings.
  • In v3, the type str are for Unicode text strings, and bytes is a sequence of bytes, also known as bytestring or byte string.
# Python v3
isinstance(s,str)          # True if s is a unicode text string
isinstance('abc',str)      # True
isinstance(b,bytes)        # True if b is a bytestring
isinstance(b'abc',bytes)   # True
s.encode()                 # Convert a text string (str) to bytes
b.decode()                 # Convert a bytestring (bytes) to str

XOR strings together

In Python 2.x [30]:

def sxor(s1,s2):
    return ''.join(chr(ord(a) ^ ord(b)) for a,b in zip(s1,s2))

In Python 3.x:

def bytes_xor(a, b) :
    return bytes(x ^ y for x, y in zip(a, b))

Various conversion

Binary 00110101
# Or use bin to convert an integer into binary literal string ('0b' prefix)
>>> bin(173)
'0b10101101'
# Binary literals are regular integers
>>> 0b101111
47
# Use int(..., 2) to convert a binary string into integer
>>> print int('01010101111',2)
687
>>> print int('11111111',2)
255

Reverse a string

>>> 'hello world'[::-1]
'dlrow olleh'

Reload a module in interactive python

There is reload command:

  • Python3 >= 3.4: importlib.reload(some_module)
  • Python3 < 3.4: imp.reload(some_module)
  • Python2: reload(some_module) built-in

For instance

import importlib
import some_module

# hack hack...

importlib.reload(some_module)           # Reload module

However

  • reload does not reload dependencies.
  • It does not work when module is loaded like from some_module import *.

Usually it's simpler to do:

python3 -c 'from some_module import *'
# >>> hack hack...
# >>> <CTRL-D>
python3 -c 'from some_module import *'
# >>> ....

Benchmark an algorithm

From the shell, using the timeit module:

python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
# 10000 loops, best of 3: 143 usec per loop
python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
# 1000 loops, best of 3: 969 usec per loop
python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
# 1000 loops, best of 3: 1.1 msec per loop

Or directly in Python, using timeit.Timer:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

Flatten a list of lists (of lists...)

from SO:

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]

# Fastest - using iconcat
functools.reduce(operator.iconcat, a, [])

# Fastest - using itertools
list(itertools.chain(*list2d))
list(itertools.chain.from_iterable(list2d))  # Since Python 2.6, no unpacking needed

# Using list comprehension - very fast
flat_list = [item for sublist in l for item in sublist]

# Using sum and monoid - fastest for small list, very compact
sum(l, [])

# Using lambda, slowest
reduce(lambda x,y: x+y,l)

See also this blogspot, for a non-recursive solution that can process even deeply nested lists.

Detect last element in a for loop

From SO:

def lookahead(iterable):
    """Pass through all values from the given iterable, augmented by the
    information if there are more values to come after the current one
    (False), or if it is the last value (True).
    """
    # Get an iterator and pull the first value.
    it = iter(iterable)
    last = next(it)
    # Run the iterator to exhaustion (starting from the second value).
    for val in it:
        # Report the *previous* value (more to come).
        yield last, False
        last = val
    # Report the last value.
    yield last, True

for i, has_more in lookahead(range(3)):
    print(i, has_more)

Swap two variables

The pythonic way [31]:

a,b = b,a

Print to stderr

# For Python 2:
# from __future__ import print_function
# import sys

def eprint(*args, **kwargs):
    print(*args, file=sys.stderr, **kwargs)

Note that stderr is not buffered, so no need to flush [32].

Get product of all elements in a list

import numpy
L={10,20,30}
int(numpy.prod(L))      # 6000

Check that a variable is an integer

isinstance(1, int)      # True
isinstance(1.1, int)    # False

Get path to current script

From StackOverflow:

Python 3
# Directory of the script being run
import pathlib
pathlib.Path(__file__).parent.absolute()

# Current working directory
import pathlib
pathlib.Path().absolute()
Python 2 and 3
# Directory of the script being run
import os
os.path.dirname(os.path.abspath(__file__))
os.path.dirname(__file__)                      # BAD!!! is empty if __file__ has no dir component

# Current working directory
import os
os.path.abspath(os.getcwd())

Otherwise, a convoluted solution using inspect, when we cannot use __file__:

import os
import inspect

def dummy_func():
    pass

# We can not use __file__ to get the local file path, use another method that uses 'inspect' module
filepath = os.path.dirname(os.path.abspath(inspect.getsourcefile(dummy_func)))
filepath = filepath.replace( '\\', '/')

View methods / attributes of any object

# https://stackoverflow.com/questions/34439/finding-what-methods-a-python-object-has

dir(object)
help(object)

Reload modules automatically in iPython

See iPython.

Prettyprint JSON file

Quick [33]:

print json.dumps(your_json_string, indent=4)

Command line:

echo '{"foo": "bar", "baz": [1, 2, 3]}' | python -m json.tool

Install packages from a list in file

Usually the file is named requirements.txt:

google-api-python-client==1.7.9
google-auth-httplib2==0.0.3
google-auth-oauthlib==0.4.0

To install:

virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt

Convert CSV file to array

Use module csv [34]:

import csv

with open("in.csv") as file:
    reader = csv.reader(file)
    array = list(reader)

print(array)

with open(f'out.csv',mode='w') as file:
    writer = csv.writer(file)
    writer.writerows(array)

Check that a string is a valid hexadecimal string

From SO:

# Using all() string.hexdigits
import string
s = 'deadbeef'
all(c in string.hexdigits for c in s)            # True -- note: no need for [ ... ]
H = set(string.hexdigits)

# Faster - using a set
H = set(string.hexdigits)
all(c in H for c in s)                           # True

# Using int(..., 16) - but this accepts '0xbeef' and '-beef'
int('beef',16)
int('0xBEEF',16)
int('-beef',16)
int('x',16)       # ValueError

Benchmarks

Deep copying list of sets

Fastest:

  1. manual copy
  2. pickle.
  3. marshal.
  4. deepcopy
# Time taken for deep copying: 1.0053 seconds
# Time taken for pickle      : 0.0947 seconds
# Time taken for marshal     : 0.4481 seconds
# Time taken for copy        : 0.0187 seconds

import timeit
import copy
import random
import pickle
import marshal

# Function to generate a list of sets
def generate_list_of_sets(num_sets, set_size):
    return [set(random.sample(range(1, 1001), set_size)) for _ in range(num_sets)]

def benchmark_deep_copy(l):
    return copy.deepcopy(l)

def benchmark_marshal(l):
    return marshal.loads(marshal.dumps(l))

def benchmark_pickle(l):
    return pickle.loads(pickle.dumps(l))

def benchmark_copy(l):
    return [x.copy() for x in l]

# Define the number of sets and the set size
num_sets = 200
set_size = 10

# Generate the list of sets
original_list = generate_list_of_sets(num_sets, set_size)

# Run the benchmark and print the time taken
time_taken = timeit.timeit(lambda: benchmark_deep_copy(original_list), number=1000)
print("Time taken for deep copying:", time_taken, "seconds")
time_taken = timeit.timeit(lambda: benchmark_marshal(original_list), number=1000)
print("Time taken for marshal     :", time_taken, "seconds")
time_taken = timeit.timeit(lambda: benchmark_pickle(original_list), number=1000)
print("Time taken for pickle      :", time_taken, "seconds")
time_taken = timeit.timeit(lambda: benchmark_copy(original_list), number=1000)
print("Time taken for copy        :", time_taken, "seconds")

Modifying lists

We want to modify some elements in an array, what is the fast method?

  • FASTEST: for loops.
  • SLOWEST: list comprehension (because of new list creation)

Although list comprehension are quite fast, here it is slow because it involves creating a new list for each row.

# Benchmark from ChatGPT 4o
# Result:
#     Method 2: 0.053109 seconds - For Loop
#     Method 6: 0.056372 seconds - List Slicing and Assignment
#     Method 5: 0.067764 seconds - Custom Function
#     Method 3: 0.080177 seconds - Numpy Arrays
#     Method 1: 0.128225 seconds - List Comprehension
#     Method 4: 0.132993 seconds - Map and lambda
import timeit
import numpy as np

# Method 2: Using a For Loop
def method_2():
    data = [[0] * 34 for _ in range(32)]
    for row in data:
        row[-2:] = [255, 255]
    return data

# Method 6: Using List Slicing and Assignment
def method_6():
    data = [[0] * 34 for _ in range(32)]
    for i in range(len(data)):
        data[i][-2:] = [255, 255]
    return data

# Method 5: Using a Custom Function
def method_5():
    data = [[0] * 34 for _ in range(32)]
    def replace_last_two_bytes(row):
        row[-2:] = [255, 255]
        return row
    data = [replace_last_two_bytes(row) for row in data]
    return data

# Method 3: Using Numpy Arrays
def method_3():
    data = np.zeros((32, 34), dtype=int)
    data[:, -2:] = 255
    return data.tolist()

# Method 1: Using List Comprehension
def method_1():
    data = [[0] * 34 for _ in range(32)]
    data = [row[:-2] + [255, 255] for row in data]
    return data

# Method 4: Using Map and Lambda
def method_4():
    data = [[0] * 34 for _ in range(32)]
    data = list(map(lambda row: row[:-2] + [255, 255], data))
    return data

# Benchmarking each method
methods = [method_1, method_2, method_3, method_4, method_5, method_6]
method_names = ["Method 1", "Method 2", "Method 3", "Method 4", "Method 5", "Method 6"]

for method, name in zip(methods, method_names):
    time_taken = timeit.timeit(method, number=10000)
    print(f"{name}: {time_taken:.6f} seconds")

Comparing lists

  • Fastest: for loop
  • Slowest: list comprehension
# List Comprehension and all() method: 0.061402 seconds
# For loop method: 0.036475 seconds

import timeit

class Signal:
    def __init__(self, Q):
        self.Q = Q

# Create a list of Signal objects
signals = [Signal(True), Signal(False), Signal(True), Signal(True), Signal(False),
           Signal(True), Signal(False), Signal(True), Signal(False), Signal(True)]

# Create a reference boolean vector
reference_vector = [True, False, True, True, False, True, False, True, False, True]

# Method 1: Using list comprehension and all()
def compare_signals_list_comprehension(signals, reference_vector):
    # return all(signal.Q == ref for signal, ref in zip(signals, reference_vector))
    return all(signals[i].Q == reference_vector[i] for i in range(10))

# Method 2: Using a for loop
def compare_signals_for_loop(signals, reference_vector):
    for signal, ref in zip(signals, reference_vector):
        if signal.Q != ref:
            return False
    return True

# Benchmarking
setup_code = '''
from __main__ import Signal, signals, reference_vector, compare_signals_list_comprehension, compare_signals_for_loop
'''
list_comprehension_code = '''
compare_signals_list_comprehension(signals, reference_vector)
'''
for_loop_code = '''
compare_signals_for_loop(signals, reference_vector)
'''

# Time the methods
list_comprehension_time = timeit.timeit(stmt=list_comprehension_code, setup=setup_code, number=100000)
for_loop_time = timeit.timeit(stmt=for_loop_code, setup=setup_code, number=100000)

print(f"List Comprehension and all() method: {list_comprehension_time:.6f} seconds")
print(f"For loop method: {for_loop_time:.6f} seconds")

Oneliners

Read a list of integers
arr=[int(line) for line in open('input.txt')]
Count how many times sums over moving 3-windows are increasing
sum([arr[i] > arr[i-3] for i in range(3,len(arr))])

Do's and don't's

foo = 'abcdef'
l = list(foo)                     # DO
foo = 'abcdef'
l = [c for c in foo]              # don't
foo = list(...)
g = map(blah,foo]                 # DO
foo = list(...)
g = [blah(i) for i in foo]        # don't
A = [[0]*5 for _ in range(5)]     # DO
A = [[0]*5]*5                     # don't

Traps

Frequent mistakes. Beware the snake can bite you!

Confuse a method and a property in a test

if A.isdummy():            # This will fail isdummy is a property
if A.isdummy:              # Always True if isdummy is a method

Note that property should only be used to extend the behaviour of a class variable. Properties are designed to make it safe to publish variables in class interface, and get rid of useless mutator/accessor (see Python in a Nutshell, Why properties are important). Don't use property as replacement of a method when designing a new class.

Stick to a convention. Like always define methods like isxyyz() or hasabc() as methods. Note that defining them as property would raise an exception if used as a function, and hence might be safer.

Mix 0 with None in a sequence

Testing whether an element is defined is more difficult.
a = [0,None,None,None]
bool(a[0])           # --> False
bool(a[1])           # --> False !!! How can we tell them apart?
a[1] is None         # --> True      This works

Mixing property and normal getter

SOLUTION: prefix all getter method with get, like getvalue()
b = a.prop           # Using a property, OR
b = a.getprop()      # Using a getter

Forget that, in a python function, arguments are always passed by value

def f(x, y):
    x = 23
    y.append(42)
a = 77
b = [99]
f(a, b)
print a, b                 # prints: 77 [99, 42]

To reassing a list in a function, use a[:] construct, like:

def f(a):
   a[:]=a[::-1]             # This will NOT create a new list, but reassign elements in the original list

Use bytes, not string of characters

Characters can be unicode and take more than one byte.

b'abc'
bytes('abc')

Mixing string and bytestring (v3)

buf = b'abc\n'
if buf.find(b'\n'):        # MUST use BYTESTRING here
    # ....
str = 'abc\n'
if str.find('\n'):         #  MUST use STRING here
    # ....

Forget self. when using class members

class MyClass(object):
    buf = b''

    def UpdateBuf(self,new_buf):
        buf = new_buf                 # WRONG!
        self.buf = new_buf            # CORRECT!

Relying on Queue.qsize / Queue.empty

Multiprocessing.queue provides two functions to check the queue state empty() and qsize() that are just plain UNRELIABLE. NEVER use them in the code to check the state of the queue:

  • empty() may return True even though:
  • the queue is NOT empty
  • qsize() is NOT 0 at the same time.
  • other processes continue to get() element from the queue.
  • empty() actually checks the current state of the queue but there is separate thread that is feeding the queue.

See:

Other mechanisms must be used to synchronized processes:

  • Sentinel on the input queue (to notify the consumer processes that the main processes stop feeding new inputs).
  • Sentinel on the output queue (to notify the main processes the the consumer stop feeding new results).

Examples

Read a file line by line

Sources: [35]

Shortest version with autoclose and universal line ending (mode "u"):

for line in open("path/to/file.txt","U"): # U: universal line ending
    print(line.strip())                   # or strip('\r')

Slightly longer version with with:

with open("path/to/file.txt") as f:       # assume read-text mode "rt"
    for line in f:
        print(line.strip())               # or strip('\r')

Counting line number:

with open('path/to/file.txt') as f:
    for cnt, line in enumerate(f):
        print(f"Line {cnt}: {line.strip()}")

The long old way:

try:
    f = open("path/to/file.txt")
    line = f.readline()
    cnt = 1
    while line:
        print(f"Line {cnt}: {line.strip()}")
        line = f.readline()
        cnt += 1
finally:
    f.close()

Read a list of integers from a file

# https://stackoverflow.com/questions/6583573/how-to-read-numbers-from-file-in-python

# Using for loop
a=[]
with open('input.txt') as f:
    for line in f:
        a.append(int(line))

# Using list comprehension
a=[int(line) for line in open('input.txt')]

Simple TCP server

import socket, socketserver
import sys 
import itertools

SERV_ADDR="0.0.0.0"
SERV_ADDR=2222

class Handler(socketserver.BaseRequestHandler):
    messages = b""
    def handle(self):
        token = uint8(0)
        client = self.request
        client.setblocking(True)

        try:
            while True:
                buf = client.recv(1)
                # buf = client.recv(len)
                # client.send(buf)
                pass
        except socket.error as msg:
            pass

        client.close()
        return

port = SERV_PORT
if len(sys.argv) > 1:
    port = int(sys.argv[1])
server = socketserver.TCPServer((SERV_ADDR, port), Handler)
server.serve_forever()

Decompress a ZIP

import zlib
import itertools

DATA_FILE_PATH = ...

with open(DATA_FILE_PATH, 'rb') as f:
  content_bytes = f.read()

# Try all offsets to see if we find a ZIP file
for offset in itertools.count():
  print(f'Trying {offset}...')
  try:
    content_decompressed = zlib.decompress(content_bytes[offset:])
    print('Found ZIP!')
    break
  except zlib.error:  # Current content_bytes is not a zipfile -> skip a byte.
    pass

Create new packages / modules

Links

Recommend using flit for packaging, tox for linters and tests, etc. Very nice writeups. See more on HN.
Recommended on HN.
  • pip can install packages from GitHub:
pip install git+https://myg.it/repo.git

Standard method

  1. Create project template as in reference above
  2. Add the files in src/.
  3. Install build
python3 -m pip install --upgrade build
  1. Build the package
python3 -m build

Libraries

Big numbers
  • gmpy based on GMP
  • libnum a lighter bignum library, but compatible with pypy.

Unicode

Set source file encoding

Add any of these lines [36]:

# -*- coding: utf-8 -*-
# vim: set fileencoding=utf-8 :
Write the BOM

See [37]

import codecs

file = codecs.open("lol", "w", "utf-8")
file.write(u'\ufeff')                          # or use unicode name: u'\N{ZERO WIDTH NO-BREAK SPACE}'
file.close()

# Using https://docs.python.org/2/library/codecs.html#module-encodings.utf_8_sig
with codecs.open("test_output", "w", "utf-8-sig") as temp:
    temp.write("hi mom\n")
Handling unicode

Some recommends to always process unicode internally, and decode on input and encode on output [38]:

line = line.decode('utf-8')
# ...treat line as unicode...
print line.encode('utf-8')

But this is error prone. So another solution proposed is to redefine sys.stdout:

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)

An hackish way (not recommended):

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"

Python 2 to Python 3

Use python v3 print in v2
from __future__ import print_function

This way print() will not print () in v2.

Coding style

From PEP 8, Coding Style.

  • Use pycodestyle to check code conformance:
pip install pycodestyle
pycodestyle optparse.py
  • Use autopep8 to format existing code:
pip install autopep8
autopep8 --in-place optparse.py
  • Use black (sudo apt install black):
augroup equalprg
   autocmd FileType python      setlocal equalprg=/usr/bin/black\ -l\ 110\ -q\ -
augroup END
Use # fmt: off and # fmt: on to prevent black to format some part of the code (e.g. long list)
  • Use yapf (sudo apt install yapf3) (from Google, based on clang-format):
augroup equalprg
   autocmd FileType python      setlocal equalprg=/usr/bin/yapf3\ --style\ .style.yapf
augroup END
Create a file .style.yapf:
[style]
based_on_style = pep8
column_limit = 110
Naming conventions
lower_case_variable = None

def lower_case_func():
    # ...

class ClassNameAreCapsWord:
    # ...
Some good/bad practices
# BAD - superfluous 'pass'
class InvalidAttribute(AttributeError):
    """Used to indicate attributes that could never be valid"""
    pass
# GOOD
class InvalidAttribute(AttributeError):
    """Used to indicate attributes that could never be valid"""


# BAD
f = open('file.txt')
a = f.read()
print a
f.close()
# GOOD
with open('file.txt') as f:
    for line in f:
        print line

# BAD
my_very_big_string = """For a long time I used to go to bed early. Sometimes, \
    when I had put out my candle, my eyes would close so quickly that I had not even \
    time to say “I’m going to sleep.”"""

from some.deep.module.inside.a.module import a_nice_function, another_nice_function, \
    yet_another_nice_function
# GOOD
my_very_big_string = (
    "For a long time I used to go to bed early. Sometimes, "
    "when I had put out my candle, my eyes would close so quickly "
    "that I had not even time to say “I’m going to sleep.”"
)

from some.deep.module.inside.a.module import (
    a_nice_function, another_nice_function, yet_another_nice_function)

Troubleshooting

Troubleshooting a missing library

  • Use python -v -c "import mylibrary" to troubleshoot a module.
  • Look at the log for the loaded libraries.
  • Some libraries are statically linked in python and might be missing. Use ldd to see the linked libraries, and report missing ones.
ldd /path/to/your/_hashlib.so
# linux-gate.so.1 =>  (0xf77c3000)
# libssl.so.6 => not found
# libcrypto.so.6 => not found
# libpython2.7.so.1.0 => not found
# libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf776a000)
# libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf75b3000)
# /lib/ld-linux.so.2 (0x5659b000)