Python
References
Books
- O'Reilly's Python in a Nutshell
- The Python language reference
Links
- Python 3
- Including Language Reference.
- Python 2.7
- Python 2.7.6 docs
- ==> The Python 2 Standard Library <==
- Python Quick Reference 2.7 — Extremelly complete
- Other versions of Python are available [1]
- Variants and distributions
- ipython
- Jupyter — The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
- Anaconda
- PEP
- Coding style
- References:
- Miscellaneous
- Nice example of generating / testing regex in Python (with nice / small test framework)
- Some tips on debugging in Python, mostly focussing on using a logger (instead of printf). See also HN page for many interesting comments.
- Libraries
- seaborn is a powerful python toolkit to visualize statistical data.
- plumbum, a library to mimic bash-like commands, to ease rewrite bash scripts in Python, including pipes.
- tqdm - progress bar, a library to make easily progress bar out of loops, iterable
- pwntools, a CTF framework and exploit development library.
- In particular, check pwntools tubes, a library for talking to sockets, processes, ssh connections. Useful for automation (see this CTF writeup for an example).
- Profiler
# As simple as
py-spy --pid 12345 # Display activity of given pid in real-time!
Tutorials
- Wikibook on Python I/O
- Packaging projects — How to create package / module that can be installed with
pip
.
Shell
In a command shell, use pydoc
to get help:
pydoc repr # Get help on 'repr' command
Same can be achieved in python interpreter:
help() # Interactive help
help('repr') # Same as typing 'repr' in interactive help
help(repr) # Help on repr builtin
Testing
- pytest: helps you write better programs (see this tutorial).
- Hypothesis - Hypothesis is an advanced testing library for Python.
- Quickcheck a testing library.
Examples
- ProgramCreek, search into existing code for example of use
Install
Virtual Environments
A Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them.
- References
- Guide to Python — Virtual Environments
- Is it possible to install another version of Python to Virtualenv? (stackoverflow.com)
- Virtualenv
# Install
sudo apt install virtualenv python3-virtualenv
# Create a new environment
virtualenv -p python3 venv # To use python3. venv is recommended default to add to .gitignore, etc.
source venv/bin/activate # Enter environment. From now on, packages will only be installed locally
# Do stuff - pip3 install ...
deactivate # Exit environment
Update Python
❗ | It is not recommended to update the system Python |
Some links:
- https://unix.stackexchange.com/questions/332641/how-to-install-python-3-6
- https://github.com/chriskuehl/python3.6-debian-stretch
Install pip and setuptools
To install setuptools, the easiest is to use pip, which comes pre-installed in later versions of Python:
pip install -U setuptools
To bootstrap the setuptools on an naked installation:
cd /path/to/your/python
wget https://bootstrap.pypa.io/ez_setup.py -O - | ./python
wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo ./python # System-wide
wget https://bootstrap.pypa.io/ez_setup.py -O - | ./python - --user # User-local path
See Install pip setuptools and wheels for more information.
Install module online
Python comes with a wide range of libraries, called modules. There are several ways to install these modules.
- Using the distribution
- For instance, in Debian:
apt-cache search --names-only python- # View available modules
sudo apt-get install python-pyscard # Install the pyscard module
- Using pip
pip is the new way to install modules. It uses the wheel format.
sudo pip install Pygments
This is equivalent to:
sudo python -m pip install Pygments
This last form can be used to explicit which python runtime must be used:
sudo /path/to/your/python -m pip install Pygments
Use --user
to install for user only:
pip install --user Pygments
Use --target SITE
to specify manually the target SITE:
pip install --target SITE Pygments
See tip below on how to obtain the default site.
- Using easy_install
easy_install is the old way to install modules. It uses the egg format.
sudo easy_install Pygments
- Using the source
Download and uppack the package
wget http://sourceforge.net/projects/pyscard/files/pyscard/pyscard%201.6.12/pyscard-1.6.12.tar.gz#md5=908d2530972ea91eb4bb66987e0e1e98
tar -xvzf pyscard-1.6.12.tar.gz
cd pyscard-1.6.12
To install globally (in /usr/local/lib/python2.7/dist-packages or similar):
sudo ./setup.py install
To install locally (in ~/.local/lib/python2.7/site-packages, use --user
:
sudo ./setup.py install --user
One can also use pip to install from source:
sudo pip install . # Global install
pip install --user . # Local install
Install modules offline
To install a Python module on a machine that has no connection to Internet [3]:
- On a machine with internet connection
# For instance, to install package neovim
mkdir tmp && cd tmp
pip download neovim
- On the offline machine, which has access to tmp/:
# For instance, to install package neovim
cd tmp
pip install --no-index --find-links ./ neovim
If you don't have pip on the offline machine, and you can't use an OS package, install directly from source:
- Download pip archive from https://pypi.org/project/pip/#files
- Copy to offline machine, then
python setup.py install
Install python2 pip on Debian Bullseye
- Download python-pip and python-pip-whl from Buster.
- Install both packages, this will uninstall python-pip-whl 20.1 and python3-pip
sudo apt install ./python-pip_18.1-5_all.deb ./python-pip-whl_18.1-5_all.deb
- Upgrade pip:
sudo python2 -m pip install -U pip
- Install back Bullseye python-pip3
sudo apt install python-pip-whl python3-pip
- Confirm some packages that were installed with python2 pip:
sudo apt install libpython-all-dev python-all python-all-dev python-pkg-resources python-setuptools
Interactive mode
Python can be run interactively, which is a very powerful way to develop new applications.
Python
To import an existing module, use import
as usual:
import mymod # Import module in current session
from mymod import * # Idem, but remove mymod. prefix to symbols
iPython / Jupyter
To import an existing module, use import
as above or command run
:
import mymod # import file 'mymod.py'
run mymod
Reloading modules automatically
When working on a module, iPython can reload that module automatically [4]:
%load_ext autoreload
%autoreload 2 # Module will be reloaded at each carriage return
import mymod
# ...or...
load_ext autoreload
autoreload 2 # Module will be reloaded at each carriage return
import mymod
%autoreload? # for help
Python variants
iPy
Use iPy (ipython
) to get an interactive shell with auto-completion, instant help...
%magic # Get help on %magic commands (%run,...)
?run # Get help on %run magic
%run script.py # Run given script
%run -i script.py # ... with inspect mode on
%run -i -e script.py # ... ... and ignore sys.exit() call
!cmd # Run shell command 'cmd', for instance ...
!ls # ... List file in current directory
Pypy
PyPy is a fast, compliant alternative implementation of the Python language, which usually runs python programs faster thanks to its Just-in-Time compiler.
- Install
- On Lucid 64-bit, the easiest is to download the dedicated tarball:
wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.2.1-linux64.tar.bz2
tar -cvjf pypy-2.2.1-linux64.tar.bz2
- Install
virtualenv
, then installpypy
as virtual environmentmy-pypy-env
sudo apt-get install python-virtualenv
virtualenv -p pypy-2.2.1-linux64/bin/pypy my-pypy-env
- Modules must be installed separatedly for this virtual environment. For instance
./my-pypy-env/bin/pip install libnum
- Run
- Run python programs using
python
orpypy
./my-pypy-env/bin/pypy
Python 3 Reference
Source: Python reference, w3schools python tutorial and O'Reilly Python in a Nutshell
Keywords
False await else import pass
None break except in raise
True class finally is return
and continue for lambda try
as def from nonlocal while
assert del global not with
async elif if or yield
In addition, the following have special meaning:
_*
. Also_
is last evaluation result in interactive mode.__*__
system-defined names.__*
class-private names (rewritten as mangled form by the compiler).
Literals
See Literals in Python reference and Python in a Nutshell.
42 # Integer literal
3.14 # Floating-point literal
3.14e-10 # Floating-point literal
1.0j # Imaginary literal
[42, 3.14, 'hello'] # List
[] # Empty list
100, 200, 300 # Tuple
() # Empty tuple
{'x':42, 'y':3.14} # Dictionary
{} # Empty dictionary
{1, 2, 4, 8, 'string'} # Set
# There is no literal to denote an empty set; use set() instead
- string literals (
str
objects)
"hello"
'hello'
"""Good
night""" # Triple-quoted string literal
r"\b\x" # raw -- ignore escape sequences
R"\b\x" # raw -- ignore escape sequences
f"name is {name!r}" # formatted string literals
- multiline string literals (
str
objects)
"multiline\nstring" # simple quote with embedded \n
"""multi-line
string""" # triple quote, preserve newlines, but not indent friendly
("multi-line\n"
"string") # Using bracket, recommended by PEP. indent friendly.
"multi-line\n" \
"string" # Using backslash. indent friendly.
- bytes literals (
bytes
objects)
b"abc\x81\x82"
B"abc\x81\x82"
rb"abc\x81\x82" # raw -- ignore escape sequences
RB"abc\x81\x82" # raw -- ignore escape sequences
- formatted string literals (3.6)
name="Fred"
f'His name is {name!r}' # !r conversion, applies repr()
f'His name is {repr(name)}' # equivalent
# !s does str(), !a does ascii()
f'length is {len(name)}' # expression
width=8; prec=3;
f'{3.14159:{width}.{prec}}' # integer formatting
n = 1024
f'{n:x}' # '400'
f'{n:4x}' # ' 400'
f'{n:04x}' # '0400'
f'{n:#x}' # '0x400'
f'{n:#6x}' # ' 0x400'
f'{n:#06x}' # '0x0400'
today = datetime(year=2017, month=1, day=27)
f'{today:%B %d, %Y}' # date format specifier
f'{n} vs {{n}} vs {{{n}}}' # '1024 vs {n} vs {1024}'
- Raw string literals
r'^foo\.bar$' # Useful for regex mainly (fix invalid escape sequence)
bar="BAR"
fr'^foo\.{bar}$' # raw AND formatted string
Operators
+ - * ** / // % @
<< >> & | ^ ~ :=
< > <= >= == !=
Operators and their evaluation order, from highest to lowest:
, [...] {...} `...` # Tuple, list & dict. creation; string conv.
s[i] s[i:j] s.attr f(...) # indexing & slicing; attributes, function calls
+x, -x, ~x # Unary operators
x**y # Power
x*y x/y x//y x%y # mult, division, floor division (integer division), modulo
x+y x-y # addition, substraction
x<<y x>>y # Bit shifting
x&y # Bitwise "and"; also intersection of sets
x^y # Bitwise exclusive or
x|y # Bitwise "or"; also union of sets
x<y x<=y x>y x>=y x==y x!=y x<>y # Comparison
x is y x is not y # identity
x in s x not in s # membership
not x # boolean negation
x and y # boolean and
x or y # boolean or
lambda args: expr # anonymous function
- Arithmetic operators
1//2 # Floor division (PEP-238)
- ternary operator
x_sign = 'positive' if (x>=0) else 'negative'
- Notes
- Use
is
ornot
for testingNone
if (p.poll() is None): # Use 'is' for testing None
print "None"
if not p.poll(): # ... or 'not'
print "None"
Delimiters
( ) [ ] { }
, : . ; @ = ->
+= -= *= /= //= %= @=
&= |= ^= >>= <<= **=
Characers with special meanings as part of other tokens:
' " # \
Data types
Boolean
True # constant for true
False # constant for false
bool(x) # To convert to bool built-in type
Avoid unnecessary call to bool(x)
.
if x: # GOOD
if bool(x): # BAD
if x is True: # BAD
if x == True: # BAD
if bool(x) == True: # BAD
A valid use:
def count_trues(seq): return sum(bool(x) for x in seq) # Ensure each item is counted either as 0 or 1
Strings
Strings in Python are immutable objects. There are many differences between Python2 and Python3.
Python 2 | Python 3 |
---|---|
There are two type of strings:
type('foo')
# <type 'str'>
type(u'foo')
# <type 'unicode'>
|
There are two type of strings:
type('foo')
# <class 'str'>
type(b'foo')
# <class 'bytes'>
So Python3's |
|
|
b'hello' == 'hello'.encode() # str to bytes
'hello' == b'hello'.decode() # bytes to str
"def" in "abcdefgh" # substring
s.upper() # Change 'uppercase' to 'UPPERCASE'
', '.join(set_3) # Join a sequence
map(ord, hex_data) # [0xDE, 0xAD, 0xBE, 0xEF]
# Strings function
s="Hello, World"
s.endswith('World') # True
s.startswith('Hello') # True
Bitstring
See Bitstring module.
List
Nice tutorial: http://effbot.org/zone/python-list.htm
a=[0,3,6]
print a[1] # 3
a=[0] * 1000 # Array with 1000 elements
len(a) # Number of elements
b=a # This only copy the REFERENCE
b[0]+=1 # ... this also changes a[0]
b=a[:] # This makes a NEW COPY
b=a.copy() # PYTHON >3.3
import copy
a=[[1,2],[3,4]]
b=a.deepcopy() # Deep copy - MUST for dimension >= 2
a[:]=a[::-1] # Reassign element in the list (here in reverse order)
a=a[::-1] # Idem, but create a new object
a=[];
a.append(12); # Create object before appending
a[len(a):] = [13]; # Same as appending
a=[1,2,3]
a.extend([4,5,6]) # [1,2,3,4,5,6] -- extend with an iterable
l=[1,2,3]
l.pop() # 3 - pop last element
l.pop(0) # 1 - pop first element - consider deque and popleft() for better perf
del l[0] # Delete first element
list("abc") # ['a', 'b', 'c']
line = '1234567890'
n = 2
[line[i:i+n] for i in range(0, len(line), n)] # ['12', '34', '56', '78', '90']
def shiftRow(word, n):
return word[n:]+word[0:n]
state[i::4] = shiftRow(state[i::4],i) # Apply shiftRow on 4 bytes distant of 4 each
alist = map(lambda b: sbox[b],alist)
state[:] = [ a ^ b for a,b in zip(state,roundKey) ] # Ex-oring 2 lists of integers
# Multi-dimensional list
matrix = [[0 for _ in range(5)] for _ in range(5)] # Initialize bi-dimensional array
matrix = [[0]*5 for _ in range(5)] # faster way
# matrix = 5*[5*[0]] # WRONG - 5 times copy of same
# Compare - simply use ==
[1,2,3] == [1,2,3] # True
[1,2,3] == [1,2,3,4] # False
[1,2,3] == ['a','b'] # False
# ... to remove order and duplicates, use set()
set([1,2,3]) == set([2,1,3,3]) # True
# Sort
a.sort()
# Sum
a=[8,19,3,17,12,2]
sum(x <= 10 for x in a)
sum(1 for x in a if x <= 10) # List comprehension
def count(iterable):
return sum(1 for _ in iterable)
sub10Count = count(x for x in a if x <= 10) # Cheap (doesn't create useless list) and readable
# Adding (https://stackoverflow.com/questions/18713321/element-wise-addition-of-2-lists)
[sum(x) for x in zip(list1, list2)] # 177ms
from itertools import izip; [sum(x) for x in izip(list1, list2)] # 139ms
[a + b for a, b in zip(list1, list2)] # 112ms, most pythonic
from itertools import izip; [a + b for a, b in izip(list1, list2)] # 71ms, pythonic
from operator import add; map(add, list1, list2) # 44ms
import numpy as np
vector1 = np.array([1, 2, 3])
vector2 = np.array([4, 5, 6])
sum_vector = vector1 + vector2 # 25x faster
# Find *first* matching item
["foo", "bar", "baz"].index("bar") # 1 !!! Throws ValueError if item not found
try:
return L.index(obj) # Fastest method - note: this could return -1
except ValueError:
# ...
if obj in L:
return L.index(obj) # Faster if obj not found
# Find all items
[i for i, e in enumerate([1, 2, 1]) if e == 1] # [0, 2]
g = (i for i, e in enumerate([1, 2, 1]) if e == 1)
next(g) # 0
next(g) # 2
# Sort based on object attribute
ut.sort(key=lambda x: x.count, reverse=True) # To sort the list in place...
newlist = sorted(ut, key=lambda x: x.count, reverse=True) # To return a new list, use the sorted() built-in function...
- (From stackoverflow [5])
for c in list(sha256.digest()):
key.append(ord(c))
Dictionary
D = { 'x':42, 'y':3.14, 'z':7 }
D['x'] # 42
del D[k] # Removes from dictionary D the item whose key is k
#Spare matrix
Matrix = {}
Matrix[1,2] = 15 # This works because 1,2 -- a tuple -- is used as a key
for key in d: # Loop over keys in dictionary d
for key, value in d.iteritems(): # Loop over keys and values in dictionary d
Set
S = set() # Empty set
S = {1,2,3} # Set with some values
S.add(4) # Add an element
S.update([2,5]) # Add a list of element
S.update({2,5}) # ... any iterable
(elem,) = S # get only elem -- fail if S not singleton
elem = next(iter(S)) # get any elem -- work if S singleton or not
Control flow statements
if
if x < 0: print('x is negative')
elif x % 2: print('x is positive and odd')
else: print('x is even and non-negative')
# Better style (PEP 8):
if x < 0:
print('x is negative')
elif x % 2:
print('x is positive and odd')
else:
print('x is even and non-negative')
while
count = 0
while x > 0:
x //= 2 # truncating division
count += 1
print('The approximate log2 is', count)
for
for letter in 'ciao':
print('give me a', letter, '...')
# target can be a tuple
for key, value in d.items():
if key and value: # print only true keys and values
print(key, value)
# ... or something else (LHS expression)
prototype = [1, 'placemarker', 3]
for prototype[1] in 'xyz': print(prototype)
# prints [1, 'x', 3], then [1, 'y', 3], then [1, 'z', 3]
# Using range():
for i in range(10):
statement(s)
for i in range(5,10):
statement(s)
#Using list comprehension:
result1 = [x+1 for x in some_sequence]
#... same as:
result2 = []
for x in some_sequence:
result2.append(x+1)
# Comprehension list may have 'if', or nested for
result3 = [x+1 for x in some_sequence if x>23]
result5 = [x for sublist in listoflists for x in sublist]
# Dict comprehension
d = {n:n//2 for n in range(5)}
print(d) # prints: {0:0, 1:0, 2:1, 3:1, 4:2] or other order
break
while True: # this loop can never terminate naturally
x = get_next()
y = preprocess(x)
if not keep_looping(x, y): break
process(x, y)
continue
for x in some_container:
if not seems_ok(x): continue
for-else and while-else
for x in some_container:
if is_ok(x): break # item x is satisfactory, terminate loop
else:
print('Beware: no satisfactory item was found in container')
x = None
pass
if condition1(x):
process1(x)
elif x>23 or condition2(x) and x<5:
pass # nothing to be done in this case
elif condition3(x):
process3(x)
else:
process_default(x)
try-except-finally-else / raise
try:
print(x)
except:
print("An exception occured")
|
try:
print(x)
except NameError: # Can give many except
print("Variable x is not defined")
except:
print("Something else went wrong")
|
try:
print(x)
except:
print("Something went wrong")
else: # exec'ed if no error and NO BREAK
print("try block finished")
|
try:
print(x)
except:
print("Something went wrong")
finally: # exec'ed no matter what
print("the 'try except is finished'")
|
raise Exception("Sorry, that was wrong")
|
try:
i = int(s.strip())
except OSError as err:
print("OS error: {0}".format(err))
except ValueError:
print("Could not convert data to an integer.")
except:
print("Unexpected error:", sys.exc_info()[0])
raise
|
with
The with statement is the Python embodiment of the well-known C++ idiom “resource acquisition is initialization" (RAII)
with expression [as varname]:
statement(s)
Functions
a = 'global'
def afunction():
global a # Use 'global' to change scope of a variable
a = 'still using global'
b = 'local'
Docstrings
def toh(cls,s):
""" Convert a (binary) string into an hexadecimal string.
>>> mc.toh('ABCD')
'41424344'
>>> mc.toh('mycrypto')
'6d7963727970746f'
"""
return s.encode('hex')
Docstrings can also be defined at module level. The docstring line must appear before imports:
#! /usr/bin/python3
"""Use int() to convert either binary or hex string to an integer
>>> int('11110000',2)
240
"""
import binascii
Use module doctest
to test examples in docstrings:
# Check docstring examples on exec (not on import)
if __name__ == "__main__":
import doctest
doctest.testmod()
Classes
- Reference: Python docs.
An empty class:
class Empty(object):
pass
e=Empty()
A class with constructor and data members:
class Basic(object):
__param = None # __* denotes a class-private member
def __init__(self, param):
self.__param = param
print "Basic is born with param %s" % param
b1=Basic('foo')
b2=Basic(param='bar')
A class that inherits:
class Child(Parent):
__param = None
def __init__(self, param):
Parent.__init__(self) # Must call EXPLICITLY parent constructor
self.__param = param
Class members can be defined as properties:
class Rectangle(object):
def __init__(self, width, height):
self.width = width
self.height = height
@property
def area(self):
'''area of the rectangle'''
return self.width * self.height
@area.setter
def area(self, value):
scale = math.sqrt(value/self.area)
self.width *= scale
self.height *= scale
Classes may have static methods and class methods [6]:
class Rectangle(object):
max_area = 10 # A class variable shared by all instances
def __init__(self, width, height):
self.width = width
self.height = height
@staticmethod
def give_height(area,width):
return area / width
@classmethod
def get_max_height(cls,max_area):
return cls.max_area
Using modules
References:
- importlib, the default implementation for Python
import
- Relative imports for the billionth time, easy explanation of relative import issues.
Assume we have a module named module.py:
import module; # Import everything in module.* namespace
from module import *; # Import everything in current namespace
import sys
sys.path.append('some/custom/path')
import module; # Import module from a custom path
Use built-in __import__
, or better yet importlib.import_module
, to import a module whose name is in a string (http://effbot.org/zone/import-string.htm)
mymodule = __import__('mymodule') # Import module from string - see http://effbot.org/zone/import-string.htm
import importlib
importlib.import_module('mymodule')
Note that modules can be imported anywhere, not just at the start of the file. This allows for loading a module only when necessary.
- Scripts vs module
- There are two ways to load a python file: as the top-level script, or as a module.
- File is loaded as top-level script when it is executed directly (eg.
python myfile.py
). The__name__
of the top-level script is always__main__
. - A module is a file imported with
import mymodule
. The__name__
ismymodule
. - A module can be part of a package (say
package.mysubpackage.mymodule
). Module in package can do relative import (from .. import blah
).
- Relative imports
- This usually works, at least for a top-level script (see also post above, and section below Get path of current file)
# To import ../../some/package/mymodule.py, relatively to current file
sys.path.append(os.path.dirname(os.path.abspath(__file__)) + '/../../some/package')
import mymodule
Python 2 reference
for i in range(10):
print i
# Add a comma to remove carriage return
for i in range(10):
print i, # 0 1 2 3 4 5 6 7 8 9
To enable Python 3 print
function:
from __future__ import print_function # Enable v3 print in Python 2.x
Basic I/O in Python
Source: O'Reilly Python in a Nutshell.
String formatting with format
or formatted-string literals
Source:
- https://docs.python.org/3/tutorial/inputoutput.html
- See Python in Nutshell, chapter 8 for more information.
Available since Python 3.
# v3 - String formatting
# '{[selector][conversion]:[format_specifier]}'.format(value)
'First: {} second: {}'.format(1, 'two')
'Second: {1} first: {0}'.format(1, 'two') # Give positional for all
'a: {a}, 1st: {}, 2nd: {}, a again: {a}'.format(1, 'two', a=3) # Give name for some
'a: {a} first:{0} second: {1} first: {0}'.format(1, 'two', a=3) # Can mix name and positional
# Using sequences and composites:
'p0[1]: {[1]} p1[0]: {[0]}'.format(('zero', 'one'), ('two', 'three'))
'p1[0]: {1[0]} p0[1]: {0[1]}'.format(('zero', 'one'), ('two', 'three'))
'{} {} {a[2]}'.format(1, 2, a=(5, 4, 3))
'First r: {.real} Second i: {a.imag}'.format(1+2j, a=3+4j)
# Field width
'{:^12s}'.format(s)
'{:.>12s}'.format(s)
print('{:,}'.format(12345678))
# Precision specification
'as f: {:.4f}'.format(x)
'as g: {:.4g}'.format(x)
'as s: {:.6s}'.format(s)
# Formatted-string literals
# ... Use {{ or }} to insert a literal left or right curly brace
table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}
for name, phone in table.items():
print(f'{name:10} ==> {phone:10d}')
String formatting with %
Available in Python 2 and 3.
# format % values
'result = %d' % x # %d - decimal
'answers: %d %f' % x, y # %f - float
'%x' % hexval # Print hex
'File not found %r' % filename # !!! USE %r to log possibly erroneous strings !!!
Input parsing
See also modules parse
and re
.
# Using built-ins
print(int('2'))
print(float('3.14'))
# Using ast.literal_eval()
import ast
print(ast.literal_eval('23')) # 23
print(ast.literal_eval('[2,3]')) # [2, 3]
print(ast.literal_eval('2+3')) # raises ValueError
print(ast.literal_eval('2+')) # raises SyntaxError
# Using split()
a='abc\ndef\n123\n'
a.split('\n') # ['abc', 'def', '123', '']
a.strip().split('\n') # ['abc', 'def', '123']
a='12,34,56'
a.split(',') # ['12','34','56']
[int(x) for x in a.split(',')] # [12,34,56]
Text output
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
for i in range(10):
print(i,"",end="") # 0 1 2 3 4 5 6 7 8 9
for i in range(10):
print(f"{i} ",end="") # 0 1 2 3 4 5 6 7 8 9
import sys;
sys.stdout # Standard output
sys.stderr # Standard error
# Output to a file
print(file=f,'...')
f.write('...')
sys.stdout.write(...) # Using write with stdout
# Output to stderr
sys.stderr.write(...) # Using write
print(file=sys.stderr,'...') # Using print
def eprint(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
eprint('...') # Using custom fct
Text input
See also #io module.
- Standard input
import sys;
sys.stdin # Standard input
# Input (from stdin only)
input(prompt='') # v3: same as v2 raw_input; v2: same as eval(raw_input(prompt))
raw_input(prompt='') # v2 only
- File input with context manager (recommended) - file input
# Context manager - f is closed automatically
with open("test.txt", "U") as f: # "U" for universal line ending
for line in f:
print(line.rstrip('\n')) # Or rstrip() to right strip all blanks (no need for "U" then)
# Even more compact
for line in open('test.txt', 'U'): # file will be closed when object out of scope
print(line.rstrip()) # Or rstrip() to right strip all blanks (no need for "U" then)
- Open / close
# read a file
f = open("demofile3.txt", "rU") # "r" optional, "U" for universal line ending
print(f.read())
f.close()
- fileinput
import fileinput
# Iterate over all files in sys.argv or stdin
for line in fileinput.input():
print(line.rstrip()) # Right-strip all blansk (CR,LF,SPC)
# Can override list of files -- here explicit use as context manager
with fileinput.input(files=('spam.txt', 'eggs.txt'),mode="U") as f:
for line in f:
print(line.rstrip())
Standard Library
sys module
Arguments
sys.argv, len(sys.argv) # Argument list, number of arguments ([0] -> exec name)
if ("-h" in sys.argv) or ("--help" in sys.argv):
printUsage()
for a in range(len(sys.argv)):
if sys.argv[a] == "-e":
# handler
Exit
sys.exit()
io module
To open a file:
# - mode can be 'r', 'w', 'a', 'r+', 'w+', 'a+', ...
# Default is text 't', add 'b' for binary, 'U' for universal line ending
open(file, mode='r', buffering=-1, encoding=None, errors='strict', newline=None, closefd=True, opener=os.open)
with io.open(...) as f: # PYTHONIC way, open is a manager
# ...
for line in io.open(...): # PYTHONIC way to read line by line, file close automatically
# ...
f = io.open(...) # BAD. No guarantee that f gets closed
File operations:
f.close()
f.flush()
str = f.read(size=-1) # bytestring in bynary mode, text string otherwise.
str = f.readline(size=-1)
lst = f.readlines(size=-1)
lst = [l.strip() for l in open(...)] # To get rid of '\n', trailing spaces...
with open(filename) as f:
mylist = f.read().splitlines() # To get rid of '\n' only
f.write(s)
f.writelines(lst) # Same as: for line in lst: f.write(line)
Iterations:
for line in f:
# ... # !!! 'break' and 'next(t)' interferes with file's position
# f.readline() is ok.
os and filesystem operations
import os
os.remove(path) # Remove a file
os.unlink(path) # ... idem
os.rmdir(path) # Remove an (empty) directory
os.path.dirname(path)
os.path.basename(path)
import shutil
shutil.rmtree(path, ignore_errors=False, onerror=None) # Remove a directory and all its content
import os.path
os.path.isfile(fname) # True if fname exists and is a file
if not os.path.exists(directory):
os.makedirs(directory) # Create directory if does not exists
try: # Avoid race condition if directory created by another process
os.makedirs(path) # But we could fix solution above as well
except OSError: # This one always trigger an exception in nominal case
if not os.path.isdir(path):
raise
Scanning a directory
import glob
tests = glob.glob('tests/tests_*.py')
for t in tests:
print("tests %s" % t)
# https://stackoverflow.com/questions/6773584/how-is-pythons-glob-glob-ordered
import os
sorted(glob.glob('*.png')) # Sort by name
sorted(glob.glob('*.png'), key=os.path.getmtime) # Sort by modification time
sorted(glob.glob('*.png'), key=os.path.getsize) # Sort by size
Executing a command in subshell:
os.system(f"diff -rq {dir1} {dir2} >/dev/null 2>/dev/null") # Return code are multiplied by 256
It is however recommended to use subsystem.call
rather than os.system
.
argparse module
See excellent argparse tutorial.
# Parse command line
parser = argparse.ArgumentParser()
parser.add_argument("-p", "--port", type=int, default=PORT, help="server port number")
group = parser.add_mutually_exclusive_group()
group.add_argument("-a", "--attach", action="store_true", help="don't start a new server but attach to a running one")
group.add_argument("-t", "--target", default=TARGET, help="path to server executable")
parser.add_argument("test", nargs='+', help="path to python module containing tests to run")
args = parser.parse_args()
bench = TestBench(target=args.target, port=args.port, attach=args.attach)
for s in args.test:
print("test %s" % s)
random module
import random
IV = []
for i in range(16):
IV.append(random.randint(0, 255))
datetime module
import datetime
print datetime.datetime.today()
print datetime.datetime.now() # similar, but possibly more accurate
print datetime.date.now() # date only
bitstring module
from bitstring import *
s = Bits('0x8081828384858687')
s = Bits(hex='8081828384858687')
s = Bits(bytes=b'\x80\x81\x82\x83\x84\x85\x86\x87')
sa = BitArray('0x8081828384858687') # same as Bits, but mutable
s << 8 # Logical shift
s[8:] + '0x00' # ... same as above
s <<= 8 # ... (with mutation)
sa.rol(8) # Cyclic shift (with mutation)
s[8:] + s[:7] # ... same as above
Named Tuple
Source: SO
Using named tuples from collections:
# Python 2 and 3
from collections import namedtuple
MyStruct = namedtuple("MyStruct", "field1 field2 field3")
m = MyStruct("foo", "bar", "baz")
m = MyStruct(field1="foo", field2="bar", field3="baz")
Since Python 3.6, improved NamedTuple:
# Python 3.6
from typing import NamedTuple
class User(NamedTuple):
name: str
class MyStruct(NamedTuple):
foo: str
bar: int
baz: list
qux: User
my_item = MyStruct('foo', 0, ['baz'], User('peter'))
Since Python 3.7, Data Classes:
# Python 3.7
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
z: float = 0.0
p = Point(1.5, 2.5)
print(p) # Point(x=1.5, y=2.5, z=0.0)
parsing module
Source: https://pypi.org/project/parse/, https://stackoverflow.com/questions/2175080/sscanf-in-python
Parse is the opposite of format.
from parse import parse
parsed=parse('{} fish',"blue fish")
print(parsed[0]) # 'blue'
min,max,letter,pwd=parse('{}-{} {}: {}',"4-8 n: noon")
print(f"{min}, {max}, {letter}, {pwd}")
regex module
Source: w3schools
See also parse
module for more advanced parsing
# It's a good habit to put re string in raw r'...' strings!
import re
re.search(r'^The.*Spain$', "The rain in Spain") # re.Match object
re.match(r'The.*Spain$', "The rain in Spain") # match always start from beg.
re.findall(r'.ai', "The rain in Spain") # ['rai', 'pai']
re.split(r'\s', "The rain in Spain") # ['The', 'rain', 'in', 'Spain']
re.sub(r'\s', "_", "The rain in Spain") # The_rain_in_Spain
re.sub(r'\s', "_", "The rain in Spain", 2) # The_rain_in Spain
# re.Match object
re.search(r'.ai', "The rain in Spain").group() # rai
re.search(r'.ai', "The rain in Spain").span() # (4, 7)
re.search(r'.ai', "The rain in Spain").string # The rain in Spain
# re.sub
x = re.sub(r'\s', "_", "The rain in Spain") # The_rain_in_Spain
# re.compile
r = re.compile(r'.ai')
r.search("The rain in Spain").group() # rai
# flags
a="foo\nbar"
re.match(r'foo\nbar',a) is not None # True
re.match(r'foo\nbar$',a) is not None # True
re.match(r'foo$\nbar',a) is not None # False - ^ and $ only match first/last
re.match(r'fo.*ar$',a) is not None # False - . doesn't match \n
re.match(r'fo.*ar$',a,re.S) is not None # True - re.S: . match also \n
re.match(r'foo$\nbar$',a,re.M) is not None # True - re.M: ^ and $ match any \n
re.match(r'f.*^bar$',a,re.S|re.M) is not None # True
Subsystem
Execute a command in a subshell (better than os.system
):
import subsystem
code = subsystem.call("diff", "-rq", dir1, dir2, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
Here, no specific handling is necessary if dir1
or dir2
contains special characters.
Cryptography
- Package pycrypto
from Crypto.Cipher import AES
def toh(s):
return s.encode('hex')
def tos(h):
return h.replace(' ','').decode('hex')
def aes(k,p):
a=AES.new(tos(k))
return toh(a.encrypt(tos(p)))
def aesinv(k,c):
a=AES.new(tos(k))
return toh(a.decrypt(tos(c)))
def sxor(h1,h2):
return toh(''.join(chr(ord(a) ^ ord(b)) for a,b in zip(tos(h1),tos(h2))))
Example of use:
ipython
run mycrypto # Assuming script in current dir and named 'mycrypto.py'
key='00112233 44556677 8899aabb ccddeeff'
p0='00000100 80000000 00000000 00000000'
c0=aes(key,p0)
p1='aaaaaaaa bbbbbbbb cccccccc dddddddd'
c1=aes(key,sxor(c0,p1))
- Modular inverse [8]
# Using gmpy - FASTEST
import gmpy
gmpy.invert(1234567, p) # 1000000 loops, best of 3: 737 ns per loop (p 1024-bit)
gmpy.divm(1, 1234567, p) # 1000000 loops, best of 3: 933 ns per loop (p 1024-bit)
# Using egcd function - NO DEPS, BUT SLOWER
def egcd(a, b):
if a == 0:
return (b, 0, 1)
else:
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)
def modinv(a, m):
g, x, y = egcd(a, m)
if g != 1:
raise Exception('modular inverse does not exist')
else:
return x % m
timeit modinv(1234567,p) # 100000 loops, best of 3: 13.6 us per loop (p 1024-bit)
# Using pow() - SIMPLEST BUT SLOWEST
timeit pow(1234567,p-2,p) # 100 loops, best of 3: 4.22 ms per loop
- modular exponentiation
from gmpy import mpz
def power_mod(a, b, n):
return long(pow(mpz(a),b,n))
- Package hashlib
# Example from https://www.quickprogrammingtips.com/python/how-to-calculate-sha256-hash-of-a-file-in-python.html
import hashlib
filename = input("Enter the input file name: ")
sha256_hash = hashlib.sha256()
with open(filename,"rb") as f:
# Read and update hash string value in blocks of 4K
for byte_block in iter(lambda: f.read(4096),b""):
sha256_hash.update(byte_block)
print(sha256_hash.hexdigest())
Doctest
The doctest module searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown.
See example below.
# file dc.py
def toh(s):
""" Convert a (binary) string into an hexadecimal string.
>>> toh('DOH!')
'444f4821'
"""
return s.encode('hex')
if __name__ == "__main__":
import doctest
doctest.testmod()
Run the tests with:
python dc.py
Testing
pytest
See pytest.
Tips
Simple HTTP Server
It's very easy to setup an ad-hoc HTTP server with Python. Just open a shell in a folder with some contents to share, and type:
python -m SimpleHTTPServer
More available at http://docs.python.org/2/library/internet.html (see BaseHTTPServer and CGIHTTPServer).
Detect interactive mode
Started with | First method | Second method | Third method | Fourth method |
---|---|---|---|---|
import __main__ as main print hasattr(main, '__file__')
|
def in_ipython(): try: __IPYTHON__ except NameError: return False return True
|
import sys print hasattr(sys, 'ps1'):
|
import sys print bool(sys.flags.interactive)
| |
python mymod.py
|
True | - | - | - |
python -i mymod.py
|
True | - | - | True |
python then import mymod
|
- | - | True | - |
ipython mymod.py
|
True | True | - | - |
ipython -i mymod.py
|
True | True | - | - |
ipython then run mymod.py
|
True | True | - | - |
ipython then run -i mymod.py
|
True | True | - | - |
ipython then import mymod
|
- | True | - | - |
ipython -i then import mymod
|
- | True | - | - |
Find duplicates in list
From stackoverflow [11]
import collections
def fastest(): # 134 us - Fastest
seen = set()
seen_add = seen.add # To avoid lookup 'add' ever time an item is inserted
seen_twice = set( x for x in l if x in seen or seen_add(x) ) # adds all elements it doesn't know yet to seen and all other to seen_twice
return list( seen_twice ) # turn the set into a list (as requested)
def compact(): # 415 us
return [x for x, y in collections.Counter(l).items() if y > 1]
def slowest(): # 19.2 ms
return list(set([x for x in l if l.count(x) > 1]))
Start post-mortem debugger on exception
From stackoverflow [12]
>>> import pdb
>>> pdb.pm()
Miscellaneous
- Detect whether a variable is defined
Note it is bad practice to define a variable conditionally [13]. An interesting use case is to run code and define variable conditionally based on interactive status.
# Using try ... except
try: myvar
except NameError: print "variable 'myvar' IS defined"
# Using vars() / globals()
'myvar' in vars() or 'myvar' in globals()
# ...pedantic...
'myvar' in vars(__builtins__)
Analyse memory usage
- Dowser
- See [14] — seems better suited to find memory leaks, not to analyse usage for memory hungry applications
- memory_profiler
- See [15]
- Install
sudo pip install -U memory_profiler
sudo pip install psutil
- Add
@profile decorator
@profile
def primes(n):
...
- Run the profiler
python -m memory_profiler primes.py
The Pythonic way
Type import this
in a Python interpreter, you get this:
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
Detect Python 2 or Python 3 dependency
For instance, does gdb uses python 2 or 3?
ldd $(which gdb)|grep python
# libpython3.5m.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0 (0x00007f442a960000)
Find character in a string
The fastest and simplest is to use in
operator, like
if '.' in name:
# ...
To detect more characters, we must use a regex [16]:
>>> import re
>>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):
... return not bool(search(strg))
>>> special_match("az09.")
True
>>> special_match("az09.\n")
False
Note:
search
is faster than usingmatch
.- If using
match
, there is no need to use^...$
to force a full match. - Regex should use raw string
r'...'
. - If using the regex multiple times, compile it once and reuse later!
Detect Python version, location...
From pwndbg [17]:
# Find the Python version
PYVER=$(python -c 'import platform; print(".".join(platform.python_version_tuple()[:2]))')
PYTHON=$(python -c 'import sys; print(sys.executable)')
PYTHON="${PYTHON}${PYVER}"
# Find the Python site-packages that we need to use
SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
# or to get user site
SITE_PACKAGES=$(python -c 'import site; print(site.getusersitepackages())')
Using script above, one can install a module using pip for the given python/site installation.
# Install Python dependencies using pip
sudo ${PYTHON} -m pip install --target ${SITE_PACKAGES} -Ur requirements.txt
Display random distribution with seaborn
seaborn is a powerful python toolkit to visualize statistical data.
Assume a data file like
head -n 5 samples
# 19.2
# 6.6
# 7.9
# 5.5
# 3.6
# ...
To visualize into seaborn:
# First setup seaborn - https://seaborn.pydata.org/tutorial/distributions.html
%matplotlib gtk
import numpy as np
import pandas as pd
from scipy import stats, integrate
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(color_codes=True)
np.random.seed(sum(map(ord, "distributions")))
# Then load our file - https://stackoverflow.com/questions/36343646/reading-a-text-file-and-converting-string-to-float
file_in = open('../samples','r')
for z in file_in.read().split('\n'):
if z: y.append(float(z))
file_in.close()
# Then tell seaborn to show the distribution. If
sns.distplot(y)
# Normally the graph should pop up automatically. If not:
# plt.show()
# sns.plt.show();
Convert bytes to str and vice-versa
Python v2 and v3 have different types of strings.
- In v2, the type
str
is a sequence of bytes, whileunicode
are for Unicode text strings. - In v3, the type
str
are for Unicode text strings, andbytes
is a sequence of bytes, also known as bytestring or byte string.
# Python v3
isinstance(s,str) # True if s is a unicode text string
isinstance('abc',str) # True
isinstance(b,bytes) # True if b is a bytestring
isinstance(b'abc',bytes) # True
s.encode() # Convert a text string (str) to bytes
b.decode() # Convert a bytestring (bytes) to str
XOR strings together
In Python 2.x [18]:
def sxor(s1,s2):
return ''.join(chr(ord(a) ^ ord(b)) for a,b in zip(s1,s2))
In Python 3.x:
def bytes_xor(a, b) :
return bytes(x ^ y for x, y in zip(a, b))
Various conversion
- Binary 00110101
# Or use bin to convert an integer into binary literal string ('0b' prefix)
>>> bin(173)
'0b10101101'
# Binary literals are regular integers
>>> 0b101111
47
# Use int(..., 2) to convert a binary string into integer
>>> print int('01010101111',2)
687
>>> print int('11111111',2)
255
Reverse a string
>>> 'hello world'[::-1]
'dlrow olleh'
Reload a module in interactive python
There is reload
command:
- Python3 >= 3.4:
importlib.reload(some_module)
- Python3 < 3.4:
imp.reload(some_module)
- Python2:
reload(some_module)
built-in
For instance
import importlib
import some_module
# hack hack...
importlib.reload(some_module) # Reload module
However
reload
does not reload dependencies.- It does not work when module is loaded like
from some_module import *
.
Usually it's simpler to do:
python3 -c 'from some_module import *'
# >>> hack hack...
# >>> <CTRL-D>
python3 -c 'from some_module import *'
# >>> ....
Benchmark an algorithm
From the shell, using the timeit
module:
python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
# 10000 loops, best of 3: 143 usec per loop
python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
# 1000 loops, best of 3: 969 usec per loop
python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
# 1000 loops, best of 3: 1.1 msec per loop
Or directly in Python, using timeit.Timer
:
>>> timeit.Timer(
'[item for sublist in l for item in sublist]',
'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
).timeit(100)
2.0440959930419922
Flatten a list of lists (of lists...)
from SO:
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
# Fastest - using iconcat
functools.reduce(operator.iconcat, a, [])
# Fastest - using itertools
list(itertools.chain(*list2d))
list(itertools.chain.from_iterable(list2d)) # Since Python 2.6, no unpacking needed
# Using list comprehension - very fast
flat_list = [item for sublist in l for item in sublist]
# Using sum and monoid - fastest for small list, very compact
sum(l, [])
# Using lambda, slowest
reduce(lambda x,y: x+y,l)
See also this blogspot, for a non-recursive solution that can process even deeply nested lists.
Detect last element in a for loop
From SO:
def lookahead(iterable):
"""Pass through all values from the given iterable, augmented by the
information if there are more values to come after the current one
(False), or if it is the last value (True).
"""
# Get an iterator and pull the first value.
it = iter(iterable)
last = next(it)
# Run the iterator to exhaustion (starting from the second value).
for val in it:
# Report the *previous* value (more to come).
yield last, False
last = val
# Report the last value.
yield last, True
for i, has_more in lookahead(range(3)):
print(i, has_more)
Swap two variables
The pythonic way [19]:
a,b = b,a
Print to stderr
# For Python 2:
# from __future__ import print_function
# import sys
def eprint(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
Note that stderr
is not buffered, so no need to flush [20].
Get product of all elements in a list
import numpy
L={10,20,30}
int(numpy.prod(L)) # 6000
Check that a variable is an integer
isinstance(1, int) # True
isinstance(1.1, int) # False
Get path to current script
From StackOverflow:
- Python 3
# Directory of the script being run
import pathlib
pathlib.Path(__file__).parent.absolute()
# Current working directory
import pathlib
pathlib.Path().absolute()
- Python 2 and 3
# Directory of the script being run
import os
os.path.dirname(os.path.abspath(__file__))
os.path.dirname(__file__) # BAD!!! is empty if __file__ has no dir component
# Current working directory
import os
os.path.abspath(os.getcwd())
Otherwise, a convoluted solution using inspect
, when we cannot use __file__
:
import os
import inspect
def dummy_func():
pass
# We can not use __file__ to get the local file path, use another method that uses 'inspect' module
filepath = os.path.dirname(os.path.abspath(inspect.getsourcefile(dummy_func)))
filepath = filepath.replace( '\\', '/')
View methods / attributes of any object
# https://stackoverflow.com/questions/34439/finding-what-methods-a-python-object-has
dir(object)
help(object)
Do's and don't's
foo = 'abcdef'
l = list(foo) # DO
|
foo = 'abcdef'
l = [c for c in foo] # don't
|
foo = list(...)
g = map(blah,foo] # DO
|
foo = list(...)
g = [blah(i) for i in foo] # don't
|
A = [[0]*5 for _ in range(5)] # DO
|
A = [[0]*5]*5 # don't
|
Traps
Frequent mistakes. Beware the snake can bite you!
Confuse a method and a property in a test
if A.isdummy(): # This will fail isdummy is a property
if A.isdummy: # Always True if isdummy is a method
Note that property should only be used to extend the behaviour of a class variable. Properties are designed to make it safe to publish variables in class interface, and get rid of useless mutator/accessor (see Python in a Nutshell, Why properties are important). Don't use property as replacement of a method when designing a new class.
Stick to a convention. Like always define methods like isxyyz()
or hasabc()
as methods. Note that defining them as property would raise an exception if used as a function, and hence might be safer.
Mix 0
with None
in a sequence
- Testing whether an element is defined is more difficult.
a = [0,None,None,None]
bool(a[0]) # --> False
bool(a[1]) # --> False !!! How can we tell them apart?
a[1] is None # --> True This works
Mixing property and normal getter
- SOLUTION: prefix all getter method with get, like
getvalue()
b = a.prop # Using a property, OR
b = a.getprop() # Using a getter
Forget that, in a python function, arguments are always passed by value
def f(x, y):
x = 23
y.append(42)
a = 77
b = [99]
f(a, b)
print a, b # prints: 77 [99, 42]
To reassing a list in a function, use a[:]
construct, like:
def f(a):
a[:]=a[::-1] # This will NOT create a new list, but reassign elements in the original list
Use bytes, not string of characters
Characters can be unicode and take more than one byte.
b'abc'
bytes('abc')
Mixing string and bytestring (v3)
buf = b'abc\n'
if buf.find(b'\n'): # MUST use BYTESTRING here
# ....
str = 'abc\n'
if str.find('\n'): # MUST use STRING here
# ....
Forget self.
when using class members
class MyClass(object):
buf = b''
def UpdateBuf(self,new_buf):
buf = new_buf # WRONG!
self.buf = new_buf # CORRECT!
Examples
Read a file line by line
Sources: [21]
Shortest version with autoclose and universal line ending (mode "u"
):
for line in open("path/to/file.txt","U"): # U: universal line ending
print(line.strip()) # or strip('\r')
Slightly longer version with with
:
with open("path/to/file.txt") as f: # assume read-text mode "rt"
for line in f:
print(line.strip()) # or strip('\r')
Counting line number:
with open('path/to/file.txt') as f:
for cnt, line in enumerate(f):
print(f"Line {cnt}: {line.strip()}")
The long old way:
try:
f = open("path/to/file.txt")
line = f.readline()
cnt = 1
while line:
print(f"Line {cnt}: {line.strip()}")
line = f.readline()
cnt += 1
finally:
f.close()
Read a list of integers from a file
# https://stackoverflow.com/questions/6583573/how-to-read-numbers-from-file-in-python
# Using for loop
a=[]
with open('input.txt') as f:
for line in f:
a.append(int(line))
# Using list comprehension
a=[int(line) for line in open('input.txt')]
Simple TCP server
import socket, socketserver
import sys
import itertools
SERV_ADDR="0.0.0.0"
SERV_ADDR=2222
class Handler(socketserver.BaseRequestHandler):
messages = b""
def handle(self):
token = uint8(0)
client = self.request
client.setblocking(True)
try:
while True:
buf = client.recv(1)
# buf = client.recv(len)
# client.send(buf)
pass
except socket.error as msg:
pass
client.close()
return
port = SERV_PORT
if len(sys.argv) > 1:
port = int(sys.argv[1])
server = socketserver.TCPServer((SERV_ADDR, port), Handler)
server.serve_forever()
Docstrings and Doctest
Specifications: pep-0257
- To write good module docstrings, "think about somebody doing help(yourmodule) at the interactive interpreter's prompt — what do they want to know?" [22].
- See pep-0257 for more recommendations
- Using doctest
You can include tests, in the form of examples, in your Python modules' docstrings [23].
For instance, here file sxor.py. It contains:
- A function with a docstring, and example of use with some test values.
- A footer code that will call
doctest.testmod()
function if the module is loaded as main file.
import binascii
def sxor(s1,s2):
"""Xor two strings together.
>>> sxor('abcd','1234')
'b9f9'
"""
s1=binascii.unhexlify(s1)
s2=binascii.unhexlify(s2)
return binascii.hexlify(bytes(a ^ b for a,b in zip(s1,s2))).decode()
# Footer to trigger doctest automatically.
# Alternatively, trigger it with:
#
# python -m doctest sxor.py
#
if __name__ == "__main__":
import doctest
doctest.testmod()
Now, we can run the tests with:
python3 sxor.py
No output means there was no errors. Use -v
to get more output:
python3 sxor.py -v
# Trying:
# sxor('abcd','1234')
# Expecting:
# 'b9f9'
# ok
# 1 items had no tests:
# __main__
# 1 items passed all tests:
# 1 tests in __main__.sxor
# 1 tests in 2 items.
# 1 passed and 0 failed.
# Test passed.
Instead of using the footer code, one may call doctest from the command line (since Python 2.6):
python3 -m doctest sxor.py
Create new packages / modules
Links
- Recommend using flit for packaging,
tox
for linters and tests, etc. Very nice writeups. See more on HN.
- Recommended on HN.
- pip can install packages from GitHub:
pip install git+https://myg.it/repo.git
Standard method
- Reference: https://packaging.python.org/tutorials/packaging-projects/
- setup.cfg: https://setuptools.readthedocs.io/en/latest/userguide/declarative_config.html
- Create project template as in reference above
- Add the files in src/.
- Install build
python3 -m pip install --upgrade build
- Build the package
python3 -m build
Libraries
- Big numbers
- gmpy based on GMP
- libnum a lighter bignum library, but compatible with pypy.
Unicode
- Set source file encoding
Add any of these lines [24]:
# -*- coding: utf-8 -*-
# vim: set fileencoding=utf-8 :
- Write the BOM
See [25]
import codecs
file = codecs.open("lol", "w", "utf-8")
file.write(u'\ufeff') # or use unicode name: u'\N{ZERO WIDTH NO-BREAK SPACE}'
file.close()
# Using https://docs.python.org/2/library/codecs.html#module-encodings.utf_8_sig
with codecs.open("test_output", "w", "utf-8-sig") as temp:
temp.write("hi mom\n")
- Handling unicode
Some recommends to always process unicode internally, and decode on input and encode on output [26]:
line = line.decode('utf-8')
# ...treat line as unicode...
print line.encode('utf-8')
But this is error prone. So another solution proposed is to redefine sys.stdout
:
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
An hackish way (not recommended):
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"
Python 2 to Python 3
- Use python v3
print
in v2
from __future__ import print_function
This way print()
will not print ()
in v2.
Coding style
From PEP 8, Coding Style.
- Use
pycodestyle
to check code conformance:
pip install pycodestyle
pycodestyle optparse.py
- Use
autopep8
to format existing code:
pip install autopep8
autopep8 --in-place optparse.py
- Naming conventions
lower_case_variable = None
def lower_case_func():
# ...
class ClassNameAreCapsWord:
# ...
- Some good/bad practices
# BAD - superfluous 'pass'
class InvalidAttribute(AttributeError):
"""Used to indicate attributes that could never be valid"""
pass
# GOOD
class InvalidAttribute(AttributeError):
"""Used to indicate attributes that could never be valid"""
# BAD
f = open('file.txt')
a = f.read()
print a
f.close()
# GOOD
with open('file.txt') as f:
for line in f:
print line
# BAD
my_very_big_string = """For a long time I used to go to bed early. Sometimes, \
when I had put out my candle, my eyes would close so quickly that I had not even \
time to say “I’m going to sleep.”"""
from some.deep.module.inside.a.module import a_nice_function, another_nice_function, \
yet_another_nice_function
# GOOD
my_very_big_string = (
"For a long time I used to go to bed early. Sometimes, "
"when I had put out my candle, my eyes would close so quickly "
"that I had not even time to say “I’m going to sleep.”"
)
from some.deep.module.inside.a.module import (
a_nice_function, another_nice_function, yet_another_nice_function)
Troubleshooting
Troubleshooting a missing library
- Use
python -v -c "import mylibrary"
to troubleshoot a module. - Look at the log for the loaded libraries.
- Some libraries are statically linked in python and might be missing. Use
ldd
to see the linked libraries, and report missing ones.
ldd /path/to/your/_hashlib.so
# linux-gate.so.1 => (0xf77c3000)
# libssl.so.6 => not found
# libcrypto.so.6 => not found
# libpython2.7.so.1.0 => not found
# libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf776a000)
# libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf75b3000)
# /lib/ld-linux.so.2 (0x5659b000)