Python

From miki
Revision as of 10:17, 23 November 2016 by Mip (talk | contribs) (→‎Online install)
Jump to navigation Jump to search

Links

Other versions of Python are available [1]
Variants and distributions
  • ipython
  • Jupyter — The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
  • Anaconda
Python 3
PEP
Tools
  • autopep8 — A tool that automatically formats Python code to conform to the PEP 8 style guide
sudo pip install --upgrade autopep8
Miscellaneous
  • Nice example of generating / testing regex in Python (with nice / small test framework) [2]

Python modules

Online install

Python comes with a wide range of libraries, called modules. There are several ways to install these modules.

Using the distribution
  • For instance, in Debian:
apt-cache search --names-only python-       # View available modules
sudo apt-get install python-pyscard         # Install the pyscard module
Using pip

pip is the new way to install modules. It uses the wheel format.

sudo pip install Pygments
Using easy_install

easy_install is the old way to install modules. It uses the egg format.

sudo easy_install Pygments


From source

Install globally (dist-package):

wget http://sourceforge.net/projects/pyscard/files/pyscard/pyscard%201.6.12/pyscard-1.6.12.tar.gz#md5=908d2530972ea91eb4bb66987e0e1e98
tar -xvzf pyscard-1.6.12.tar.gz
cd pyscard-1.6.12
sudo ./setup.py install

To install locally (site-package), use --user:

sudo ./setup.py install --user

Offline install

To install a Python module on a machine that has no connection to Internet [3]:

  • On a machine with internet connection
# For instance, to install package neovim
mkdir tmp && cd tmp
pip download neovim
  • On the offline machine, which has access to tmp/:
# For instance, to install package neovim
cd tmp
pip install --no-index --find-links ./ neovim

Interactive mode

Python can be run interactively, which is a very powerful way to develop new applications.

Python

To import an existing module, use import as usual:

import mymod             # Import module in current session
from mymod import *      # Idem, but remove mymod. prefix to symbols

iPython / Jupyter

To import an existing module, use import as above or command run:

run mymod

Python variants

iPy

Use iPy (ipython) to get an interactive shell with auto-completion, instant help...

%magic                    # Get help on %magic commands (%run,...)
?run                      # Get help on %run magic
%run script.py            # Run given script
%run -i script.py         # ... with inspect mode on
%run -i -e script.py      # ... ... and ignore sys.exit() call
!cmd                      # Run shell command 'cmd', for instance ...
!ls                       # ... List file in current directory

Pypy

PyPy is a fast, compliant alternative implementation of the Python language, which usually runs python programs faster thanks to its Just-in-Time compiler.

Install
On Lucid 64-bit, the easiest is to download the dedicated tarball:
wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.2.1-linux64.tar.bz2
tar -cvjf pypy-2.2.1-linux64.tar.bz2
Install virtualenv, then install pypy as virtual environment my-pypy-env
sudo apt-get install python-virtualenv
virtualenv -p pypy-2.2.1-linux64/bin/pypy my-pypy-env
Modules must be installed separatedly for this virtual environment. For instance
./my-pypy-env/bin/pip install libnum
Run
Run python programs using python or pypy
./my-pypy-env/bin/pypy

Reference

Basic

Statements
try:
    statement(s)
except [expression [,target]]:
    statement(s)
[else:
    statement(s)]
try:
    statement(s)
finally:
    statement(s)
try:
    statement(s)
except [expression [,target]]:
    statement(s)
finally:
    statement(s)
expression is a class or tuple of classes. target is variable that will store exception object. else clause is executed if try block terminates, i.e. not on exception or if a break occurs. try-except-finally is Python 2.5.

Basic - Examples

for i in range(10):
    print i                      # carriage return

for i in range(10):
    print i,                     # no carriage return
for key in d:                       # Loop over keys in dictionary d
for key, value in d.iteritems():    # Loop over keys and values in dictionary d

a = 'global'
def afunction():
    global a
    a = 'still using global'
    b = 'local'
import os.path
os.path.isfile(fname)            # True if fname exists and is a file

if not os.path.exists(directory):
    os.makedirs(directory)       # Create directory if does not exists

try:                             # Avoid race condition if directory created by another process
    os.makedirs(path)            # But we could fix solution above as well
except OSError:                  # This one always trigger an exception in nominal case
    if not os.path.isdir(path):  
        raise
s.upper()                             # string s to uppercase
', '.join(set_3)                      # Join a sequence
hex_data = "deadbeef".decode("hex")   # "\xde\xad\xbe\xef"
map(ord, hex_data)                    # [0xDE, 0xAD, 0xBE, 0xEF]
sys.argv, len(sys.argv)          # Argument list, number of arguments ([0] -> exec name)
if ("-h" in sys.argv) or ("--help" in sys.argv):
    printUsage()
for a in range(len(sys.argv)):
    if sys.argv[a] == "-e":
        # handler
# Sort based on object attribute
ut.sort(key=lambda x: x.count, reverse=True)   # To sort the list in place...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)  # To return a new list, use the sorted() built-in function...
(From stackoverflow [4])
for c in list(sha256.digest()):
    key.append(ord(c))
List
a[:]=a[::-1]                   # Reassign element in the list (here in reverse order)
a=a[::-1]                      # Idem, but create a new object

def shiftRow(word, n):
    return word[n:]+word[0:n]
state[i::4] = shiftRow(state[i::4],i)      # Apply shiftRow on 4 bytes distant of 4 each

alist = map(lambda b: sbox[b],alist)

state[:] = [ a ^ b for a,b in zip(state,roundKey) ]    # Ex-oring 2 lists of integers

# Multi-dimensional list
matrix = [[0 for x in range(5)] for x in range(5)]     # Initialize bi-dimensional array
matrix = [[0]*5 for i in range(5)]                     # faster way
# matrix = 5*[5*[0]]                                   # DO NOT DO THIS - 5 times copy of same
Dictionary
D = { 'x':42, 'y':3.14, 'z':7 }
D['x']                                                 # 42
del D[k]                                               # Removes from dictionary D the item whose key is k
#Spare matrix
Matrix = {}
Matrix[1,2] = 15                                       # This works because 1,2 -- a tuple -- is used as a key
Random
IV = []
for i in range(16):
    IV.append(randint(0, 255))
Miscellaneous conversion
print list("abc")               # ['a', 'b', 'c']
Format operator %
print '%x' % variable            # Print hex
math
print 1//2                       # floor division (PEP-238)
System
sys.exit()

Modules

import datetime
print datetime.datetime.today()  
print datetime.datetime.now()    # similar, but possibly more accurate
print datetime.date.now()        # date only

Advanced

mymodule = __import__('mymodule')          # Import module from string - see http://effbot.org/zone/import-string.htm


Modular inverse [5]
# Using gmpy - FASTEST
import gmpy
gmpy.invert(1234567, p)                      # 1000000 loops, best of 3: 737 ns per loop (p 1024-bit)
gmpy.divm(1, 1234567, p)                     # 1000000 loops, best of 3: 933 ns per loop (p 1024-bit)

# Using egcd function - NO DEPS, BUT SLOWER
def egcd(a, b):
    if a == 0:
        return (b, 0, 1)
    else:
        g, y, x = egcd(b % a, a)
        return (g, x - (b // a) * y, y)

def modinv(a, m):
    g, x, y = egcd(a, m)
    if g != 1:
        raise Exception('modular inverse does not exist')
    else:
        return x % m
timeit modinv(1234567,p)                     # 100000 loops, best of 3: 13.6 us per loop (p 1024-bit)

# Using pow() - SIMPLEST BUT SLOWEST
timeit pow(1234567,p-2,p)                    # 100 loops, best of 3: 4.22 ms per loop
modular exponentiation
from gmpy import mpz
def power_mod(a, b, n):
    return long(pow(mpz(a),b,n))
Python list
Bitstring [7] (manual)
from bitstring import *
s  = Bits('0x8081828384858687')
s  = Bits(hex='8081828384858687')
s  = Bits(bytes=b'\x80\x81\x82\x83\x84\x85\x86\x87')
sa = BitArray('0x8081828384858687')    # same as Bits, but mutable

s << 8                           # Logical shift
s[8:] + '0x00'                   # ... same as above
s <<= 8                          # ... (with mutation)
sa.rol(8)                        # Cyclic shift (with mutation)
s[8:] + s[:7]                    # ... same as above

Cryptography

from Crypto.Cipher import AES
def toh(s):
    return s.encode('hex')
def tos(h):
    return h.replace(' ','').decode('hex')
def aes(k,p):
    a=AES.new(tos(k))
    return toh(a.encrypt(tos(p)))
def aesinv(k,c):
    a=AES.new(tos(k))
    return toh(a.decrypt(tos(c)))
def sxor(h1,h2):
    return toh(''.join(chr(ord(a) ^ ord(b)) for a,b in zip(tos(h1),tos(h2))))

Example of use:

ipython
run mycrypto                    # Assuming script in current dir and named 'mycrypto.py'
key='00112233 44556677 8899aabb ccddeeff'
p0='00000100 80000000 00000000 00000000'
c0=aes(key,p0)
p1='aaaaaaaa bbbbbbbb cccccccc dddddddd'
c1=aes(key,sxor(c0,p1))

os and filesystem operations

# Using os module
os.remove(path)                 # Remove a file
os.unlink(path)                 # ... idem
os.rmdir(path)                  # Remove a directory

# Using shutil module
rmtree(path, ignore_errors=False, onerror=None)
                                # Remove a directory and all its content

Doctest

The doctest module searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown.

See example below.

# file dc.py

def toh(s):
    """ Convert a (binary) string into an hexadecimal string.
    >>> toh('DOH!')
    '444f4821'
    """
    return s.encode('hex')

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Run the tests with:

python dc.py


Tips

Simple HTTP Server

It's very easy to setup an ad-hoc HTTP server with Python. Just open a shell in a folder with some contents to share, and type:

python -m SimpleHTTPServer

More available at http://docs.python.org/2/library/internet.html (see BaseHTTPServer and CGIHTTPServer).

Detect interactive mode

References: [8], [9]

Started with First method Second method Third method Fourth method
import __main__ as main print hasattr(main, '__file__') def in_ipython(): try: __IPYTHON__ except NameError: return False return True import sys print hasattr(sys, 'ps1'): import sys print bool(sys.flags.interactive)
python mymod.py True - - -
python -i mymod.py True - - True
python then import mymod - - True -
ipython mymod.py True True - -
ipython -i mymod.py True True - -
ipython then run mymod.py True True - -
ipython then run -i mymod.py True True - -
ipython then import mymod - True - -
ipython -i then import mymod - True - -

Find duplicates in list

From stackoverflow [10]

import collections

def fastest():                         # 134 us - Fastest
    seen = set()
    seen_add = seen.add                                            # To avoid lookup 'add' ever time an item is inserted
    seen_twice = set( x for x in l if x in seen or seen_add(x) )   # adds all elements it doesn't know yet to seen and all other to seen_twice
    return list( seen_twice )                                      # turn the set into a list (as requested)

def compact():                         # 415 us
    return [x for x, y in collections.Counter(l).items() if y > 1]

def slowest():                         # 19.2 ms
    return list(set([x for x in l if l.count(x) > 1]))

Start post-mortem debugger on exception

From stackoverflow [11]

>>> import pdb
>>> pdb.pm()

Miscellaneous

Detect whether a variable is defined

Note it is bad practice to define a variable conditionally [12]. An interesting use case is to run code and define variable conditionally based on interactive status.

# Using try ... except
try: myvar
except NameError: print "variable 'myvar' IS defined"

# Using vars() / globals()
'myvar' in vars() or 'myvar' in globals()
# ...pedantic...
'myvar' in vars(__builtins__)

Analyse memory usage

Dowser
  • See [13] — seems better suited to find memory leaks, not to analyse usage for memory hungry applications
memory_profiler
sudo pip install -U memory_profiler
sudo pip install psutil
  • Add @profile decorator
@profile
def primes(n): 
    ...
  • Run the profiler
 python -m memory_profiler primes.py

The Pythonic way

Type import this in a Python interpreter, you get this:

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Detect Python 2 or Python 3 dependency

For instance, does gdb uses python 2 or 3?

ldd $(which gdb)|grep python
# libpython3.5m.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0 (0x00007f442a960000)

Do's and don't's

foo = 'abcdef'
l = list(foo)                     # DO
foo = 'abcdef'
l = [c for c in foo]              # don't
foo = list(...)
g = map(blah,foo]                 # DO
foo = list(...)
g = [blah(i) for i in foo]        # don't

Traps

Frequent mistakes. Beware the snake can bite you!

Confuse a method and a property in a test
SOLUTION: Stick to a convention. Like always define methods like isxyyz() or hasabc() as methods. Note that defining them as property would raise an exception if used as a function, and hence might be safer.
if A.isdummy():            # This will fail isdummy is a property
if A.isdummy:              # Always True if isdummy is a method
Mix 0 with None in a sequence
Testing whether an element is defined is more difficult.
a = [0,None,None,None]
bool(a[0])           # --> False
bool(a[1])           # --> False !!! How can we tell them apart?
a[1] == None         # --> True      This works, but is unusual and likely bad practice
Mixing property and normal getter
SOLUTION: prefix all getter method with get, like getvalue()
b = a.prop           # Using a property, OR
b = a.getprop()      # Using a getter
Forget that, in a python function, arguments are always passed by value
def f(x, y):
    x = 23
    y.append(42)
a = 77
b = [99]
f(a, b)
print a, b                 # prints: 77 [99, 42]

To reassing a list in a function, use a[:] construct, like:

def f(a):
   a[:]=a[::-1]             # This will NOT create a new list, but reassign elements in the original list
Use bytes, not string of characters

Characters can be unicode and take more than one byte.

b'abc'
bytes('abc')

Docstrings

Specifications: pep-0257

  • To write good module docstrings, "think about somebody doing help(yourmodule) at the interactive interpreter's prompt — what do they want to know?" [15].
  • See pep-0257 for more recommendations
Using doctest

You can include tests, in the form of examples, in your Python modules' docstrings. Properly written, these tests can be executed and verified by the doctest module. [16]

Libraries

Big numbers
  • gmpy based on GMP
  • libnum a lighter bignum library, but compatible with pypy.

Unicode

Set source file encoding

Add any of these lines [17]:

# -*- coding: utf-8 -*-
# vim: set fileencoding=utf-8 :
Write the BOM

See [18]

import codecs

file = codecs.open("lol", "w", "utf-8")
file.write(u'\ufeff')                          # or use unicode name: u'\N{ZERO WIDTH NO-BREAK SPACE}'
file.close()

# Using https://docs.python.org/2/library/codecs.html#module-encodings.utf_8_sig
with codecs.open("test_output", "w", "utf-8-sig") as temp:
    temp.write("hi mom\n")
Handling unicode

Some recommends to always process unicode internally, and decode on input and encode on output [19]:

line = line.decode('utf-8')
# ...treat line as unicode...
print line.encode('utf-8')

But this is error prone. So another solution proposed is to redefine sys.stdout:

import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)

An hackish way (not recommended):

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
print u"åäö"