AES encryption and decryption discussion:
The Advanced Encryption Standard (AES) is the encryption method of choice (2016). There are several Python implementations available. The seemingly most common and easily used is found within the PyCrypto package (by Litzenberger ref 1 and 2). This package has many crypto functions including a C implementation of AES (ref 3).
When using AES, there are several parameters to consider: mode, initialization vector, block size, keys.
AES Mode:
Without going into the details – concentrating on implementation -- Cipher Block Chaining (CBC) is the preferred mode to use. But is not perfect. CBC encryption is a sequential operation and can be slow. Because the algorithm works on a batch-of-bits (a block), the message must be padded to a multiple of the cipher block size. Most interesting, however, is that because it sequentially operates on blocks in the full plaintext, changing any bit in the plaintext bit chain changes the result. Appending a initial block -- an Initialization Vector (IV) – causes a ripple effect in the subsequent sequential encryption of the remaining plaintext. Further, any error in calculation of any middle part of the encryption, fouls the remaining encryption. This means that any error in the encryption of a long message screws the entire message. Likewise, any dropped or altered bit in the encrypted message makes decryption fail. This is good for encryption but bad for transmission or writing or recording tolerance.
AES Block Size:
Most implementations of AES encrypt a “block” of 16 bytes. This means that the encryption process operates on 16 bytes chunks. Therefore, the encrypted message must be a multiple of 16 bytes long. The implementation must therefore pad the plaintext message so that it becomes a multiple of 16 bytes long. This padding is done by appending to the end of the plaintext enough characters to make it suitably long.
AES Initialization Vector:
The AES encryption method is serial. Encrypt the first 16 bytes, then encrypt the second 16 bytes while incorporating the result of the first 16 bytes, then encrypt the third 16 bytes while incorporating the result of the second 16 bytes, etc.
But encrypting the same message or a similar message will produce the same result. For example, chain encryption “Hello how are you my friend” and “Hello how are you my friend and dog” produces similar results. Hex results:
d084000120f0fb3ac46127d78d29d4198344c83421ddb4e7eaae83bb3674f45d
d084000120f0fb3ac46127d78d29d419894d401d2b2dd2c3f0bf62fd00d34e1847efec70c4f1cd3190c598f022b1e3d0
While the second result is longer, one concludes that much of the message is the same. This essentially “leaks” info about the message. To remove this leak, simply include an IV to the beginning of the plaintext when performing the serial CBC encryption. Using a random IV, which changes for each encryption performed, forces even the same message to produce a vastly different ciphertext. This essentially starts the chained cypher process at a unique/different point. The typical AES CBC encryption is applied to an IV-augmented-plaintext message that is composed of the IV concatenated to the plaintext (i.e. IV || plaintext). The IV is 16 bytes (= block size). An example IV-augmented-plaintext is thus:
RandomIV-exampleHello how are you my friend
In actual use, rather than simple text as in this illustration, the IV is 16*8 bits (= 32 hex = 128 bits) random bit sequence.
To decrypt, one needs both the original IV starting point and the resulting ciphertext. For many/most implementations, the resulting ciphertext contains first the IV and then the encrypted message. This is because to decrypt one needs the IV used so that the plaintext can be computed. This is an example of output ciphertext:
aa8ddcc4679600905162dfa9930e2fcbd084000120f0fb3ac46127d78d29d4198344c83421ddb4e7eaae83bb3674f45d
The first 32 hex (16 bytes) is the randomly generated IV:
aa8ddcc4679600905162dfa9930e2fcb
The remainder is the encrypted plaintext:
d084000120f0fb3ac46127d78d29d4198344c83421ddb4e7eaae83bb3674f45d
To illustrate the use of different IVs, consider the same plaintext message of “Hello how are you my friend” encrypted twice:
f15142e0e284ea10b7dc26e4b0ba041c66b226e4a1f1bef5c8f7da1fe251ec9f43a38c17b0138591ed3f07e06a6c0f81
a115e6c97ed9d9ad3aec6a525ab6303a59766a979e6427c896c38ab85011574b0fc8d7cc532e9acfa1f76b06a00dfe25
In these two examples, the IV is different and this forces the resulting encryption of the plaintext message to be vastly different. There is no leak of information. The eavesdropper does not know that these two messages are identical!
In summary, use AES with CBC. Block size is 16. An IV of 16 bytes is appended to the front of the plaintext before encrypting. The plaintext is extended to a multiple of 16 bytes by adding fill characters to the end of the plaintext. The last thing to consider is the encryption key.
AES Key:
The encryption key may be either 16, 24 or 32 bytes long. It cannot be simply any length -- it must be one of these. In many cases, the user’s input key is some words, numbers, passphrase or other typeable/printable sequence. As discussed previously in the post about hashes, this does not use all the possible bit combinations of a byte. Likewise, there is no guarantee it is 32 (or 16 or 24) bytes long. To construct a general-purpose interface for AES encryption, one must decide on a key length and make sure the incoming key length matches. I choose 32 byte keys. I also choose to always apply a SHA256 hash to the incoming key and thus assure a 32 byte key. I choose 32 bytes because, well, why not. And as I have shown before, SHA256 is not much slower than SHA128.
Depending upon the particular implementation one might consider adding additional information to the user’s input key or even using a stretching algorithm (i.e. Key Derivation Function…KDF) to achieve other “good things” (see previous post re hashes). I choose to build a generalized AES function which applies a SHA256 hash to the incoming key and thus the generalized AES function can accept any length.
An Example Python AES Encrypt function:
This example implements much of the discussion above. An important note, however. I chose to only pass ciphertext as hex-coded information – both in and out of the functions. I do this because I find that settling on this easily viewable/printable mode helps with reducing coding and usage errors in passing around and using the ciphertext (YMMV). It is also quite easy to unhexlify as needed.
I also return a dictionary. With debug options off, the dictionary contains error indicators and the resulting ciphertext. Again, I find this useful when debugging. I prefer to write functions that are debugged and re-use them…
from Crypto.Cipher import AES
from Crypto import Random
import hashlib
import binascii
import sys
def AES_encrypt(plaintext, str_key_in, debug = False, build_iv = True):
# key_in is any string, hex, etc. and any length
# will use any str_key_in and make a SHA256 (32 byte) hash.
# if you want a salt or some other sort of random stuff, simply add it in BEFORE it
# is passed to here.
# if for some odd reason you wish to create and build your own iv,
# put it into the build_iv spot. It MUST be 16 bytes.
result_dic = {}
# this is a bit key....not printable character restricted!!!
AES_key_used = hashlib.sha256(str_key_in).digest()
if debug == True:
result_dic['AES_str_key'] = str_key_in
result_dic['hashed_AES_key_used'] = AES_key_used
result_dic['hex_hashed_AES_key_used'] = binascii.hexlify(AES_key_used)
result_dic['plaintext'] = plaintext
try:
# will implement only the most common mode = CBC
# for CBC, must have input string multiple of 16 bytes long
# padding function will do this
BS = 16
pad = lambda s: s + (BS - len(s) % BS) * chr(BS - len(s) % BS)
plaintext_with_pad = pad(plaintext)
if build_iv == True:
# initialization vector from random function
iv = Random.new().read(BS)
else:
iv = build_iv
encryptor = AES.new(AES_key_used, AES.MODE_CBC, iv)
ciphertext = encryptor.encrypt(plaintext_with_pad)
# put the iv in front of the ciphertext
# this is bit/byte data....not printable
iv_ciphertext = iv + ciphertext
# setup the resulting dictionary
result_dic['error'] = 'none'
result_dic['result'] = True
# will pass ciphertext as hex.
# this allows printing and passing without weird things happening
# if want bit/byte result, simply unhexlify!
result_dic["hex_iv_ciphertext"] = binascii.hexlify(iv_ciphertext)
except:
error_info = sys.exc_info()[0]
result_dic['error'] = 'error ' + str(error_info)
result_dic['result'] = False
result_dic["hex_iv_ciphertext"] = result_dic['error']
if debug == True:
print '----------------- AES error ------------------'
for i in result_dic: print i, ' ... ', result_dic[i]
exit()
return(result_dic)
if debug == True:
result_dic['build_iv'] = build_iv
result_dic['hex_iv']= binascii.hexlify(iv)
result_dic['hex_ciphertext'] = binascii.hexlify(ciphertext)
result_dic['iv_ciphertext'] = iv_ciphertext
result_dic['iv'] = iv
result_dic['ciphertext'] = ciphertext
print '..........AES ecnrypt debug output.................'
for i in result_dic: print i, ' ... ', result_dic[i]
print '...................................................'
return(result_dic)
AES decrypt example:
AES decrypt is pretty self-explanatory. Note this function matches the one above. It only deals in hex-encoded data.
# mate to AES_encrypt
# RETURNS A DICTIONARY
def AES_decrypt(hex_iv_ciphertext, str_key_in, debug = False):
# CIPHERTEXT IS IN HEX FORMAT
# hex is used because it is printable, usually easily passable without screwups or weird things happening
# hex_ciphertext is from doAES_encrypt
# str_key_in is the key used for doAES_encrypt...ie before sha256
result_dic = {}
result_dic['error'] = 'none'
result_dic['result'] = True
# make input hex into bit/byte format
iv_ciphertext = binascii.unhexlify(hex_iv_ciphertext)
# make key frm string key input
# this is a 32 byte, non-printable key
AES_key_used = hashlib.sha256(str_key_in).digest()
if debug == True:
# if debug, add info to result_dic
result_dic['iv_ciphertext'] = iv_ciphertext
result_dic['hex_iv_ciphertext'] = hex_iv_ciphertext
result_dic['AES_key_in'] = str_key_in
result_dic['hashed_AES_key_used'] = AES_key_used
result_dic['hex_hashed_AES_key_used'] = binascii.hexlify(AES_key_used)
try:
# cipher text should have block size of 16
keymod = len(iv_ciphertext) % 16
if keymod <> 0:
result_dic['result'] = False
result_dic['error'] = 'error = cipher octet mod-16 is not zero...' + str(keymod)
for i in result_dic: print i, ' ... ' , result_dic[i]
exit()
return(result_dic)
# divide the ciphertext into the iv and the text parts
# these are binary/bit/octet coded things
cipher_octet = iv_ciphertext[16:]
iv_octet = iv_ciphertext[:16]
cipher = AES.new(AES_key_used, AES.MODE_CBC, iv_octet)
plaintext_with_pad = cipher.decrypt(cipher_octet)
# remember needed plaintext to be in multiples of BS....so added pad
# if the pad is there, remove it now
unpad = lambda s : s[:-ord(s[len(s)-1:])]
plaintext = unpad(plaintext_with_pad)
# success
result_dic['result'] = True
result_dic['plaintext'] = plaintext
except:
result_dic['result'] = False
error_info = sys.exc_info()[0]
result_dic['error'] = 'error = ' + str(error_info)
result_dic['plaintext'] = 'error...did not decrypt properly' + str(error_info)
for i in result_dic: print i, ' ... ', result_dic[i]
exit()
return(result_dic)
References:
https://pypi.python.org/pypi/pycrypto
http://pythonhosted.org/pycrypto/
https://www.dlitz.net/software/pycrypto/api/current/Crypto.Cipher.AES-module.html