I would like to add AES encryption to a software product, but am concerned by increasing the size of the data. I am guessing that the data does increase in size, and then I'll have to add a compression algorithm to compensate.
byte—8 bytes selected by a
SecureRandom makes a good salt—which doesn't need to be kept secret) with the recipient out-of-band. Then to derive a good key from this information:
/* Derive the key, given password and salt. */ SecretKeyFactory factory = SecretKeyFactory.getInstance("PBKDF2WithHmacSHA256"); KeySpec spec = new PBEKeySpec(password, salt, 65536, 256); SecretKey tmp = factory.generateSecret(spec); SecretKey secret = new SecretKeySpec(tmp.getEncoded(), "AES");
The magic numbers (which could be defined as constants somewhere) 65536 and 256 are the key derivation iteration count and the key size, respectively.
The key derivation function is iterated to require significant computational effort, and that prevents attackers from quickly trying many different passwords. The iteration count can be changed depending on the computing resources available.
The key size can be reduced to 128 bits, which is still considered "strong" encryption, but it doesn't give much of a safety margin if attacks are discovered that weaken AES.
Used with a proper block-chaining mode, the same derived key can be used to encrypt many messages. In Cipher Block Chaining (CBC), a random initialization vector (IV) is generated for each message, yielding different cipher text even if the plain text is identical. CBC may not be the most secure mode available to you (see AEAD below); there are many other modes with different security properties, but they all use a similar random input. In any case, the outputs of each encryption operation are the cipher text and the initialization vector:
/* Encrypt the message. */ Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding"); cipher.init(Cipher.ENCRYPT_MODE, secret); AlgorithmParameters params = cipher.getParameters(); byte iv = params.getParameterSpec(IvParameterSpec.class).getIV(); byte ciphertext = cipher.doFinal("Hello, World!".getBytes(StandardCharsets.UTF_8));
ciphertext and the
iv. On decryption, the
SecretKey is regenerated in exactly the same way, using using the password with the same salt and iteration parameters. Initialize the cipher with this key and the initialization vector stored with the message:
/* Decrypt the message, given derived key and initialization vector. */ Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding"); cipher.init(Cipher.DECRYPT_MODE, secret, new IvParameterSpec(iv)); String plaintext = new String(cipher.doFinal(ciphertext), StandardCharsets.UTF_8); System.out.println(plaintext);
Java 7 included API support for AEAD cipher modes, and the "SunJCE" provider included with OpenJDK and Oracle distributions implements these beginning with Java 8. One of these modes is strongly recommended in place of CBC; it will protect the integrity of the data as well as their privacy.
java.security.InvalidKeyException with the message "Illegal key size or default parameters" means that the cryptography strength is limited; the unlimited strength jurisdiction policy files are not in the correct location. In a JDK, they should be placed under
Based on the problem description, it sounds like the policy files are not correctly installed. Systems can easily have multiple Java runtimes; double-check to make sure that the correct location is being used.
Please consider long and hard if you can't get around implementing your own cryptography
The ugly truth of the matter is that if you are asking this question you will probably not be able to design and implement a secure system.
Let me illustrate my point: Imagine you are building a web application and you need to store some session data. You could assign each user a session ID and store the session data on the server in a hash map mapping session ID to session data. But then you have to deal with this pesky state on the server and if at some point you need more than one server things will get messy. So instead you have the idea to store the session data in a cookie on the client side. You will encrypt it of course so the user cannot read and manipulate the data. So what mode should you use? Coming here you read the top answer (sorry for singling you out myforwik). The first one covered - ECB - is not for you, you want to encrypt more than one block, the next one - CBC - sounds good and you don't need the parallelism of CTR, you don't need random access, so no XTS and patents are a PITA, so no OCB. Using your crypto library you realize that you need some padding because you can only encrypt multiples of the block size. You choose PKCS7 because it was defined in some serious cryptography standards. After reading somewhere that CBC is provably secure if used with a random IV and a secure block cipher, you rest at ease even though you are storing your sensitive data on the client side.
Years later after your service has indeed grown to significant size, an IT security specialist contacts you in a responsible disclosure. She's telling you that she can decrypt all your cookies using a padding oracle attack, because your code produces an error page if the padding is somehow broken.
This is not a hypothetical scenario: Microsoft had this exact flaw in ASP.NET until a few years ago.
The problem is there are a lot of pitfalls regarding cryptography and it is extremely easy to build a system that looks secure for the layman but is trivial to break for a knowledgeable attacker.
What to do if you need to encrypt data
For live connections use TLS (be sure to check the hostname of the certificate and the issuer chain). If you can't use TLS, look for the highest level API your system has to offer for your task and be sure you understand the guarantees it offers and more important what it does not guarantee. For the example above a framework like Play offers client side storage facilities, it does not invalidate the stored data after some time, though, and if you changed the client side state, an attacker can restore a previous state without you noticing.
If there is no high level abstraction available use a high level crypto library. A prominent example is NaCl and a portable implementation with many language bindings is Sodium. Using such a library you do not have to care about encryption modes etc. but you have to be even more careful about the usage details than with a higher level abstraction, like never using a nonce twice. For custom protocol building (say you want something like TLS, but not over TCP or UDP) there are frameworks like Noise and associated implementations that do most of the heavy lifting for you, but their flexibility also means there is a lot of room for error, if you don't understand in depth what all the components do.
If for some reason you cannot use a high level crypto library, for example because you need to interact with existing system in a specific way, there is no way around educating yourself thoroughly. I recommend reading Cryptography Engineering by Ferguson, Kohno and Schneier. Please don't fool yourself into believing you can build a secure system without the necessary background. Cryptography is extremely subtle and it's nigh impossible to test the security of a system.
Comparison of the modes
- Modes that require padding:
Like in the example, padding can generally be dangerous because it opens up the possibility of padding oracle attacks. The easiest defense is to authenticate every message before decryption. See below.
- ECB encrypts each block of data independently and the same plaintext block will result in the same ciphertext block. Take a look at the ECB encrypted Tux image on the ECB Wikipedia page to see why this is a serious problem. I don't know of any use case where ECB would be acceptable.
- CBC has an IV and thus needs randomness every time a message is encrypted, changing a part of the message requires re-encrypting everything after the change, transmission errors in one ciphertext block completely destroy the plaintext and change the decryption of the next block, decryption can be parallelized / encryption can't, the plaintext is malleable to a certain degree - this can be a problem.
- Stream cipher modes: These modes generate a pseudo random stream of data that may or may not depend the plaintext. Similarly to stream ciphers generally, the generated pseudo random stream is XORed with the plaintext to generate the ciphertext. As you can use as many bits of the random stream as you like you don't need padding at all. Disadvantage of this simplicity is that the encryption is completely malleable, meaning that the decryption can be changed by an attacker in any way he likes as for a plaintext p1, a ciphertext c1 and a pseudo random stream r and attacker can choose a difference d such that the decryption of a ciphertext c2=c1⊕d is p2 = p1⊕d, as p2 = c2⊕r = (c1 ⊕ d) ⊕ r = d ⊕ (c1 ⊕ r). Also the same pseudo random stream must never be used twice as for two ciphertexts c1=p1⊕r and c2=p2⊕r, an attacker can compute the xor of the two plaintexts as c1⊕c2=p1⊕r⊕p2⊕r=p1⊕p2. That also means that changing the message requires complete reencryption, if the original message could have been obtained by an attacker. All of the following steam cipher modes only need the encryption operation of the block cipher, so depending on the cipher this might save some (silicon or machine code) space in extremely constricted environments.
- CTR is simple, it creates a pseudo random stream that is independent of the plaintext, different pseudo random streams are obtained by counting up from different nonces/IVs which are multiplied by a maximum message length so that overlap is prevented, using nonces message encryption is possible without per message randomness, decryption and encryption are completed parallelizable, transmission errors only effect the wrong bits and nothing more
- OFB also creates a pseudo random stream independent of the plaintext, different pseudo random streams are obtained by starting with a different nonce or random IV for every message, neither encryption nor decryption is parallelizable, as with CTR using nonces message encryption is possible without per message randomness, as with CTR transmission errors only effect the wrong bits and nothing more
- CFB's pseudo random stream depends on the plaintext, a different nonce or random IV is needed for every message, like with CTR and OFB using nonces message encryption is possible without per message randomness, decryption is parallelizable / encryption is not, transmission errors completely destroy the following block, but only effect the wrong bits in the current block
- Disk encryption modes: These modes are specialized to encrypt data below the file system abstraction. For efficiency reasons changing some data on the disc must only require the rewrite of at most one disc block (512 bytes or 4kib). They are out of scope of this answer as they have vastly different usage scenarios than the other. Don't use them for anything except block level disc encryption. Some members: XEX, XTS, LRW.
To prevent padding oracle attacks and changes to the ciphertext, one can compute a message authentication code (MAC) on the ciphertext and only decrypt it if it has not been tampered with. This is called encrypt-then-mac and should be preferred to any other order. Except for very few use cases authenticity is as important as confidentiality (the latter of which is the aim of encryption). Authenticated encryption schemes (with associated data (AEAD)) combine the two part process of encryption and authentication into one block cipher mode that also produces an authentication tag in the process. In most cases this results in speed improvement.
- CCM is a simple combination of CTR mode and a CBC-MAC. Using two block cipher encryptions per block it is very slow.
- OCB is faster but encumbered by patents. For free (as in freedom) or non-military software the patent holder has granted a free license, though.
- GCM is a very fast but arguably complex combination of CTR mode and GHASH, a MAC over the Galois field with 2^128 elements. Its wide use in important network standards like TLS 1.2 is reflected by a special instruction Intel has introduced to speed up the calculation of GHASH.
Considering the importance of authentication I would recommend the following two block cipher modes for most use cases (except for disk encryption purposes): If the data is authenticated by an asymmetric signature use CBC, otherwise use GCM.
- Objective-c – AES Encryption for an NSString on the iPhone
- Iphone – Does the application “contain encryption”
- Java – Size of data after AES/CBC and AES/ECB encryption
- Fundamental difference between Hashing and Encryption algorithms
- Java – Encryption using AES-128 in Android and IPhone (Different result)