July 03, 2015

Encrypting Data With Ecto

Author’s Note: This post has been substantially updated since it was first posted. A much stronger crypto implementation has been used and the code has been reworked to be cleaner and more efficient.

I’ve also released an open-source Hex package that implements the approach to encryption I describe in this post. Read the announcement post here.

In the future, as privacy becomes more and more of an issue, we’re going to be encrypting a lot more of the data we store on the web. With that in mind, I thought it would be a good idea to figure out a good way to integrate data encryption with Elixir’s database library, Ecto.

Requirements

In Rails, we have a gem called attr_encrypted which makes it easy to have encrypted attributes on ActiveRecord models. The important features are:

  • Transparent encryption/decryption of fields
  • Custom encryptors, allowing for customizable security
  • Automatic query support for encrypted fields

Let’s take a look at how to replicate this in Ecto.

Building Our Encryptor

Before we can encrypt anything, we’re going to need to create a module to handle encryption and decryption. Erlang comes with a good crypto module, which will serve as our base.

For this use case, I’ve chosen to use AES encryption in CTR mode, but you could just as easily use any other type of encryption supported by crypto.

defmodule MyApp.AES do
  # Encrypts each plaintext with a different, random IV. This is much more
  # secure than reusing the same IV, and is highly recommended.
  def encrypt(plaintext) do
    iv    = :crypto.strong_rand_bytes(16) # Random IVs for each encryption
    state = :crypto.stream_init(:aes_ctr, key, iv)

    {_state, ciphertext} = :crypto.stream_encrypt(state, to_string(plaintext))
    iv <> ciphertext # Prepend IV to ciphertext, for easier decryption
  end

  def decrypt(ciphertext) do
    # Split the IV that was used off the front of the binary. It's the first
    # 16 bytes.
    <<iv::binary-16, ciphertext::binary>> = ciphertext
    state = :crypto.stream_init(:aes_ctr, key, iv)

    {_state, plaintext} = :crypto.stream_decrypt(state, ciphertext)
    plaintext
  end

  # Convenience function to get the application's configuration key.
  defp key do
    Application.get_env(:encryption, MyApp.AES)[:key]
  end
end

The module can then be used pretty simply:

MyApp.AES.encrypt("hello world!")
|> MyApp.AES.decrypt
# => "hello world!"

You can configure the key to use in the config.exs for your app:

config :my_app, MyApp.AES,
       key: :base64.decode("..."), # assuming your key is in base64

Now that we have an encryptor, we can look at integrating it with Ecto.

Ecto.Type

To implement transparent encryption and decryption of fields, we need to add a layer of code in Ecto’s row insertion and loading logic, so that we can encrypt the fields on save, and decrypt them when they are read. Fortunately, Ecto has exactly what we need in Ecto.Type.

Ecto.Type lets you define custom field types for Ecto’s schema, allowing you to modify the value of a field when it is loaded or saved. Here’s a custom EncryptedField type:

defmodule MyApp.EncryptedField do
  import MyApp.AES

  # Assert that this module behaves like an Ecto.Type so that the compiler can
  # warn us if we forget to implement the 4 callback functions below.
  @behaviour Ecto.Type

  # This defines the base type of this kind of field in the database.
  def type, do: :binary

  # This is called on a value in queries if it is not a string.
  def cast(value) do
    {:ok, to_string(value)}
  end

  # This is called when the field value is about to be written to the database
  def dump(value) do
    ciphertext = value |> to_string |> encrypt
    {:ok, ciphertext}
  end

  # This is called when the field is loaded from the database
  def load(value) do
    {:ok, decrypt(value)}
  end
end

We’re almost done! Now, since the encryptor we wrote operates directly on binary, the fields we encrypt should be :binary fields in the database. Suppose we have an Ecto.Model with a binary name attribute like this:

defmodule MyApp.User do
  use Ecto.Model

  schema "users" do
    field :name, :binary
  end
end

To encrypt the name field, you just need to specify the MyApp.EncryptedField type instead of :binary:

defmodule MyApp.User do
  use Ecto.Model

  schema "users" do
    field :name, MyApp.EncryptedField
  end
end

That’s it! The field will be transparently encrypted and decrypted as the model struct is saved to or loaded from the database.

Querying

Querying encrypted fields is difficult, because our encryptor is designed specifically not to produce the same ciphertext twice for security reasons. To understand the difficulty, consider the following example.

  • Suppose you have a user with the email address test@example.com. This value is encrypted in a binary field in the database.

  • Now, someone comes along and tries to log in as test@example.com, and you want to query the database for a user with that email address.

  • You can’t search for test@example.com, because that value doesn’t exist in the database. Neither can you run the email address through the encryptor, because that will produce a different value than what is in the database.

So, how can you query for the email address? The answer is to add another field to the database called :email_hash, which will contain a hash of the :email field’s contents. This is both secure and convenient: secure because the contents can’t be reconstructed from a good hash; convenient, because hash algorithms produce the same result for the same plaintext every time.

Let’s implement another Ecto.Type, which will automatically hash the value of a field using the recommended SHA256 algorithm:

defmodule MyApp.HashField do
  @behaviour Ecto.Type

  def type, do: :binary

  def cast(value) do
    {:ok, to_string(value)}
  end

  def dump(value) do
    {:ok, hash(value)}
  end

  def load(value) do
    {:ok, value}
  end

  def hash(value) do
    :crypto.hash(:sha256, value)
  end
end

Then, we add this field to the schema:

defmodule MyApp.User do
  use Ecto.Model

  schema "users" do
    field :name, MyApp.EncryptedField
    field :email, MyApp.EncryptedField
    field :email_hash, MyApp.HashField
  end

  # We must ensure that the email_hash field is always a hash of the same value
  # held in the email field, or queries will be inaccurate.
  before_insert :set_hashed_fields
  before_update :set_hashed_fields

  defp set_hashed_fields(changeset) do
    changeset
    |> put_change(:email_hash, get_field(changeset, :email))
  end
end

And now, we can easily query on the :email_hash field, because Ecto will automatically use our HashField module to convert our search term to a hash:

email = "test@example.com"

MyApp.Repo.get_by(MyApp.User, email_hash: email)
MyApp.Repo.one(from u in MyApp.User, where: u.email_hash == ^email)

# Both produce this query:
#
# SELECT u0."id", u0."name", u0."email", u0."email_hash", u0."inserted_at",
# u0."updated_at" FROM "users" AS u0 WHERE (u0."email_hash" = $1) [<<151, 61,
# 254, 70, 62, 200, 87, 133, 245, 249, 90, 245, 186, 57, 6, 238, 219, 45, 147,
# 28, 36, 230, 152, 36, 168, 158, 166, 93, 186, 78, 129, 59>>]

And, we’re done!

Get the Code

All this code and more is over on a Phoenix sample project I put up on Github. Here’s the relevant commit. It includes tests and more examples, including how to validate the uniqueness of an encrypted field.

Conclusion

We implemented the main features of attr_encrypted, a somewhat complicated Ruby gem, in about 60 lines of code, without any monkey patching! This implementation is also substantially more secure than attr_encrypted’s default settings, which seem to rely on reusing IVs and using AES in CBC mode.

Even better, this solution is very easy to understand and is very extensible. For example, if you wanted to use a physical Hardware Security Module to do the encryption and decryption, you could just write a custom encryptor and use it instead in your EncryptedField module.

I wasn’t sure how easy this would be to implement, and I’m very happy with the result. My confidence in Elixir as a tool continues to rise.

READ THIS NEXT: Changing Your Ecto Encryption Key


Credits

Credit to @victorluft for suggesting much better crypto, and @josevalim for suggesting that I not make the encryptor a GenServer, since this could be a performance bottleneck.

Security Note

Adding a hashed version of a field is a slight security risk, because it reveals rows which have the same value in that field. Their hashes will match. Depending on your threat model, this may mean you’ll need a different solution for querying on encrypted fields.

Did You Enjoy This Post?