Saturday, February 21, 2015

Adventures with Pangram

I was recently given a fun little puzzle. A Pangram. I'm not going to pretend that I knew what a Pangram was before it was explained to me. Basically it is a sentence that has every letter of the alphabet. For instance 'The quick brown fox jumped over the lazy dog'. Wow, I've used Pangrams without even knowing it!

First Concept

Not so great. My first thought is to have an array of the alphabet chars, iterate through the input and check to see if the current char is used and store the result somewhere. Meh, it works but it's not that exciting really. It leaves a lot to be desired as far as elegant algorithm is concerned much less performance. There is a lot of checking you have to do and I wanted to find a better solution (without googling the answer of course). I'm not so concerned with readability as I am with performance (as this is a puzzle and not real production code). 

Second Concept

I walked away and when I got into my car it immediately dawned on me another possible solution. 26 letters in the alphabet could be represented by a single 32 bit integer. If all the letters in the alphabet were represented by bit positions (a=0, b=1, c=2...) and all the letters in the alphabet were present then all the bits in the integer would be 1s instead of 0s. You would already know that the answer is:

0011 1111 1111 1111 1111 1111 1111
or
67,108,863
This is an intriguing idea as the operation to check if the integer value of the bits your flipping >= 67,108,863 is very fast and very simple. Sounds good right? Well as it turns out... it works great! Here is the end result:

If you go to the 'Gist' snippet at https://gist.github.com/hobeau/5b941124719a10ca2ce6 you can see the original implementation along with the revisions.

First Iteration

My first iteration was to use BitVector32 which is a special .net collection that is very good and VERY fast at bit operations with a 32 bit integer. It is a bit difficult to grasp at first as it requires a few things such as a 'mask' array to point your code to each bit based on a zero based array of bits. To see more check out the source code here. Basically it uses a private uint field to keep track of the current value of the 32 bit integer and creates a zero-based indexer with a set operation that performs an OR or AND bit operation based on the boolean value that is passed from the calling code. The CreateMask function literally just takes in an integer value and bit shifts it by 1 to return an integer result that is the bit value of the current mask. All that to say, it is a helper method to build an array that looks like:

[1,2,4,8,16...]
This allows you to flip bits by the internal |= operation. So if you are rusty on your binary math, here's a quick refresher. If you 4 |= 8 you are essentially saying:
00100000 |
00010000

= 00110000 => 12
(note I'm only doing 8 bit bytes instead of 32 bit integers for simplicity and going left to right)

Second Iteration

My second iteration was to realize that setting up the mask and using BitVector32 was great but it also had more overhead than I really needed. Why not just do the bit operations manually? This increased performance greatly.

Next Iterations...

The next iterations included simple things such as: 
  • Creating a char[] at the beginning instead of dealing with the string objects indexer
  • Changing the loop into an unrolled loop. (the CLR only has to do bound-checking for half the iterations instead of every one)
  • Checking to see if the iterator ('i') was greater than 25 (no need to check the value of the integer if the iterator wasn't passed the possible 26 chars in the alphabet
  • Performing bit operations by hand vs using BitVector32

Performance

After I finished I figured I would check to see how this algorithm performs compared to some other algorithms out there. I found http://rosettacode.org/wiki/Pangram_checker#C.23 which was just what I needed. The test I run is using System.StopWatch and ran 100 rounds of 100,000 iterations of my code vs the examples I found on rosettacode. The first example uses Linq and performance isn't really in mind. Ellapsed milliseconds is about 500 on average. The second example runs at about 56-58 milliseconds which is much better. The algorithm I wrote runs on average of 21-23 milliseconds so even faster with all of the tweaks/modifications.

Conclusion

This was a really fun little project. Working with low level bits tends to be very efficient. There is a lot more to talk about with BitVector32 and I'll talk about it more in another post. However, in this case I found it to be faster and more efficient to do the bit operations by hand vs using BitVector32. It tends to be a bit more confusing code but again this is really more of an algorithm (a toy algorithm at that) so performance is all I'm going for here and not readability.

Addendum

Note that as this is really just a toy algorithm, this assumes an ASCII string.

Friday, February 6, 2015

ProtectMemory vs ProtectData

Main Reference

Code Observations

There are two major ways of internally protecting data in your applications. 'ProtectData' and 'ProtectedMemory'. The purpose of ProtectData is to protect the data within the scope of:
  • The current user who is logged in and using the application that is doing the protecting
  • The local machine on which the application is running on
If the data is protected within the CurrentUser scope, only that user account is able to decrypt that data and can usually only decrypt the data on that machine unless the user account is a 'roaming profile'. If the data is protected within the CurrentMachine scope, all the users who are authenticated on that machine can decrypt the data.

The purpose of ProtectedMemory is to protect data within the scope of:
  • The current user account (the application must be run by the same user account)
  • The same process (only the exact same process can decrypt)
  • Cross Process (different processes can decrypt but only within 
    One of the big differences between ProtectedData and ProtectedMemory is that ProtectedData can be stored somewhere such as a file and be decrypted even after your computer restarts. If you encrypt data with ProtectedMemory and store it somewhere it will not be decryptable after your computer restarts so this is a consideration. ProtectedMemory is typically useful when you want to protect something temporarily and limit the scope within a particular process.

    Another difference is that ProtectedData also adds a MAC (Message Authentication Code) to verify that the data has not been tampered with. This is useful especially when the data is not just temporary data but stored somewhere. Ultimately this increases the size of the output as the MAC is appended to the end of the cipher text. ProtectedMemory, on the other hand, does not use a MAC but instead has the same size output as the input. One caveat is that ProtectMemory also requires an input that is a multiple of 16 bytes (128 bits). This is interesting as Triple DES is the encryption algorithm on older versions of windows and it encrypts in 64bit (8 byte) blocks. There is an interesting code comment:
The Rtl functions accept data in 8 byte increments, but we don't want applications to be able to make use of this, or else they're liable to break when the user upgrades.
I can't find much about this but I can only assume that this means that the developers were thinking ahead to a future change to the underlying algorithm and moving towards a more secure encryption algorithm that requires 128bit (16 byte) block sizes such as AES which is used in Windows7-8.

This article isn't meant to go into great detail of cryptography or the internals of the DPAPI but something to take note of is the underlying algorithms used in the different versions of windows (thanks to the folks at passcape):

OS Encryption Algorithm Hash Algorithm Number of iterations in PKCS#5
Windows2000 RC4 SHA1 1
WindowsXP 3DES SHA1 4000
WindowsVista 3DES SHA1 24000
Windows7 AES256 SHA512 5600

Protected Memory

ProtectedMemory is essentially a wrapper for the RtlEncryptMemory function also known as 'SystemFunction040' imported from Advapi32.dll. You can check out the source code online but I'll put it up here just for quick reference purposes:
[SecuritySafeCritical]
        public static void Protect (byte[] userData, 
                                    MemoryProtectionScope scope) {
            if (userData == null)
                throw new ArgumentNullException("userData");
            if (Environment.OSVersion.Platform == PlatformID.Win32Windows)
                throw new NotSupportedException("NotSupported_PlatformRequiresNT");
 
            VerifyScope(scope);
 
            // The RtlEncryptMemory and RtlDecryptMemory functions are available on WinXP and publicly published 
            // in the ntsecapi.h header file as of Windows Server 2003.
            // The Rtl functions accept data in 8 byte increments, but we don't want applications to be able to make use of this, 
            // or else they're liable to break when the user upgrades.
            if ((userData.Length == 0) || (userData.Length % CAPI.CRYPTPROTECTMEMORY_BLOCK_SIZE != 0))
                throw new CryptographicException("Cryptography_DpApi_InvalidMemoryLength");
 
            uint dwFlags = (uint) scope;
            try {
                // RtlEncryptMemory return an NTSTATUS
                int status = CAPI.SystemFunction040(userData,
                                                    (uint) userData.Length,
                                                    dwFlags);
                if (status < 0) // non-negative numbers indicate success
                    throw new CryptographicException(CAPI.CAPISafe.LsaNtStatusToWinError(status));
            }
            catch (EntryPointNotFoundException) {
                throw new NotSupportedException("NotSupported_PlatformRequiresNT");
            }
        }
The first couple things to notice is the parameters. The userData param is the input that you want to protect. 'scope' allows you to assert what kind of scope you want to protect your data in (User/Same Process/Cross Process). The first couple lines of code validate the data (make sure it's not null) and the environment. Advapi32.dll is only available as of WindowsXP (I say 'only' lol) and if you are running this on an older environment (good lord I hope not) then this will throw an exception. The next step is to verify the scope that your code asserted. It must be at least one of the three possible scopes or this will throw an exception. The next step is to verify the length. Must be greater than zero and also must be divisible by 16 (as noted earlier). The next step is to call into a very large class called 'CAPI' which defines constants and imports functions for much of System.Security.Cryptography. It's way too much code to display here so you'll just have to look it up. Essentially it takes the data, data length and converted scope (into a uint) and passes it to the external 'black box' of RtlEncryptMemory. Simple enough? The Unprotect function of course simply does the reverse so I won't waste time on it. Suffice it to say this is a very lightweight 'close to the vest' function to protect memory temporarily and have a minimal scope.

Protected Data

Protected data (on the other hand) is a bit more complicated. ProtectedMemory is very lightweight but ProtectedData is a bit on the heavier side, but with good reason.
[SecuritySafeCritical]
        public static byte[] Protect(byte[] userData,
            byte[] optionalEntropy,
            DataProtectionScope scope)
        {
            if (userData == null)
                throw new ArgumentNullException("userData");
            if (Environment.OSVersion.Platform == PlatformID.Win32Windows)
                throw new NotSupportedException("NotSupported_PlatformRequiresNT");

            GCHandle pbDataIn = new GCHandle();
            GCHandle pOptionalEntropy = new GCHandle();
            CAPI.CRYPTOAPI_BLOB blob = new CAPI.CRYPTOAPI_BLOB();

            RuntimeHelpers.PrepareConstrainedRegions();
            try
            {
                pbDataIn = GCHandle.Alloc(userData, GCHandleType.Pinned);
                CAPI.CRYPTOAPI_BLOB dataIn = new CAPI.CRYPTOAPI_BLOB();
                dataIn.cbData = (uint) userData.Length;
                dataIn.pbData = pbDataIn.AddrOfPinnedObject();
                CAPI.CRYPTOAPI_BLOB entropy = new CAPI.CRYPTOAPI_BLOB();
                if (optionalEntropy != null)
                {
                    pOptionalEntropy = GCHandle.Alloc(optionalEntropy, GCHandleType.Pinned);
                    entropy.cbData = (uint) optionalEntropy.Length;
                    entropy.pbData = pOptionalEntropy.AddrOfPinnedObject();
                }
                uint dwFlags = CAPI.CRYPTPROTECT_UI_FORBIDDEN;
                if (scope == DataProtectionScope.LocalMachine)
                    dwFlags |= CAPI.CRYPTPROTECT_LOCAL_MACHINE;
                unsafe
                {
                    if (!CAPI.CryptProtectData(new IntPtr(&dataIn),
                        String.Empty,
                        new IntPtr(&entropy),
                        IntPtr.Zero,
                        IntPtr.Zero,
                        dwFlags,
                        new IntPtr(&blob)))
                    {
                        int lastWin32Error = Marshal.GetLastWin32Error();

                        // One of the most common reasons that DPAPI operations fail is that the user
                        // profile is not loaded (for instance in the case of impersonation or running in a
                        // service.  In those cases, throw an exception that provides more specific details
                        // about what happened.
                        if (CAPI.ErrorMayBeCausedByUnloadedProfile(lastWin32Error))
                        {
                            throw new CryptographicException(
                                "Cryptography_DpApi_ProfileMayNotBeLoaded");
                        }
                        else
                        {
                            throw new CryptographicException(lastWin32Error);
                        }
                    }
                }

                // In some cases, the API would fail due to OOM but simply return a null pointer.
                if (blob.pbData == IntPtr.Zero)
                    throw new OutOfMemoryException();

                byte[] encryptedData = new byte[(int) blob.cbData];
                Marshal.Copy(blob.pbData, encryptedData, 0, encryptedData.Length);

                return encryptedData;
            }
            catch (EntryPointNotFoundException)
            {
                throw new NotSupportedException("NotSupported_PlatformRequiresNT");
            }
            finally
            {
                if (pbDataIn.IsAllocated)
                    pbDataIn.Free();
                if (pOptionalEntropy.IsAllocated)
                    pOptionalEntropy.Free();
                if (blob.pbData != IntPtr.Zero)
                {
                    CAPI.CAPISafe.ZeroMemory(blob.pbData, blob.cbData);
                    CAPI.CAPISafe.LocalFree(blob.pbData);
                }
            }
        }

First Steps:
First couple things to get out of the way is is the platform and input validation (of course). Then we have a method called PrepareConstrainedRegions. When this method is called before a try/catch it creates a Constrained Execution Region which is designed to avoid 'out of band' exceptions such as a stack overflow. It does this by calling ProbeForSufficientStack which probes 48KB of the stack to make sure there is sufficient space.

Pin The Data:
The next step is to call GCHandle.Alloc with the parameter 'Pinned'. What this means is that the code tells the CLR Garbage Collector not to move the memory address around. This is good in the sense that the Garbage Collector can relocate objects and leave several copies of them in memory. This is good for efficiency but bad for security. Another possibility is that the object may end up in the Page File which is typically unencrypted (but it can be encrypted) so it is possible that small amounts of data could be retrieved. So you can see why, if you are trying to protect data, it's important to pin that data in memory as soon as possible while it's plaintext.

Prepare The Data:
The next step is to gather the pieces together that the Win32 CryptProtectData function requires. The first thing we have to put together is the CRYPT_INTEGER_BLOB which is basically just a variable that has the plaintext to be secured and the length of that plaintext. On the C# side there is a struct created in CAPI.cs called CRYPTOAPI_BLOB that matches the CRYPT_INTEGER_BLOB requirements with a uint to set specify the length of the plaintext and an IntPtr as a pointer to the plaintext. So we set the cbData to the userData length and the Alloc function we called earlier gives us the pointer so, yay, we now have our blob that we can pass to the CryptProtectData api function. We then have some similar code for the 'entropy' such as a password. Basically this is just extra data but the thing to remember is that whatever is used to protect this data, must also be used to un-protect it. The last thing we have to do before calling our CryptProtectData function is to set our scope flags.

Protect The Data:
Yay! Now we can finally protect the data. The params we pass to CAPI.CryptProtectData are

  • The pointer to our plaintext blob (still pinned in memory)
  • An empty string to the underlying 'data description' (kinda weird)
  • The pointer to the entropy (if no entropy then it passes in a null pointer)
  • A null pointer to a reserved parameter. This is kinda weird too, it's pretty much just reserved for possible future upgrades to the win32 function.
  • A null pointer to the optional prompt (not necessary for most applications)
  • The scope flags
  • A pointer to the empty return blob which will have the ciphertext
Errors:
They happen! There is an interesting code comment that notes:
One of the most common reasons that DPAPI operations fail is that the user profile is not loaded (for instance in the case of impersonation or running in a service.  In those cases, throw an exception that provides more specific details about what happened. 
 A good example of this is running ASP.net code in IIS when the IIS setting 'Load User Profile' is set to false. If this is the case and you try to use the ProtectedData API then it will throw this exception.

Get The Ciphertext:
Assuming no exceptions have been thrown, the data now needs to be marshaled from the un-managed environment to the managed CLR environment. The developers do that by calling Marshal.Copy to copy the ciphertext to the 'encryptedData' variable and return it.

Cleanup:
Last but not least is the final cleanup. Free up the garbage collection 'pin' on pbDataIn (the plaintext) and then call a win32 function called ZeroMemory which is designed to very quickly replace the plaintext memory with zeros. This can work better/faster than setting it in code as some compilers, optimizers or managed code may not do what you would expect. Last step is to call LocalFree external function (Kernel32.dll) which zeroizes any memory locks on that memory location. If a process tries to access that memory an access violation will be thrown.

So that's about it. There is no such thing as perfect security and there have been some interesting demonstrations of weaknesses in DPAPI but it's certainly better than nothing and it's quite interesting to understand how it works under the surface. This, of course doesn't go into the deep internals of the win32