Say no to source-controlled credentials: .NET + RSA encryption

Let’s start with an axiom: you should never store any credentials in your source code in plain text, especially if the codebase is source-controlled. If you are doing it, there’s something seriously wrong with your architecture.1

Here’s one possible solution to this problem. Presumably you already have an OpenSSH pair of public/private keys. It will serve us nicely in this case:

1. Generate a PEM version of your OpenSSH public key:

$openssl rsa -in ~/.ssh/id_rsa -pubout > ~/.ssh/id_rsa.pub.pem Enter pass phrase for ~/.ssh/id_rsa: writing RSA key  2. Write your password in a plaintext file (say, admin_pass), and encrypt it (say, into admin_pass.enc):2 cat admin_pass | openssl rsautl -encrypt -pubin -inkey ~/.ssh/id_rsa.pub.pem > admin_pass.enc  3. Delete admin_pass from your hard drive.3 The resulting admin_pass.enc can be safely checked into the repository: one needs access to your private key in order to decrypt it: cat admin_pass.enc | openssl rsautl -decrypt -inkey ~/.ssh/id_rsa  However, you probably don’t want to do even that. The best policy is to decrypt the credentials directly in your app, securely and in-memory. Presumably your favorite programming language/platform has some API for securely-stored memory objects (usually backed by the OS/hardware cryptography system). I am going to show an implementation for .NET below. First, in order to use the OpenSSH keys in .NET infrastructure, we need to convert them both to PEM, and also generate the corresponding certificate and the PFX file. Here’s the process using my website and email as an example for the generated certificate (the exact values are not really important for our intended purpose): $ openssl rsa -in ~/.ssh/id_rsa > ~/.ssh/id_rsa.pem
Enter pass phrase for ~/.ssh/id_rsa:
writing RSA key

$openssl req -nodes -x509 -days 3650 -subj '/CN=www.alexpolozov.com/emailAddress=polozov@cs.washington.edu' -new -key ~/.ssh/id_rsa.pem -out ~/.ssh/certificate.crt$ openssl pkcs12 -export -out ~/.ssh/certificate.pfx -inkey ~/.ssh/id_rsa.pem -in ~/.ssh/certificate.crt


Now let’s use this certificate to decrypt the password in C#:

public static SecureString DecryptCredentials(string certificateFile, SecureString pkPassphrase,
string encryptedFile) {
string home = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile);
string sshDir = Path.Combine(home, ".ssh");
var cert = new X509Certificate2(Path.Combine(sshDir, certificateFile), pkPassphrase,
X509KeyStorageFlags.Exportable);
using (var rsa = cert.PrivateKey as RSACryptoServiceProvider)
using (var credentialStream = new FileStream(encryptedFile, FileMode.Open))
using (var encrypted = new MemoryStream()) {
credentialStream.CopyTo(encrypted);
// Unfortunately, RSACryptoServiceProvider does not have a Decrypt() overload that
// returns a SecureString (no idea why). Because of that, we have to store the
// decrypted data briefly in a garbage-collected memory, and manually convert it
// into a secure representation as soon as possible.
byte[] decrypted = rsa.Decrypt(encrypted.ToArray(), false);
var result = new SecureString();
foreach (char c in Encoding.ASCII.GetChars(decrypted))
result.AppendChar(c);
GC.Collect();  // get rid of the decrypted bytes
return result;
}
}

DecryptCredentials("certificate.pfx", Utils.SecurePrompt(), "admin_pass.enc");

The code uses a simple SecurePrompt function, which prompts the user for the SSH passphrase, encrypting one character at a time in a SecureString:

public static SecureString SecurePrompt(string prompt = "> ") {
Console.Write(prompt);
while (true) {
switch (input.Key) {
case ConsoleKey.Enter:
Console.WriteLine();
case ConsoleKey.Backspace:
Console.Write("\b \b");
}
break;
default:
Console.Write("*");
break;
}
}
}

2. rsautl has a maximum size limit on the data being encrypted, since it applies the RSA algorithm directly (as opposed to, say, AES). The limit is roughly equal to “[your key size] - [padding]”, which for common key sizes gives 200-400 bytes. Hopefully your password is pretty long, but it probably is not that long ☺
3. Alternatively, you could avoid creating it in the first place, and simply cat the password from stdin.

Substitution mnemonic

(I overheard this at UW, from a graduate student who told me he doesn’t know the original author of this mnemonic either. Please do not attribute it to either of us).

In PL and formal logic, there exists a common notation, which I personally consider the worst possible notation choice of all times. Let’s say you have an expression $E$, in which you want to replace every occurrence of a variable $x$ (or any other fixed subexpression, for that matter) with another variable $y$ (again, or any other expression). This operation is called substitution 1. In formal logic, $\lambda$-calculus, etc. there exists a bewildering soup of notations representing this simple transformation:

• $E[x \to y]$ (“$x$ is transformed into $y$”)
• $E[x \gets y]$ (“$x$ is replaced with $y$”)
• $E[x := y]$ (“assign $y$ to every occurrence of $x$)
• $E[y/x]$ (WTF?)

The first two (and their mirrored alternatives) are confusing on its own – you never know which substitution direction the arrow represents (is it “replace $x$ with $y$” or “replace $y$ with $x$”?). But the last one is notoriously ridiculous in that regard. It is too symmetric. How am I supposed to remember which variable substitutes the other one if the only thing denoting this relationship is a meaningless slash character in between? Some say this notation is the easiest to typeset – which is not true, because the third notation also doesn’t use any characters beyond standard ASCII, and it is unambiguous. Anyway, I usually tried to avoid the slash notation as much as possible whenever I could.

However, now I know a beautiful mnemonic, which breaks the “almost-symmetrical” relationship between $x$ and $y$:

“Think of it as $y$ falling over and squishing the $x$.”

1. The actual formal definition of substitution is trickier, because you have to take into account which variables are free in $E$, and which are bound. The details are outside the scope of this post.

On performance in .NET

Formerly, when someone asked me “How to make my C# program faster?”, my go-to advice was:

2. Switch to a native language.

Of these suggestions, step #1 should take care of performance in most situations, if done properly. However, sometimes it is not enough. For example, you are writing a user-facing real-time performance-critical application, such as the C# compiler, and you need to look not only at the average performance, but at the 99th percentile. This is where step #2 used to take over… but not anymore. Currently, my go-to advice looks like this:

2. Read “Writing High-Performance .NET Code”, Roslyn team blog, and the rest of the links provided below.
3. Switch to a native language.

Measurement tools

“Writing High-Performance .NET Code” should be your desk book for any performance-critical .NET application. I found the following knowledge it provides most important:

• What profiling tools exist in the .NET world (PerfView, ETW, CLR Profiler, dotTrace, dotMemory, Windbg…), how to use each one of them, how to answer specific questions with a combination of these tools, which metrics should you look at and why.
• How exactly .NET GC works and why you should care. How to help .NET GC by managing your allocations and object lifecycle carefully.
• JIT mechanics, NGEN, when to use each of them, startup rates.

In other words, when you have a specific problem with your .NET application, this book will show how to use the tools to locate who exactly is responsible for the bottleneck, how often it occurs, and so on. Often you will be surprised, because (assuming you’ve already gone through step #1) your greatest bottleneck is going to be either GC, allocations, JIT, boxing, or incorrect usage of .NET API. All of these subtle engineering details are quite unobvious if you don’t train yourself to keep them in mind.

Performance-minded coding

When you know the consequence of the problem (e.g. “my app spends 20% of the time in GC”, which arises because of “there’s this pinned handle that points to a long retention path that is being walked on every gen-2 collection”), the next step is to eliminate it. The series of links below (along with the accompanying chapters in “Writing High-Performance .NET Code”) explains the techniques for doing it. If you know that this particular path in your code is going to be performance-critical up to the 99th percentile (never optimize prematurely!), you should really think of these techniques as guiding principles to keep in mind every second during your coding process.

Here are some must-read articles for any .NET programmer that works with performance-critical code:

I’m going to list some of the mentioned techniques below, briefly. Again, the book and the sources above have already done it much better than me.

Avoid allocations

For every allocation you make, the .NET GC will spend a little more time on every future collection. If this instance is relatively long-lived, it’s going to eventually move to gen-1 and gen-2, and greatly affect the future GC times. But even if it is short-lived, the performance of gen-0 collection is going to matter in your higher percentiles anyway.

Some of the common sources of hidden allocations are:

• LINQ queries: they allocate enumerators and lambda-representing objects that capture references in a closure. Also, eventually you need to convert a query to a list/array, and it’s going to expand itself several times in the process. Instead you should allocate a list/array of a known capacity beforehand, and do all the required work directly, without allocating hidden LINQ enumerators.
• Lambdas: if they capture any references from the outer scope, they will be rewritten to compiler-generated objects. If your lambda operates in a generic context this object will not even be cached in place, and will be re-created every time. Consider refactoring your logic in such a way that lambda becomes non-capturing, or get rid of the lambda altogether.

UPD: This behavior will change in the upcomping C# 6. In Roslyn, lambda functions now always compile down to instance calls, which causes up to 35% performance gain in common scenarios.

• Strings: because they are immutable, every string modification allocates a new string object. This is most subtle in calls like string.Trim(' '), which actually returns a new string. Instead use index-based arithmetic on hot paths. For case-insensitive comparison, use the corresponding string.Compare overload instead of converting all your strings to lowercase with string.ToLower. Also, string concatenation for simple cases is much faster than string formatting.
• Invoking a method with params always allocates an array, possibly of zero length. Consider creating specialized overloads for most common numbers of parameters.

Use object pooling

If you use some objects often but temporarily, consider allocating a pool of objects of this type once, and reuse them where necessary. For example, the interface that Roslyn uses for StringBuilders look somewhat like this:

internal static class StringBuilderPool {
public static StringBuilder Allocate() {
return SharedPools.Default<StringBuilder>().AllocateAndClear();
}

public static void Free(StringBuilder builder) {
SharedPools.Default<StringBuilder>().ClearAndFree(builder);
}

public static string ReturnAndFree(StringBuilder builder) {
SharedPools.Default<StringBuilder>().ForgetTrackedObject(builder);
return builder.ToString();
}
}

Which is then later used in the following manner:

public override string GetTextBetween(SyntaxToken token1, SyntaxToken token2) {
var builder = StringBuilderPool.Allocate();
CommonFormattingHelpers.AppendTextBetween(token1, token2, builder);
return StringBuilderPool.ReturnAndFree(builder);
}

Compile your reflection calls into delegates

Method calling through Reflection is horrendously slow. If you know the signature of your method beforehand (and for some reason cannot agree on a common interface with the third party who implements the external method), you can use LINQ Expression Trees to compile your MethodInfo to a strongly-typed delegate. This needs to be done only once, after that you can keep a reference to the delegate, and call it whenever necessary with standard C# syntax.

The technique is described at Jon Skeet’s blog. Here’s how I used it in one of my recent projects:

public static TDelegate ToDelegate<TDelegate>(this MethodInfo method) {
var parameterTypes = method.GetParameters().Select(p => p.ParameterType).ToArray();
MethodInfo invokeMethod = typeof (TDelegate).GetMethod("Invoke");
var @params = invokeMethod.GetParameters()
.Select((p, i) => Expression.Parameter(p.ParameterType, "arg" + i))
.ToArray();
MethodCallExpression methodCall = Expression.Call(method,
@params.ZipWith(parameterTypes).Select2(Expression.Convert));
return Expression.Lambda<TDelegate>(Expression.Convert(methodCall, invokeMethod.ReturnType),
@params).Compile();
}

Avoid boxing

Apart from obvious cases where people unnecessarily convert value types into objects (such as using now deprecated System.Collections namespace), there are some subtler ones:

• String.Format boxes its arguments
• Structs are boxed when used through their implemented interface. In particular, this happens when you use LINQ: List<T>.Enumerator is a struct, but LINQ methods treat it as IEnumerator<T>
• Object.Equals(object other) is the source of all evil
• Do not ever concatenate a string with anything other than a string
• Calling GetHashCode on an enum causes boxing. Convert the enum instance to int first.

Use specialized collections for smaller sizes

An array outperforms a dictionary up to a certain number of elements, even though its lookup is $\mathcal{O}(n)$. The reason is all the boilerplate overhead that accompanies a dictionary (hashtable, bucket exploration, etc.). The exact tripping point depends on your particular code and has to be measured experimentally.

Know when to use structs and when not to

Structs give better performance than classes when they are small, consist of primitive types, and when they are not copied around often. The overhead of a class vtable is noticeable during garbage collection. However, classes are copied faster, because they are represented with just a pointer. On another hand, an array of classes will cause you a nightmare of cache misses (see reference locality). An array of structs neatly lies in memory as a single block, but would be a nightmare to copy as a function result. Bottom line, know your usecase and measure everything.

Note: you should always re-implement Equals, GetHashCode, and IEquatable<T>.Equals for your structs, because the default implementations use reflection to enumerate over the fields.

If you are going to reuse the same regular expression often, build it with a RegexOptions.Compiled flag. Do not ever use static methods of the Regex class, they are horrendously slow.

If your regular expressions do not require non-regular features such as lookaround or backreferences, consider using re2 instead of a default regex engine. It might yield you up to a 10x speedup. There is a .NET wrapper available, called Re2.Net.

I will stop here because I cannot possibly outline all of the techniques. Again, for the complete picture read the book, watch the talk, read the articles referenced above, know your tools, and measure everything carefully. Also, feel free to comment here if you want to contribute more advice!

Record types and pattern matching in C#

For those of you who do not follow the Roslyn project, an active discussion is currently going on regarding record classes (read: ADT) and pattern matching in the future version of C#. Neal Gafter wrote a draft spec and a prototype implementation, and both of them are rapidly evolving at this very moment. The prototype is not yet publicly available, but Neal promises that it will be shortly.
UPD: The prototype is now published!

What I like most of all in this undertaking is that they didn't just inject some pieces of F#/Scala/Haskell into C#. It would feel foreign, since the languages have different styles and design principles. Instead they actually designed a feature in such a way that it feels C#, rather than a foreign addition. Kudos to the language design team.

If you have any ideas and/or suggestions, feel free to join the discussion!

Roslyn and my favorite persistent data structure

This post is not a propaganda about perks and benefits of immutable persistent data structures. People familiar with the concept already know when it can tremendously help your code and when it cannot. And for those who aren't familiar I cannot give a better introduction than those three links:

In this post I want to talk about my favorite persistent data structure. Or, if you wish, the one I consider most beautiful.
For a long time the answer to this question was Zipper¹. However, two years ago this question popped up on StackOverflow:

Are Roslyn SyntaxNodes reused?

The team who built Roslyn, the new compiler infrastructure for C# and VB.NET languages, had to come up with an AST data structure that would satisfy a set of mutually incompatible requirements. The tree had to (a) have a cheap access to the parent field; (b) maintain an up-to-date character span in the document for each node; (c) persistently reuse most of the nodes on every update. As Eric Lippert describes, they solved this problem by maintaining two trees, one of which is immutable and persistent, and another has parent references and is re-built on-demand for each query, but only partially. The latter tree forms a facade around the former one, and together they solve all of the requirements at once.

The post is a great read, I encourage everyone to look at Eric's description. Because Roslyn is now open-source, the entire implementation is now available on CodePlex (hint: search for SyntaxNode/RedNode/GreenNode). For a long time I was also surprised that the team never published anything that formally describes their idea. However, today I found out that they at least filed a public patent. Its text contains many more interesting explanations and details on how this combination of two parse trees works as one unified persistent parse tree.

What is your favorite persistent data structure?

1. By the way, did you know that a zipper is a derivative of a surrounding container in a semifield of algebraic data types? This is a beautiful connection that also relates to generating functions, Taylor series, linear logic, and information theory. See, for example, [1][2][3][4][5].

Random thoughts on Swift

Based on my quick skimming session through the language guide.

Kind of expected from a modern language, would be surprised if it wasn't included:

• Closures
• Type inference
• First-class collections and tuples
• Generics with all kinds of boundary restrictions
• Interoperability with existing libraries (in this case with C and Obj-C).

Surprisingly included, and it's nice:

• Bret Victor-style interactive IDE (let me make a sincere bow to Apple folks at this point)
• Opt-in mutability
• Lots of small inspirations from D, C#, Scala, etc.

However...

• Reference counting instead of garbage collection. Cannot detect reference cycles, ergo manual weakref management.
• I'm not quite sure how to combine "protocols" (read: interfaces) and "extensions" (read: mixins) into full-fledged traits and typeclasses. For example, how do I define a default method implementation for an interface a protocol?
(Also, why the heck do you need so many new names for existing concepts?!)
• Optional types are not quite Maybe, you can force the "unwrapping" with a ! operator (naturally, with a possible runtime exception). I foresee every single iOS developer just blindly doing it with every external optional they get.
• Extremely closed infrastructure: iOS/MacOS only, proprietary compiler, etc. Hell, you need an iTunes account to even read language documentation.
• Absolutely no first-class concurrency treatment. Moreover, there's even no library treatment for concurrency (even though it is not enough).
• No stream processing sugar (read: comprehensions/LINQ). Well, at least they have map and reduce in the standard library, but that's about it.
• No private members.

Not sure how I feel about:

• No exceptions. If you gave us some other monadic alternative, I might've put it in the "nice" section.
• Some magic in interoperability. Apparently, they wrote "external parameters names" for many standard library functions and forgot to show them to the legacy Objective-C programmers.
• The fact that their compiler cannot resolve recursive ADT definitions, even though it supposedly should.

Interesting. Let's see how it flies.