My article on wikiHow, How to Build a Supercomputer must have struck the right chord–it's been read over 50,000 times as of writing this. The hardware part is easy, if you have the cash. The software part, however, requires much more know-how and experience. Many of the tuning tricks–such as turning off IGMP snooping to reduce overhead in the switches, and using an 8x8GB memory configuration (as opposed to 4x16GB) to reduce the latency/bandwidth bottleneck of Piledriver CPUs–are crucial to making the investment worth the money.
What's often overlooked is the fact that, if your only concern is solving your algorithms as soon a possible, then spending your time and money on a new machine may not be the best option. If it takes X amount of money and Y amount of time on building a new machine to get a 20% increase in performance, yet the same amount of money and time is required to get a 40% performance increase by parallelizing the code, then the choice should be obvious. However, it's not that easy. Let's say you spend 6 months to rewrite your code with OpenMP to achieve a 1 week run time. But even if the unmodified code takes as long as 6 months to execute, it wouldn't have been worth it.
The answer may lie in new high performance computing interfaces. A new layer of abstraction between the researchers and the machines would allow the researches to worry about their research, supercomputer centers to worry about their supercomputers, and computer scientists to worry about the interface between the two. Today, with the ubiquitous use of smartphones and web apps that do it all for you, fewer people are learning the hard way how to make a computer do what you want it to do. In just the same way that the graphical user interface made using a computer easier for the layman, a new HPC interface will make parallel computing easier for scientists and cheaper for advanced computing centers.