Tuesday, August 18, 2009

CULA (LAPACK)

CULA is a LAPACK library from EM Photonics. It comes in three flavors: basic, premium, and commercial.

CULA Basic is loaded with the most popular LAPACK routines including LU Decomposition, QR Factorization, Singular Value Decomposition, and Least Squares. CULA Premium is for power users who need additional functionality, such as double precision and additional LAPACK routines not found in CULA Basic. CULA Commercial is for independent software vendors who want to include CULA functions in their own applications.

For more information http://www.culatools.com/

Tuesday, August 11, 2009

SuperMicro 1U

The SS6016GT GPU Supercomputing Servers establish Supermicro as the true global IT hardware leader in server architecture, performance, and Green computing. Generating massively parallel processing power and unrivaled networking flexibility with two double-width GPUs or up to 5 expansion slots in a 1U form factor, the SS6016GT is performance and quality optimized for the most computationally-intensive applications. Supermicro's unique server designs with Gold Level power supplies, energy-saving motherboards and enterprise class server management optimize cooling for even the most demanding applications, providing the perfect technology platform for these impressive GPU Supercomputing Servers.

With performance bettering a small computer cluster while packing the punch (2-4 Teraflops) in a 1U, and 4U form factor, these GPU optimized solutions are ideal for Scientific computing, EDA, oil & gas exploration, military, science, and other computational intensive applications.

For more information http://www.supermicro.com/products/nfo/gpu.cfm

GPU Clusters

Supercharge your cluster with Tesla S1070 systems from NVIDIA or Tesla M1060 processors integrated into servers from leading OEMs. Experience the performance of a large cluster with just a small cluster of Tesla solutions. Tesla-based clusters deliver up to 30 times the performance of CPU-only clusters with lower power and less space. Featuring the revolutionary NVIDIA CUDA™ parallel computing architecture and powered by 240 parallel processing cores in each Tesla processor, the Tesla pre-configured solutions shatter your performance per watt expectations to help you solve the toughest computing problems faster.

You can build your own GPU cluster with Nvidia S1070 1Us connected to rack mounted servers. You will still need some way to distribute your workload like Condor or a 3rd party grid software provider. Or you can buy preconfigured GPU clusters from companies like Cray (yes... the inventors of the super computer).

For more information http://www.nvidia.com/object/preconfigured_clusters.html

Tesla S1070

With the world’s first teraflop many-core processor, the NVIDIA® Tesla™ S1070 computing system speeds the transition to energy-efficient parallel computing. With 960 processor cores and a standard C compiler that simplifies application development, Tesla S1070 scales to solve the world’s most important computing challenges—more quickly and accurately.

Under the hood the S1070 has the same computational capacity as 4 C1060 cards. The S1070 does not actually run an operating system. It only houses and powers the GPUs. The device needs to be connected to a host system via two PCIe 2.0 connector cables.

for more information http://www.nvidia.com/object/product_tesla_s1070_us.html

Tesla C1060

The NVIDIA® Tesla™ C1060 transforms a workstation into a high-performance computer that outperforms a small cluster. This gives technical professionals a dedicated computing resource at their desk-side that is much faster and more energy-efficient than a shared cluster in the data center. The Tesla C1060 is based on the massively parallel, many-core Tesla processor, which is coupled with the standard CUDA C programming environment to simplify many-core programming.

The C1060 has a 2 slot form factor and plugs into a PCIe 2.0 slot. The device houses 1 GPU containing 10 multi-processors. Each multi-processor contains 24 streaming processors for a total of 240 processing cores. The device has 4G of global memory.

While the device does support double precision computations it does so at a much slower rate (97 GFLOPS) than single precision (1 Terra FLOP). Nvidia next generation GPUs should provide 1 Terra FLOPS of double precision computational capacity.

For more information http://www.nvidia.com/object/product_tesla_c1060_us.html

Monday, August 10, 2009

RapidMind

Multi-core processors offer tremendous performance gains, but few applications take full advantage of this new technology because of the significant complexity of parallelizing across the multiple cores.

Applications that are not multi-core enabled will suffer a performance decrease as it will only run on a single core, and will not scale as the number of cores increases.

While efforts to multi-thread an application may take advantage of multiple cores, these projects are ambitious, time-consuming and error-prone. Multi-threaded applications are harder to develop and test, which requires a level of development expertise which is difficult to find. Software organizations are all too aware of the real fear of releasing an unstable solution that quickly fails in the field.

Traditional approaches force software organizations to choose between either decreased performance or longer, more expensive development cycles. In most cases, these organizations are limiting themselves to single-core processing and leaving incredible business benefits on the table.

Multi-core processing presents an opportunity for software organizations to gain a competitive advantage. The award-winning RapidMind Multi-core Development Platform simplifies the development of parallel applications, reducing the cost and timelines of software development when compared to multi-threaded projects, and greatly improves the likelihood of project success.

Processors Supported By RapidMind:
AMD,Intel Multi-core x86 CPUs
NVIDIA GeForce® 6000, 7000 or 8000 series cards
AMD® FireStream™ 9170
ATI™ x1X00, 2x00 and HD Radeon™ 3870 families of cards
IBM® QS21/22 Blade server with Cell BE processor
Cell BE on Sony PlayStation®3 using Yellow Dog™ Linux

For more information http://www.rapidmind.com

Advanced Derivatives Solutions (Q-GPU)

Q-GPU (Quantara-GPU) is a high performance options analytics for pricing and risk managing exotic structures. Q-GPU is based on the NVIDIA-Cuda high performance computing technology to price a wide range of interest rate structures using state-of-the-art stochastic volatility and multi-factor models.

These analytics running on GPU are from 40 times up to 100 times faster than those running on CPU. Q-GPU is currently being extended to include equity, foreign exchange and credit derivatives.

For more information http://www.aderivatives.com/index.html