30 December 2011

CUDA at last

CUDA installed, on a recent attempt

- it does work, there's a nice Mandelbrot demo...
I'm interested in Big Integers. Its not clear how much CUDA speeds these up
NVIDIA GPUs have a bunch of floating point multipliers, not a natural fit for integer MUL with carry

Reports vary from "2X" to "order of magnitude" faster on factoring big integers
nvidia
NVIDIA Corporation\NVIDIA GPU Computing SDK 4.0\C\bin\win64\Release\bandwidthTest.exe Starting...
Running on...
Device 0: GeForce GT 525M
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2330.7
,,,
NVIDIA Corporation\NVIDIA GPU Computing SDK 4.0\C\bin\win64\Release\deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Found 1 CUDA Capable device(s)
Device 0: "GeForce GT 525M"
CUDA Driver Version / Runtime Version 4.0 / 4.0
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 962 MBytes (1008402432 bytes)
( 2) Multiprocessors x (48) CUDA Cores/MP: 96 CUDA Cores
GPU Clock Speed: 1.20 GHz
...
Obviously with just a laptop and a GT525M, I cant be serious about doing big Integer stuff at a competitive level
- but I'm just curious...

Way back, I wrote 386 Assembler code Miller-Rabin Prime tests and a bunch of functions
(Kuttaka.exe, it must be out there )

Occasionally I get a hankering to do big Integers once more

Computers are a million times faster now, they say

GMP is the famous bigInteger library now, buts uses gcc, anti-windows,
a bitch, Even if you go Linux dual boot, you just know that gcc is gonna throw some bunch of errors


a factoring program   YAFU    bbuhrow
- This apparently uses GMP but the exe runs ootb on Windows.
 nice functions like "nextprime"
factored an 80 digit decimal in 4 minutes (Quadratic sieve?)

I want to use my GPU, so I downloaded
msieve    from     gilchrist       this is "Jeff" who people on http://www.mersenneforum.org/ are often thanking

the cuda version
the win32 version worked (factored the 80 digit in 4 minutes)

18/06/2011 02:07 p.m. 870,912 msieve.exe

but the NVIDIA utility reported GPU activity 'none'

tried the big win64 version but got
"vcomp100.dll is missing" - some cuda 64/32 bit mismatch
- not fixed by reloading directX, nor by scattering vcomp100.dll or cudart.dll about

17/06/2011  09:23 p.m.         1,171,968 msieve.exe   CUDA confusion?

I may post a complaint on
mersenneforum
but as of now I dont have any idea if my GPU is active or effective

there is a Number-Field-Sieve implementation CGNFS (from Jeff)
- requires Python 2.6 which is worth having anyway.
I overwrote with the win32 version of msieve and tried the 80 digit using factMsieve.py
- it chugged away for half an hour? then crashed

At the moment I am trying CGNFS on their 100 digit example - hasnt crashed, is onto the sieve
- from the forum it seems its all about tweaking parameters

mpir
MPIR is a fork from GMP
MPIR may be Non-anti-windows and may even be faster
I havnt looked to see if has  a library for VC++

do I still have the level of obsession required ?...

It was oddly nostalgic to look up an article and realise I had tinkered with its algorithms
back in the 90s

MATHEMATICS OF COMPUTATION
VOLUME 53, NUMBER 187
JULY 1989, PAGES 411-414
A New Method for Producing Large Carmichael Numbers
By H. Dubner

2 comments:

  1. CGNFS factored the 100decimal in 3.17 hours
    Which is phenominal by 1990's standards.


    MPIR dics say:
    project files for MSVC are provided..For Visual Studio 2010 see the readme.txt file in the build.vc10 directory. The MSVC projects pro-vides full assembler support and for ‘x86_64’ CPU’s this will produce far superior
    results. they dont mention "sandy bridge"...

    ReplyDelete
  2. I guess the 80 digit number was somehow too small (?)

    ReplyDelete