Optimized app ?
Message boards :
Number crunching :
Optimized app ?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 5 Jul 15 Posts: 20 Credit: 6,490,932 RAC: 0 |
AVX2 version. My laptop trashed all the units using this one, I reverted to the old one and it is fine again. |
Send message Joined: 22 Oct 15 Posts: 3 Credit: 719,262 RAC: 0 |
^ Same result. On W7 i5 2400 |
Send message Joined: 8 Jun 15 Posts: 1 Credit: 8,103,612 RAC: 0 |
Hi They go fine on 5930 +5960s Ross* |
Send message Joined: 22 Apr 15 Posts: 4 Credit: 17,166,398 RAC: 0 |
AVX2 version. ^ Same result. On W7 i5 2400 AVX2 is available only on i3/i5/i7/xeon 4th gen or newer processors. So only i3/i5/i7 4xxx+ or Xeon v3+ |
Send message Joined: 9 Apr 15 Posts: 11 Credit: 3,149,460 RAC: 0 |
AVX2 version. Is an AVX "1" version worth it at all? I have a 3rd generation CPU on my main laptop which only support AVX. Not AVX2. I'll try this app on my 4th generation laptop tho. |
Send message Joined: 28 Apr 15 Posts: 29 Credit: 1,426,883 RAC: 0 |
The AVX2 version looks very good to me. For the 600 series, I am getting about 2 minutes 32 seconds running on four cores of an i7-4771 (another core supports a GPU on Folding, and the other three cores are largely free). This is on Win7 64-bit. The core temps are a little higher, which is usual for AVX2 work, averaging about 70 C now. |
Send message Joined: 10 May 15 Posts: 1 Credit: 8,898,939 RAC: 0 |
Is there a Windows 32 bit app available? I have 3 machines that are unable to run 64 bit Windows. |
Send message Joined: 9 Apr 15 Posts: 11 Credit: 3,149,460 RAC: 0 |
|
Send message Joined: 11 Apr 15 Posts: 24 Credit: 4,366,045 RAC: 0 |
I'm not convinced by the AVX2 application ... on my i7 4710HQ, it's actually slower than the previous application :( The only case where it's faster is if I run it on only 4 cores instead of 8 threads. But in this case, the speed improvement is not enough to compensate the loss of the 4 other processes. Doing 4 WUs every 9min30 is finally producing less than doing 8 WUs in 12 minutes (with previous application, it varies between 12 and 15 minutes with the new one). |
Send message Joined: 16 Apr 15 Posts: 20 Credit: 5,195,178 RAC: 0 |
I'm not convinced by the AVX2 application ... on my i7 4710HQ, it's actually slower than the previous application :( It's normal behaviour. |
Send message Joined: 11 Apr 15 Posts: 24 Credit: 4,366,045 RAC: 0 |
So this kind of optimizations shouldn't be used on HT processors ? It might be complicated to include such logic in assignement process or in code logic :( |
Send message Joined: 18 Oct 15 Posts: 3 Credit: 210,007 RAC: 0 |
Yes, in Asteroids@Home message boards they briefly explain why AVX isn't suggested for CPU with Hyper Threading, see here. Moreover on Primegrid message boards is pointed out that simulating a duble number of cores causes the chip to produce more heat. I've noticed that AVX2 is significantly slower than SSE4.1 on my HT Intel (Haswell) CPU too. EDIT URL added |
Send message Joined: 28 Apr 15 Posts: 29 Credit: 1,426,883 RAC: 0 |
My tests show the following on an i7-4771 CPU, Win7 64-bit: Comparison of Sesef's DENIS optimizations on the "3XP 1800" series work units: With Sesef 1.6.1 AVX2 optimization: DENIS running on 8 virtual cores - 9 minutes 42 seconds (CPU temp - 63 C average) DENIS running on 4 virtual cores (other 4 cores free) - 7 minutes 7 seconds (CPU temp - 55 C average) With Sesef 1.5.5 SSE3 optimization: DENIS running on 8 virtual cores - 10 minutes 43 seconds (CPU temp - 60 C average) DENIS running on 4 virtual cores (other 4 cores free) - 8 minutes 9 seconds (CPU temp - 54 C average) So in each case, the AVX2 optimization is faster than the SSE3 optimization. I doubt that Sesef would have released it otherwise. However, the temps can build up, especially if you have a GPU card. That could cause throttling of the CPU in some cases, thus lowering its speed. |
Send message Joined: 18 Oct 15 Posts: 3 Credit: 210,007 RAC: 0 |
I assume you don't know about Crunch3r's SSE4.1 app version. This is the one I'm referring to (not sesef's SSE3 one): it's two times faster than AVX2 on my CPU when running one WU at a time without any other application (distributed computing nor not-DC) so no throttling at all. I know it's a really simple scenario but it allows you to understand things easily. Sure enough on other CPUs it will perform differently than my Haswll CPU with Hyper Threading and factory power limitation (which I removed through Intel extreme tuning utility though), but I wrote it clearly that I was referring to my particular case. EDIT Corrected typos |
Send message Joined: 9 Apr 15 Posts: 11 Credit: 116,882,448 RAC: 2 |
I would like to test AVX2 application but it seems to be windows only. Is it possible to make a linux version? |
Send message Joined: 28 Apr 15 Posts: 29 Credit: 1,426,883 RAC: 0 |
I assume you don't know about Crunch3r's SSE4.1 app version. This is the one I'm referring to (not sesef's SSE3 one): it's two times faster than AVX2 on my CPU when running one WU at a time without any other application (distributed computing nor not-DC) so no throttling at all. I know it's a really simple scenario but it allows you to understand things easily. I tried Cruncher's app when it first came out; it was slightly faster than Sesef, but I don't think better than the AVX2 that I recall. I hope that is simple enough. |
Send message Joined: 18 Oct 15 Posts: 3 Credit: 210,007 RAC: 0 |
It varies from CPU to CPU: on my system (a laptop pc) it works as I said. On your system (which I assume is a desktop pc) it works as you said. If you don't believe me, I don't care but surfing the web one finds out that some people see better results with SSEx than AVXx in some projects (not only DENIS). The only thing to do to be sure is to try by ourselves. |
Send message Joined: 12 Jul 15 Posts: 7 Credit: 43,028,399 RAC: 0 |
On i7-4770K AVX2 version is clearly faster : AVX2 : http://i826.photobucket.com/albums/zz182/mm_67/sse41_zpsqpu1afzg.jpg SSE4.1 : http://i826.photobucket.com/albums/zz182/mm_67/avx2_zpsi4aydh50.jpg |
Send message Joined: 3 Nov 15 Posts: 23 Credit: 2,254,547 RAC: 0 |
Denis performance seems to be strongly related to the compiler version and the way the compiler handles the math library. I built Denis in a Ubuntu 14.04 virtualbox with two compilers. gcc v4.8.4 and Intel icc v16. I ran the current Denis 64-bit application, the recompiled gcc and icc versions. The user time was 8.872s, 7.888s and 2.996s respectively on the VM running on my i7-5930K CPU. The perf tool reports show that GCC spends a majority of time in the power and exponential functions. The icc version seems to be about 2x to 3x faster than the gcc versions when using the same (I think) standard libm libraries. the gcc version seems to spend about 75% of its execution time in the power and exponential libm functions. The icc version seems to be able to eliminate most of that time. time ./CRLP2011EPI_105_x86_64-pc-linux-gnu in real 0m 10.958s user 0m 8.872s sys 0m 0.016s rjs@rjs-VirtualBox:~/boinc/denis/denis-boinc-baseapp$ time ./denis.icc in real 0m 5.583s user 0m 2.996s sys 0m 0.028s rjs@rjs-VirtualBox:~/boinc/denis/denis-boinc-baseapp$ time ./denis.g++ in real 0m 10.465s user 0m 7.888s sys 0m 0.020s export BDIR=/home/rjs/boinc/source/boinc export CC=icc export CC=g++ OPT=" -g -O3 " $CC $OPT app.cpp -I$BDIR -I$BDIR/api -I$BDIR/lib -lboinc -lboinc_api -o denis.$CC rjs@rjs-VirtualBox:~/boinc/denis/denis-boinc-baseapp$ icc -v icc version 16.0.0 (gcc version 4.8.0 compatibility) rjs@rjs-VirtualBox:~/boinc/denis/denis-boinc-baseapp$ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.4-2ubuntu1~14.04' gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) Perf from the icc run "denis.icc in" 41.38% denis.icc denis.icc [.] _Z12computeRatesdPdS_S_S_ 37.36% denis.icc denis.icc [.] __libm_exp_e7 9.07% denis.icc denis.icc [.] __libm_pow_e7 4.91% denis.icc denis.icc [.] main 3.71% denis.icc denis.icc [.] __libm_log_e7 1.99% denis.icc denis.icc [.] exp 0.28% denis.icc denis.icc [.] boinc_time_to_checkpoint@plt Perf from the g++ run "denis.g++ in" 31.43% denis.g++ libm-2.19.so [.] __ieee754_pow_sse2 30.57% denis.g++ libm-2.19.so [.] __ieee754_exp_sse2 15.96% denis.g++ libm-2.19.so [.] __exp1 9.97% denis.g++ denis.g++ [.] _Z12computeRatesdPdS_S_S_ 3.62% denis.g++ libm-2.19.so [.] __ieee754_log_sse2 3.21% denis.g++ libm-2.19.so [.] __GI___exp 2.18% denis.g++ libm-2.19.so [.] __pow 1.34% denis.g++ denis.g++ [.] _Z10solveModeliPdS_S_S_6CONFIGRS_i 0.35% denis.g++ libm-2.19.so [.] @plt ldd denis.icc linux-vdso.so.1 => (0x00007fff7ffd6000) libboinc.so.7 => /usr/lib/libboinc.so.7 (0x00007f4c7a288000) libboinc_api.so.7 => /usr/lib/libboinc_api.so.7 (0x00007f4c7a068000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4c79d62000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f4c79a5e000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4c79848000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4c79483000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4c7927f000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4c79061000) /lib64/ld-linux-x86-64.so.2 (0x00007f4c7a4fd000) ldd denis.g++ linux-vdso.so.1 => (0x00007ffc6f3bd000) libboinc.so.7 => /usr/lib/libboinc.so.7 (0x00007fd8b6b09000) libboinc_api.so.7 => /usr/lib/libboinc_api.so.7 (0x00007fd8b68e9000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd8b65e5000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd8b62df000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd8b60c9000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd8b5d04000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd8b5ae6000) /lib64/ld-linux-x86-64.so.2 (0x00007fd8b6d7e000) |
Send message Joined: 9 Apr 15 Posts: 172 Credit: 1,552,856 RAC: 0 |
The perf tool reports show that GCC spends a majority of time in the power and exponential functions. The icc version seems to be about 2x to 3x faster than the gcc versions when using the same (I think) standard libm libraries. Some questions: 1) Intel icc is free? Can the project use this compiler? 2) Do you plan to release your app version, like Sefef/Chrun3er?? 3) It's sse3, sse4.1 or avx app?? 4) Have the same results in Windows? 5) Is this app compatible with Amd cpu?? |