Optimized app ?
Message boards :
Number crunching :
Optimized app ?
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6
Author | Message |
---|---|
Send message Joined: 13 Apr 15 Posts: 2 Credit: 1,990,224 RAC: 0 |
Not all Amd cpus : https://software.intel.com/en-us/articles/optimization-notice#opt-en consider LLVM/Clang compiler instead ICC :) |
Send message Joined: 3 Nov 15 Posts: 23 Credit: 2,254,547 RAC: 0 |
The perf tool reports show that GCC spends a majority of time in the power and exponential functions. The icc version seems to be about 2x to 3x faster than the gcc versions when using the same (I think) standard libm libraries. 1. I think the school sponsored project could use the icc compiler as an educational institution. 2. I am amazed that I was able to build something that will run. I will walk anyone who wants through the processing of duplicating it. I did not try the cruncher version. I would probably forward comments to Sefef/Chruncher rather than propagate yet another version. 3. I compiled with -O3. I think the icc default is sse2. I am not sure the full applications will see the same performance improvement. 4. I have only built the one Linux version. 5. It should have no problems on AMD CPU and like another comment, I will look at CLANG. |
Send message Joined: 11 Apr 15 Posts: 24 Credit: 4,366,045 RAC: 0 |
On one machine, from time to time (17 WUs in 5 days), I'm getting this : <core_client_version>7.6.9</core_client_version> And the task fails with the following error : 194 (0xc2) EXIT_ABORTED_BY_CLIENT On another machine, I also got a couple of these : <core_client_version>7.6.9</core_client_version> And the task fails with the following error : 1 (0x1) Unknown error number Does anyone know what's wrong ? |
Send message Joined: 12 Jul 15 Posts: 1 Credit: 40,508 RAC: 0 |
Excused salutes where she unloads her/it to him "libstdc++" x mac os and as it is installed thanks |
Send message Joined: 3 Nov 15 Posts: 23 Credit: 2,254,547 RAC: 0 |
If the AMD or Intel CPUID bits indicate that the CPU silicon supports the feature, the icc compiler enables and optimizes for the feature. I think the various Intel and AMD CPU that have a particular feature bit set will "likely" perform differently and show different percentage improvement when turning the optimization on/off. I looked at the clang compiler and generated a Linux64 binary. Denis is a pretty small application and benefits some from "whole program" optimizations. clang does not support whole program that I can find. It appears clang would generate slower Linux code than gcc or icc with their whole program optimizations .... but I am still playing with changes. I do have a static linked Linux 64-bit Denis that should run "faster" on any 64-bit Linux but it is my first build so there might be problems that I don't expect. I have not shared anything other than email so I will have to figure out how to distribute if there is any interest. Intel i7-4790K CPU @ 4.00GHz Fedora21 4.1.8-100.fc21.x86_64 AVERAGE of 20 results for three binaries on same machine Run time CPU time Credit 4,638.82 4,417.49 56.07 Default binary 1,214.26 1,156.89 46.93 denis_1.05_x86_64-pc-linux-gnu__sse2 1,088.34 1,071.15 52.40 my binary I am not sure why the project allocated more credits for a shorter run of my binary than the sse2 one, but I will deal with it. Interesting. |
Send message Joined: 13 Apr 15 Posts: 2 Credit: 1,990,224 RAC: 0 |
Well You can read this articl: http://clang.llvm.org/comparison.html and http://www.phoronix.com/scan.php?page=article&item=gcc49_compiler_llvm35&num old Phoronix Bench tests about GCC (vs) CLANG :) I think that GCC compiler is for a "general purpuse" ;) PS : I hate "anti-spam Akismet." :( |
Send message Joined: 11 Apr 15 Posts: 24 Credit: 4,366,045 RAC: 0 |
On one machine, from time to time (17 WUs in 5 days), I'm getting this : No one ? I'm still getting some "finish file present too long" errors sometimes ... |
Send message Joined: 9 Apr 15 Posts: 35 Credit: 5,972,800 RAC: 0 |
On one machine, from time to time (17 WUs in 5 days), I'm getting this : G'Day toTOW, I don't really know what might be at issue but I did notice that the computer you are having the most issues with (it has had 94 errors) is ID: 48314, an i7 with 8 CPUs and 4GB RAM. Could the amount of RAM be the issue as it has the least memory (your i3 has 12GB and other i7's have up to 16Gb), if the computer runs out of memory and sits there waiting till it can get some it could be generating the long finish files? Just a thought. Conan |
Send message Joined: 11 Apr 15 Posts: 24 Credit: 4,366,045 RAC: 0 |
I don't know ... the DENIS application doesn't need a lot of memory, so it shouldn't be an issue. Unless this occurs when DENIS app is allocating memory at the same time as other programs ? |
Send message Joined: 3 Nov 15 Posts: 23 Credit: 2,254,547 RAC: 0 |
DENIS only takes about 4 MEGA BYTES per task. The BOINC message "finish file present too long" does not mean that the file is too long but seems to mean that a DENIS FINISH file was written but the DENIS task has not completed and exited yet. You can use the TASK MANAGER to see if you are exhausting memory and possibly paging to disk which would increase the size of a "race" window. EXAMPLE: https://boinc.berkeley.edu/dev/forum_thread.php?id=10354 Look at the TASK MANAGER CPU USAGE or the PROCESS view and if you are oversubscribing the CPU, you might set the BOINC MANAGER to reduce the number of BOINC tasks. BOINC will allow up to 1 job per CPU BUT!!! if you start too many, normal tasks will fight for the CPU and the BOINC jobs might actually run slower. I adjust for CPU usage to be between 90% to 95%. I usually run 1 task less than CPU. |
Send message Joined: 7 Jul 15 Posts: 28 Credit: 31,154,473 RAC: 0 |
The AVX2 optimization runs on my 5930K setup faster than any other version. The SSE41 runs faster on the other setups, but only slightly so, mater of 15 to 28 seconds depending on CPU. I have several Xeons and the SLOWEST ones are the dual X5680's with HT ON. But, with 24 threads, it isn't bad at all. However, the 12 thread 5930K runs the tasks nearly twice as fast as X5680's with the AVX2 version. (now I want a DUAL 2011v3 setup! LOL!) In any case, I would like to thank ALL the folks again, especially Sesef and Cruncher, for the tremendous help to the project! As an aside, I haven't really experienced any "finish file too long" errors that I know of. 8-) PS: Yes I am addicted to BOINC... :-P |
Send message Joined: 16 Nov 15 Posts: 1 Credit: 1,019,375 RAC: 0 |
Very nice those optimized apps! :D However... as these apps are not official supported by project (team), how about the results, are they official accepted as being valid? Meaning, I have seen it happen in the past at some other projects that valid results were declared invalid as they were calculated by unsupported (optimized) apps, and points/credits were deducted. Can project team give decisive answer on this? |
Send message Joined: 5 Jul 15 Posts: 18 Credit: 6,490,932 RAC: 1 |
One problem could be that he has hyper-threading enabled and could be trying to run 8 units at the same time. BTW I also set my pc's to use no more than 99% of the available cpu's on them. |
Send message Joined: 3 Nov 15 Posts: 23 Credit: 2,254,547 RAC: 0 |
Very nice those optimized apps! :D I ran an offline test with a long DENIS input file that I randomly selected. I ran the standard DENIS binary and the 5 crunchr3 optimized Linux binaries using Redhat RHEL7.1 OS. I ran the 2 32-bit and 3 64-bit binary versions. The "output" results file matched exactly in all 6 cases ... which rather surprised me. Floating point results rarely match (for me) in the least significant digits. IMO, if the binaries get the same exact answer, there is not much risk of credit revocation. The standard binary took about 56 min. The 32-bit binaries took about 7 min. The 64-bit binaries took about 6 min. The DENIS input file I randomly selected and used: 1800000 0.002 50 1799000 6 35 0.8 37 1 40 1.2 42 0.9 48 0.8 57 1.1 1 0 TIMING ON THE SAME HW (4-core i5 2.7GHz SkyLake): CRLP2011EPI_105_x86_64-pc-linux-gnu real 56m2.867s user 56m4.681s sys 0m0.145s denis_1.05_x86_32-pc-linux-gnu__sse2 real 8m24.371s user 8m22.952s sys 0m0.020s denis_1.05_x86_32-pc-linux-gnu__sse3 real 6m55.814s user 6m54.291s sys 0m0.021s denis_1.05_x86_64-pc-linux-gnu__sse2 real 6m0.354s user 5m58.778s sys 0m0.009s denis_1.05_x86_64-pc-linux-gnu__sse3 real 5m56.852s user 5m55.222s sys 0m0.046s denis_1.05_x86_64-pc-linux-gnu__sse41 real 5m53.159s user 5m51.564s sys 0m0.019s |
Send message Joined: 23 Feb 16 Posts: 1 Credit: 9,177 RAC: 0 |
Is the optimized app useable for the new beta WUs? |
Send message Joined: 11 Apr 15 Posts: 24 Credit: 4,366,045 RAC: 0 |
No. |