𝕏

Posts by rjs5

1) Message boards : Number crunching : AvX2+ benefits ? (Message 2441)
Posted 3 Jul 2024 by rjs5
Post:
I just looked at the tasks run time on my computers and my i3-12100F is twice as fast as my i7-8700k

i3: https://denis.usj.es/denisathome/show_host_detail.php?hostid=238733
i7: https://denis.usj.es/denisathome/show_host_detail.php?hostid=238736

I know the i7 is only 8th gen and the i3 12th gen, but still that's a big difference, is that the AVX2+ benefit ?



The biggest benefit for AVX2 code is in the parallelism of memory move and math operations. The compilers can take advantage of those benefits ONLY if it can determine that it is safe to do. If your CPU supports:
387 = 80-bits = 1 floating point operation
SSE2 = 128-bits = 2 64-bit floating point operations simultaneously
AVX = 128-bits = 2 64-bit floating point operations simultaneously
AVX2 = 256-bits = 4 64-bit floating point operations simultaneously
AVX512 = 512-bits = 8 64-bit floating point operations simultaneously

If the code can be structured so multiple independent FP calculations can be performed together in the same cycle, then you can ALIGN the data so the compiler knows that it can generate code to perform multiple 64-bit operations simultaneously. You have to be careful how you define structures and arrange the data. You can tell the compiler to generate AVX2 code, but if the DATA cannot be (or is not) DEFINED properly, it generates non-parallel code.

Denis binary currently does not take advantage of any CPU parallelism. Denis executes a SINGLE operation at a time. Its execution speed is closely related to CPU speed and CACHE sizes.

Crunchers typically run multiple WU, but optimizers usually work optimizing a single image. A fast single image many times yields a version that runs slower when you run many WU.

I started multiple Denis WU on my i9-9980XE CPU (which supports AVX512) running Fedora Linux. I used "perf top" to quickly look into the operation of 6 Denis WU executing simultaneously.

perf top

38.37% HuVeMOp_0.02_x86_64-pc-linux-gnu [.] __ieee754_exp_avx ?
17.42% HuVeMOp_0.02_x86_64-pc-linux-gnu [.] ORd_SS_biphasic::computeRates(double, double*, double*, double*) ?
16.53% HuVeMOp_0.02_x86_64-pc-linux-gnu [.] __ieee754_pow_sse2 ?
13.86% HuVeMOp_0.02_x86_64-pc-linux-gnu [.] __exp1 ?
4.49% HuVeMOp_0.02_x86_64-pc-linux-gnu [.] __exp ?
2.38% HuVeMOp_0.02_x86_64-pc-linux-gnu [.] __ieee754_log_avx ?
0.90% HuVeMOp_0.02_x86_64-pc-linux-gnu [.] 0x0000000000000550 ?

38.37% of the time, Denis spends in the function __ieee754_exp_avx which is the AVX version of EXP.


Digging down into __ieee754_exp_avx, you see that the 38% of the time is mainly spent in 4 SINGLE PRECISION operations. You can look at the instructions like "vaddsd" which is a VECTOR add. The "s" in the instruction indicates only 1 64-bit operation is happening. The binary is only using half the 128-bit registers.

0.00 ? vmovapd %xmm5,%xmm3
1.61 ? vaddsd %xmm1,%xmm3,%xmm3
5.33 ? vaddsd %xmm2,%xmm3,%xmm3
0.01 ? vmovapd %xmm3,%xmm1
0.00 ? vmovapd %xmm7,%xmm3
3.47 ? vaddsd %xmm1,%xmm3,%xmm3
6.67 ? vsubsd %xmm3,%xmm7,%xmm7
6.26 ? vaddsd %xmm7,%xmm1,%xmm1
7.23 ? vmulsd t256+0x8,%xmm1,%xmm1
10.64 ? vaddsd %xmm3,%xmm1,%xmm1

2.47 ? vucomisd %xmm1,%xmm3
1.66 ? ? jp 3b0
1.77 ? ? jne 3b0
0.37 ? vmovsd 0x8(%rsp),%xmm0
0.14 ? vmulsd %xmm3,%xmm0,%xmm0
0.00 ?1a5: test %bpl,%bpl

I have not looked at the Denis algorithms, structure or source code, but the Linux "perf top" and associated tools are very nice to do most of the optimization needed for code. If you wire in any special code for optimization .... COMMENT it ... the code will be unchanged and may become a bottleneck later.
2) Message boards : Number crunching : Myocyte v0.15 Beta (Message 1806)
Posted 2 Aug 2022 by rjs5
Post:
Hi!
We are experience problems with the chekpoint in windows hosts. We have tryed to add a temporary file for the checkpoint to avoid corrupted checkpoints, but it fails when it tries to rename it. As the checkpoint fails, it tries again in all the iterations... for that reason the aplication goes so slow. I will upload a new version solving it as fast as possible.

Many thanks for the comments. Checking your taks it easier to find the problem.

Best,
Jesús.



Myocyte v0.16 Beta runs as expected on both Windows 11 and Linux Fedora in expected time on my machines.

Checkpointing seems to be happening every 2 minutes which seems to be a little too frequent.
3) Message boards : Number crunching : Myocyte v0.15 Beta (Message 1803)
Posted 31 Jul 2022 by rjs5
Post:
They are running OK for me under Ubuntu 20.04.4.
They complete in the usual 51 minutes.
https://denis.usj.es/denisathome/results.php?hostid=215226&offset=0&show_names=0&state=4&appid=


Fedora 36 is working OK too. The two Windows machines are both failing. I have the Boinc Data directory exempted from Norton antivirus so I am pretty sure there is no antivirus involvement.

I ran the free version of Intel Vtune on my system running multiple Denis WU and did not see anything obvious. I will try that again tonight and look again. There has to be something that is different between Windows and Linux. The thing that comes up for me is the difference in the file systems. Linux will allow multiple opens on a file where Windows will not.
4) Message boards : Number crunching : Myocyte v0.15 Beta (Message 1789)
Posted 29 Jul 2022 by rjs5
Post:
v0.15 Beta WU have been running on my Windows system for 214476 (clx10980xe-rtx3090) for 4 hours each and indicate they are going to take another 3 days.

Seems like there might be a problem with the long run times.
5) Message boards : News : Results of the second phase of application testing // Resultados de la segunda fase de pruebas de la aplicación (Message 1653)
Posted 9 Jun 2022 by rjs5
Post:
The Denis application uses scalar floating point. It is not even using the 2-way to 8-way double precision parallel capabilities of the CPUs. If you are careful about data placement, the compilers will automatically parallelize the code.
6) Message boards : Number crunching : Very long wus (Message 943)
Posted 18 Jul 2016 by rjs5
Post:
Hi!

We have upload new version of CRLP2011 that have that bug fixed.

Best regards, Joel


Joel,
It would be helpful for you to give information with these "new application" messages.

Which version did you "upload"? I know what version I am running.
Which bug?
What should crunchers do with the JOBS in PROGRESS?
7) Message boards : News : Simulations are back! (Message 927)
Posted 17 Jul 2016 by rjs5
Post:
Did you even enable automatic optimization (SSE x.x or AVX) when compiling the application to embed all code path in the same executable ?


They compiled with mtune=generic and march=x86_64, so the compiler assumes that it can use either x87 instructions or SSE2. There is lots of x87 code.

They use the GNU 4.8.1 compiler so they would have to build special CPU specific binaries to support AVX.

The execution profile indicates that the code does not generate VECTOR instructions so AVX2 would probably generate slower code because of the larger data footprint.
8) Message boards : Number crunching : Very long wus (Message 922)
Posted 16 Jul 2016 by rjs5
Post:
Okay, mine are averaging 16 hours each at this point and haven't finished. There's another thread showing somebody getting 'CPU time exceeded' with 24 hours of work.

Admins, do I cancel or what?

At this point, 183 CPU hours of work are up for grabs.



I am running on a (computer # 52542 ) Haswell i7-5930K CPU @ 3.50GHz and I am seeing the results fall into 3 general time buckets so far: 5,500, 11,000 and 24,000 seconds. I installed Windows updates and reboot while they were running and the application "restart" times looked rather funny.



State: All (50) · In progress (27) · Validation pending (9) · Validation inconclusive (7) · Valid (4) · Invalid (3) · Error (0)
http://denis.usj.es/denisathome/results.php?hostid=52542
9) Message boards : Number crunching : Very long wus (Message 919)
Posted 16 Jul 2016 by rjs5
Post:
I doubt that DENIS will make the source code available this time.


? Why?


Just my opinion ...

Having multiple copies of the applications floating around introduces additional workload on the project team that they do not have. They are running very thin on resources already.

When an "optimized" application generates a result, how do the project members using the answer ... know if the results are "correct" or not.

If the computed "answer" is what they expected ... is it really correct?
If the computed "answer" is not what they expected ... is it really wrong?
10) Message boards : News : Telegram Channel (Message 918)
Posted 16 Jul 2016 by rjs5
Post:
Dear all,
We have created a Telegram Channel to send you information: https://telegram.me/denisproject. If you join this channel you will receive a Telegram message when we post something in the News Forum and when new simulations are uploaded to the CRLP2011EPI application.

We hope you find it useful.

Best regards,
Jesús


When DENIS POSTS something to the DENIS NEWS FORUM, doesn't it automatically go to the BOINC MANAGER "NOTICES" and alerts users? I see the DENIS NEWS messages there.

Seems like you tried to invent something that already exists ... or did I misunderstand what you were trying to do?
11) Message boards : Number crunching : Very long wus (Message 913)
Posted 16 Jul 2016 by rjs5
Post:
GD_jcarro_20160714202312000000_ThirdSimulations_SteadyState8000Schmidt98_conf_1513.xml_0 is at 4.55% after 1 hour 30 minutes.

This laptop has also crunched through 4 other WUs today, and averaged about 2 hours 30 minutes for each. Those were all "SteadyState6XX" units.


Same here with 8000 version



8-)

Once they are convinced that the algorithm is stable, they should be able to double or triple the performance.

The Windows code I looked at is using a ton of x87 FP math (ugh). They should be using at least the explicitly enabled SSE2 compiler options.

The jobs are spending 60% of the execution time in the "exp" function and a bulk of that time is being spent on the "F2XML st0" instruction which converts the contents of "st0" into 2^st0 - 1 ... microcode cycles like crazy.

I doubt that DENIS will make the source code available this time.
12) Message boards : Number crunching : Computers running out of work - no work available?? (Message 857)
Posted 17 Apr 2016 by rjs5
Post:
Just checked my computers and one of them is down to it's last 5 tasks. I then checked the server status and is seems that there is no more available work?
Can someone please let me know how long this will take to fix?


How do you check the server status? DENIS@home does not seem to offer a way to do it in the places I've seen on other BOINC projects.


They have hidden everything under drop down menus on the HOME page.

Go to the HOME page.
Drop down menu DENIS@HOME at the top right of the page.
"MOUSE OVER" the DENIS@HOME page and select the first entry "PROJECT STATUS".

I think this URL will get you there.

http://denis.usj.es/denishome/project-status/
13) Message boards : Number crunching : WU ready to send but none received (Message 784)
Posted 26 Jan 2016 by rjs5
Post:
Isn't the beta app (Beta Testing Version of D.E.N.I.S Application) already considered a test WU? It seemed redundant to have to choose the test option.

That checkbox got me 10 tasks.


OK, Thanks.


Suggestion to DENIS admins:
Send a BOINC NOTICE to DENIS crunchers that you have BETA workloads and to PARTICIPATE ... CHECK the new PREFERENCE option.

That is really what the BOINC NOTICE is designed for .... to communicate with your crunchers.


"Run only the selected applications Beta Testing Version of D.E.N.I.S Application"
14) Message boards : Number crunching : Optimized app ? (Message 703)
Posted 20 Nov 2015 by rjs5
Post:
Very nice those optimized apps! :D

However... as these apps are not official supported by project (team), how about the results, are they official accepted as being valid?

Meaning, I have seen it happen in the past at some other projects that valid results were declared invalid as they were calculated by unsupported (optimized) apps, and points/credits were deducted.

Can project team give decisive answer on this?



I ran an offline test with a long DENIS input file that I randomly selected. I ran the standard DENIS binary and the 5 crunchr3 optimized Linux binaries using Redhat RHEL7.1 OS. I ran the 2 32-bit and 3 64-bit binary versions. The "output" results file matched exactly in all 6 cases ... which rather surprised me. Floating point results rarely match (for me) in the least significant digits.

IMO, if the binaries get the same exact answer, there is not much risk of credit revocation.

The standard binary took about 56 min.
The 32-bit binaries took about 7 min.
The 64-bit binaries took about 6 min.


The DENIS input file I randomly selected and used:

1800000
0.002
50
1799000
6
35 0.8
37 1
40 1.2
42 0.9
48 0.8
57 1.1
1
0

TIMING ON THE SAME HW (4-core i5 2.7GHz SkyLake):

CRLP2011EPI_105_x86_64-pc-linux-gnu real 56m2.867s user 56m4.681s sys 0m0.145s

denis_1.05_x86_32-pc-linux-gnu__sse2 real 8m24.371s user 8m22.952s sys 0m0.020s
denis_1.05_x86_32-pc-linux-gnu__sse3 real 6m55.814s user 6m54.291s sys 0m0.021s

denis_1.05_x86_64-pc-linux-gnu__sse2 real 6m0.354s user 5m58.778s sys 0m0.009s
denis_1.05_x86_64-pc-linux-gnu__sse3 real 5m56.852s user 5m55.222s sys 0m0.046s
denis_1.05_x86_64-pc-linux-gnu__sse41 real 5m53.159s user 5m51.564s sys 0m0.019s
15) Message boards : Number crunching : Run or abort? (Message 696)
Posted 18 Nov 2015 by rjs5
Post:
hey rjs5,


ok first: you are right, im using 32 bit linux. i have done that test case and it was solved in about 30 seconds.
(my second denis task is about 29 hours now, 99.98 percent)

results (i think you are more interested in the time):
# time /var/lib/boinc-client/projects/denis.usj.es_denisathome/CRLP2011EPI_105_i686-pc-linux-gnu

real 0m31.036s
user 0m28.904s
sys 0m0.048s


Finishing the "in" file is good. That means that DENIS should be OK. DENIS is just taking a long time running the standard APP at the 500MHz or so that your AMD A8-7100 is clocked at. Denis only requires several MB of memory so it will not strain your capacity. If you run several different apps, I would periodically check for swapping. Swapping could kill you.

The standard DENIS app was taking 10 seconds and the optimized app was taking about 1.5 seconds on a VM that I was testing. I would probably let the 29-hour app finish but would install one of the optimized 32-bit apps which should execute 5 to 10 times faster.

You should be able to successfully run the crunchr optimized SSE3 32-bit linux app and it should run many times faster than the standard DENIS app.


Linux 32bit SSE3(INTEL/AMD compatible):
http://www.boincunited.org/opt_apps/denis/denis_1.05_x86_32-pc-linux-gnu__sse3_v3.tar.bz2

Linux 32bit SSE2(INTEL/AMD compatible):
http://www.boincunited.org/opt_apps/denis/denis_1.05_x86_32-pc-linux-gnu__sse2_v3.tar.bz2


From BritishBob .... install instructions ...
My standard check list when swapping the optimized apps:
-Disable new tasks (in client)
-Let the current tasks finish.
-Close BOINC Client.
-Exit BOINC client in bottom right icon menu thing (name escapes me atm)
-Copy APP to the proper location detailed somewhere above (I have it on a shortcut)
-Open up BOINC
-Allow new tasks
16) Message boards : Number crunching : Run or abort? (Message 693)
Posted 15 Nov 2015 by rjs5
Post:
thx a lot joel, that calms me down...
my second task was estimated around 3 hours and is now at approx. 18 hours, 98,5 percent..
i know, the credit i get for that is zero..
but lets see, what your magic is doing on this calculation-bug ^^



wol,
It looks like the common element on the systems with the long run time is they are running 32-bit Linux. You might just run a short test on your setup to see if will complete a DENIS computation. The DENIS source build tree has a 13-line text input test case that should run in less than a minute. It might help isolate the problem.

You can go to the boinc project directory ( probably something like /var/lib/boinc/projects/denis.usj.es_denisathome/ ) and create a file name "in" with the contents of the test case (below). You will see the other configuration input files in that directory named <task>.conf and can look at their contents.

You should find the Denis application executable binary there. It will have a name something like "CRLP2011EPT_105_i686-pc-linux-gnu"

You should be able to execute that application with no parameters and it will look for the "in" file by default and it should create an output file "out" by default.

time ./CRLP2011EPT_105_i686-pc-linux-gnu

TEST CASE lines for "in":
3000
0.002
50
2900
5
26 0.002
36 0.00
41 12.29
47 0.0673
56 25.0
2
0
1

the test case is from:
https://github.com/DENISproject/denis-boinc-baseapp/blob/master/in
17) Message boards : Number crunching : Optimized app ? (Message 689)
Posted 13 Nov 2015 by rjs5
Post:


And the task fails with the following error : 1 (0x1) Unknown error number

Does anyone know what's wrong ?

No one ?

I'm still getting some "finish file present too long" errors sometimes ...


G'Day toTOW,

I don't really know what might be at issue but I did notice that the computer you are having the most issues with (it has had 94 errors) is ID: 48314, an i7 with 8 CPUs and 4GB RAM.
Could the amount of RAM be the issue as it has the least memory (your i3 has 12GB and other i7's have up to 16Gb), if the computer runs out of memory and sits there waiting till it can get some it could be generating the long finish files?

Just a thought.

Conan



DENIS only takes about 4 MEGA BYTES per task. The BOINC message "finish file present too long" does not mean that the file is too long but seems to mean that a DENIS FINISH file was written but the DENIS task has not completed and exited yet. You can use the TASK MANAGER to see if you are exhausting memory and possibly paging to disk which would increase the size of a "race" window.

EXAMPLE: https://boinc.berkeley.edu/dev/forum_thread.php?id=10354


Look at the TASK MANAGER CPU USAGE or the PROCESS view and if you are oversubscribing the CPU, you might set the BOINC MANAGER to reduce the number of BOINC tasks.

BOINC will allow up to 1 job per CPU BUT!!! if you start too many, normal tasks will fight for the CPU and the BOINC jobs might actually run slower. I adjust for CPU usage to be between 90% to 95%. I usually run 1 task less than CPU.
18) Message boards : Number crunching : Fatal error: Allowed memory size of 209715200 bytes exhausted (Message 667)
Posted 9 Nov 2015 by rjs5
Post:
Hi!

As you said we are still learning and you are great crunchers. We started months ago and the increase of computing power make us to change the limits of the sever every.... 2 months more or less.

This error happen when you crunch so much and the system can't retrieve all those results in a single php response. This limit have been increased few months ago, but maybe it could be increased.

Best regards, Joel.


Hi Joel,
You are wearing many hats and doing a pretty good job. No complaints from me or anyone else that I can detect. I would be curious/interested in a thread describing your growing pains/problems, but that would burn some more of your time.

If you can, I would also be interested in your server host configurations.

thanks again
19) Message boards : Number crunching : Fatal error: Allowed memory size of 209715200 bytes exhausted (Message 663)
Posted 9 Nov 2015 by rjs5
Post:
Perhaps I should slow down and give them time to do their housekeeping !

Probably about 80,000 tasks I think, even so, that's a lot of memory per task.


IMO. Nope ... crunch away. The school is very, very young. San Jorge was established in 2005 and I suspect they are still early in the growth phase.

It would not surprise me to learn that your crunchers are 10x the size of their "server farm".
20) Message boards : Number crunching : Fatal error: Allowed memory size of 209715200 bytes exhausted (Message 661)
Posted 8 Nov 2015 by rjs5
Post:
Hi,

When logged in to my account I'm getting this error when I click to view Tasks...

Fatal error: Allowed memory size of 209715200 bytes exhausted (tried to allocate 32 bytes) in /home/boincadm/projects/denisathome/html/inc/db_conn.inc on line 119

If I click to view all computers on the account that works and I can see all the tasks for an individual computer. Just can't see them all in one go !


I think you crunched too many tasks for their system ... if so ... congrats!!!

Denis has seen the problem before and I suspect they are probably familiar with a fix. Time ... for tasks to age out of the visibility of the account should fix it if they don't changed the max buffer size.
http://denis.usj.es/denisathome/forum_thread.php?id=46&sort=5


Next 20