𝕏

New version with checkpoints // Nueva versión con checkpoints

Message boards : News : New version with checkpoints // Nueva versión con checkpoints
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Jesús Carro
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 18 Mar 15
Posts: 318
Credit: 3,753,399
RAC: 582
Message 3110 - Posted: 26 May 2025, 10:08:44 UTC

Dear Volunteers,

Over the past few weeks, we have been testing the server and the new application developed by Iván. After verifying that the application is running well and having a general idea of the server's limits for the application, we have just launched a new version. This new version includes checkpoints so that we can run longer simulations without issues arising from restarts. Debugging checkpoints has always been one of the most challenging aspects when developing new versions, so we will be monitoring how things progress gradually. It will remain in beta mode, so please be patient if this leads to tasks that are not validated or other types of problems. One of the risks at this stage is that if the simulation does not restart correctly at the checkpoint, it becomes corrupted and the results will not match those submitted by other volunteers.

Thank you very much to everyone who is running our beta version. Without this validation, it would be unthinkable to be able to advance towards creating better simulations.

Sincerely,
Jesús.

===============================================================

Estimados voluntarios: En las últimas semanas hemos estado poniendo a prueba al servidor y la nueva aplicación desarrollada por Iván. Tras ver que la aplicación va bien y tenemos más o menos claros los límites del servidor para la aplicación, acabamos de lanzar una nueva versión. Esta nueva versión incluye checkpoints para así poder lanzar simulaciones más largas sin que sea un problema por reinicios. La depuración de los checkpoints siempre ha sido uno de los elementos más difíciles cuando desarrollamos nuevas versiones, así que vamos a ir viendo poco a poco cómo va. Seguirá en modo beta y tened paciencia si esto genera tareas que no se validan u otro tipo de problemas, es uno de los riesgos en este punto. Si no se reinicia bien en el checkpoint la simulación queda corrompida y los resultados no coincidirán con los enviados por otros voluntarios.

Muchísimas gracias a todos los que estáis corriendo nuestra versión beta, sin esta validación sería impensable poder avanzar para hacer mejores simulaciones.

Atentamente,
Jesús.
Jesús Carro
Universidad San Jorge
@InSilicoHeart
ID: 3110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rilian
Avatar

Send message
Joined: 21 May 15
Posts: 31
Credit: 1,023,632
RAC: 4,460
Message 3113 - Posted: 26 May 2025, 17:30:24 UTC - in response to Message 3110.  

Thank you for update & good luck with the test!
--
I crunch for Ukraine

ID: 3113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Craig S. Weinstein

Send message
Joined: 12 Nov 23
Posts: 6
Credit: 1,031,649
RAC: 260
Message 3117 - Posted: 27 May 2025, 1:14:15 UTC

Wait, where can we download the beta version to run that instead?
ID: 3117 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Keith Myers
Avatar

Send message
Joined: 19 Jun 24
Posts: 5
Credit: 186,233
RAC: 0
Message 3118 - Posted: 27 May 2025, 4:27:07 UTC - in response to Message 3117.  

Wait, where can we download the beta version to run that instead?

Change your project preferences to enable the Denis-Fiber beta app and also toggle the Test applications.
https://denis.usj.es/denisathome/prefs.php?subset=project&updated=1
ID: 3118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Craig S. Weinstein

Send message
Joined: 12 Nov 23
Posts: 6
Credit: 1,031,649
RAC: 260
Message 3124 - Posted: 27 May 2025, 11:51:21 UTC - in response to Message 3118.  

Great! Thank you so much!
ID: 3124 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Warped

Send message
Joined: 10 Aug 22
Posts: 6
Credit: 531,913
RAC: 64
Message 3125 - Posted: 27 May 2025, 11:54:50 UTC

Thanks for the update.
I note that the tasks are very long - in excess of 14 hours on my laptop.
Checkpoints are very regular - every few seconds.

Are we required to do anything special to assist with the beta test?
I have suspended and restarted tasks without any noticeable issues.
I have also closed Boinc and restarted the computer without any issues.
So it seems to be going well so far.
ID: 3125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 6 Mar 23
Posts: 75
Credit: 2,443,839
RAC: 173
Message 3126 - Posted: 27 May 2025, 13:56:09 UTC - in response to Message 3125.  

I note that the tasks are very long - in excess of 14 hours on my laptop.
Checkpoints are very regular - every few seconds.


These 0.04 asks are a lot longer than the 0.03 tasks. IIRC, those were around 20 minutes each and these seem to be around 7 hours on my main (Linux) machine. So maybe my machine is twice as fast as yours.
The current checkpoint was about 13 minutes ago and another one was about 25 minutes ago. I would not call those "every few seconds."

Application Beta of DENIS-fiber 0.04 
Name DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_2-conf_626
State Running
Received Tue 27 May 2025 05:06:22 AM EDT
Report deadline Wed 18 Jun 2025 06:42:21 AM EDT
Estimated computation size 158,446 GFLOPs
CPU time 03:13:18
CPU time since checkpoint 00:12:55
Elapsed time 03:14:00
Estimated time remaining 03:38:07
Fraction done 47.073%
Virtual memory size 5.29 MB
Working set size 3.86 MB

Progress rate 14.400% per hour
Executable denis-fiber_0.04_x86_64-pc-linux-gnu

ID: 3126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Crystal Pellet

Send message
Joined: 16 Jul 15
Posts: 15
Credit: 6,502,377
RAC: 1,382
Message 3127 - Posted: 27 May 2025, 15:40:45 UTC - in response to Message 3126.  

The current checkpoint was about 13 minutes ago and another one was about 25 minutes ago. I would not call those "every few seconds."
The chekpoint interval also depends on your computing preference:

'Request tasks to checkpoint at most every ...... seconds'
ID: 3127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
denismac

Send message
Joined: 7 Mar 23
Posts: 6
Credit: 213,861
RAC: 110
Message 3129 - Posted: 27 May 2025, 15:57:57 UTC

New version work is presently available for download from the DENIS server, but I'm not getting any for my Raspberry Pi 4. The previous beta version worked fine on this Pi. Do I need to reconfigure any settings? If so, what changes are needed? Thanks.
ID: 3129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 6 Mar 23
Posts: 75
Credit: 2,443,839
RAC: 173
Message 3130 - Posted: 27 May 2025, 17:21:21 UTC - in response to Message 3127.  

The chekpoint interval also depends on your computing preference:


Oops: I forgot about that. Mine is set to 1801 seconds (about every half hour). But so many projects seem to ignore this that I forgot about it.
ID: 3130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rsNeutrino

Send message
Joined: 12 Mar 23
Posts: 3
Credit: 4,983,948
RAC: 1,142
Message 3131 - Posted: 27 May 2025, 17:54:28 UTC
Last modified: 27 May 2025, 18:10:15 UTC

Plotted result of InitialTest_k_0-Test_0:


Looks like the test WUs have the same input parameters, therefore producing the same output, given the checkpointing is working correctly...

Input parameter files:
md5sum
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_110.xml
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_560.xml
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_561.xml
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_614.xml
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_5-conf_73.xml
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_5-conf_867.xml
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_6-conf_166.xml
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_6-conf_425.xml
280ea3d133fb61739e7673ca2b0172bc  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_6-conf_657.xml

Result files:
md5sum
6b77f91bf95eea017891fb1e167a99f0  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_110_1_r1508481016_0
6b77f91bf95eea017891fb1e167a99f0  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_560_0_r33409334_0
6b77f91bf95eea017891fb1e167a99f0  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_561_0_r403643408_0
bcb751d6c140566a52fccd7da3d5452e  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_110_1_r1508481016_1
bcb751d6c140566a52fccd7da3d5452e  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_560_0_r33409334_1
bcb751d6c140566a52fccd7da3d5452e  DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_561_0_r403643408_1
ID: 3131 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Warped

Send message
Joined: 10 Aug 22
Posts: 6
Credit: 531,913
RAC: 64
Message 3132 - Posted: 27 May 2025, 18:29:31 UTC

My first completed task took 6.5 hrs of CPU time, with a big increase followed by a rapid decrease in the estimated remaining time while it was crunching.
I suppose it's a Boinc quirk which is no big deal as long as the checkpointing is both regular and successful.
Now I wait for the "wingman".
ID: 3132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sgt.Joe

Send message
Joined: 5 Aug 22
Posts: 11
Credit: 2,422,137
RAC: 162
Message 3133 - Posted: 28 May 2025, 2:46:04 UTC - in response to Message 3132.  
Last modified: 28 May 2025, 2:47:47 UTC

These look to run about 15 hours or so on an I7- 3770 with Windows 7.

Cheers
ID: 3133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 31 May 15
Posts: 27
Credit: 1,487,475
RAC: 184
Message 3137 - Posted: 28 May 2025, 16:51:07 UTC

Mine are still not finished my iMac after 20 hours / 80% (i9 intel) but an AF colleague ended his tasks after 32 hours (I don't have the specs of his machine): less than 600 credits granted, I hope the final "non beta" app will be reevaluated about credits :)

The good news is they seem to be working fine and not crashing, this is a good thing already.
ID: 3137 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Lanius collurio

Send message
Joined: 5 Apr 25
Posts: 61
Credit: 641,224
RAC: 8,495
Message 3139 - Posted: 28 May 2025, 17:16:47 UTC - in response to Message 3137.  

Credits have increased proportionally with the runtime, the nasty part is when your wingman pulls down the final score because of bad CPU benchmarking.
ID: 3139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sgt.Joe

Send message
Joined: 5 Aug 22
Posts: 11
Credit: 2,422,137
RAC: 162
Message 3140 - Posted: 28 May 2025, 19:51:15 UTC - in response to Message 3133.  
Last modified: 28 May 2025, 19:52:09 UTC

These look to run about 15 hours or so on an I7- 3770 with Windows 7.

Cheers

After running for about 19 hours it looks more like 22 to 23 hours per work unit om the Windows machine. On a Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz running Linux they take almost exactly 24 hours. Very consistent around that mark.

No crashes on either machine.

Cheers
ID: 3140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vester

Send message
Joined: 26 Mar 22
Posts: 1
Credit: 513,147
RAC: 1,635
Message 3141 - Posted: 28 May 2025, 20:06:11 UTC

My tasks are taking about 13 hours on Windows 11 with an Intel i9-10850K at 5GHz. I have had no problems with checkpoints.
ID: 3141 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 6 Mar 23
Posts: 75
Credit: 2,443,839
RAC: 173
Message 3142 - Posted: 28 May 2025, 21:37:44 UTC - in response to Message 3133.  

These look to run about 15 hours or so on an I7- 3770 with Windows 7.


On my machine, they take a little over 7 hours each.

Estimated computation size 158,446 GFLOPs
CPU time 01:31:53
CPU time since checkpoint 00:02:02
Elapsed time 01:32:43
Estimated time remaining 05:44:03
Fraction done 23.681%
Virtual memory size 5.29 MB
Working set size 3.81 MB
Progress rate 15.480% per hour


CPU type  GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16
Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.54.1.el8_10.x86_64|libc 2.28]
BOINC version 	7.20.2
Memory 	125.08 GB
Cache 	        16896 KB
Swap space 	15.62 GB
Total disk space 	488.04 GB
Free Disk Space 	478.53 GB
Measured floating point speed 	5.86 billion ops/sec
Measured integer speed 	       21.28 billion ops/sec

ID: 3142 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Craig S. Weinstein

Send message
Joined: 12 Nov 23
Posts: 6
Credit: 1,031,649
RAC: 260
Message 3148 - Posted: 30 May 2025, 0:48:28 UTC

I tried it on my 2 different laptops (my 2019 MacBook as well as this refurbished gaming laptop running Windows 11), and for either computer, it took roughly 24 hours, spread out over the better part of a week.

I remember the previous set of tasks from a year or so ago, which went by REALLY quick, comparatively speaking; are we ever going to go back to tasks that small and fast, or not really?
ID: 3148 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Lanius collurio

Send message
Joined: 5 Apr 25
Posts: 61
Credit: 641,224
RAC: 8,495
Message 3150 - Posted: 30 May 2025, 5:25:39 UTC - in response to Message 3148.  

I think the longer tasks are more efficient to manage for the research team, possibly decreasing the load on the validator and other parts of the server. Psychologically it's more pleasing to see your PC rip through hundreds of tasks a day instead of 8-12-16 but if it hinders the research...
ID: 3150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : New version with checkpoints // Nueva versión con checkpoints