New version with checkpoints // Nueva versión con checkpoints
Message boards :
News :
New version with checkpoints // Nueva versión con checkpoints
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Mar 15 Posts: 318 Credit: 3,753,399 RAC: 582 ![]() ![]() ![]() ![]() ![]() |
Dear Volunteers, Over the past few weeks, we have been testing the server and the new application developed by Iván. After verifying that the application is running well and having a general idea of the server's limits for the application, we have just launched a new version. This new version includes checkpoints so that we can run longer simulations without issues arising from restarts. Debugging checkpoints has always been one of the most challenging aspects when developing new versions, so we will be monitoring how things progress gradually. It will remain in beta mode, so please be patient if this leads to tasks that are not validated or other types of problems. One of the risks at this stage is that if the simulation does not restart correctly at the checkpoint, it becomes corrupted and the results will not match those submitted by other volunteers. Thank you very much to everyone who is running our beta version. Without this validation, it would be unthinkable to be able to advance towards creating better simulations. Sincerely, Jesús. =============================================================== Estimados voluntarios: En las últimas semanas hemos estado poniendo a prueba al servidor y la nueva aplicación desarrollada por Iván. Tras ver que la aplicación va bien y tenemos más o menos claros los límites del servidor para la aplicación, acabamos de lanzar una nueva versión. Esta nueva versión incluye checkpoints para así poder lanzar simulaciones más largas sin que sea un problema por reinicios. La depuración de los checkpoints siempre ha sido uno de los elementos más difíciles cuando desarrollamos nuevas versiones, así que vamos a ir viendo poco a poco cómo va. Seguirá en modo beta y tened paciencia si esto genera tareas que no se validan u otro tipo de problemas, es uno de los riesgos en este punto. Si no se reinicia bien en el checkpoint la simulación queda corrompida y los resultados no coincidirán con los enviados por otros voluntarios. Muchísimas gracias a todos los que estáis corriendo nuestra versión beta, sin esta validación sería impensable poder avanzar para hacer mejores simulaciones. Atentamente, Jesús. Jesús Carro Universidad San Jorge @InSilicoHeart |
![]() Send message Joined: 21 May 15 Posts: 31 Credit: 1,023,632 RAC: 4,460 ![]() ![]() ![]() ![]() ![]() |
|
Send message Joined: 12 Nov 23 Posts: 6 Credit: 1,031,649 RAC: 260 ![]() ![]() ![]() ![]() ![]() |
Wait, where can we download the beta version to run that instead? |
![]() Send message Joined: 19 Jun 24 Posts: 5 Credit: 186,233 RAC: 0 ![]() ![]() |
Wait, where can we download the beta version to run that instead? Change your project preferences to enable the Denis-Fiber beta app and also toggle the Test applications. https://denis.usj.es/denisathome/prefs.php?subset=project&updated=1 |
Send message Joined: 12 Nov 23 Posts: 6 Credit: 1,031,649 RAC: 260 ![]() ![]() ![]() ![]() ![]() |
Great! Thank you so much! |
![]() Send message Joined: 10 Aug 22 Posts: 6 Credit: 531,913 RAC: 64 ![]() ![]() ![]() |
Thanks for the update. I note that the tasks are very long - in excess of 14 hours on my laptop. Checkpoints are very regular - every few seconds. Are we required to do anything special to assist with the beta test? I have suspended and restarted tasks without any noticeable issues. I have also closed Boinc and restarted the computer without any issues. So it seems to be going well so far. |
Send message Joined: 6 Mar 23 Posts: 75 Credit: 2,443,839 RAC: 173 ![]() ![]() ![]() ![]() ![]() |
I note that the tasks are very long - in excess of 14 hours on my laptop. These 0.04 asks are a lot longer than the 0.03 tasks. IIRC, those were around 20 minutes each and these seem to be around 7 hours on my main (Linux) machine. So maybe my machine is twice as fast as yours. The current checkpoint was about 13 minutes ago and another one was about 25 minutes ago. I would not call those "every few seconds." Application Beta of DENIS-fiber 0.04 Name DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_2-conf_626 State Running Received Tue 27 May 2025 05:06:22 AM EDT Report deadline Wed 18 Jun 2025 06:42:21 AM EDT Estimated computation size 158,446 GFLOPs CPU time 03:13:18 CPU time since checkpoint 00:12:55 Elapsed time 03:14:00 Estimated time remaining 03:38:07 Fraction done 47.073% Virtual memory size 5.29 MB Working set size 3.86 MB Progress rate 14.400% per hour Executable denis-fiber_0.04_x86_64-pc-linux-gnu ![]() |
Send message Joined: 16 Jul 15 Posts: 15 Credit: 6,502,377 RAC: 1,382 ![]() ![]() ![]() ![]() ![]() |
The current checkpoint was about 13 minutes ago and another one was about 25 minutes ago. I would not call those "every few seconds."The chekpoint interval also depends on your computing preference: 'Request tasks to checkpoint at most every ...... seconds' |
Send message Joined: 7 Mar 23 Posts: 6 Credit: 213,861 RAC: 110 ![]() ![]() ![]() |
New version work is presently available for download from the DENIS server, but I'm not getting any for my Raspberry Pi 4. The previous beta version worked fine on this Pi. Do I need to reconfigure any settings? If so, what changes are needed? Thanks. |
Send message Joined: 6 Mar 23 Posts: 75 Credit: 2,443,839 RAC: 173 ![]() ![]() ![]() ![]() ![]() |
The chekpoint interval also depends on your computing preference: Oops: I forgot about that. Mine is set to 1801 seconds (about every half hour). But so many projects seem to ignore this that I forgot about it. ![]() |
Send message Joined: 12 Mar 23 Posts: 3 Credit: 4,983,948 RAC: 1,142 ![]() ![]() ![]() ![]() ![]() |
Plotted result of InitialTest_k_0-Test_0: ![]() Looks like the test WUs have the same input parameters, therefore producing the same output, given the checkpointing is working correctly... Input parameter files: md5sum 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_110.xml 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_560.xml 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_561.xml 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_614.xml 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_5-conf_73.xml 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_5-conf_867.xml 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_6-conf_166.xml 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_6-conf_425.xml 280ea3d133fb61739e7673ca2b0172bc DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_6-conf_657.xml Result files: md5sum 6b77f91bf95eea017891fb1e167a99f0 DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_110_1_r1508481016_0 6b77f91bf95eea017891fb1e167a99f0 DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_560_0_r33409334_0 6b77f91bf95eea017891fb1e167a99f0 DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_561_0_r403643408_0 bcb751d6c140566a52fccd7da3d5452e DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_110_1_r1508481016_1 bcb751d6c140566a52fccd7da3d5452e DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_560_0_r33409334_1 bcb751d6c140566a52fccd7da3d5452e DENIS_Fiber_Beta_20250527101314143177_InitialTest_k_0-Test_0-conf_561_0_r403643408_1 |
![]() Send message Joined: 10 Aug 22 Posts: 6 Credit: 531,913 RAC: 64 ![]() ![]() ![]() |
My first completed task took 6.5 hrs of CPU time, with a big increase followed by a rapid decrease in the estimated remaining time while it was crunching. I suppose it's a Boinc quirk which is no big deal as long as the checkpointing is both regular and successful. Now I wait for the "wingman". |
Send message Joined: 5 Aug 22 Posts: 11 Credit: 2,422,137 RAC: 162 ![]() ![]() ![]() ![]() ![]() |
These look to run about 15 hours or so on an I7- 3770 with Windows 7. Cheers |
Send message Joined: 31 May 15 Posts: 27 Credit: 1,487,475 RAC: 184 ![]() ![]() ![]() ![]() ![]() |
Mine are still not finished my iMac after 20 hours / 80% (i9 intel) but an AF colleague ended his tasks after 32 hours (I don't have the specs of his machine): less than 600 credits granted, I hope the final "non beta" app will be reevaluated about credits :) The good news is they seem to be working fine and not crashing, this is a good thing already. |
![]() Send message Joined: 5 Apr 25 Posts: 61 Credit: 641,224 RAC: 8,495 ![]() ![]() ![]() ![]() |
Credits have increased proportionally with the runtime, the nasty part is when your wingman pulls down the final score because of bad CPU benchmarking. ![]() ![]() |
Send message Joined: 5 Aug 22 Posts: 11 Credit: 2,422,137 RAC: 162 ![]() ![]() ![]() ![]() ![]() |
These look to run about 15 hours or so on an I7- 3770 with Windows 7. After running for about 19 hours it looks more like 22 to 23 hours per work unit om the Windows machine. On a Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz running Linux they take almost exactly 24 hours. Very consistent around that mark. No crashes on either machine. Cheers |
Send message Joined: 26 Mar 22 Posts: 1 Credit: 513,147 RAC: 1,635 ![]() ![]() ![]() ![]() |
My tasks are taking about 13 hours on Windows 11 with an Intel i9-10850K at 5GHz. I have had no problems with checkpoints. |
Send message Joined: 6 Mar 23 Posts: 75 Credit: 2,443,839 RAC: 173 ![]() ![]() ![]() ![]() ![]() |
These look to run about 15 hours or so on an I7- 3770 with Windows 7. On my machine, they take a little over 7 hours each. Estimated computation size 158,446 GFLOPs CPU time 01:31:53 CPU time since checkpoint 00:02:02 Elapsed time 01:32:43 Estimated time remaining 05:44:03 Fraction done 23.681% Virtual memory size 5.29 MB Working set size 3.81 MB Progress rate 15.480% per hour CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.54.1.el8_10.x86_64|libc 2.28] BOINC version 7.20.2 Memory 125.08 GB Cache 16896 KB Swap space 15.62 GB Total disk space 488.04 GB Free Disk Space 478.53 GB Measured floating point speed 5.86 billion ops/sec Measured integer speed 21.28 billion ops/sec ![]() |
Send message Joined: 12 Nov 23 Posts: 6 Credit: 1,031,649 RAC: 260 ![]() ![]() ![]() ![]() ![]() |
I tried it on my 2 different laptops (my 2019 MacBook as well as this refurbished gaming laptop running Windows 11), and for either computer, it took roughly 24 hours, spread out over the better part of a week. I remember the previous set of tasks from a year or so ago, which went by REALLY quick, comparatively speaking; are we ever going to go back to tasks that small and fast, or not really? |
![]() Send message Joined: 5 Apr 25 Posts: 61 Credit: 641,224 RAC: 8,495 ![]() ![]() ![]() ![]() |
I think the longer tasks are more efficient to manage for the research team, possibly decreasing the load on the validator and other parts of the server. Psychologically it's more pleasing to see your PC rip through hundreds of tasks a day instead of 8-12-16 but if it hinders the research... ![]() ![]() |