𝕏

Thought on Invalid Work Units.

Message boards : Number crunching : Thought on Invalid Work Units.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile cuphi

Send message
Joined: 9 Aug 22
Posts: 10
Credit: 2,219,354
RAC: 0
Message 2379 - Posted: 28 Apr 2024, 1:48:17 UTC
Last modified: 28 Apr 2024, 1:49:29 UTC

When I can I crunch number for DENIS@home using my Ryzen 9 3950X and Ryzen 7 3600X. In the BIOS of these two systems I have turn off the Performance Boost option so that they never got above the base max CPU speeds (3.6 GHz for the 3700 and 3.5 GHz for the 3950). 24 hours in a day time 60 minutes in an hour gives us 1440 minutes in a day. My average time to complete a WU is about 72 minutes (This is also very close the global average completion time of all volunteers donating time to the project) so 1440/72 = 20 WU's per thread and with 48 threads running 24/7 I complete very close to 960 WU's a day.

Now then, very close to 1% of my WU's come back as being invalid and although I have no idea how many WU's are completed in total every day I can still extrapolate some data by using an arbitrary number, like say 100000. If the total number of all WU's done in a day is 100000 and the invalid rate of WU's holds steady at around 1% that means that are more invalid tasks generated in a day than I day than I can crunch. Assuming all invalid tasks are spread evenly across all volunteers running DENIS@home we are collectively wasting enough electricity to power 1.85 terraflops of calculations (see the totals at the bottom of this chart: https://denis.usj.es/denisathome/cpu_list.php). That's a lot of CO2 being pumped into the atmosphere, especially when you add in the power need to keep all the systems cool.

With the Northern Summer fast approaching I would like to urge the computer scientists behind DENIS@home to find methods that reduce the number of invalid results. It might not easy. It might not even be possible. However I think it's worth the effort.
ID: 2379 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 6 Mar 23
Posts: 36
Credit: 2,078,354
RAC: 0
Message 2380 - Posted: 28 Apr 2024, 2:12:27 UTC - in response to Message 2379.  

very close to 1% of my WU's come back as being invalid


I do not get failures anywhere near what you get:

All tasks for computer 224473

Next 20
State: All (459) · In progress (186) · Validation pending (6) · Validation inconclusive (1) · Valid (265) · Invalid (1) · Error (0)
Application: All (459) · Beta of DENIS-myocyte (0) · Human ventricular cell models optimization (459) · New human ventricular cell model (0) 


Not obvious to me why it failed:

Task 43654207
Name 	HuVeMOp_20240422174246836079_ORdGradInc_k_11-BASE_MID-conf_978_1
Workunit 	20718422
Created 	22 Apr 2024, 16:21:34 UTC
Sent 	23 Apr 2024, 2:39:52 UTC
Report deadline 	26 Apr 2024, 2:39:52 UTC
Received 	25 Apr 2024, 3:33:46 UTC
Server state 	Over
Outcome 	Success
Client state 	Done
Exit status 	0 (0x00000000)
Computer ID 	224473
Run time 	1 hours 0 min 45 sec
CPU time 	1 hours 0 min 29 sec
Validate state 	Invalid
Credit 	0.00
Device peak FLOPS 	5.92 GFLOPS
Application version 	Human ventricular cell models optimization v0.02
x86_64-pc-linux-gnu
Peak working set size 	2.81 MB
Peak swap size 	3.88 MB
Peak disk usage 	0.32 MB
Stderr output

<core_client_version>7.20.2</core_client_version>
<![CDATA[
<stderr_txt>


===============================================================

                --- DENIS Myocyte Simulator ---                
                      Version 0.22 - Beta                      

---------------------------------------------------------------
                                          https://denis.usj.es 
                                         Universidad San Jorge 
===============================================================

Boinc file names: 
 - INPUT_FILENAME: ../../projects/denis.usj.es_denisathome/HuVeMOp_20240422174246836079_ORdGradInc_k_11-BASE_MID-conf_978.xml
 - OUTPUT_FILENAME: ../../projects/denis.usj.es_denisathome/HuVeMOp_20240422174246836079_ORdGradInc_k_11-BASE_MID-conf_978_1_r297516337_0
 - MARKERS_FILENAME: ../../projects/denis.usj.es_denisathome/HuVeMOp_20240422174246836079_ORdGradInc_k_11-BASE_MID-conf_978_1_r297516337_1
 - CHECKPOINT_FILE: denis_checkpoint
 - CHECKPOINT_FILE_2: denis_checkpoint_2


Configuration:
 - Model name: ORd_SS_biphasic
 - Solver: FE
 - Simulation time: 3000000.000000
 - dt: 0.002000
 - Constants to change: 14
     * GNa in INa: 105.611888
     * GNaL_b in INaL: 0.014499
     * Gto_b in Ito: 0.028115
     * PCa_b in ICaL: 0.000089
     * GKr_b in IKr: 0.041525
     * GKs_b in IKs: 0.002227
     * GK1_b in IK1: 0.168013
     * Gncx_b in INaCa_i: 0.000885
     * Pnak_b in INaK: 44.782199
     * GKb_b in IKb: 0.003256
     * PNab in INab: 0.000000
     * PCab in ICab: 0.000000
     * GpCa in IpCa: 0.053179
     * celltype in environment: 2.000000
Starting output configuration...
Output starts at iteration 1499500000.
Num states to save: 1.
Num rates to save: 0.
Num algebraics to save: 0.
Saving checkpoint... 
Iteration: 744561358, states[0] = 24.350441, states[40]= 0.000000
Saving checkpoint... 
Iteration: 1488768646, states[0] = -86.950091, states[40]= 0.000001
Total time: 3000000.000000
State Variables:
 - v in component membrane (millivolt): -87.123730
 - CaMKt in component CaMK (millimolar): 0.024610
 - cass in component intracellular_ions (millimolar): 0.000094
 - nai in component intracellular_ions (millimolar): 8.619925
 - nass in component intracellular_ions (millimolar): 8.620069
 - ki in component intracellular_ions (millimolar): 142.840449
 - kss in component intracellular_ions (millimolar): 142.840413
 - cansr in component intracellular_ions (millimolar): 2.029411
 - cajsr in component intracellular_ions (millimolar): 1.961385
 - cai in component intracellular_ions (millimolar): 0.000097
 - m in component INa (dimensionless): 0.007923
 - hf in component INa (dimensionless): 0.671373
 - hs in component INa (dimensionless): 0.671347
 - j in component INa (dimensionless): 0.671199
 - hsp in component INa (dimensionless): 0.424407
 - jp in component INa (dimensionless): 0.671094
 - mL in component INaL (dimensionless): 0.000217
 - hL in component INaL (dimensionless): 0.468505
 - hLp in component INaL (dimensionless): 0.233324
 - a in component Ito (dimensionless): 0.001053
 - iF in component Ito (dimensionless): 0.999491
 - iS in component Ito (dimensionless): 0.543418
 - ap in component Ito (dimensionless): 0.000537
 - iFp in component Ito (dimensionless): 0.999491
 - iSp in component Ito (dimensionless): 0.588244
 - d in component ICaL (dimensionless): 0.000000
 - ff in component ICaL (dimensionless): 1.000000
 - fs in component ICaL (dimensionless): 0.874226
 - fcaf in component ICaL (dimensionless): 1.000000
 - fcas in component ICaL (dimensionless): 0.999586
 - jca in component ICaL (dimensionless): 0.999945
 - ffp in component ICaL (dimensionless): 1.000000
 - fcafp in component ICaL (dimensionless): 1.000000
 - nca in component ICaL (dimensionless): 0.004026
 - xrf in component IKr (dimensionless): 0.000009
 - xrs in component IKr (dimensionless): 0.502252
 - xs1 in component IKs (dimensionless): 0.333262
 - xs2 in component IKs (dimensionless): 0.000210
 - xk1 in component IK1 (dimensionless): 0.996952
 - Jrelnp in component ryr (millimolar_per_millisecond): 0.000002
 - Jrelp in component ryr (millimolar_per_millisecond): 0.000002
23:33:32 (2183041): called boinc_finish(0)

</stderr_txt>
]]>



CoMBA Research Group Logo
Powered by BOINC 

ID: 2380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MJH333

Send message
Joined: 5 Aug 22
Posts: 5
Credit: 1,998,618
RAC: 0
Message 2381 - Posted: 28 Apr 2024, 8:21:06 UTC - in response to Message 2379.  

I would like to urge the computer scientists behind DENIS@home to find methods that reduce the number of invalid results. It might not easy. It might not even be possible. However I think it's worth the effort.
The scientists are aware of this and have said that they are trying to improve it. See
https://denis.usj.es/denisathome/forum_thread.php?id=266.

In that thread, entity points out that most of their invalids occur where the tasks are done by machines using different operating systems. This is also my experience. Entity suggested that the project consider using Homogeneous Redundancy, but this hasn’t been implemented to date.
ID: 2381 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MJH333

Send message
Joined: 5 Aug 22
Posts: 5
Credit: 1,998,618
RAC: 0
Message 2382 - Posted: 28 Apr 2024, 8:37:08 UTC - in response to Message 2380.  

I do not get failures anywhere near what you get:

All tasks for computer 224473

Next 20
State: All (459) · In progress (186) · Validation pending (6) · Validation inconclusive (1) · Valid (265) · Invalid (1) · Error (0)
Application: All (459) · Beta of DENIS-myocyte (0) · Human ventricular cell models optimization (459) · New human ventricular cell model (0) 
If your Validation inconclusive task is found to be Invalid, your invalid percentage will be 0.75%, which is pretty close (on a small sample) to that calculated by cuphi.

Not obvious to me why it failed:
Your Linux result was initially compared against a Windows result which gave rise to an inconclusive validation. Another task was issued, this time to a Windows machine, and the two Windows results validated leaving your result invalid.
ID: 2382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 8 Apr 15
Posts: 34
Credit: 389,238
RAC: 0
Message 2446 - Posted: 7 Jul 2024, 21:31:12 UTC

I see there is STILL A PROBLEM WITH MATCHING RESULTS on Linux to Windows and vice versa.

Most recent:
https://denis.usj.es/denisathome/workunit.php?wuid=26830656

Any update from the project manager on if this will ever be fixed?

It's a shame that completed work is not being cross-platform validated correctly and not receiving credit for work done.

ID: 2446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jesús Carro
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 18 Mar 15
Posts: 284
Credit: 2,748,608
RAC: 0
Message 2447 - Posted: 8 Jul 2024, 14:11:26 UTC - in response to Message 2446.  

Hi,
I have to apologize because we hoped to have resolved this in less time. We do not activate the homogeneous option, because although it would solve it, in many cases it would force an unnecessary homogeneity (not all tasks that come from different OS are rejected, only a small percentage).
However, we have temporarily activated it until we have solved it.

All the best,
Jesus.
Jesús Carro
Universidad San Jorge
@InSilicoHeart
ID: 2447 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 8 Apr 15
Posts: 34
Credit: 389,238
RAC: 0
Message 2448 - Posted: 8 Jul 2024, 16:01:06 UTC - in response to Message 2447.  

... However, we have temporarily activated it until we have solved it.

Jesus,
Thank you for making a temporary work around!

Hopefully, one day you can find the problem and have a fix for it.
ID: 2448 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Thought on Invalid Work Units.