Thought on Invalid Work Units.
Message boards :
Number crunching :
Thought on Invalid Work Units.
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Aug 22 Posts: 10 Credit: 2,219,354 RAC: 0 |
When I can I crunch number for DENIS@home using my Ryzen 9 3950X and Ryzen 7 3600X. In the BIOS of these two systems I have turn off the Performance Boost option so that they never got above the base max CPU speeds (3.6 GHz for the 3700 and 3.5 GHz for the 3950). 24 hours in a day time 60 minutes in an hour gives us 1440 minutes in a day. My average time to complete a WU is about 72 minutes (This is also very close the global average completion time of all volunteers donating time to the project) so 1440/72 = 20 WU's per thread and with 48 threads running 24/7 I complete very close to 960 WU's a day. Now then, very close to 1% of my WU's come back as being invalid and although I have no idea how many WU's are completed in total every day I can still extrapolate some data by using an arbitrary number, like say 100000. If the total number of all WU's done in a day is 100000 and the invalid rate of WU's holds steady at around 1% that means that are more invalid tasks generated in a day than I day than I can crunch. Assuming all invalid tasks are spread evenly across all volunteers running DENIS@home we are collectively wasting enough electricity to power 1.85 terraflops of calculations (see the totals at the bottom of this chart: https://denis.usj.es/denisathome/cpu_list.php). That's a lot of CO2 being pumped into the atmosphere, especially when you add in the power need to keep all the systems cool. With the Northern Summer fast approaching I would like to urge the computer scientists behind DENIS@home to find methods that reduce the number of invalid results. It might not easy. It might not even be possible. However I think it's worth the effort. |
Send message Joined: 6 Mar 23 Posts: 36 Credit: 2,078,354 RAC: 0 |
very close to 1% of my WU's come back as being invalid I do not get failures anywhere near what you get: All tasks for computer 224473 Next 20 State: All (459) · In progress (186) · Validation pending (6) · Validation inconclusive (1) · Valid (265) · Invalid (1) · Error (0) Application: All (459) · Beta of DENIS-myocyte (0) · Human ventricular cell models optimization (459) · New human ventricular cell model (0) Not obvious to me why it failed: Task 43654207 Name HuVeMOp_20240422174246836079_ORdGradInc_k_11-BASE_MID-conf_978_1 Workunit 20718422 Created 22 Apr 2024, 16:21:34 UTC Sent 23 Apr 2024, 2:39:52 UTC Report deadline 26 Apr 2024, 2:39:52 UTC Received 25 Apr 2024, 3:33:46 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 224473 Run time 1 hours 0 min 45 sec CPU time 1 hours 0 min 29 sec Validate state Invalid Credit 0.00 Device peak FLOPS 5.92 GFLOPS Application version Human ventricular cell models optimization v0.02 x86_64-pc-linux-gnu Peak working set size 2.81 MB Peak swap size 3.88 MB Peak disk usage 0.32 MB Stderr output <core_client_version>7.20.2</core_client_version> <![CDATA[ <stderr_txt> =============================================================== --- DENIS Myocyte Simulator --- Version 0.22 - Beta --------------------------------------------------------------- https://denis.usj.es Universidad San Jorge =============================================================== Boinc file names: - INPUT_FILENAME: ../../projects/denis.usj.es_denisathome/HuVeMOp_20240422174246836079_ORdGradInc_k_11-BASE_MID-conf_978.xml - OUTPUT_FILENAME: ../../projects/denis.usj.es_denisathome/HuVeMOp_20240422174246836079_ORdGradInc_k_11-BASE_MID-conf_978_1_r297516337_0 - MARKERS_FILENAME: ../../projects/denis.usj.es_denisathome/HuVeMOp_20240422174246836079_ORdGradInc_k_11-BASE_MID-conf_978_1_r297516337_1 - CHECKPOINT_FILE: denis_checkpoint - CHECKPOINT_FILE_2: denis_checkpoint_2 Configuration: - Model name: ORd_SS_biphasic - Solver: FE - Simulation time: 3000000.000000 - dt: 0.002000 - Constants to change: 14 * GNa in INa: 105.611888 * GNaL_b in INaL: 0.014499 * Gto_b in Ito: 0.028115 * PCa_b in ICaL: 0.000089 * GKr_b in IKr: 0.041525 * GKs_b in IKs: 0.002227 * GK1_b in IK1: 0.168013 * Gncx_b in INaCa_i: 0.000885 * Pnak_b in INaK: 44.782199 * GKb_b in IKb: 0.003256 * PNab in INab: 0.000000 * PCab in ICab: 0.000000 * GpCa in IpCa: 0.053179 * celltype in environment: 2.000000 Starting output configuration... Output starts at iteration 1499500000. Num states to save: 1. Num rates to save: 0. Num algebraics to save: 0. Saving checkpoint... Iteration: 744561358, states[0] = 24.350441, states[40]= 0.000000 Saving checkpoint... Iteration: 1488768646, states[0] = -86.950091, states[40]= 0.000001 Total time: 3000000.000000 State Variables: - v in component membrane (millivolt): -87.123730 - CaMKt in component CaMK (millimolar): 0.024610 - cass in component intracellular_ions (millimolar): 0.000094 - nai in component intracellular_ions (millimolar): 8.619925 - nass in component intracellular_ions (millimolar): 8.620069 - ki in component intracellular_ions (millimolar): 142.840449 - kss in component intracellular_ions (millimolar): 142.840413 - cansr in component intracellular_ions (millimolar): 2.029411 - cajsr in component intracellular_ions (millimolar): 1.961385 - cai in component intracellular_ions (millimolar): 0.000097 - m in component INa (dimensionless): 0.007923 - hf in component INa (dimensionless): 0.671373 - hs in component INa (dimensionless): 0.671347 - j in component INa (dimensionless): 0.671199 - hsp in component INa (dimensionless): 0.424407 - jp in component INa (dimensionless): 0.671094 - mL in component INaL (dimensionless): 0.000217 - hL in component INaL (dimensionless): 0.468505 - hLp in component INaL (dimensionless): 0.233324 - a in component Ito (dimensionless): 0.001053 - iF in component Ito (dimensionless): 0.999491 - iS in component Ito (dimensionless): 0.543418 - ap in component Ito (dimensionless): 0.000537 - iFp in component Ito (dimensionless): 0.999491 - iSp in component Ito (dimensionless): 0.588244 - d in component ICaL (dimensionless): 0.000000 - ff in component ICaL (dimensionless): 1.000000 - fs in component ICaL (dimensionless): 0.874226 - fcaf in component ICaL (dimensionless): 1.000000 - fcas in component ICaL (dimensionless): 0.999586 - jca in component ICaL (dimensionless): 0.999945 - ffp in component ICaL (dimensionless): 1.000000 - fcafp in component ICaL (dimensionless): 1.000000 - nca in component ICaL (dimensionless): 0.004026 - xrf in component IKr (dimensionless): 0.000009 - xrs in component IKr (dimensionless): 0.502252 - xs1 in component IKs (dimensionless): 0.333262 - xs2 in component IKs (dimensionless): 0.000210 - xk1 in component IK1 (dimensionless): 0.996952 - Jrelnp in component ryr (millimolar_per_millisecond): 0.000002 - Jrelp in component ryr (millimolar_per_millisecond): 0.000002 23:33:32 (2183041): called boinc_finish(0) </stderr_txt> ]]> CoMBA Research Group Logo Powered by BOINC |
Send message Joined: 5 Aug 22 Posts: 5 Credit: 1,998,618 RAC: 0 |
I would like to urge the computer scientists behind DENIS@home to find methods that reduce the number of invalid results. It might not easy. It might not even be possible. However I think it's worth the effort.The scientists are aware of this and have said that they are trying to improve it. See https://denis.usj.es/denisathome/forum_thread.php?id=266. In that thread, entity points out that most of their invalids occur where the tasks are done by machines using different operating systems. This is also my experience. Entity suggested that the project consider using Homogeneous Redundancy, but this hasn’t been implemented to date. |
Send message Joined: 5 Aug 22 Posts: 5 Credit: 1,998,618 RAC: 0 |
I do not get failures anywhere near what you get:If your Validation inconclusive task is found to be Invalid, your invalid percentage will be 0.75%, which is pretty close (on a small sample) to that calculated by cuphi. Not obvious to me why it failed:Your Linux result was initially compared against a Windows result which gave rise to an inconclusive validation. Another task was issued, this time to a Windows machine, and the two Windows results validated leaving your result invalid. |
Send message Joined: 8 Apr 15 Posts: 34 Credit: 389,238 RAC: 0 |
I see there is STILL A PROBLEM WITH MATCHING RESULTS on Linux to Windows and vice versa. Most recent: https://denis.usj.es/denisathome/workunit.php?wuid=26830656 Any update from the project manager on if this will ever be fixed? It's a shame that completed work is not being cross-platform validated correctly and not receiving credit for work done. |
Send message Joined: 18 Mar 15 Posts: 284 Credit: 2,748,608 RAC: 0 |
Hi, I have to apologize because we hoped to have resolved this in less time. We do not activate the homogeneous option, because although it would solve it, in many cases it would force an unnecessary homogeneity (not all tasks that come from different OS are rejected, only a small percentage). However, we have temporarily activated it until we have solved it. All the best, Jesus. Jesús Carro Universidad San Jorge @InSilicoHeart |
Send message Joined: 8 Apr 15 Posts: 34 Credit: 389,238 RAC: 0 |
... However, we have temporarily activated it until we have solved it. Jesus, Thank you for making a temporary work around! Hopefully, one day you can find the problem and have a fix for it. |