𝕏

Lots of WUs "Error While Computing"

Message boards : Number crunching : Lots of WUs "Error While Computing"
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Steve Dodd

Send message
Joined: 8 Mar 16
Posts: 2
Credit: 5,523,829
RAC: 0
Message 895 - Posted: 9 Jul 2016, 2:01:00 UTC

Hope it's just my computers (but I wouldn't know why), but I've had nearly half of the WUs I got return Error While Computing. The ones that validate only run for a second? Someone please tell me what's going on.
ID: 895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jesús Carro
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 18 Mar 15
Posts: 284
Credit: 2,748,608
RAC: 0
Message 898 - Posted: 11 Jul 2016, 7:35:33 UTC

Hi Steve!
Thank you for your feedback and sorry for the problems. We will analyze what is happening.

Best,
Jesús
Jesús Carro
Universidad San Jorge
@InSilicoHeart
ID: 898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Col323

Send message
Joined: 5 Oct 15
Posts: 17
Credit: 1,335,501
RAC: 0
Message 899 - Posted: 11 Jul 2016, 15:48:23 UTC

One of my computers successfully completed a couple units which took a couple of hours, as expected. Its stderr.txt output is over 2400 lines long. However, this computer also successfully completed units which took seconds, and their stderr.txt is only about 50 lines. Here is an example:

Name GD_jcarro_20160708110503000000_SecondSimulations_SteadyState4000_conf_917.xml_0
Workunit 18800550
Created 8 Jul 2016, 9:05:04 UTC
Sent 8 Jul 2016, 11:33:53 UTC
Report deadline 22 Jul 2016, 11:33:53 UTC
Received 8 Jul 2016, 11:34:25 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 62844
Run time 2 sec
CPU time
Validate state Valid
Credit 0.01
Device peak FLOPS 2.60 GFLOPS
Application version Carro-Rodriguez-Laguna-Pueyo Epicardial Model (Carro et al. 2011) for human ventricular cells v1.06
Stderr output

<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
MName:CRLP2011_EPI
MID:0
OpT:12000000.000000
DT:0.002000
OutFreq:50
InT:11996000.000000
NumConstToChange:15
NumStatesToPrint:1
NumAlgToPrint:0
CC ID:16 NAME: G_Na in component Fast_Na_Current VALUE:13.5582
CC ID:17 NAME: G_Na_B in component Background_Na_Current VALUE:0.000690863
CC ID:23 NAME: G_Kr in component Rapidly_Activating_K_Current VALUE:0.0269904
CC ID:24 NAME: G_Ks in component Slowly_Activating_K_Current VALUE:0.00349788
CC ID:25 NAME: G_Kp in component Plateau_K_Current VALUE:0.00157695
CC ID:26 NAME: G_to in component Transient_Outward_K_Current VALUE:0.124306
CC ID:28 NAME: G_K1 in component Inward_Rectifier_K_Current VALUE:0.609042
CC ID:29 NAME: G_ClCa in component Ca_Activated_Cl_Current VALUE:0.0414283
CC ID:31 NAME: G_Cl_B in component Background_Cl_Current VALUE:0.0064337
CC ID:34 NAME: G_Ca in component L_Type_Calcium_Current VALUE:0.000175819
CC ID:47 NAME: G_Ca_B in component Background_Ca_Current VALUE:0.000676662
CC ID:19 NAME: Ibar_NaK in component Na_K_Pump_Current VALUE:0.907297
CC ID:44 NAME: Ibar_NCX in component Na_Ca_Exchanger_Current VALUE:5.70955
CC ID:46 NAME: Ibar_PMCA in component Sarcolemmal_Ca_Pump_Current VALUE:0.0681847
CC ID:7 NAME: I_Stim_CL in component membrane VALUE:4000
STP ID:0 - V in component membrane
CONFIG END
SolveModel 388
SolveModel 406 NUMC2CHANGE: 15
SolveModel 410 ITER: 0 , 16 --- 1.355820e+001
SolveModel 410 ITER: 1 , 17 --- 6.908630e-004
SolveModel 410 ITER: 2 , 23 --- 2.699040e-002
SolveModel 410 ITER: 3 , 24 --- 3.497880e-003
SolveModel 410 ITER: 4 , 25 --- 1.576950e-003
SolveModel 410 ITER: 5 , 26 --- 1.243060e-001
SolveModel 410 ITER: 6 , 28 --- 6.090420e-001
SolveModel 410 ITER: 7 , 29 --- 4.142830e-002
SolveModel 410 ITER: 8 , 31 --- 6.433700e-003
SolveModel 410 ITER: 9 , 34 --- 1.758190e-004
SolveModel 410 ITER: 10 , 47 --- 6.766620e-004
SolveModel 410 ITER: 11 , 19 --- 9.072970e-001
SolveModel 410 ITER: 12 , 44 --- 5.709550e+000
SolveModel 410 ITER: 13 , 46 --- 6.818470e-002
SolveModel 410 ITER: 14 , 7 --- 4.000000e+003
SolveModel 413
PRINTABLE_STATE ID:0
07:33:49 (5300): called boinc_finish(0)

</stderr_txt>
]]>
ID: 899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jesús Carro
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 18 Mar 15
Posts: 284
Credit: 2,748,608
RAC: 0
Message 900 - Posted: 11 Jul 2016, 20:01:13 UTC - in response to Message 899.  

Yes, we have detected that there is a problem with the files of SteadyState2000 and SteadyState4000. We are working to find what is happening. In local it works properly.

Best,
JEsús.
Jesús Carro
Universidad San Jorge
@InSilicoHeart
ID: 900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jesús Carro
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 18 Mar 15
Posts: 284
Credit: 2,748,608
RAC: 0
Message 903 - Posted: 15 Jul 2016, 7:05:14 UTC

Hi!
We have uploaded a new version in which one, the bug is fixed. If you detect it again, please report it here.

Thank you very much!

Best,
Jesús.
Jesús Carro
Universidad San Jorge
@InSilicoHeart
ID: 903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 1 Jul 15
Posts: 2
Credit: 243,560
RAC: 0
Message 916 - Posted: 16 Jul 2016, 13:57:25 UTC

I got also one result with error while computing
GD_jcarro_20160714201100000000_ThirdSimulations_SteadyState1000Schmidt98_conf_474.xml_0
CPU time: 23 hours
<message>
exceeded elapsed time limit 92043.99 (368000.00G/3.15G)
</message>
Matthias
ID: 916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
No.15

Send message
Joined: 11 Jul 16
Posts: 1
Credit: 78,376
RAC: 0
Message 933 - Posted: 17 Jul 2016, 12:56:55 UTC

I am having a lot of very long wu end with error while computing. If you need any info let me know.

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
MName:CRLP2011_EPI
MID:0
OpT:24000000.000000
DT:0.002000
OutFreq:50
InT:23992000.000000
NumConstToChange:21
NumStatesToPrint:2
NumAlgToPrint:0
CC ID:16 NAME: G_Na in component Fast_Na_Current VALUE:19.49890136
CC ID:17 NAME: G_Na_B in component Background_Na_Current VALUE:0.000473911734
CC ID:23 NAME: G_Kr in component Rapidly_Activating_K_Current VALUE:0.0333725
CC ID:24 NAME: G_Ks in component Slowly_Activating_K_Current VALUE:0.003911159
CC ID:25 NAME: G_Kp in component Plateau_K_Current VALUE:0.002040128
CC ID:26 NAME: G_to in component Transient_Outward_K_Current VALUE:0.13697164
CC ID:28 NAME: G_K1 in component Inward_Rectifier_K_Current VALUE:0.7089200967
CC ID:29 NAME: G_ClCa in component Ca_Activated_Cl_Current VALUE:0.069126099438
CC ID:31 NAME: G_Cl_B in component Background_Cl_Current VALUE:0.01069641
CC ID:34 NAME: G_Ca in component L_Type_Calcium_Current VALUE:0.0001851241056
CC ID:47 NAME: G_Ca_B in component Background_Ca_Current VALUE:0.0006057563114
CC ID:19 NAME: Ibar_NaK in component Na_K_Pump_Current VALUE:1.05791202
CC ID:44 NAME: Ibar_NCX in component Na_Ca_Exchanger_Current VALUE:4.823514
CC ID:46 NAME: Ibar_PMCA in component Sarcolemmal_Ca_Pump_Current VALUE:0.0557848354
CC ID:12 NAME: J_Ca_juncsl in component membrane VALUE:8.0706556422e-13
CC ID:13 NAME: J_Ca_slmyo in component membrane VALUE:4.0001812468e-12
CC ID:60 NAME: k_SR_leak in component SR_Fluxes VALUE:6.513019016e-06
CC ID:55 NAME: ks in component SR_Fluxes VALUE:28.42225
CC ID:58 NAME: V_max_SR_CaP in component SR_Fluxes VALUE:0.0050332844732
CC ID:7 NAME: I_Stim_CL in component membrane VALUE:8000
CC ID:33 NAME: Ca_o in component Calcium_Concentrations VALUE:2.5
STP ID:0 - V in component membrane
STP ID:23 - Ca_i in component Calcium_Concentrations
CONFIG END


Sniped a lot of CP


Doing CP It:6132953918.000000
Doing CP It:6138905580.000000
Doing CP It:6144935804.000000
Doing CP It:6151107187.000000
Doing CP It:6157221981.000000SIGSEGV: segmentation violation
Stack trace (8 frames):
../../projects/denis.usj.es_denisathome/CRLP2011EPI_107_x86_64-pc-linux-gnu(boinc_catch_signal+0x57)[0x485117]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7ff571a65330]
/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0x2d)[0x7ff5716d9c6d]
/lib/x86_64-linux-gnu/libc.so.6(_IO_fprintf+0x87)[0x7ff5716e4337]
../../projects/denis.usj.es_denisathome/CRLP2011EPI_107_x86_64-pc-linux-gnu[0x40ff61]
../../projects/denis.usj.es_denisathome/CRLP2011EPI_107_x86_64-pc-linux-gnu[0x41d309]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ff5716b1f45]
../../projects/denis.usj.es_denisathome/CRLP2011EPI_107_x86_64-pc-linux-gnu[0x4048b9]

Exiting...

</stderr_txt>
]]>
ID: 933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile jcastro
Avatar

Send message
Joined: 16 Mar 15
Posts: 219
Credit: 14,859
RAC: 0
Message 942 - Posted: 18 Jul 2016, 10:43:21 UTC - in response to Message 933.  

Hi!

This bug should be fixed using version 1.08 of CRLP2011 applications. Refresh your jobs.

Best regards, Joel.
ID: 942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Viking69
Avatar

Send message
Joined: 21 May 15
Posts: 5
Credit: 1,874,902
RAC: 0
Message 955 - Posted: 20 Jul 2016, 4:37:01 UTC

A lot of wasted hours of CPU time......I'm not sure I should trust this project. My work was listed as error but 2 Linux users completed in a lot shorter time frame. There should be a way to keep the OS's separate. .
ID: 955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Col323

Send message
Joined: 5 Oct 15
Posts: 17
Credit: 1,335,501
RAC: 0
Message 962 - Posted: 20 Jul 2016, 17:22:56 UTC

I don't think it's quite fixed. On a Linux box:

GD_jcarro_20160714201445000000_ThirdSimulations_SteadyState2000Schmidt98_conf_93.xml_2

Outcome Computation error
Client state Compute error
Exit status 193 (0xc1) EXIT_SIGNAL
Computer ID 62747
Run time 1 days 1 hours 5 min 53 sec
CPU time 1 days 0 hours 20 min 58 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 0.77 GFLOPS
Application version Carro-Rodriguez-Laguna-Pueyo Epicardial Model (Carro et al. 2011) for human ventricular cells v1.08
Peak working set size 4.36 MB
Peak swap size 14.12 MB
Peak disk usage 0.05 MB

Doing CP It:2185491915.000000
Doing CP It:2187622491.000000
Doing CP It:2189763663.000000
Doing CP It:2191921198.000000
Doing CP It:2194083377.000000
Doing CP It:2196235201.000000SIGSEGV: segmentation violation
Stack trace (8 frames):
../../projects/denis.usj.es_denisathome/CRLP2011EPI_108_x86_64-pc-linux-gnu(boinc_catch_signal+0x57)[0x4ca917]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10d10)[0x7f8fe7dadd10]
/lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0x24)[0x7f8fe7a1cc44]
/lib/x86_64-linux-gnu/libc.so.6(_IO_fprintf+0x87)[0x7f8fe7a27b97]
../../projects/denis.usj.es_denisathome/CRLP2011EPI_108_x86_64-pc-linux-gnu[0x41b225]
../../projects/denis.usj.es_denisathome/CRLP2011EPI_108_x86_64-pc-linux-gnu[0x462b0f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f8fe79f3a40]
../../projects/denis.usj.es_denisathome/CRLP2011EPI_108_x86_64-pc-linux-gnu[0x4048b9]

Exiting...

</stderr_txt>
]]>

There are 4 related units, 2 are In Progress, and 2 also errored. One of the errors was also on version 1.08.
ID: 962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile jcastro
Avatar

Send message
Joined: 16 Mar 15
Posts: 219
Credit: 14,859
RAC: 0
Message 964 - Posted: 20 Jul 2016, 17:58:48 UTC - in response to Message 962.  

Hi, we will take it into account. That seems to be related to something during the calculus inside the simulation, we will upgrade our app as soon as possible.

Best regards, Joel.
ID: 964 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Viking69
Avatar

Send message
Joined: 21 May 15
Posts: 5
Credit: 1,874,902
RAC: 0
Message 988 - Posted: 24 Jul 2016, 16:56:02 UTC
Last modified: 24 Jul 2016, 16:59:51 UTC

http://denis.usj.es/denisathome/result.php?resultid=38323222

Why? all the results lately seem to fail after a couple of days on a Windows PC?

If windows is going to fail, do not provide WU's for them. That is on your end.

I am using 7.6.22 and I have updated my PC OS and Video drivers, so I am current.

I am letting one of the 1.08 tasks continue to see if it completes as expected, but it still says 1 day and 7 hours to go after 8 hours of work. Not at all like the Linux boxes.
ID: 988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
skgiven

Send message
Joined: 12 Apr 15
Posts: 1
Credit: 178,729
RAC: 0
Message 991 - Posted: 25 Jul 2016, 5:46:41 UTC - in response to Message 988.  

Task 38376354 failed on a Linux system, app v1.08:

Compute error for a SteadyState3000 simulation.
Exit status 193 (0xc1) EXIT_SIGNAL
Run time 1 days 0 hours 32 min 10 sec
CPU time 17 hours 57 min
SIGSEGV: segmentation violation


Task 38355439 also failed after 4 days 21 hours albeit on app v1.07.

That system still performs reasonably well; 36 Valid, 3 Invalid and only 2 finished in an error.
ID: 991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Lots of WUs "Error While Computing"