𝕏

Posts by Pettra

1) Message boards : Number crunching : WU ready to send but none received (Message 788)
Posted 1 Feb 2016 by Pettra
Post:
Nice to have work again! Thank you :)
2) Message boards : Number crunching : version 1.04 checkpoint experiments (windows) and now 1.05 experiments... (Message 308)
Posted 17 Jun 2015 by Pettra
Post:
Recovery from a checkpoint usually cost some time and we are working now to decrease it.

Great! :) Thanks Joel. Recovery time doesn't seem to be significant when suspended tasks are stored in memory however (experiment #2 below) - so good news! :)

First though, I have a little more to add to my last post:

EXPERIMENT #1 (cont)
Tasks suspended without "leave applications in memory” checked...
AF201506161800_300_2172: http://denis.usj.es/denisathome/result.php?resultid=4901641
suspended with elapsed time of 0:30:18 (checkpointed at 0:29:00 according to properties dialog box)
Task finished with runtime of: 1:01:29

Task AF201506161800_300_2129: http://denis.usj.es/denisathome/result.php?resultid=4901554
suspended with elapsed time of 0:30:19 (checkpointed at 0:29:13 according to properties dialog box)
Suspended again at 1.00.13 (checkpoint per properties dialog box 0:59:29)
Task finished with runtime of: 1:32:13

AF201506161800_300_2160: http://denis.usj.es/denisathome/result.php?resultid=4901617
suspended with elapsed time of 0:20:00 (checkpointed at 0:19:08 according to properties dialog box)
Task finished with runtime of 1:00:23 so could have been a longer wu

AF201506161800_300_2155 http://denis.usj.es/denisathome/result.php?resultid=4901606
suspended with elapsed time of 0:10:47 (checkpointed at 0:09:37 according to properties dialog box))
Task finished with runtime of 0:49:56

THEY ALL VALIDATED instead of erroring out!!! YAY!!! :)


EXPERIMENT #2: Suspended WITH "leave applications in memory” checked...

Task AF201506161800_300_2112 http://denis.usj.es/denisathome/result.php?resultid=4901520
suspended with elapsed time of 0:28:34 (checkpointed at 0:28:06 according to properties dialog box)
Task finished with runtime of 0:31:18

Task AF201506161800_300_2125 http://denis.usj.es/denisathome/result.php?resultid=4901547
suspended with elapsed time of 0:30:22 (checkpointed at 0:29:11 according to properties dialog box)
Task finished with runtime of 0:32:17

Runtimes effectively match my average task times :)

Not only that... but when I did this with version 1.04 boinc crashed but task validated...

This time boinc DIDN'T crash and tasks validated!! :)
So YAY again :)
3) Message boards : Number crunching : version 1.04 checkpoint experiments (windows) and now 1.05 experiments... (Message 300)
Posted 17 Jun 2015 by Pettra
Post:
Sorry. Slight delay since my last message :( power outage. Only had chance to do this so far.

Suspending without "leave applications in memory” checked

Task resumed normally :) and didn't error out :) and validated :)

"Properties dialogue box" appeared to show checkpointing correctly, as did the event log (with checkpoint debug enabled)

HOWEVER :/ runtime seems to extend - by about the same time that the task reached when it was suspended.

for example: with my current task runtimes averaging around 33 minutes, I suspended this one after thirty minutes:

AF201506161800_300_2166 http://denis.usj.es/denisathome/result.php?resultid=4901628 It took an extra 32 minutes to complete (runtime total of 1:02:48) but did validate.

Might just be a longer wu of course, and so a coincidence, so I will repeat with a couple more and will also suspend one more than once :) just to see what happens... :)

Will experiment with "leave applications in memory” checked, sometime tomorrow hopefully.
4) Message boards : Number crunching : version 1.04 checkpoint experiments (windows) and now 1.05 experiments... (Message 292)
Posted 15 Jun 2015 by Pettra
Post:
As requested here :) I will try same experiment I did with version 1.04, but with the 1.05 version - although it might not be till tomorrow now. Hope that's ok.

edit: to alter thread title :)
5) Message boards : Number crunching : version 1.04 checkpoint experiments (windows) and now 1.05 experiments... (Message 269)
Posted 4 Jun 2015 by Pettra
Post:
If this is useful.

Experiment #1:
Suspended task http://denis.usj.es/denisathome/result.php?resultid=4610080 without "leave applications in memory while suspended".
Ended in computation error.

Experiment #2:
Suspended task http://denis.usj.es/denisathome/result.php?resultid=4610161 with "leave applications in memory while suspended". Event log reported task checkpointing approximately every 1 minute 6 seconds.

Reached 100% and BOINC Manager immediately crashed.

Restarted BOINC and all tasks (including other project tasks) had disappeared. Had to reboot computer. After doing so, all remaining Denis tasks (8 of them) errored out immediately. examples: one that had been in progress at time of crash http://denis.usj.es/denisathome/result.php?resultid=4610069 and one that was not http://denis.usj.es/denisathome/result.php?resultid=4610172

Now waiting to see if the task I suspended validates against wingman. http://denis.usj.es/denisathome/workunit.php?wuid=2254606
Stderr output looks normal.

Might just be a problem my end.

edit: suspending denis tasks is not something I actually have ever NEEDED to do. So it is not likely to be a problem for me. I was just being curious.
6) Message boards : Cafe : The inevitable ATA thread (Message 167)
Posted 5 May 2015 by Pettra
Post:
Hello. Nice project. Be good to help.