𝕏

Posts by Robert Miles

1) Message boards : Number crunching : Error while downloading (Message 1384)
Posted 23 Mar 2018 by Robert Miles
Post:
I'm often getting error while downloading. Could you check if the lists of files for these tasks were set up correctly?

http://denis.usj.es/denisathome/result.php?resultid=204060
<file_name>GD_jcarro_20180321180916000000_Test_1-CLVPET_Block_Kr_70-conf_723.xml</file_name>

http://denis.usj.es/denisathome/result.php?resultid=204967
<file_name>GD_jcarro_20180321180959000000_Test_1-Coppini_2000-conf_714.xml</file_name>

http://denis.usj.es/denisathome/result.php?resultid=204945
<file_name>GD_jcarro_20180321180956000000_Test_1-TESZT_Block_Kr_15-conf_712.xml</file_name>
http://denis.usj.es/denisathome/result.php?resultid=204865
<file_name>GD_jcarro_20180321180954000000_Test_1-CLVPET-conf_712.xml</file_name>
2) Message boards : Number crunching : DENIS_BETA v1.05 checkpoints are SOMETIMES working (Message 1070)
Posted 19 Aug 2016 by Robert Miles
Post:
Task already finished now.
3) Message boards : Number crunching : DENIS_BETA v1.05 checkpoints are SOMETIMES working (Message 1066)
Posted 19 Aug 2016 by Robert Miles
Post:
http://denis.usj.es/denisathome/result.php?resultid=38399420
Beta Testing Version of D.E.N.I.S Application v1.05
now past its deadline

Tried to restart from checkpoints three times. Worked the first and third time,but not the second time.
This shows why most BOINC projects use TWO checkpoint files, so that if restarting from the latest one fails, it can try again with the earlier one instead of just restarting from the beginning.
Windows 10 needs restarts due to updates often enough that you will lose lots of CPU time if you don't use two checkpoint files.

Excerpts from stderr:

Doing CP It:3363671586
Doing CP It:3372124842
Doing CP It:3380582533MName:CRLP2011_EPI
MID:6
LENGTH: ALG : 118 ,RAT: 40 ,CONS: 111

STP ID:0 - V in component membrane
STP ID:23 - Ca_i in component Calcium_Concentrations
CONFIG END
LoadFromCP: it = 3380582533
Doing CP It:3388523891
Doing CP It:3396352746


Doing CP It:11690476630
Doing CP It:11698689469
Doing CP It:11706941073MName:CRLP2011_EPI
MID:6
LENGTH: ALG : 118 ,RAT: 40 ,CONS: 111

STP ID:0 - V in component membrane
STP ID:23 - Ca_i in component Calcium_Concentrations
CONFIG END
LoadFromCP: it = 0
Doing CP It:8288844
Doing CP It:16599039


Doing CP It:8789054880
Doing CP It:8796515591
Doing CP It:8804155383MName:CRLP2011_EPI
MID:6
LENGTH: ALG : 118 ,RAT: 40 ,CONS: 111

STP ID:0 - V in component membrane
STP ID:23 - Ca_i in component Calcium_Concentrations
CONFIG END
LoadFromCP: it = 8804155383
Doing CP It:8812633165
Doing CP It:8821108370

Should I abort it so you analyze the output files?
4) Message boards : Number crunching : Very long wus (Message 1039)
Posted 30 Jul 2016 by Robert Miles
Post:
Another restart for beta 1.03 under Windows 10.

Doing CP It:3986402711.00000016:47:06 (6160): called boinc_finish(0)
MName:CRLP2011_EPI
MID:6

I'm about to abort this task.
5) Message boards : Number crunching : Very long wus (Message 1033)
Posted 30 Jul 2016 by Robert Miles
Post:
Writing into disk is one of the most "expensive" things in the execution of the simulation, this is why at the beginning we have tried to simplify this part. But with those ultra-long WUs it could be necessary.


I will buy an SSD!


I already have an SSD in each of my desktops. However, the one for my Windows Vista computer does not have a Windows Vista driver available. You may want to check for a similar problem before you buy any.
6) Message boards : Number crunching : Very long wus (Message 1031)
Posted 29 Jul 2016 by Robert Miles
Post:
The beta 1.03 task under Windows 10 also started the infinite reruns.

Doing CP It:3980371292.000000
Doing CP It:3988241479.00000010:58:46 (8512): called boinc_finish(0)
MName:CRLP2011_EPI
MID:6

Even though the temp file was previously present and not empty, it could not find it after this restart.

Your next beta version appears to need to be able to tell whether the checkpoint file is not present, present but empty, present and not empty but with the contents not usable, and some other error in reading it.

The beta 1.03 task under Windows Vista started another rerun. When should I shut this down by aborting it? The temp file is still present but only 0 bytes.

Do you have a way to make your next batch of Windows tasks use a much smaller number of iterations, so they can be aimed at testing whether the Windows version of the application shuts down properly? Also, they should send any remaining temp file back to you, so you can check if it's even in the right format.

I believe I've read that some compilers allow creating files in such a way that they will automatically be deleted when the application that created them ends. You may need to inspect the Windows application to see if it does this.

Also, does the Windows application close the checkpoint file after it finishes writing a checkpoint? If not, I'd expect the checkpoint file to be more likely to be lost, since it might be smaller than the hard drive block size, and therefore still in cache but not yet written to the hard drive.
7) Message boards : Number crunching : Very long wus (Message 1026)
Posted 29 Jul 2016 by Robert Miles
Post:
On my other computer with a beta 1.03 task in progress, there is a temp file, but it's empty - 0 bytes.

47.338% progress, so it should have created many checkpoints by now.

Task 38385601

Microsoft does not appear to have forced any Tuesday night Windows 10 updates requiring a Windows restart this week, so it may still complete without trying to resume from a checkpoint.

A suggested feature for the next beta: For the first few checkpoints, report whether the checkpoint file even exists before trying to write the checkpoint, and if so, how many bytes it contains. You could also have the first few checkpoints report whether they were successful in writing to the checkpoint file.
8) Message boards : Number crunching : Very long wus (Message 1019)
Posted 28 Jul 2016 by Robert Miles
Post:
Last night, I decided to check whether beta 1.03 could recover from checkpoints, so I shut down BOINC and then did a Windows restart on one of my computers. The task started up again, could not find the checkpoint file, and restarted from the beginning. I do not expect it to finish by the deadline.


It looks like beta 1.03 does NOT fix the endless restarts problem.

A few lines from stderr.txt showing this:

Doing CP It:3396376837.000000
Doing CP It:3402929064.00000000:35:28 (2660): called boinc_finish(0)
MName:CRLP2011_EPI
MID:6

At least this time it created a file named temp, AFTER it restarted.

Task 38385075
9) Message boards : Number crunching : Very long wus (Message 1005)
Posted 26 Jul 2016 by Robert Miles
Post:
Some things I've seen on other BOINC projects:

1. Two separate checkpoint files per task, so that if the task is interrupted in the middle of writing a checkpoint, it can resume from the other one instead. This needs a method for determining which of the files has the more recent checkpoint, in case both contain valid checkpoints.

2. Something identifying which checkpoint is written placed at the beginning, which must match something at the end to indicate that the checkpoint is complete. Whatever is used should not match any other checkpoints from the same task.

3. Requests from the project for users to deliberately interrupt BOINC during beta tests, to check whether the checkpoints are working properly.

Windows 10 has at least one update most Tuesday nights, with delaying them difficult, so you may soon see many more checkpoint resume failures from Windows 10 users.
10) Message boards : Number crunching : Very long wus (Message 1000)
Posted 26 Jul 2016 by Robert Miles
Post:
Last night, I decided to check whether beta 1.03 could recover from checkpoints, so I shut down BOINC and then did a Windows restart on one of my computers. The task started up again, could not find the checkpoint file, and restarted from the beginning. I do not expect it to finish by the deadline.
11) Message boards : Number crunching : Very long wus (Message 985)
Posted 23 Jul 2016 by Robert Miles
Post:
:) That one with the large negative iteration numbers ... looks like a v1.07 task, where they had problems where the iteration number was being stored in a variable that was too small (hence the large negative numbers).

You need to cancel your v1.07 tasks, and get some v1.08 tasks.


Both were 1.07; both now aborted.
12) Message boards : Number crunching : Very long wus (Message 980)
Posted 23 Jul 2016 by Robert Miles
Post:
Robert:

Take a peek at the stderr.txt files in the slots folders for each of the tasks. Does it look like it started over on its own yet -- Called boinc_finish(0), then restarted all over, from iteration: 0 ?

If so, then the task already looped on you.


The one that's run the longest (1d 03:36:13 so far) has started from iteration 0 three times, with boinc_finish(0) only for the second time. Appears to have written many checkpoints, but never even tried to restart from any of them.

The other one (08:00:33 so far) had a LoadFromCP: line and many Doing CP It: lines (neither appeared for the other task); it might have started over once with this line:

Doing CP It:1141721694.000000MName:CRLP2011_EPI

The LoadFromCP: line and many Doing CP It: lines appeared only after this line. For example:

CONFIG END
LoadFromCP: it = -6110892645033682256337252711852045995486982415347792110886177594213665993470218171985733728434033033587033496356699725256440570264016656431698385745737884464543906958658299879007483863999332353110867870092786887207335856942329392643006019576502181256283885945035465978876472831944354395374476697335574440100913969004208979437429562438677774977071428939853534541475566471267849288046304198734754166816489965632689457386820297658428675099090853412911405636771391450031250674115123612619557337130249336382937620409401265463957689022391515650389931161503345110237497753094939731649355659964714257267804169098109870728723040764861336992280276580898652295780437425500632570729433442377566631411950434096222547411386458811123191812111714332983839623878205988539575062909922470855412160084978819241882902718261716530388263900607241152634232759461569517343048983412070339426621470310714167931097596516890271446417098644016034660822574004287594162691221897348942721047501664150162228770868255706100368034040796435428828639038807423772798402936065423263483411447648897941541627810012386044507472646611100208405994844963076102531395747872299212095676624931764997195519296319715353220851595186953683263643824775496487963807746118653593060511074576561528025103706232301679246824056526208460350640570487199446185779603308357521660409037683723063944453248295021859025145645448793464757479007787819804029912671551324780314579610854877520942362471275800099082258138711914204001771932453599568925919340157672561099980018471234030290184323011182124536568585195789339121533055037884541379395543103017665986423621168653843928286983804063138055958293537749962652372790837532067922652962358211690051325939381235025568122757932355165528688971988072450778342090576020207125710955595761829836850797441693285356857911814932360859682379609480974874935850921286246110592571222216116133627790444933668553883249905913831378094019158539908961899466489238569224183509003239792185068178413230751682438124167267860558336731748539133663243651862058957234962281361100085052373855738773432972529285124458453311576088997489353933995968566904939637411159037884497920.000000
Doing CP It:-6110892645033682256337252711852045995486982415347792110886177594213665993470218171985733728434033033587033496356699725256440570264016656431698385745737884464543906958658299879007483863999332353110867870092786887207335856942329392643006019576502181256283885945035465978876472831944354395374476697335574440100913969004208979437429562438677774977071428939853534541475566471267849288046304198734754166816489965632689457386820297658428675099090853412911405636771391450031250674115123612619557337130249336382937620409401265463957689022391515650389931161503345110237497753094939731649355659964714257267804169098109870728723040764861336992280276580898652295780437425500632570729433442377566631411950434096222547411386458811123191812111714332983839623878205988539575062909922470855412160084978819241882902718261716530388263900607241152634232759461569517343048983412070339426621470310714167931097596516890271446417098644016034660822574004287594162691221897348942721047501664150162228770868255706100368034040796435428828639038807423772798402936065423263483411447648897941541627810012386044507472646611100208405994844963076102531395747872299212095676624931764997195519296319715353220851595186953683263643824775496487963807746118653593060511074576561528025103706232301679246824056526208460350640570487199446185779603308357521660409037683723063944453248295021859025145645448793464757479007787819804029912671551324780314579610854877520942362471275800099082258138711914204001771932453599568925919340157672561099980018471234030290184323011182124536568585195789339121533055037884541379395543103017665986423621168653843928286983804063138055958293537749962652372790837532067922652962358211690051325939381235025568122757932355165528688971988072450778342090576020207125710955595761829836850797441693285356857911814932360859682379609480974874935850921286246110592571222216116133627790444933668553883249905913831378094019158539908961899466489238569224183509003239792185068178413230751682438124167267860558336731748539133663243651862058957234962281361100085052373855738773432972529285124458453311576088997489353933995968566904939637411159037884497920.000000
Doing CP It:-6110892645033682256337252711852045995486982415347792110886177594213665993470218171985733728434033033587033496356699725256440570264016656431698385745737884464543906958658299879007483863999332353110867870092786887207335856942329392643006019576502181256283885945035465978876472831944354395374476697335574440100913969004208979437429562438677774977071428939853534541475566471267849288046304198734754166816489965632689457386820297658428675099090853412911405636771391450031250674115123612619557337130249336382937620409401265463957689022391515650389931161503345110237497753094939731649355659964714257267804169098109870728723040764861336992280276580898652295780437425500632570729433442377566631411950434096222547411386458811123191812111714332983839623878205988539575062909922470855412160084978819241882902718261716530388263900607241152634232759461569517343048983412070339426621470310714167931097596516890271446417098644016034660822574004287594162691221897348942721047501664150162228770868255706100368034040796435428828639038807423772798402936065423263483411447648897941541627810012386044507472646611100208405994844963076102531395747872299212095676624931764997195519296319715353220851595186953683263643824775496487963807746118653593060511074576561528025103706232301679246824056526208460350640570487199446185779603308357521660409037683723063944453248295021859025145645448793464757479007787819804029912671551324780314579610854877520942362471275800099082258138711914204001771932453599568925919340157672561099980018471234030290184323011182124536568585195789339121533055037884541379395543103017665986423621168653843928286983804063138055958293537749962652372790837532067922652962358211690051325939381235025568122757932355165528688971988072450778342090576020207125710955595761829836850797441693285356857911814932360859682379609480974874935850921286246110592571222216116133627790444933668553883249905913831378094019158539908961899466489238569224183509003239792185068178413230751682438124167267860558336731748539133663243651862058957234962281361100085052373855738773432972529285124458453311576088997489353933995968566904939637411159037884497920.000000

I'm unable to tell whether those lines indicate valid iteration numbers or not.

Neither task has anything I recognize as a checkpoint file.
13) Message boards : Number crunching : Very long wus (Message 978)
Posted 22 Jul 2016 by Robert Miles
Post:
Two tasks in progress for me that currently look like my run time will be around 30 hours each, even though they were much faster for wingmates.

http://denis.usj.es/denisathome/result.php?resultid=38327994
http://denis.usj.es/denisathome/result.php?resultid=38327983
14) Message boards : Number crunching : Computers running out of work - no work available?? (Message 880)
Posted 5 Jun 2016 by Robert Miles
Post:
Hello,

We understand you, and we are doing the things the best we can. Those days have been really busy and we have been worried about other parts of the project ( and other projects, the end of the academic year...) and we have been pass less time here.

We are creating a new way to launch a vast number of simulations. It have a big part of study in order to make it flexible. We hope that in a near date we could have it done and working.

Best regards, Joel.


You might want to include a feature I've seen on some other BOINC projects. Initially set up the list of simulations to be done in some way that takes up much less disk space than if all of them were ready to send to clients.

Then have a server program that first examines how much disk space is free and how many workunits are ready to be sent, and if the number of workunits is low but enough space is available, expand a workunit on the initial list into a ready to be sent form. Repeat until either an adequate number of workunits are ready, or until free disk space is too low. For best results, count each expanded workunit as reserving enough hard drive space to allow uploading it after it runs. It's not visible to users whether a similar server server program condenses the files uploaded after the workunits run.
15) Message boards : Number crunching : Computers running out of work - no work available?? (Message 858)
Posted 18 Apr 2016 by Robert Miles
Post:

Go to the HOME page.
Drop down menu DENIS@HOME at the top right of the page.
"MOUSE OVER" the DENIS@HOME page and select the first entry "PROJECT STATUS".

I think this URL will get you there.

http://denis.usj.es/denishome/project-status/


It does. Thank you.
16) Message boards : Number crunching : Computers running out of work - no work available?? (Message 856)
Posted 17 Apr 2016 by Robert Miles
Post:
Just checked my computers and one of them is down to it's last 5 tasks. I then checked the server status and is seems that there is no more available work?
Can someone please let me know how long this will take to fix?


How do you check the server status? DENIS@home does not seem to offer a way to do it in the places I've seen on other BOINC projects.