𝕏

Database error

Message boards : Number crunching : Database error
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Michael H.W. Weber
Avatar

Send message
Joined: 9 Apr 15
Posts: 11
Credit: 328,091
RAC: 106
Message 2268 - Posted: 8 Mar 2024, 9:29:37 UTC

Every day, your project is down for hours shortly after midnight.
Today, there was a database error for at least 7 hours.
Remember how you crashed the entire system several years ago without being able to recover it fully with results & stats being partly lost?
You need to thoroughly check what is going on there - just a friendly reminder...

Michael.
President of Rechenkraft.net, Principal Investigator of the RNA World distributed supercomputer.

ID: 2268 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TPCBF

Send message
Joined: 11 Oct 23
Posts: 24
Credit: 3,008,042
RAC: 9,854
Message 2269 - Posted: 11 Mar 2024, 16:17:07 UTC - in response to Message 2268.  
Last modified: 11 Mar 2024, 16:22:19 UTC

Every day, your project is down for hours shortly after midnight.
Well, yes, the "system" is down for about one (1) hour (give or take 15 min), roughly at 3pm (15:00). now 4pm (16:00) here in LA, which seems like daily maintenance. It is certainly NOT "for hours"....
Today, there was a database error for at least 7 hours.
Remember how you crashed the entire system several years ago without being able to recover it fully with results & stats being partly lost?
You need to thoroughly check what is going on there - just a friendly reminder...
Ok, you might wanna try the hot mess that reigns supreme at World Community Grid for comparison. We barely had more than 3-4 weeks where the system was barely running stable.

For all I can tell, here at DENIS, the "system went down" sometime on Saturday (here in Los Angeles, before we switched to Daylight Savings Time on Sunday morning) and it seems barely have come back up early this morning with WUs uploading and reporting (though very slowly, which I take is the result of all clients trying to upload/report all over the world).
Now this is the second time that this happened like this within a month, and for my part, some info from the system admins about what is/was going on and the current status would be nice. And if the website/forum is down, a post on FB and/or X/Twitter would be nice.
What a this point would be appreciated to get at least a quick update about the status would appreciated, as it is a bit concerning that with most of the pending WUs uploaded and reported, the system status page, as well as the personal account info page, show the status from before the "crash". leaving the possibility that all those pending WUs could potentially be lost...

Thanks for any update on the current situation would be highly appreciated...

PS: As soon as I had sent off the above post, the personal account Tasks page finally came up and it shows that all those after the "crash" uploaded and reported WUs are marked as "pending validation", so nothing seems to be lost at this point, just that the system has to do some catch up to do...
Ralf
ID: 2269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jonathan

Send message
Joined: 29 Feb 24
Posts: 6
Credit: 152,187
RAC: 3,817
Message 2271 - Posted: 11 Mar 2024, 16:51:23 UTC

Server status pages update only every so often for BOINC projects, it's not continuous.

"Task data as of 10 Mar 2024, 10:05:47 UTC" when I checked here.
ID: 2271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TPCBF

Send message
Joined: 11 Oct 23
Posts: 24
Credit: 3,008,042
RAC: 9,854
Message 2272 - Posted: 11 Mar 2024, 17:11:54 UTC - in response to Message 2271.  

Server status pages update only every so often for BOINC projects, it's not continuous.
I am well aware of this. And you're are missing the point...

Ralf
ID: 2272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 9 Apr 15
Posts: 171
Credit: 1,409,776
RAC: 1,397
Message 2273 - Posted: 11 Mar 2024, 17:15:54 UTC - in response to Message 2269.  

Thanks for any update on the current situation would be highly appreciated...


I have over 130 wus to report and some news will be welcome during these down
ID: 2273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TPCBF

Send message
Joined: 11 Oct 23
Posts: 24
Credit: 3,008,042
RAC: 9,854
Message 2274 - Posted: 11 Mar 2024, 18:31:05 UTC - in response to Message 2273.  

I have over 130 wus to report and some news will be welcome during these down
Well, I have about 1100 WUs being added to "pending validation" and have yet to see a single new one being downloaded after uploading/reporting.

Some WUs, in the single numbers, are apparently being validated as they are reported in the last couple of hours, but it doesn't look any catch-up on those crunched during the outage has happened yet.
It's now past business hours in Spain, so I think this will be the observed status for the next 12-14h...


Ralf
ID: 2274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg_BE

Send message
Joined: 2 Aug 22
Posts: 38
Credit: 851,958
RAC: 2,523
Message 2275 - Posted: 12 Mar 2024, 7:23:58 UTC

What is going on?
You go down on the weekend, come back up for a bit and then crash again in the evening of Monday.

You do not put any news out in a broadcast message or a post here....so what is going on?

As to the comment about WCG....yeah no kidding. I am thinking of dropping that. Waste of time now.
ID: 2275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 9 Apr 15
Posts: 11
Credit: 328,091
RAC: 106
Message 2276 - Posted: 12 Mar 2024, 7:56:25 UTC
Last modified: 12 Mar 2024, 8:02:20 UTC

Still massive issues with the server.
At my end, the last valid task dates to around 10 am two days ago. Since then mutiple machines finished tasks one by one on the usual 24/7 basis - most of which first get stuck for hours in malfunctinal server upload attemps, finally upload and end up in the pending queue.
No stats increase since a while - it is plain fishy...

Michael.

P.S.: The server also has cancelled some recently delievered but not yet started tasks. Makes the planning of non-idle crunching difficult to say the least - given the fact that new tasks are handed out only occasionally.
President of Rechenkraft.net, Principal Investigator of the RNA World distributed supercomputer.

ID: 2276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 9 Apr 15
Posts: 171
Credit: 1,409,776
RAC: 1,397
Message 2280 - Posted: 12 Mar 2024, 14:07:45 UTC - in response to Message 2274.  

I have over 130 wus to report and some news will be welcome during these down
Well, I have about 1100 WUs being added to "pending validation" and have yet to see a single new one being downloaded after uploading/reporting.


Yes, over 150 wus still "pending validation"
Before or later will be validated....
ID: 2280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg_BE

Send message
Joined: 2 Aug 22
Posts: 38
Credit: 851,958
RAC: 2,523
Message 2281 - Posted: 12 Mar 2024, 18:51:45 UTC
Last modified: 12 Mar 2024, 18:52:05 UTC

Still issues on the account side. Very slow to respond.
Since I share my resources with a bunch of other projects, I have only 261 pending validation, but the wingmen also are waiting. Workunits waiting for validation 114234

And it looks like we will be out of work soon if he doesn't release new work.
Tasks ready to send 10


And then:
Tasks in progress 40469
ID: 2281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg_BE

Send message
Joined: 2 Aug 22
Posts: 38
Credit: 851,958
RAC: 2,523
Message 2282 - Posted: 12 Mar 2024, 18:55:56 UTC - in response to Message 2269.  

on WCG...did you read their tweet from last week?
Basically on a wild guess it will be the end of the month or later before they are back.
ID: 2282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TPCBF

Send message
Joined: 11 Oct 23
Posts: 24
Credit: 3,008,042
RAC: 9,854
Message 2283 - Posted: 12 Mar 2024, 20:05:38 UTC - in response to Message 2282.  

on WCG...did you read their tweet from last week?
Basically on a wild guess it will be the end of the month or later before they are back.
Not quite sure to whom and what you are referring to...
ID: 2283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sgt.Joe

Send message
Joined: 5 Aug 22
Posts: 3
Credit: 1,924,847
RAC: 7,984
Message 2284 - Posted: 12 Mar 2024, 21:16:48 UTC - in response to Message 2283.  

I am getting a sporadic supply of work units, but that is pretty much expected from this project. At least they have a server status page. The supply from WCG is adequate for the moment. At least they seem to have a handle on the assimilation side. I have seen my numbers go down about 10,000 or so. They are making progress.

I have about 300 awaiting validation here which is a little bit higher than normal after a bit crunching, but I have seen the number slowly decreasing, probably until they dump another big batch into the system.

Cheers
ID: 2284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nealburns5

Send message
Joined: 9 Apr 23
Posts: 1
Credit: 10,206,409
RAC: 71,267
Message 2285 - Posted: 13 Mar 2024, 1:04:37 UTC

I have a big lump of tasks waiting for validation. It's about 3x as many as the usual peak. Are these going to get validated?

TY,
Neal
ID: 2285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TPCBF

Send message
Joined: 11 Oct 23
Posts: 24
Credit: 3,008,042
RAC: 9,854
Message 2286 - Posted: 13 Mar 2024, 2:13:07 UTC - in response to Message 2284.  

I am getting a sporadic supply of work units, but that is pretty much expected from this project. At least they have a server status page. The supply from WCG is adequate for the moment. At least they seem to have a handle on the assimilation side. I have seen my numbers go down about 10,000 or so. They are making progress.

I have about 300 awaiting validation here which is a little bit higher than normal after a bit crunching, but I have seen the number slowly decreasing, probably until they dump another big batch into the system.

Cheers
I have +1500 WUs waiting to be validated, and that number is slowly increasing still, it seems only 1 our of 2 or 3 of those "sporadic" new WUs are being quickly validated.
And that status server page is of little help right now, as it is very infrequently updated. It shows 18 WUs "ready to send" for hours now and I have gotten probably more than that just by those 8 or 9 hosts that have DENIS enabled...

As for WCG, yeah, right now the number of valid WUs waiting to be purged as dropped steadily in the last few days, so much that I can actually refresh the Result page one time only, instead of getting repeatedly/for hours those annoying progress bars with nothing to show for. Overall, I would have a different definition of "progress", we are more catching up to the point where we used to be back in October of last year...

Ralf
ID: 2286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TPCBF

Send message
Joined: 11 Oct 23
Posts: 24
Credit: 3,008,042
RAC: 9,854
Message 2287 - Posted: 13 Mar 2024, 4:57:38 UTC - in response to Message 2286.  

Looks like validating WUs has picked up a bit speed, there are now almost 200 validated in the last 2h....
ID: 2287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg_BE

Send message
Joined: 2 Aug 22
Posts: 38
Credit: 851,958
RAC: 2,523
Message 2288 - Posted: 13 Mar 2024, 7:32:39 UTC - in response to Message 2283.  

on WCG...did you read their tweet from last week?
Basically on a wild guess it will be the end of the month or later before they are back.
Not quite sure to whom and what you are referring to...



Doesn't matter. Someone had that as part of their post, I didn't look at who specifically.
But that is irrelevant to this project.
ID: 2288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sgt.Joe

Send message
Joined: 5 Aug 22
Posts: 3
Credit: 1,924,847
RAC: 7,984
Message 2295 - Posted: 14 Mar 2024, 2:27:41 UTC - in response to Message 2288.  

Doesn't matter. Someone had that as part of their post, I didn't look at who specifically.
But that is irrelevant to this project.


It is slightly relevant to this project if only for the fact that when WCG is down or spotty, more resources come to a project like this, at least for some time.

Cheers
ID: 2295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Greg_BE

Send message
Joined: 2 Aug 22
Posts: 38
Credit: 851,958
RAC: 2,523
Message 2302 - Posted: 15 Mar 2024, 20:41:25 UTC - in response to Message 2295.  

Doesn't matter. Someone had that as part of their post, I didn't look at who specifically.
But that is irrelevant to this project.


It is slightly relevant to this project if only for the fact that when WCG is down or spotty, more resources come to a project like this, at least for some time.

Cheers



Well I started over at Baker Lab but they moved most of their work into AI systems in their lab and its sporadic what they do send out for BOINC. Looks like they are back now with some work. They used to be like here, lots of work, lots of information, but that all died. So its refreshing to see a new and excited team working on a project.
ID: 2302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Database error