𝕏

Server out of disk space

Message boards : Number crunching : Server out of disk space
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile GPV67
Avatar

Send message
Joined: 16 Sep 22
Posts: 3
Credit: 1,216,904
RAC: 3,687
Message 2656 - Posted: 30 Mar 2025, 13:05:29 UTC

Hi Jesús,

New issue:
Sun Mar 30 14:59:54 2025 | DENIS@home | [error] Error reported by file upload server: Server is out of disk space

Lots of work waiting for the team on Monday morning...
ID: 2656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 9 Apr 15
Posts: 207
Credit: 1,573,789
RAC: 272
Message 2658 - Posted: 30 Mar 2025, 14:47:58 UTC - in response to Message 2656.  

I think it's related with the problem of validator...
ID: 2658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 6 Mar 23
Posts: 61
Credit: 2,404,380
RAC: 5,328
Message 2664 - Posted: 30 Mar 2025, 19:33:54 UTC - in response to Message 2656.  

Problem seems fixed now.
ID: 2664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 9 Apr 15
Posts: 207
Credit: 1,573,789
RAC: 272
Message 2671 - Posted: 31 Mar 2025, 6:51:18 UTC - in response to Message 2664.  

Problem seems fixed now.


But the validator is still blocked and in the server status page there is this message:
set_cached_data(): can't open ../cache/f2/server_status.php_job_status
ID: 2671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 6 Mar 23
Posts: 61
Credit: 2,404,380
RAC: 5,328
Message 2887 - Posted: 17 Apr 2025, 12:43:58 UTC

Out of disk space again ...

Thu 17 Apr 2025 08:37:37 AM EDT | DENIS@home | Started upload of DENIS_Fiber_Beta_20250416030032669914_InitialTest_k_0-Test_16-conf_105_1_r1260245714_0
Thu 17 Apr 2025 08:37:37 AM EDT | DENIS@home | Started upload of DENIS_Fiber_Beta_20250416030032669914_InitialTest_k_0-Test_16-conf_163_0_r880802995_0
Thu 17 Apr 2025 08:37:39 AM EDT | DENIS@home | [error] Error reported by file upload server: Server is out of disk space
Thu 17 Apr 2025 08:37:39 AM EDT | DENIS@home | [error] Error reported by file upload server: Server is out of disk space
Thu 17 Apr 2025 08:37:39 AM EDT | DENIS@home | Temporarily failed upload of
ID: 2887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Lanius collurio

Send message
Joined: 5 Apr 25
Posts: 24
Credit: 57,807
RAC: 2,680
Message 2889 - Posted: 17 Apr 2025, 13:10:14 UTC

Curiously, the number of WUs waiting for validation dropped a bit. Was over 33k last time I checked, now it shows 32971. Maybe they paused the uploads to speed up validating older WUs?
ID: 2889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Paul

Send message
Joined: 8 Jul 22
Posts: 36
Credit: 979,475
RAC: 698
Message 2891 - Posted: 17 Apr 2025, 14:16:17 UTC - in response to Message 2887.  

Same, at 13:55 UTC one WU uploaded, rest got space error.
Paul.
ID: 2891 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Paul

Send message
Joined: 8 Jul 22
Posts: 36
Credit: 979,475
RAC: 698
Message 2894 - Posted: 17 Apr 2025, 16:07:35 UTC - in response to Message 2891.  

2 more uploaded but rest still getting space error.
Paul.
ID: 2894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Lanius collurio

Send message
Joined: 5 Apr 25
Posts: 24
Credit: 57,807
RAC: 2,680
Message 2895 - Posted: 17 Apr 2025, 16:13:16 UTC - in response to Message 2894.  
Last modified: 17 Apr 2025, 16:59:41 UTC

I found a couple of successful uploads in the log of one of my rigs about an hour ago.
My theory is that the validator is slowly cruching through the backlog and when it frees up some space a few fresh WUs get to be uploaded. I still haven't gotten lucky with any valitated WUs since 01:56 UTC today.

later edit: a few more successful uploads, but only on the same laptop as before
ID: 2895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 14 Apr 22
Posts: 25
Credit: 11,536,298
RAC: 18,353
Message 2897 - Posted: 17 Apr 2025, 20:46:06 UTC - in response to Message 2895.  
Last modified: 17 Apr 2025, 20:47:51 UTC

I have almost 300 waiting to upload and growing by the minute. My largest server isn't crunching Denis at the moment but will be in about an hour which will make the situation worse.

Maybe the wise move is to stop generating work right now and deal with the server issues. Once they are cleared, then generate new work. Remember, these are beta tasks.
ID: 2897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Lanius collurio

Send message
Joined: 5 Apr 25
Posts: 24
Credit: 57,807
RAC: 2,680
Message 2898 - Posted: 17 Apr 2025, 20:50:17 UTC - in response to Message 2897.  
Last modified: 17 Apr 2025, 20:51:32 UTC

Seems like DENIS will run out of tasks to send in a couple of hours so eventually things might start to get better even without any intervention from the team. Unfortunate that this happened just before Easter. Worst case, I'll allow my rigs to get CPU tasks for Einstein until things get sorted here.

P.S. Sometimes retrying helps upload at least of a few of the WUs waiting.
ID: 2898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 14 Apr 22
Posts: 25
Credit: 11,536,298
RAC: 18,353
Message 2903 - Posted: 18 Apr 2025, 1:11:31 UTC - in response to Message 2898.  

Now have close to 600 WUs ready to upload. I have a nagging suspicion that A LOT of work is going to miss deadlines and a bunch of resends are going to issued resulting in a lot of wasted CPU time that could have used toward other projects.
ID: 2903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
William Albert

Send message
Joined: 29 Sep 24
Posts: 9
Credit: 15,646
RAC: 1,346
Message 2904 - Posted: 18 Apr 2025, 2:43:51 UTC - in response to Message 2903.  

Now have close to 600 WUs ready to upload. I have a nagging suspicion that A LOT of work is going to miss deadlines and a bunch of resends are going to issued resulting in a lot of wasted CPU time that could have used toward other projects.


In defense of DENIS, the application is clearly labeled as a beta, so some issues with the application (and the system surrounding it) are to be expected.

Those who are not interested in the beta testing process and who view it as "wasted CPU time" should not participate while it is in beta.
ID: 2904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 17 Mar 23
Posts: 3
Credit: 592,367
RAC: 0
Message 2905 - Posted: 18 Apr 2025, 2:47:22 UTC - in response to Message 2903.  

Now have close to 600 WUs ready to upload. I have a nagging suspicion that A LOT of work is going to miss deadlines and a bunch of resends are going to issued resulting in a lot of wasted CPU time that could have used toward other projects.

If the work needs doing I do not consider it a waste. It all is part of testing things
ID: 2905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 9 Apr 15
Posts: 207
Credit: 1,573,789
RAC: 272
Message 2907 - Posted: 18 Apr 2025, 5:36:15 UTC - in response to Message 2905.  

If the work needs doing I do not consider it a waste. It all is part of testing things


I'm not agree.
I partecipated, in the past, to various beta boinc project (Ralph@Home, the same Denis@Home, etc), with app crash, validation errors, whatsoever and i reported it on the forums when i could... But if you cannot report wus (wrong or correct) you're doing nothing.
Are you testing the space on the server disks??
ID: 2907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Lanius collurio

Send message
Joined: 5 Apr 25
Posts: 24
Credit: 57,807
RAC: 2,680
Message 2908 - Posted: 18 Apr 2025, 6:22:10 UTC

Validation queue dropped nicely overnight, got 22 of my own tasks validated (they were reported around noon on the 16th). If I keep retrying at random intervals, sometimes a few WUs do get uploaded.
ID: 2908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 14 Apr 22
Posts: 25
Credit: 11,536,298
RAC: 18,353
Message 2909 - Posted: 18 Apr 2025, 13:20:16 UTC - in response to Message 2908.  

I'll respond with this,

The APPLICATION is beta, not the infrastructure at the University. The infrastructure has been in place for many years and there hasn't been any indication that there has been significant changes that need to be tested and problems worked out. The problems we are seeing is infrastructure related and not an application issue. Several years ago, this same infrastructure had the same server disk space issues. When the application is listed as beta I would expect the WUs themselves to have problems from time to time but that isn't the case here. WUs are actually executing quite nicely. I understand that nothing is perfect but at least try to mitigate the consequences when problems arise such as stopping work generation until the disk space is cleared.
ID: 2909 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Larry256

Send message
Joined: 3 Jan 25
Posts: 1
Credit: 62,879
RAC: 1,854
Message 2913 - Posted: 18 Apr 2025, 17:51:41 UTC

This is on the Front page.......

...................If the tests go well, we will start running simulations within this new project. For now, they are all functional tests. We will start with manageable simulation sizes and gradually increase the number based on how the server responds and our post-processing capacity.

26 Mar 2025, 9:09:40 UTC


I think they are testing everything.The BOINC servers have not been here for a very long time.


Larry
ID: 2913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
entity

Send message
Joined: 14 Apr 22
Posts: 25
Credit: 11,536,298
RAC: 18,353
Message 2914 - Posted: 18 Apr 2025, 18:01:11 UTC - in response to Message 2913.  

They have been there for 10 years (Project started in 2015).

Here's my post from June 2023 for the same problem:
https://denis.usj.es/denisathome/forum_thread.php?id=264#2117
ID: 2914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jesús Carro
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 18 Mar 15
Posts: 317
Credit: 3,619,292
RAC: 17,757
Message 2934 - Posted: 22 Apr 2025, 9:19:06 UTC - in response to Message 2914.  

Hello everyone. Let me give some explanations regarding this topic that I think will help to understand it better. When we mark an application as beta, in some cases it is not just the application; it may be that the type of simulations we are sending are very different from what we had until now. That is, we test both the application and the server's response to the new application. And yes, this makes us push it to the limit, and more than once we have gone overboard. These new fiber simulations generate larger files (we have to adjust what and how much we can save and select it well based on the tests). On the other hand, the biggest performance issues we are observing are in the database, and although we have several possibilities, it is not clear why it is going slower. When it takes longer to process and write to the database, the server can start to fill up, and there is a critical point where everything starts to get worse. In this testing phase, we are also analyzing this.

As in previous occasions, we need you to be patient. The ultimate goal is to be able to make research, and you know that in this project that involves moments of downtime. There are other types of projects that can work continuously (for example, searching for prime numbers), but in our case, it is bursts of a large number of simulations.

Best.
Jesús.
Jesús Carro
Universidad San Jorge
@InSilicoHeart
ID: 2934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Server out of disk space