Server out of disk space
Message boards :
Number crunching :
Server out of disk space
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 16 Sep 22 Posts: 3 Credit: 1,216,904 RAC: 3,687 ![]() ![]() ![]() ![]() ![]() |
Hi Jesús, New issue: Sun Mar 30 14:59:54 2025 | DENIS@home | [error] Error reported by file upload server: Server is out of disk space Lots of work waiting for the team on Monday morning... ![]() |
Send message Joined: 9 Apr 15 Posts: 207 Credit: 1,573,789 RAC: 272 ![]() ![]() ![]() ![]() |
I think it's related with the problem of validator... |
Send message Joined: 6 Mar 23 Posts: 61 Credit: 2,404,686 RAC: 5,306 ![]() ![]() ![]() ![]() ![]() |
Problem seems fixed now. ![]() |
Send message Joined: 9 Apr 15 Posts: 207 Credit: 1,573,789 RAC: 272 ![]() ![]() ![]() ![]() |
Problem seems fixed now. But the validator is still blocked and in the server status page there is this message: set_cached_data(): can't open ../cache/f2/server_status.php_job_status |
Send message Joined: 6 Mar 23 Posts: 61 Credit: 2,404,686 RAC: 5,306 ![]() ![]() ![]() ![]() ![]() |
Out of disk space again ... Thu 17 Apr 2025 08:37:37 AM EDT | DENIS@home | Started upload of DENIS_Fiber_Beta_20250416030032669914_InitialTest_k_0-Test_16-conf_105_1_r1260245714_0 Thu 17 Apr 2025 08:37:37 AM EDT | DENIS@home | Started upload of DENIS_Fiber_Beta_20250416030032669914_InitialTest_k_0-Test_16-conf_163_0_r880802995_0 Thu 17 Apr 2025 08:37:39 AM EDT | DENIS@home | [error] Error reported by file upload server: Server is out of disk space Thu 17 Apr 2025 08:37:39 AM EDT | DENIS@home | [error] Error reported by file upload server: Server is out of disk space Thu 17 Apr 2025 08:37:39 AM EDT | DENIS@home | Temporarily failed upload of ![]() |
![]() Send message Joined: 5 Apr 25 Posts: 24 Credit: 57,807 RAC: 2,680 ![]() ![]() ![]() ![]() |
Curiously, the number of WUs waiting for validation dropped a bit. Was over 33k last time I checked, now it shows 32971. Maybe they paused the uploads to speed up validating older WUs? |
Send message Joined: 8 Jul 22 Posts: 36 Credit: 979,475 RAC: 698 ![]() ![]() ![]() ![]() |
Same, at 13:55 UTC one WU uploaded, rest got space error. Paul. |
Send message Joined: 8 Jul 22 Posts: 36 Credit: 979,475 RAC: 698 ![]() ![]() ![]() ![]() |
2 more uploaded but rest still getting space error. Paul. |
![]() Send message Joined: 5 Apr 25 Posts: 24 Credit: 57,807 RAC: 2,680 ![]() ![]() ![]() ![]() |
I found a couple of successful uploads in the log of one of my rigs about an hour ago. My theory is that the validator is slowly cruching through the backlog and when it frees up some space a few fresh WUs get to be uploaded. I still haven't gotten lucky with any valitated WUs since 01:56 UTC today. later edit: a few more successful uploads, but only on the same laptop as before |
Send message Joined: 14 Apr 22 Posts: 25 Credit: 11,536,298 RAC: 18,353 ![]() ![]() ![]() ![]() ![]() |
I have almost 300 waiting to upload and growing by the minute. My largest server isn't crunching Denis at the moment but will be in about an hour which will make the situation worse. Maybe the wise move is to stop generating work right now and deal with the server issues. Once they are cleared, then generate new work. Remember, these are beta tasks. |
![]() Send message Joined: 5 Apr 25 Posts: 24 Credit: 57,807 RAC: 2,680 ![]() ![]() ![]() ![]() |
Seems like DENIS will run out of tasks to send in a couple of hours so eventually things might start to get better even without any intervention from the team. Unfortunate that this happened just before Easter. Worst case, I'll allow my rigs to get CPU tasks for Einstein until things get sorted here. P.S. Sometimes retrying helps upload at least of a few of the WUs waiting. |
Send message Joined: 14 Apr 22 Posts: 25 Credit: 11,536,298 RAC: 18,353 ![]() ![]() ![]() ![]() ![]() |
Now have close to 600 WUs ready to upload. I have a nagging suspicion that A LOT of work is going to miss deadlines and a bunch of resends are going to issued resulting in a lot of wasted CPU time that could have used toward other projects. |
Send message Joined: 29 Sep 24 Posts: 9 Credit: 15,646 RAC: 1,346 ![]() ![]() |
Now have close to 600 WUs ready to upload. I have a nagging suspicion that A LOT of work is going to miss deadlines and a bunch of resends are going to issued resulting in a lot of wasted CPU time that could have used toward other projects. In defense of DENIS, the application is clearly labeled as a beta, so some issues with the application (and the system surrounding it) are to be expected. Those who are not interested in the beta testing process and who view it as "wasted CPU time" should not participate while it is in beta. |
Send message Joined: 17 Mar 23 Posts: 3 Credit: 592,367 RAC: 0 ![]() ![]() |
Now have close to 600 WUs ready to upload. I have a nagging suspicion that A LOT of work is going to miss deadlines and a bunch of resends are going to issued resulting in a lot of wasted CPU time that could have used toward other projects. If the work needs doing I do not consider it a waste. It all is part of testing things |
Send message Joined: 9 Apr 15 Posts: 207 Credit: 1,573,789 RAC: 272 ![]() ![]() ![]() ![]() |
If the work needs doing I do not consider it a waste. It all is part of testing things I'm not agree. I partecipated, in the past, to various beta boinc project (Ralph@Home, the same Denis@Home, etc), with app crash, validation errors, whatsoever and i reported it on the forums when i could... But if you cannot report wus (wrong or correct) you're doing nothing. Are you testing the space on the server disks?? |
![]() Send message Joined: 5 Apr 25 Posts: 24 Credit: 57,807 RAC: 2,680 ![]() ![]() ![]() ![]() |
Validation queue dropped nicely overnight, got 22 of my own tasks validated (they were reported around noon on the 16th). If I keep retrying at random intervals, sometimes a few WUs do get uploaded. |
Send message Joined: 14 Apr 22 Posts: 25 Credit: 11,536,298 RAC: 18,353 ![]() ![]() ![]() ![]() ![]() |
I'll respond with this, The APPLICATION is beta, not the infrastructure at the University. The infrastructure has been in place for many years and there hasn't been any indication that there has been significant changes that need to be tested and problems worked out. The problems we are seeing is infrastructure related and not an application issue. Several years ago, this same infrastructure had the same server disk space issues. When the application is listed as beta I would expect the WUs themselves to have problems from time to time but that isn't the case here. WUs are actually executing quite nicely. I understand that nothing is perfect but at least try to mitigate the consequences when problems arise such as stopping work generation until the disk space is cleared. |
Send message Joined: 3 Jan 25 Posts: 1 Credit: 62,879 RAC: 1,854 ![]() ![]() ![]() ![]() |
This is on the Front page....... ...................If the tests go well, we will start running simulations within this new project. For now, they are all functional tests. We will start with manageable simulation sizes and gradually increase the number based on how the server responds and our post-processing capacity. 26 Mar 2025, 9:09:40 UTC I think they are testing everything.The BOINC servers have not been here for a very long time. Larry |
Send message Joined: 14 Apr 22 Posts: 25 Credit: 11,536,298 RAC: 18,353 ![]() ![]() ![]() ![]() ![]() |
They have been there for 10 years (Project started in 2015). Here's my post from June 2023 for the same problem: https://denis.usj.es/denisathome/forum_thread.php?id=264#2117 |
![]() ![]() Send message Joined: 18 Mar 15 Posts: 317 Credit: 3,619,292 RAC: 17,757 ![]() ![]() ![]() ![]() ![]() |
Hello everyone. Let me give some explanations regarding this topic that I think will help to understand it better. When we mark an application as beta, in some cases it is not just the application; it may be that the type of simulations we are sending are very different from what we had until now. That is, we test both the application and the server's response to the new application. And yes, this makes us push it to the limit, and more than once we have gone overboard. These new fiber simulations generate larger files (we have to adjust what and how much we can save and select it well based on the tests). On the other hand, the biggest performance issues we are observing are in the database, and although we have several possibilities, it is not clear why it is going slower. When it takes longer to process and write to the database, the server can start to fill up, and there is a critical point where everything starts to get worse. In this testing phase, we are also analyzing this. As in previous occasions, we need you to be patient. The ultimate goal is to be able to make research, and you know that in this project that involves moments of downtime. There are other types of projects that can work continuously (for example, searching for prime numbers), but in our case, it is bursts of a large number of simulations. Best. Jesús. Jesús Carro Universidad San Jorge @InSilicoHeart |