Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: pending verification |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 14
|
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Have been running CE2 tasks without problem until I got the following message:
"Killing job because cpu time has been exceeded" Curious why it was terminated. Deadline for it was not until next month. I do leave tasks in memory when suspended (normally when tasks cycle). Did I just waste 12 hours of CPU time (more if you look at the actual time, not set back to last checkpoint)? |
||
|
captainjack
Advanced Cruncher Joined: Apr 14, 2008 Post Count: 140 Status: Offline Project Badges: |
The clean energy project has a 12 hour limit on tasks. If the task finishes in less time, hooray. If the task is still running at 12 hours, the software stops it and the intermediate results are sent back to the scientists. The scientists are supposed to be able to tell whether the experiment is worth pursuing by looking at the intermediate results. No time was wasted.
Keep on crunchin' |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2955 Status: Offline Project Badges: |
At one time, there was talk of giving the users an option to 'up' this cut-off limit to 24 hours - although I haven't seen anything of that suggestion for a very long time now...
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not going to happen. What the future upgrade of the science app brings we'll learn when it's beta time... as always, will likely come with hours notice in the Beta forums.
|
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2955 Status: Offline Project Badges: |
Not going to happen SekeRob, I did tend to realise that it wasn't going to happen - as, after all, it must be at least 12, if not 18 months ago since I last saw anything of that idea. |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: |
We are working on upgrading to a new version of QCHEM now and will make some changes to the workunits when that version is released.
However, for complex reasons that I encourage the Harvard researchers to describe fully (I would explain my understanding but I would likely introduce some misinformation), the information created by running one workunit for 24 hours is less valuable than running 2 workunits for 12 hours each. As a result, it is not necessarily in the interests of advancing the science behind the project for us to change that limit. However, what Harvard has told us is that would advance their project is that they have some jobs that they will not currently send us and have been running on powerful grids. This is because they require more RAM, more CPU power, more IO and more bandwidth. However, we looked at the higher end devices connected to us and they could certainly run these jobs. When we role out the new version of QCHEM we plan to provide a mechanism for people to sign up to receive these challenging jobs and allow Harvard to run more of these high end jobs. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Weel, trying again with 2 WU's. No other WU's running from any project on my box which is ONLY running boinc and not used for anything else. Up to almost 5 hours of CPU time (same on RUN time) and both WU's haven't checkpointed in over 4 & 1/2 hours! CPU time at last checkpoint was right around 24 minutes.
How often is it supposed to checkpoint?!? Every 4 hours? 5 hours? More? If this is normal for CE2 WU's then seems that information should be included on the "requirements for projects" page so users can either opt out or increase the "switch between applications time". |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There's -no- regular interval between checkpoints of which there are a maximum of 16, of which #3 and #12 take the longest to reach... depending on device power from 3 to 6 hours, some devices don't even get to checkpoint #3 before the 12 hours cut-off is there.
----------------------------------------It is recommended, strongly, to -not- allow multiple CEP2 tasks start at exactly the same moment. This so badly compounds IO bottle-necking, that even the first checkpoint/setup phase can take very long. There's a checkpoint Start Here FAQ which discusses the particulars for all the different sciences at WCG. CEP2 is opt-in, not opt-out. A special configuration / recommendation page was compiled by the CEP2 scientists. Switch time you don't have to change. All BOINC clients since about v6 do not switch project unless a checkpoint has been made. It is on the recommendation sheet to activate LAIM (Leave application in memory when suspended) when running CEP2, so when they are interrupted, they can resume from where they left off. edit: Obviously, if BOINC is restarted [unloaded from memory] or the system booted, the task(s) resumes from last checkpoint. [Edit 1 times, last edit by Former Member at May 26, 2013 5:26:04 PM] |
||
|
Randzo
Senior Cruncher Slovakia Joined: Jan 10, 2008 Post Count: 339 Status: Offline Project Badges: |
Thank you for the update Knreed.
Give us these challenging jobs, we like challenges I will definitely opt-in for them. |
||
|
SuDu2
Cruncher Joined: Nov 13, 2013 Post Count: 9 Status: Offline Project Badges: |
I think I have data files setting in my Home file (Ubuntu) that need to be fetched. I am new to Ubuntu and WCG, so will need help. The files appear to be locked.
----------------------------------------[Edit 1 times, last edit by SuDu2 at Nov 17, 2013 5:14:02 PM] |
||
|
|