Jump to content
g__day

Pondering a workstation build to process astro images + occassionally play games

Recommended Posts

 

DASA is running a 2600k, which is a 4 core, 8 thread processor; with a modern SATA3 drive, and he hit IO bottlenecks without maxing out the CPU.

We're talking double the cores\threads, and still overclockable.

 

As for RAM, perhaps stick with the 32GB and forget the RAM disk, PCI-E is plenty fast.

not for a while now

went to 3770k and now 6700k@4.7GHz

 

yeah 32-64g ram seems to be the go using a software ramdrive

from everything i have seen a older multi cpu system may not be to bad for the image processing side of the build

im curious how a multi cpu system with quad channel ram to each cpu handles a ram drive does it share bandwidth or only draw on the ram for one cpu? may have to google it

 

 

i was going by your signature :P

Share this post


Link to post
Share on other sites

hmm fair enough thought i had updated that

fixed

Edited by Dasa

Share this post


Link to post
Share on other sites

Nice!!!

 

Well my point stands. Your example is still a 4\8 CPU, while my 'build' is a 8\16 CPU. If he's worried about the multithreaded nature of the program, I think you've basically proven there's no worry untill you sort out IO, and even then, for how long?

I just think server gear is overkill, expensive, and really useless for this task.

Share this post


Link to post
Share on other sites

dam memory bandwidth on new dual cpu systems is insane

http://www.tweaktown.com/reviews/6866/asus-z10pe-d8-ws-dual-cpu-intel-c612-workstation-motherboard-review/index6.html

6866_43_asus_z10pe_d8_ws_dual_cpu_intel_

vs single cpu http://www.tweaktown.com/reviews/7725/intel-broadwell-core-i7-6950x-10-extreme-edition-cpu-review/index5.html

7725_27_intel-broadwell-core-i7-6950x-10

 

mind you my overclocked dual channel system is competing with that 2133 quad channel bandwidth but with a much lower latency so a overclocked quad channel system would do ok to

d495304d_Untitled.jpeg

  • Like 1

Share this post


Link to post
Share on other sites

Exactly!

I'm not stupid enough to claim server hardware is useless, double up the.... everything..... and you have double the pathways to send things along.

 

But in this instance, I feel its pure overkill. You'll spend twice as much, which will mean you'll have to wait the same "far too long" to upgrade in the future!

Share this post


Link to post
Share on other sites

There are a lot of great points in people insights here. server boards are built for throughput and reliability - but modern gaming boards have excellent reliability too!

 

Dual CPU server boards require double the investment in CPUs (only a non issue thanks to going for a song older Xeon CPUs). Dual CPU server boards require special PSU with extra main board connectors and possibly extenders depending on case design. These boards need large, very well ventilated cases. The high end motherboards aren't cheap the ASUS Z10PE-D16 might cost $900. Note it does offer a full eight PCIE 16x lanes and very high speed, high bandwidth memory lanes. And yes it supports M.2 for 10 Gbps connectivity speeds - if anything can deliver that!

 

A lot of general software and games aren't coded to utilise multi core hardware well. However many astronomy programs appear to buck this trend. Deep Sky Stacker and CCD Stacker are full dual CPU optimised and Pixel Insight I believe is too - but apparently PI is rather I/O swap space speed constrained and requests one to run dual swap files across two or more independent disks (this it reports improves performance by 40%)! So it's an over simplification to think that astronomy processing programs can not take great advantages of dual CPU server boards. I still have to research how Sequence Generator Pro, Nebulosity, MaximDL, Photoshop CS and Registar perform - but these programs overlap each other greatly; if even one of them works well that is a huge boost!

 

Once you have stacked all the registered frames processing operations that should be responsive can still take 45 seconds to redraw the screen for even simple changes.

 

A modern high clocked CPU with multiple cores will always do well across the board for most software.

 

I didn't follow how folks think you can get directx 12 support if you aren't on a board certified for Windows 10 Pro.

 

Speaking to the developer of CCDStack direct compute won't be available to help in the near future for several reasons, including the cards don't have enough RAM and there is a high overload in moving the images in and out of video ram for the limited operations a GPU can do.

 

So what I am also researching is exactly how much faster a dual multi core multi CPU Xeon is compared to a high end I7. It is very interesting topic!

Edited by g__day

Share this post


Link to post
Share on other sites

So what I am also researching is exactly how much faster a dual multi core multi CPU Xeon is compared to a high end I7. It is very interesting topic!

 

You'll find it's negligible.

DASA is more clued up on back to back benchmarks than I am, so listen to him if he disagrees with me, but in reality, a Xenon chip is basically an i7, with an ECC memory controller, and some more cache.

 

As for your software,

Depending how 'up to date' and IT-nerd-specific the programmers\marketing department is, "able to utilize Dual CPUs" can EASILY mean "Able to use Dual Core". In Windows, CORE == CPU (not for licensing, but for software).

 

Regardless, if you go with an ASUS board, they're usually REALLY good at extending CPU support to odd things; so chances are, once they're cheap, the 10core\20thread XEON chip will drop into your "desktop" board anyway.

 

At the end of the day, what you're trying to do is compute things FAST. Graphics, Physics (well, math) and Data, basically, you're trying to 'game' without playing a game. The requirements are much the same.

As such, the same smart buying rules apply.

Minor upgrading in 2 years, will always result in better performance than overkilling for today.

 

If you spend $8k on a server setup, that's it for probably, what, 5 years at least? Lets say new sensors or special telescopes come out and take even BETTER images again, or new software emerges. You have 0 budget for upgrades.

If however, you spend $4k now, and an additional $2k in 2 years, not only is it cheaper, but you'll have a more powerful system for longer.

 

Your money, ans I don't mean to be nearly as forceful as I'm sounding, I'm just trying hard to stop you making a mistake IMO.

 

From experience, one of my clients (architect) has learnt that even his $10k architect software, which is "fully multi core enabled" only touches his last 4 threads when windows makes it.

You can watch it happen. 1~4 ramp up instantly when rendering, but 4~8 only start seeing SOME load when windows starts handing tasks. Basically no software is good enough to calculate how much to throw at say, 20 cores..... and that makes logical sense. You'd have to be able to predict the future to know what tasks to send to which, absed on predicted time to complete, priority, and what data set it needs to interact with (in the case of Bulldozer CPU's with separate cache).

Edited by Master_Scythe
  • Like 1

Share this post


Link to post
Share on other sites

unfortunately i have never had the chance to use more than a quad core 8 thread cpu so my experience is limited in that regard

 

rather than running one big expensive dual cpu system for processing how would it be to run a few 1-2k systems?

zen would probably be great for this option

maybe build a zen system for your main uses when its out then if you want some more processing power look at a 32 core naples later in the year running both at once should really crunch some data and may even fit under the 6k budget

  • Like 1

Share this post


Link to post
Share on other sites

Still soaking all this up - but my grad degree was on parallel processing hardware and software designs - so I am very careful on the questions I ask - and yes - it's mutli core, mutli processor capable (how capable is still under investigation). The whole segue started from the observation that slower, high core, high cache Xeon chips can be found very cheaply. That is not a thorough analysis of risks, costs and benefits by any means! I have other astronomers trying to get data points too.

Share this post


Link to post
Share on other sites

unfortunately i have never had the chance to use more than a quad core 8 thread cpu so my experience is limited in that regard

 

rather than running one big expensive dual cpu system for processing how would it be to run a few 1-2k systems?

zen would probably be great for this option

maybe build a zen system for your main uses when its out then if you want some more processing power look at a 32 core naples later in the year running both at once should really crunch some data and may even fit under the 6k budget

 

I'm running a 6 core 12 thread 2011-3 system, and I'm yet to find a single software that actually ramps up all threads, unless it launches itself multiple times.

 

One such example is batch MP3 converter.

I wanted a few thousand MP3's dropped to 160kbps for car use (road noise hides quality).

I selected "12 threads", and bam, it did 12 MP3 tracks at a time.

Was rather cool actually, and the first time I've EVER seen software use all the threads properly.

 

That said, in task manager, there were 12X copies of the software running.... So even then it's not actually multi core aware, just able to use affinity assigning properly.

 

 

I've seen all 12 threads get use, don't misunderstand, but it's always a tell-tale sign its the Kernel doing it, and not the software (obvious because X threads jump to 100%, then come down a bit as the remaining Y threads take some load)

Edited by Master_Scythe
  • Like 1

Share this post


Link to post
Share on other sites

I would be really interested in what you observe with thread utilisation under the freeware DeepSkyStacker if it tried to stack even say ten copies of that image above under all default settings. DSS is about 12 mb and free to download. You simple say load light frames - point to say ten copies of the above image, select them all any then press stack and register and accept all the suggested default parameters.

 

That should be the simplest usage. It uses all four cores of my old quad core CPU and all eight threads of my sons rigs. A simple stack of an identical picture should take one to three minutes to process - be very interested to see what I/O ram and cpu usage folks observed during the run and what the total time of the run is. If you feel adventourus afterward when it displays the final image just move any of the sliders colour or better still brightness six sliders around then press the process button and see how quickly it re draws he final image. If it's less than thirty seconds that would be pretty good!

 

If I wasn't stuck in Cooma for work I would be doing a lot more research. I find it really gratifying and helpful reading all the testing and analysis you guys have done just cause I have asked for help - many thanks!

Share this post


Link to post
Share on other sites

maybe im doing something wrong but its seems to be registering each image individually taking ~1:17 each

im not sure what image your using so i went with this one i suspect there could be a fair variance in time taken to process from one image to the next

http://wallpapersafari.com/w/gzQwOk/

 

there isnt really any io usage while registering the one image just a single core is pegged

 

adjusting the 6 brightness then clicking apply completed in 15 seconds while only using ~8% of the cpu no noticeable io

Edited by Dasa

Share this post


Link to post
Share on other sites

A couple of random thoughts from a recent PC build I just completed...

 

This PC was mainly for the Adobe suite of products (picture & video), so multi-core was important as was the quantity of RAM and I/O throughput.

 

The CPU was an i7 6850K is a 6 core processor, so not as many cores as you possibly want. However, the interesting bit was that we achieved a 27% overclock (using the Asus overclocking tools) with the Corsair Hydro H60SE liquid cooler attached.

 

This cooler was very easy to install and was very good at keeping the temps down or when they did get high bringing them down quickly. So you may want to look at some form of liquid cooling to get the most out of your CPU.

 

This box has 64Gb of ram which Adobe loves so no drama there.

 

As for storage, we went with a 3 tiered approach. I'm still not sure if this is the right way to go about it for Adobe, but the results were great.

 

The primary OS & program disk is a Samsung 950 PRO NVMe M.2 installed in a dedicated M.2 slot (PCIe x4). The Adobe 'scratch' drive was a Samsung SATA SSD and the spinny drives were WD blacks.

 

I've had a number of SSDs from OCZ and Samsung for quite a while and they are fantastic, but, these M.2 beasts take it to another level and worth thinking about.

 

It's by far the fastest thing I've ever built, and the first time I've played with any form of liquid cooling or M.2 storage but I can say I was impressed...

  • Like 1

Share this post


Link to post
Share on other sites

Sounds like a good build.

 

I'd have used RED drives for storage, and a PCI SSD over a normal one for a dedicated 'scratch', but otherwise, makes total sense.

Share this post


Link to post
Share on other sites

i wouldnt bother with aio there expensive more prone to failure and loud for the level of performance they provide until you get to the expensive ones made by swiftech\ek while still prone to failure they do manage a decent performance to noise ratio

this is assuming you have a case with half decent airflow obviously aio have a advantage being able to draw on fresh air in cases with crap airflow

http://www.relaxedtech.com/reviews/noctua/nh-d15-versus-closed-loop-liquid-coolers/2

temp-load.jpg

noise-load.jpg

http://www.anandtech.com/show/5054/corsair-hydro-series-h60-h80-and-h100-reviewed/6

42199.png

Edited by Dasa

Share this post


Link to post
Share on other sites

i wouldnt bother with aio there expensive more prone to failure and loud for the level of performance they provide

Can you point me to some failure rate info?

 

The main reason I haven't looked at this myself is because if liquid cooling fails it can do more damage.

 

As for noise & performance, from what I've read it's only the very top end air solutions that stack up and, as I've experienced, they are very dependent on mobo layout.

  • Like 1

Share this post


Link to post
Share on other sites

Mac Dude - on the storage side I would say you got that build right - pretty much what I was considering!

  • Like 1

Share this post


Link to post
Share on other sites

 

i wouldnt bother with aio there expensive more prone to failure and loud for the level of performance they provide

Can you point me to some failure rate info?

 

The main reason I haven't looked at this myself is because if liquid cooling fails it can do more damage.

 

As for noise & performance, from what I've read it's only the very top end air solutions that stack up and, as I've experienced, they are very dependent on mobo layout.

 

no failure rate info other than seeing a number of reports on forums of dead pumps and leaks

corsair has a good 5 year warranty on many of its aio which may also cover damaged parts

but i have never seen a hsf die just fans which aio use too and it usually takes a long time for cheap sleeve bearing fans let alone high quality ball bearing fans to die

 

there is some half decent cheaper hsf like cryorig h7 that fit just about anything a aio will the performance may fall a little behind but there still in front on noise

 

big hsf like nh-d15 can block the first pci-e slot on some mb although the nh-d15s gets around this and if you combine tall ram sinks with side panels close to the hsf hight limit you may have to forgo the second fan

 

 

 

g__day any idea what i need to do different to get comparable results with all cores in use with DeepSkyStacker?

  • Like 1

Share this post


Link to post
Share on other sites

Dasa,

 

To be sure I understand your query re DSS correctly

 

1. If you are asking how to get performance data - you have to run the Windows Task Manager - DSS doesn't capture performance benchmark data

2. If you are asking are you using DSS correctly to create high load - I will be home Saturday night - will capture a workflow and my timings - but broadly I had described it from memory on the previous page, load one image multiple times - register and accept defaults - press go and monitor CPU / RAM / I/O usage and timings for 1-4 minutes, then do a simple adjust - rather than the colour sliders move the 6 sliders in the dim, mid and bright zones a bit left or right then press go and monitor again.

 

Cheers, Matt

Share this post


Link to post
Share on other sites

Dasa,

 

To be sure I understand your query re DSS correctly

2. If you are asking are you using DSS correctly to create high load - I will be home Saturday night - will capture a workflow and my timings - but broadly I had described it from memory on the previous page, load one image multiple times - register and accept defaults - press go and monitor CPU / RAM / I/O usage and timings for 1-4 minutes, then do a simple adjust - rather than the colour sliders move the 6 sliders in the dim, mid and bright zones a bit left or right then press go and monitor again.

 

Cheers, Matt

thats basically what i did i think

made 10 copies of the image and loaded them all

but it only used one thread and processed the same image one after the other rather than all at once

Share this post


Link to post
Share on other sites

It will operate that way - choose the best image by star count and sky quality assessment - then add them one at a time. the adding should be done in grids linked to the number of cores. Quad core I posit the image is sliced into four and one quarter given to each core, sixteen cores - each images is slicesd into sixteen bits etc...

Share this post


Link to post
Share on other sites

then add them one at a time. the adding should be done in grids

how? this is all i have been able to manage so far

 

dss.jpg

dss2.jpg

Edited by Dasa

Share this post


Link to post
Share on other sites

Wierd - when I try stacking just three frames all four cores go to 100% - and this is the same behaviour all my astronomer friends note.

 

The adding workload slicing across CPU cores has always been automatic - unless DSS has been set to have affinity = CPU0 - but I have never heard of that happening. I tried screen grabs udring star registering and star stacking - all gave CPU at 100% for all four physical cores. If you select the Luminance pane after the final image appears and move the slides and re-compute - that flogs all four of my CPUs too!

 

16716307_1369913569750475_58893225388621

 

16707624_1369897936418705_68330964125111

Edited by g__day

Share this post


Link to post
Share on other sites

nope its affinity gives it access to all cores and it jumps about between them but never using more than one at a time

afraid im not familiar enough with the software to work out why its behaving this way for me

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×