Jump to content
Forum upgrade is live! Read more... ×
g__day

Pondering a workstation build to process astro images + occassionally play games

Recommended Posts

I sometimes imagine I have the most long lived equipment of anyone here. I must have built over 20 PCs since right back when Atomic first started (or even well before actually). My current rig is still powered by a Conroe2 Quad core 2.4 Ghz chip, powering a 1070 GTX - which can still play games fluidly on my 30" Dell screen (plus I use 2 * 24" Screens in Portrait mode surrounding the Dell - but only for work purposes) - my priorities have been elsewhere for many, many years.

 

But this year I would like to build a rig to handle astro-photography image processing workloads that can also handle games. Astro image processing can require stacking of 100s of shots per night - each shot could be 10 - 40MB - and there is a lot of mathematical processing (that no software that I know of off loads to the GPU). So I was pondering getting a 10 Core Intel beast and doing a rig built for Astro first and games second.... when a thought hit me that I first entertained a decade or two ago but never acted on... what about a dual CPU server board? Then a simple google and youtube search showed me several folk have gone down this path and got great results - at a fraction of the price I was expecting - simply because old Xeon chips sell for about $200 for a 8 core 2.4 GHz chip.

 

Now I only remember high end Tyan motherboards (like the S7070) being dual cpu - but a simple search shows many people are using the Asus Z10PE-d16 https://www.asus.com/Motherboards/Z10PED16_WS/ - which surprisingly supports DDR4 RAM (16 slots - up to a Terabyte of RAM) and has 6 PCI-E Gen3 x16 slots and can support NVidia in 3 way SLI or AMD in 4 way cross fire.

 

These boards sell in Australia for about $850 - or you might find dated rigs on Gumtree build on these boards (variant unknown) between $300 to $2000.

 

So a check on youtube and gumtree shows folks selling old Xeon CPUs that are 8 core, 16 thread, meaning this sort of build will give you a 2.4 - 2.6 GHz CPus with 16 cores and 32 threads to handle workloads that like many core solutions. I am yet to see if anyone astronomer has used the programs I like Deep Sky Stacker (freeware) or CCDStack on a multi core monster - but DSS does seem to keep all 4 core on my (or my kids latest i7 quad core machines) maxed out.

 

So my long winded question. What advice would folk have. I can guess that this rig could play games - but it would act much like a 4 core machine throttled at 2.4 - 2.8 Ghz even if it was running sat a pair of 1080s in SLI. I would have to check everything - OS (win10 x64 is supported) - memory types, Powersupplies, drivers - absolutely everything would work. Again there are folks building these rigs and they scream on 3d rendering that can use all the cores and they are decent at games (paired with a NVidia 980 or Titan card/s normally). Youtube has many examples of this working - e.g.:

 

 

 

 

Will it be the build of a life time or will it end in tears? Interested in folks thoughts!

Share this post


Link to post
Share on other sites

If you have the software that'll use it, then I suppose go for it.

The cost of new multi CPU gear though... somewhat high. And not to forget the law of diminishing returns as you add more cores to a machine with only so many paths to memory and IO devices.

 

I had the opportunity to buy a 6-core first-gen i7 Xeon variant a while back for $70, the LGA1366 motherboard I have would run it but it was only a mid 2 GHz CPU though all up would have offered more computational power than the Xeon W3550 I have in it now.

 

Another interesting opportunity is virtualisation, similarly to the videos you have there's also ones showing multi CPU beasts running 3 or more VMs with proper hardware segregation which we can now accomplish thanks to the architectural improvements like putting memory and IO controllers onboard the CPU and virtualisation features like VT-x and VT-d.

 

Before taking the leap, probably worth investigating if what you're needing to do has an alternative solution using loosely coupled distributed processing, ie workload shared among multiple physical computers with high speed network links. But in any case it should provide new computing adventures and be well within the spirit of what Atomic MPC stands for.

Share this post


Link to post
Share on other sites

Honestly, you're better off with a normal desktop system these days.

i7-6900K's will take a modest 10% overclock without even learning "how to overclock".

Leaving you with an 8 Core, 16 Thread CPU @ 3.5ghz.

If I can render an h.265 movie, while ripping a DVD, while playing DOTA, while listening to music on my 6core\12thread @4.8ghz, I'm sure you'll be fine processing a few hundred images.

 

Worst case? Server boards usually DO support desktop chips (just not the other way around).

So you buy a server board, another i7-6900K, and go 16 core 32 thread server.

 

If you ARE going to go server board though, just get a single 22core Xeon and get it over with.

 

Considering the lifespan of your last system, its going to have a fairly modest 'cost per year'.

 

 

One last thing,

I'd be surprised if your current systems main bottleneck isn't RAM and Storage.

Image processing, regardless of application has become a well optimized trivial thing, where a modern PC can edit a multiple hundred Mb RAW image as if it was a 120kbb Jpeg from back in the day....

 

Get yourself some PCI-E storage, at least as a scratch disk. Max out the ram, both speed and board capacity. Even better, if you have a UPS, make a Ram-Disk and work directly from there.

 

This could be fun, something designed to crunch big data and last a long time. Keep your budget realistic (high). Considering the length of time you'll own it.

Edited by Master_Scythe

Share this post


Link to post
Share on other sites

I have some time to plan this out - first call is posting to astronomers to see what rigs people have tried to address accelerating its specific workload. Please keep your ideas coming - loving the thoughts!

Share this post


Link to post
Share on other sites

e5 2670 is ~80% slower than 7700k in single threaded tasks and about equal in multithreaded while using a fair bit less power

but then two of them will obviously have a fair advantage in any software with good multi cpu support

http://cpu.userbenchmark.com/Compare/Intel-Xeon-E5-2670-vs-Intel-Core-i7-7700K/m18501vs3647

 

q6600 vs 7700k 412% faster at multithreaded tasks

http://cpu.userbenchmark.com/Compare/Intel-Core2-Quad-Q6600-vs-Intel-Core-i7-7700K/1980vs3647

so whatever you do its going to be a massive imporvement

Edited by Dasa

Share this post


Link to post
Share on other sites

An interim consideration is that the Q6600 overclocks reasonably well... just upping the FSB to 300 or 333 would give a worthwhile improvement and shouldn't stress the system too much given that later Core2s had the higher FSB anyway.

  • Like 1

Share this post


Link to post
Share on other sites

e5 2670 is ~80% slower than 7700k in single threaded tasks and about equal in multithreaded while using a fair bit less power

but then two of them will obviously have a fair advantage in any software with good multi cpu support

http://cpu.userbenchmark.com/Compare/Intel-Xeon-E5-2670-vs-Intel-Core-i7-7700K/m18501vs3647

 

q6600 vs 7700k 412% faster at multithreaded tasks

http://cpu.userbenchmark.com/Compare/Intel-Core2-Quad-Q6600-vs-Intel-Core-i7-7700K/1980vs3647

so whatever you do its going to be a massive imporvement

 

Exactly.

I still think the workload is being felt by the disks more than the CPU though.

Don't get me wrong, I'm sure the CPU's are pinned at 95%+ while working, but I bet you'd get the full 100% pin if you could just feed the images fast enough.

 

 

Here's a random throw together (minus GPU, as you said you have a 1070).

I'm sure there's a lot more tweaking that could be done, but this is just a basic example of what I'd use for what you've described.

 

Noctua NH-C12P SE14 CPU Cooler

$88.00

Antec One Gaming Case

$78.00

[Case and CPU cooler were chosen for high stress workloads, not because they're "gaming". The C12P over the U12P ESPECIALLY, since it cools the RAM if you orientate it correctly.]

 

Intel Core i7 5820K Haswell-E 6-Core LGA 2011-3 3.3GHz CPU Processor

$539.00

Gigabyte GA-X99P-SLI LGA2011 ATX

$359.00

[6 Core and 12 thread, what's not to love? We can go 8 core 16 thread, but it adds an entire thousand dollars..... Motherboard was chosen for the additional PCI-E lanes to add some PCI-E storage down the track]

 

 

Corsair 32GB (2 x 16GB) CMK32GX4M2B3000C15 DDR4 3000MHz Vengeance LPX Black

$299.00

Toshiba 960GB 7mm Q300 PC SSD

$355.00

[Lots of RAM at high speed, and lots of SSD storage, for obvious reasons]

 

Antec "EDGE" 550W 80 PLUS Gold, 100% Modular PSU. FDB LED Fan, 2x (6+2) PCI-E, 8x SATA, 3x Mo

$145.00

[80% for efficiency while you're hammering it overnight, and modular for cooling.]

 

Order Total

$1863.00

 

[if you had, say a $3000 budget, I'd jump to a 8/16 X series CPU, PCI-E Storage, and another 32GB of RAM]

Edited by Master_Scythe

Share this post


Link to post
Share on other sites

The new Ryzen AMD CPUs could be competitive on price if AMD get the pricing right. The new Ryzens do have "hyperthreading" as well.

 

That is actually a fair point.... But for this workload I wonder if it's worth the wait....

Share this post


Link to post
Share on other sites

 

The new Ryzen AMD CPUs could be competitive on price if AMD get the pricing right. The new Ryzens do have "hyperthreading" as well.

 

That is actually a fair point.... But for this workload I wonder if it's worth the wait....

 

 

True, depends on how much of a hurry is it needed.

Edited by Jeruselem

Share this post


Link to post
Share on other sites

No hurry - stuck in Cooma planning for a client for two months with occasional trips home!

 

Budget - $3000 sounds reasonable - but happy to go higher if needed. If I go above $6,000 I might have some serious explaining to do for my better half!

 

I am not sure yet the system is I/O bound - as the files are being ready from an SSD on both my own and my son's i5 7600 (3.5 GHz) with 16 GB RAM with an SSD - registering 4 files took about 3 minutes 20 seconds on my PC and 2 minutes 20 seconds on his from memory. I guess it would be easy to run a test from a RAM drive - just to be sure if its total memory / scratch file speed or I/O path sensitive - as the final output from 4 * 10MB RAW or FITS files will be a 100MB 32 bit TIFF. Processing this for saturation, Red / Green / Blue alignment, non linear intensity adjust can be slow - it took about 1 minute to update the screen when I moved the dark region of my image's intensity up about 5% - it divides the image into a matrix and works a square or four squares at a time. On my son's rig it is the same thing - just a tad faster!

Edited by g__day

Share this post


Link to post
Share on other sites

You should be able to use any resource monitor app you like to record a graph overnight.

Run it as hard as you can, and see where the bottleneck occurs.

 

You know it's possible the software isn't very well multithreaded, and you might be better off triggering 3 or 4 copies of it, rendering their own workload to see best performance.

I just can't logically see where a system would struggle on that type of photography, where they don't on 100MB-RAW images.

I mean, when you get lazy and batch 'fix' a CF card full of RAW photos, so easily 100, the limiting factor is always IO, and even then, it takes minutes not hours (on modern PCs)

 

I'm going to assume this software is far from freeware we can look at?

Edited by Master_Scythe

Share this post


Link to post
Share on other sites

If you want get rid of any I/O bottleneck, use the SSD directly connected to the motherboard (SATA 3 is too slow to support full speed of SSDs).

These are nvme ssds, but they do have different connector types.

 

This is the kinda of thing you'd probably want (yeah it's pricey)

https://www.pccasegear.com/products/37200?gclid=COfb5aeU6dECFYKWvQodB_UAuQ&gclsrc=aw.ds

Edited by Jeruselem

Share this post


Link to post
Share on other sites

http://www.umart.com.au/newsite/goods.php?id=32608

That'd do. And it'd be what I'd use as a scratch disk.

Would that be enough storage for 1 night?

 

I'd buy 2.~

1 x for the raw images you're trying to combine.

1 x for the output file.

Then script it's 'emptying' to some RAID1 WD-RED drives each day?

 

Or actually.... knock that down to 1 and use a big ram disk.

 

 

------------

 

Knowing tha $3k is 'OK' and you could stretch to 6, but dont want to. I've fallen somewhere in the middle.

 


[Antec One Gaming Case]
$78.00

[Antec "EDGE" 550W 80 PLUS Gold, 100% Modular PSU. FDB LED Fan, 2x (6+2) PCI-E, 8x SATA, 3x Mo]
$145.00

[Western Digital RED WD40EFRX RED NAS- 4TB/INTELLIPOWER/DDR2/3.5]
$225.00 X 2
$450.00

[Kingston HyperX Predator 480GB PCI-E SSD SHPM2280P2H/480G ]
$499.00

[intel Core i7 6900K Eight Core LGA 2011-3 3.2GHz Unlocked CPU Processor]
$1499.00

[Noctua NH-C12P SE14 CPU Cooler]
$88.00

 

[Gigabyte GA-X99P-SLI LGA2011 ATX]
$359.00
[Corsair Vengeance LED 64GB (4x16GB) DDR4 3000MHz Unbuffered 15-17-17-35 1.35V XMP 2.0 Blue LED]
$569.00

Order Total
$3687.00


It's basically identical to above, with these changes:

- 8 Core, 16 Thread CPU

- 64GB of RAM (32GB for a scratch disk?)

- 1x PCI-E based Storage device (assuming you can make a 32GB RAM disk, and that's enough space to scratch from)

- 2x WD RED 4TB drives, to RAID1 (to offload finished product).

 

If that baulks at high def image editing, even in the hundreds, I'll be damn shocked.

Being an unlocked CPU also, they'll easily hit 4GHZ without stability problems.

Edited by Master_Scythe
  • Like 1

Share this post


Link to post
Share on other sites

i did find a user on a forum that suggesting Deep Sky Stacker supports not only multiple cores but also multiple cpu

but there was no indication as to how much it benefits from it

 

maybe you could send me a few files and tell me how to process them see how they go on my 6700k

Edited by Dasa
  • Like 1

Share this post


Link to post
Share on other sites

Deep Sky Stacker is freeware about ten my to download. If you where to take any astronomy image and clone it twenty times then load all twenty copies as a light frame and press stack and register you see it do its thing. Only the final image change the light curves in the six zone to any percentages you like and you would see it do some heavy calculations.

 

As far as I am aware - confidence say 80% - DSS likes lots of cores - I am seeking confirmation of this from astronomers. I am also trying to confirm it can utilise many cores on multiple processors and not just the first. I will also see if I can contact the author to confirm this.

 

If I create a machine with huge amounts of ram say for instance 256gb I could create a 128GB RAM disk if this was cheaper than a 1 tTB M2 960 Pro.

 

More research will help define the best options. Interesting challenge!

Deepskystacker.free.fr is the download site 10MB and somewhere like Stevebb.com gives a simple tutorial on its use. What I haven't found is any advice on what hardware will optimally process its load.

On Cloudynight.com I found a post that said DSS is multi threaded and multi processor aware but may be only to address 4GB of RAM if it's 32 bit. So a RAM drive may be a good way forward too.

Share this post


Link to post
Share on other sites

Friends at Iceinspace.com.au have mentioned that one of the other major image processing software suites - PixelInsight - has a benchmark site http://pixinsight.com/benchmark/

 

A Xeon E5-2695 at 2.4GHz (12 cores) leads the benchmarks by a long way - from another Xeon and the I7 6700 is in third place with a score half that of the first place holder.

Share this post


Link to post
Share on other sites

Another thought - does your software allow multiple instances e.g. to simultaneously process 2 sets of images?

 

In such a case multicore would be very helpful.

I'm not sure you should overly worry about fast storage... buckets of Ram to allow caching everything would probably be way more beneficial.

Ramdisk - not sure. I tried one recently but it was pretty poor performing.

Share this post


Link to post
Share on other sites

Friends at Iceinspace.com.au have mentioned that one of the other major image processing software suites - PixelInsight - has a benchmark site http://pixinsight.com/benchmark/

 

A Xeon E5-2695 at 2.4GHz (12 cores) leads the benchmarks by a long way - from another Xeon and the I7 6700 is in third place with a score half that of the first place holder.

 

I'm going to try bechmark this when I get home tonight (4.8ghz OC, 6 core 12 thread).

There is NO reason a bloody Xeon should get to hold the crown these days.....

  • Like 1

Share this post


Link to post
Share on other sites

Rybags - the software not only would allow this in practicality you would do this for objects with a mixture or bright and dim objects - Orion Nebulae M42 is a clear example of needing to do this.

 

The core of the nebulae has several really bright stars - that if you image too long you blow out all the faint detail of the nebulae, whilst the outer ends of the Nebulae are tens of thousands of times fainter than the core and need longer exposures to capture their fine detail. So one would probably take 20 shots at 30 seconds, 20 at 60 seconds, 20 at 120 seconds, 20 at 240 and 20 at 360 seconds as an example. You could then use DSS multiple times to create seperate masters for your 30, 60, 120, 240 and 360 second shots. You would then combine these in either Photoshop CS using layers or in ImagePlus or Pixel Insight or MaximDL or Nebulosity etc. So this is non linear editting - boosting the signal or very dim things whilst not blowing out really bright things. You try rigorously to maintain colour fidelity, but you likely move some light from the deep infrared spectrum into the visible light sprectrum (often using an articifcal colour scheme like say the Hubble colour palette for the Crab Nebulae M1).

 

The really lengthy part of the process - crunching data to combine all the images (processing 20 lights with say their matching master darks, flat white, flat dark and bias frames) which involves registering (find 50 or so reference stars to align the whole image across the group), stacking (mathematically adding them after accounting for blats, darks and bias frames), averaging pixels, removing hot or cold pixels, then broadly honing the colour saturation and changing where the light and dark zones start and end (make too dim things brighter and make too bright things dimmer). If you are imaging in mono - then you have to do all the same for your Lumisonsity, Red, Green, Blue, Hydrogen Alpha, Oxygen III, Sulphur II - so potentially all the above 7 times over. Then you combine these 7 frames into masters per colour channel in Photoshop, then you stack and merge them through layers.

 

So it's bit complicated - but you can see that is potential 5 sets of data per each colour channel - so it typically 5 * 5 = 25 processing runs of 20 files each. If the stacking takes 30 seconds vs 330 seconds on just its first pass - it all adds up! Hence you want a quick machine! That two hours - per target, before you can even combine all your shots into a colour image of varying intensity is the bummer. I would love to bring that down to under 15 minutes.

 

Even on faint subjects like say the Triffid Nebulae M20 - there are about 3 zones or levels of brightness - stars in the central core, the immediate surrounds and the outer layers. So you might image the core for stars at 180 seconds, the central nebular for 360 seconds and the outer nebulae for 480 to 600 seconds. The more sensitive your gear - the shorter you images can be to get real detail. Stacking many of these images adds to fine detail.

Edited by g__day

Share this post


Link to post
Share on other sites

ok gave the benchmark a run

it seems both io and cpu performance are very important

first run down the bottom was on a samsung 830 64G ssd which spent more time at 100% load than the cpu

Sequential Read Speed 520MB/s
Sequential Write Speed 160MB/s
Random Read Speed 75K IOPS
it spent most the time waiting on the ssd with short spikes in cpu load
the other three runs were after i swapped to the sandisk ultra ii 960G
Seq. Read(up to)3 550 MB/s
Seq. Write(up to)3 500 MB/s
Rnd. Read (up to)3 95K IOPS
Rnd. Write (up to)3 79K IOPS
on this drive the cpu was under 100% load much more often but still dropping to 0% load a fair bit the ssd maxed out at 70% load but it seemed to be the limit while it was at 70%
maybe a ram drive would be a better option my file speeds seem to be well below the others on the benchmark site

pixinsight%20benchmark.jpg

 

EDit:

ok created a 8G ramdisk using primo ramdisk

it seems 5g wasnt big enough but thankfully 8g was because the software didnt support higher and i was running out of ram

with this the cpu was under constant load although not always 100%

if you are to do this you probably want about 32g ram

pixi%20benchmark.jpg

 

not sure why buy it seems my ram drive performance is well below other skylake test systems scoring ~3x as high

Edited by Dasa
  • Like 1

Share this post


Link to post
Share on other sites

Called it!

IO bottleneck!

 

Will a 32GB ram disk be enough for scratch?

If not, 2X PCI-E drives will be your only option, and they'll keep up!

Edited by Master_Scythe

Share this post


Link to post
Share on other sites

Many thanks for running that bench. A super sized RAM drive on a massive memory monster build is easily achieveable. I asked the author of CCDStack what he recommended and he said a lot of cores and very fast I/o with sufficient memory.

 

One other factor I have to consider is will server motherboards support Windows 10 Pro - else how Can I leverage directx 12 capable games when they emerge.

 

This is proving to be quite an interesting quest!

Share this post


Link to post
Share on other sites

Many thanks for running that bench. A super sized RAM drive on a massive memory monster build is easily achieveable. I asked the author of CCDStack what he recommended and he said a lot of cores and very fast I/o with sufficient memory.

 

One other factor I have to consider is will server motherboards support Windows 10 Pro - else how Can I leverage directx 12 capable games when they emerge.

 

This is proving to be quite an interesting quest!

 

Yes it will, but even the BEST programs, running a single instance, show diminishing returns with cores over 4.

Some are getting better and reaching 8 core, but nothing is fully utilizing the 16 threads of a modern i7 to their full potential.

(OK that's a lie, some back end server stuff is, but I highly doubt this will)...

 

Keep in mind, you DO need to sleep, so long as it gets all the work done WELL within your down-time, it's reaching the goals.

And if you're processing a single 'batch' while you wait, well 16 threads is enough to get it done in acceptable 'waiting time'. I'd almost guarantee it.

 

DASA is running a 2600k, which is a 4 core, 8 thread processor; with a modern SATA3 drive, and he hit IO bottlenecks without maxing out the CPU.

We're talking double the cores\threads, and still overclockable.

 

As for RAM, perhaps stick with the 32GB and forget the RAM disk, PCI-E is plenty fast.

 

I mean, OK, once you put PCI-E storage in the machine, the CPU may see more load, but unless you move to an unlimited budget, you're chasing bottlenecks forever.

SOMETHING is always going to be the weak link!

 

In a year or three, You can whack a 16core, 32 thread XEON in a desktop board too.

https://ark.intel.com/products/81061/Intel-Xeon-Processor-E5-2699-v3-45M-Cache-2_30-GHz

Just at the moment the price is bullshit, so you don't.... but most 2011-3 motherboards support desktop AND server chips (even if they don't list them).

 

While I'm sure DASA is going to tweak my 'quickly thrown together' build on the previous page, It's exactly what I think you should be going for a few reasons:

 

1. Reliability.

While 'Server' boards USED to be better, I think desktop\gaming boards have taken over.

Icurrently manage 150 sites with local servers. The number of issues these experience compared to my mates old 150PC LAN gaming center, I'd say it's certainly swung the other way.

With solid state caps, 'double thickness copper', and heatsinked VRM's, and various other tech they throw on modern gaming boards, I'm actually of the opinion reliability has gone the OTHER way now.

Server boards are picky picky picky.

Fail a fan it expects to be there, and no boot. Break a RAID, no boot. And really, their range is limited.

 

2. Speed

As above, that limited range means its rare to find Multi 8/16X PCI-E Slots you're going to need for PCI-E storage. And I dont think I've seen a server board with NVM\M.2 slots at all.

Not to mention, any slight overclocking is harder if it's even possible, on a server board.

 

3. Price.

You used to pay for the 'stability' of a server board. As above, I'm well of the opinion that 'gaming boards' are now required to stay up during stressed loads, because there could be a Million dollar prize on the line, stability is stress tested about the same between the two, so what on earth are you paying for?

 

4. Power

Desktop gear uses a fair bit less than Server gear, especially since you seem so keen on running dual CPU for some reason.

IMO the use for dual CPU is to have dedicated tasks; so if you intend to calculate your pictures WHILE you game a current A+ title? OK it might be useful.

But if you intend to play some oldschool-game, while you wait for the render, just steal a thread or a core, and play away. no need for a dedicated CPU.

 

Anyway, to answer your question directly, yes.

Windows 10 will run on server hardware just fine.

 

 

I take a while to get to the point eh? :P

Edited by Master_Scythe

Share this post


Link to post
Share on other sites

DASA is running a 2600k, which is a 4 core, 8 thread processor; with a modern SATA3 drive, and he hit IO bottlenecks without maxing out the CPU.

We're talking double the cores\threads, and still overclockable.

 

As for RAM, perhaps stick with the 32GB and forget the RAM disk, PCI-E is plenty fast.

not for a while now

went to 3770k and now 6700k@4.7GHz

 

yeah 32-64g ram seems to be the go using a software ramdrive

from everything i have seen a older multi cpu system may not be to bad for the image processing side of the build

im curious how a multi cpu system with quad channel ram to each cpu handles a ram drive does it share bandwidth or only draw on the ram for one cpu? may have to google it

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×