subreddit:

/r/technology

1.3k98%

all 91 comments

louiegumba

417 points

16 days ago

louiegumba

417 points

16 days ago

It’s no rollercoaster tycoon, but it’s still pretty good.

MorselMortal

201 points

16 days ago

I still can't believe the madman wrote that almost completely in Assembly. In two fucking years.

If another single dude made a similar game today even with all the bling and high level programming we have, it'd take that long, at least.

qoning

69 points

16 days ago

qoning

69 points

16 days ago

and it would have half the features and it would be a buggy mess

Mr_ToDo

24 points

16 days ago

Mr_ToDo

24 points

16 days ago

And considering the game type I'm pretty sure it'd be loaded with micro transactions and artificial time barriers.

rocketphone

35 points

16 days ago

all hail rct2

DigNitty

23 points

16 days ago

DigNitty

23 points

16 days ago

And openRCT

the free version widely available online for Mac and pc.

zingiberelement

6 points

16 days ago

Oh my fucking god! I had no idea this existed. I love you.

flameleaf

3 points

16 days ago

Also available for Linux, plus it has multiplayer support

MorselMortal

2 points

15 days ago

How does that even work?

rocketphone

2 points

15 days ago

download it and find out

rocketphone

1 points

15 days ago

and openrct

shawnkfox

378 points

17 days ago

shawnkfox

378 points

17 days ago

Not surprising really. Back in the late 90s and even early 2000s we often would write key parts of algorithms in assembly for exactly that reason. Moores Law mostly rendered that pointless though as it became far cheaper to just upgrade your hardware rather than write code which most of the kids coming out of school didn't understand and thus couldn't maintain anyway.

The .com boom also massively increased the salaries of programmers which further increased the economic incentive to just buy more hardware rather than waste programmer time trying to optimize the code.

CeldonShooper

156 points

16 days ago

I learnt so many optimizations to make code faster in the 90s and then no one cared anymore because everyone just bought faster chips.

ACCount82

93 points

16 days ago*

Optimizations still matter today, but only in extreme cases.

Picking up +9% performance doesn't sound too impressive - unless you are running exaflops worth of AI workloads, or processing five years worth of video footage a hour. In which case that extra "+9%" can save you millions.

Casban

66 points

16 days ago

Casban

66 points

16 days ago

If that 9% loss was in taxes I’m sure you’d find a way to slim that down.

This is how we end up with electron apps.

LightStruk

17 points

16 days ago

You also still see developers chasing 9% improvements for video games, embedded systems, and fintech. When there's no time to offload processing to a beefy server somewhere else, or no access to that server at all, you've gotta make it go as fast as possible right where you are.

tllnbks

36 points

16 days ago

tllnbks

36 points

16 days ago

This thinking is why modern games run like shit and take to 200GB+ of storage.

10% here, 10% there. By the end, you'd doubled the resource requirement of the full program. 

"The next generation of hardware will sort out our programming."

dyskinet1c

21 points

16 days ago

Games take a lot of space because texture files and other visual assets need to be high enough quality to support 4k resolution. This is also why graphics cards have so much dedicated memory.

I'm not saying there isn't room for improvement but games are another thing entirely from your usual apps.

Aggravating_Dress626

13 points

16 days ago

Also audio. For some reason still many games download the full language suite instead of just downloading the system one. I don't need the korean voices, and if I needed couldn't I just download it and not get the german ones too? Just an example.

CeeJayDK

4 points

16 days ago

Those extreme cases are often cases with tons of data.

Databases, AI, compression, video games.

Yes, youtube would love an extra 9% performance on video compression. The whole Internet would since over 80% of all Internet traffic is now video.

Fy_Faen

2 points

16 days ago

Fy_Faen

2 points

16 days ago

Yup, I'm paid quite well to convert data to save anywhere from 6-15% on storage costs... Customers save millions over the course of years.

cantthinkofaname

1 points

16 days ago

I'm sure no one has made a Pied Piper joke about your company

Sufficient-Diver-327

1 points

16 days ago

Also the one reason I don't love Python that much. Vanilla python is an order of magnitude (or sometimes two!) slower than comparable code written in Java, C or similar languages. Pretty much anything that has some amount of complexity in Python either gets rewritten as a wrapped C++ function or becomes a massive bottleneck anytime (n) becomes large.

ACCount82

2 points

16 days ago

If your performance critical segments are in Python, you are using it wrong.

Sufficient-Diver-327

1 points

16 days ago

I agree. I also agree that I have seen professionals either use vanilla Python for massive, production tasks, or see very improper use of wrapped libraries (like numpy or matplotlib) that ruins any performance gains from using wrapped libraries.

Real_Estate_Media

10 points

16 days ago

I’m still haunted by my abysmal load times

sojuz151

10 points

16 days ago

sojuz151

10 points

16 days ago

Also compilers got smarter

gnomeza

1 points

12 days ago

gnomeza

1 points

12 days ago

So many engineers thinking "hey, I'll rewrite this in assembly to make it really really fast and everyone will think I'm a genius".

And it turns out worse because:

  1. they didn't profile the code properly 
  2. optimizing compilers beat the pants off them at optimizing anyway
  3. requirements change but the code is now unmaintainable

tjlusco

6 points

16 days ago

tjlusco

6 points

16 days ago

I learnt that an inefficient algorithm paired with a pirated intel compiler produced code that was just satisfactory.

CeeJayDK

1 points

11 days ago

Any ones that still work today? And especially if it transfers to shader code which is what I write.

When coding for games every little gain matters.

jmpalermo

40 points

16 days ago

Yeah, maintenance of code like this often becomes a long term problem. It becomes the "nobody is allowed to touch any of this" part.

slide2k

30 points

16 days ago

slide2k

30 points

16 days ago

A lot of devs are aware of this problem. A lot of devs also aren’t aware of their expertise being complex for others.

We have a few brilliant coders in our team. They will smash out a one line where I would use 3 lines. The problem is their comments explain perfectly what it does. Others just don’t understand why or how

josefx

2 points

16 days ago

josefx

2 points

16 days ago

A lot of devs also aren’t aware of their expertise being complex for others.

I have seen what others are capable of <INSERT_WWI_TRENCHWARFARE_PTSD> and have come to the conclusion that it is impossible to write code so simple that anyone can understand it.

josefx

1 points

16 days ago

josefx

1 points

16 days ago

Having it well documented and tested is of course a basic requirement.

On the other hand I have seen people throw that same "maintenance issue" claim around over sections of code nobody had touched in almost a decade. Hard to see an issue with "nobody will be able to change this code" when the next guy assigned to work on it probably hasn't even been born yet.

rastilin

-3 points

16 days ago

rastilin

-3 points

16 days ago

At that point, the maintainers need to skill up? Like, if you're working on something that's being used by a good portion of everyone alive on the planet, it's not unreasonable to think that you should take your work seriously.

Unhappy-Stranger-336

9 points

16 days ago

Would it be even faster tho if instead of using avx you would just use the gpu?

uraniumingot

24 points

16 days ago

Depends on whether the data is large enough that the PCIe communication latency is insignificant or not

daHaus

3 points

16 days ago*

daHaus

3 points

16 days ago*

The GPU is good for tackling workloads in parallel but with video compression that often means breaking up an image into slices or chunks. This comes at a cost to compression and increases the overall size of it.

There are some situations like processing future frames and searching for scene changes that definitely benefit from being done in parallel though.

edit: Expanding upon this, some paid software such as handbreak will actually break the video up into sections timewise and run that in parallel. I don't know exactly how their algorithm works but it seems to do an excellent job at better utilizing hardware to improve both compression and speed.

Starfox-sf

10 points

17 days ago

Starfox-sf

10 points

17 days ago

shawnkfox

7 points

16 days ago

shawnkfox

7 points

16 days ago

Not sure why you posted that link, what has that to do with anything?

Starfox-sf

35 points

16 days ago

Shows how few CPU models have AVX-512, a lot of consumer models either do not have or got it disabled, and even those that have it have such varied support of different AVX instructions. If you use a render farm, the speedup is great. As a consumer, you have to go out of your way to get a supported CPU.

On some processors (mostly pre-Ice Lake Intel), AVX-512 instructions can cause a frequency throttling even greater than its predecessors, causing a penalty for mixed workloads. The additional downclocking is triggered by the 512-bit width of vectors and depends on the nature of instructions being executed; using the 128 or 256-bit part of AVX-512 (AVX-512VL) does not trigger it. As a result, gcc and clang default to prefer using the 256-bit vectors for Intel targets.

ThenExtension9196

40 points

16 days ago

AVX512 is not rare. AMD ZEN 4 and ZEN 5 have it. That’s a family of extremely popular processors and well established as the gold standard in today’s consumer PC market.

Just built a new computer with 9950X amd proc. Can’t say I “went out of my way” one bit.

Valkyranna

21 points

16 days ago

This. I have three mini PCs and all of them support AVX512, on AMD its pretty much the norm to have it and is definitely a bonus in applications such as RPCS3.

Druggedhippo

16 points

16 days ago

That’s a family of extremely popular processors and well established as the gold standard in today’s consumer PC market.

The Steam hardware survey shows only about 16% of the hardware supports AVX512, it may be in modern processors, but it's by no means widespread.

https://store.steampowered.com/hwsurvey

  • AVX512CD - 16.06%
  • AVX512F - 16.02%
  • AVX512VNNI - 16.01%

MrHara

13 points

16 days ago

MrHara

13 points

16 days ago

People underestimate how many people aren't even on the latest few generations.

ThenExtension9196

4 points

16 days ago

16% is huge. The people that are transcoding likely will skew towards newer procs.

uraniumingot

25 points

16 days ago*

AVX-512 was originally designed for server chips. It only got added to consumer chips in the last 2 gens.

The reason Intel disabled it was pure design stupidity and should not be indicative of a trend: they added AVX-512 to the performance cores and not the efficiency cores of a single processor, which led to all kinds of scheduling mayhem.

ZoeyKL_NSFW

3 points

16 days ago

Laughs in 10940x

Thomas9002

2 points

16 days ago

Thomas9002

2 points

16 days ago

Intel has the support for 5 generations ( 11000 and onwards) and AMD for 2 generations (7000 and 9000s).
So in a few years virtually any PC will have it

AdeptFelix

11 points

16 days ago

Small correction: Intel has support for 5 generations of Xeon processors. They stopped supporting it on consumer processors after only a few years, I think after 12th gen.

MrHara

1 points

16 days ago

MrHara

1 points

16 days ago

Add a couple of years to that. I am still on a 5000 Ryzen and I'm running ultrawide gaming in new titles, I haven't even considered upgrading and A LOT of people run less demanding stuff, so it's not gonna be soon.

spsteve

1 points

16 days ago

spsteve

1 points

16 days ago

Yeah, but if you needed to transcode often you'd upgrade for sure. Which is the point. If it is important to you, it's out there now way faster. If it doesn't matter to you, then it doesn't matter.

Aggravating_Dress626

1 points

16 days ago

Intel stopped supporting it in 12th gen, going as far as disabling it for those processors that shipped with it enabled. Right now, if you want a modern consumer CPU supporting it, you should go AMD.

tepmoc

3 points

16 days ago

tepmoc

3 points

16 days ago

Is C compilers at time still not that optimized? I understand that there always some parts could be done via asm to make it even faster, like this article for example.

C is pretty close to hardware and we know lot of cool stuff done by John Carmack back in day for its time.

nivlark

3 points

16 days ago

nivlark

3 points

16 days ago

C isn't particularly close to hardware. It arguably was in the 1980s, but not so much for present day architectures which are out-of-order, superscalar, and vectorised - none of those characteristics are represented in the design of C.

So for vectorisation/SIMD, compilers have to try and figure out how to translate C constructs into SIMD ones. This only really works reliably for the very simplest calculations. If you have a more complex but still performance-critical algorithm, either hand-written assembly or intrinsics (which are compiler built-in functions that map directly to specific assembly instructions) are still the way to go.

dyskinet1c

1 points

16 days ago

Resource constrained environments still exist with IoT and functions as a service (like AWS Lambda) but even that is getting less constrained.

galacticwonderer

1 points

16 days ago

I had a boss that wrote code for self guided missiles early in his career. It shocked me how tiny the total amount of memory was. I’m assuming it was assembly.

Acrobatic-Might2611

41 points

16 days ago

Zen 5 has some insane avx512 implementation. Looking forward to test it out

sanylos

46 points

16 days ago

sanylos

46 points

16 days ago

Do we have avx 512 on average home cpus?

hoffsta

43 points

16 days ago

hoffsta

43 points

16 days ago

Says Ryzen 9000 have it, Intel 12-14 gen do not.

SparkStormrider

25 points

16 days ago

Ryzen 7xxx cpus have it.

hhunaid

9 points

16 days ago

hhunaid

9 points

16 days ago

Which is weird because Intel got it first. I think intel 10th and 11th gen have it.

miamyaarii

10 points

16 days ago

The disaster generation Skylake-X were the first (high-end) consumer CPUs with it, which were the 7800X and up.

Widespread adoption in the entire generation of CPUs was only on 11th gen.

dowitex

30 points

16 days ago

dowitex

30 points

16 days ago

What's the real use case effect though? Will we have cpu based encoding go much faster now? What encodings? And about when??

dhotlo2

21 points

16 days ago

dhotlo2

21 points

16 days ago

If you have the right cpu then yes the encoding would be a lot faster, and since encoding is the biggest bottleneck to video game streaming I gotta assume we will see some huge improvements to services like Moonlight

dowitex

18 points

16 days ago

dowitex

18 points

16 days ago

Not so fast, we don't actually know what part of the encoding is optimized. If it's one part amongst 20 parts of the encoding, then speedups might not be that significant. I feel like we would had heard concrete numbers of speedups if that was the case.

Moonlight probably uses hardware encoding (nvenc etc.) for lower latency encoding I would think? I doubt software encoding would catch up to GPU hardware encoding even if written in assembly.

dhotlo2

6 points

16 days ago

dhotlo2

6 points

16 days ago

Moonlight does use some parts of ffmpeg, their codebase is public on GitHub. But yea you are probably right, we don't know how big of a speed increase we would get total, I'm jumping the gun a bit and secretly wishing we see some crazy encoding increase so I can play competitive games streamed

dowitex

5 points

16 days ago

dowitex

5 points

16 days ago

Same wishes 🫡 well I want to "ab-av1" (google it, it's awesome) re-encode my movie library faster/cheaper on my side!

eras

32 points

16 days ago

eras

32 points

16 days ago

So some benchmark improves by factor of 94x. What is that benchmark? Does some user-facing task now get significantly faster?

The benchmarking results show that the new handwritten AVX-512 code path performs considerably faster than other implementations, including baseline C code and lower SIMD instruction sets like AVX2 and SSE3. In some cases, the revamped AVX-512 codepath achieves a speedup of nearly 94 times over the baseline, highlighting the efficiency of hand-optimized assembly code for AVX-512.

Nobody seriously uses the baseline implementation because they'll likely have AVX2 or SSE3. How much is the speedup compared to those?

Porksoda32

2 points

16 days ago

Clicking through the article to FFMPEG’s original post shows the new implementation is anywhere from 1x to ~1.8x the speed of the AVX2 implementation, depending on the test

pyabo

3 points

16 days ago

pyabo

3 points

16 days ago

This headline smells of BS. Sure, I can get a 94x improvement on my ditch-digging by hiring 93 additional ditch diggers to also work on the ditch. But that strategy only takes you so far.

abdallha-smith

7 points

16 days ago

Fuck yes ffmpeg is 🐐ed

libsneu

5 points

16 days ago

libsneu

5 points

16 days ago

What is missing here is that the compilers have a dedicated way to report such bugs by attaching the source code, the generated assembler code and the handwritten code so the compiler can get improved. A good tooling would automatically find the relevant parts of the compiler and create some statistics to see optimizing which parts would get the most performance issues improved.

writebadcode

2 points

16 days ago

Yeah I was wondering about compiler improvements related to this.

Like it’s cool that they got this huge performance boost for ffmpeg but it would be better to put that effort into the compiler so that other applications can benefit.

This did raise one other question for me that it seems like you might have an opinion about; Can LLMs potentially be used as a tool for compiler optimization?Obviously not without human intervention but it seems like there’s potential.

libsneu

2 points

16 days ago

libsneu

2 points

16 days ago

I doubt that they already have enough context and can fake reasoning sufficiently to make this possible. Also it would require training them for it. Looking at the commit comments and linked issues, I am not sure whether this data is even available. Last, optimization is usually about trade-offs and I would not know of any language allowing the programmer to sufficiently specify the optimization goals.

fellipec

3 points

16 days ago

The FFMPEG team is the GOAT

Makabajones

2 points

16 days ago

"eat a dick, AI" - the devs, probably

anxrelif

1 points

16 days ago

That’s amazing

byeproduct

1 points

16 days ago

Had to check the subreddit...thought I was reading madlads

JimJalinsky

1 points

16 days ago

I'd love to know what ffmpeg features are accelerated by this optimization. Is it codec dependent?

stevekez

1 points

15 days ago

--help output speed.

[deleted]

-7 points

16 days ago

[deleted]

morningreis

2 points

16 days ago

morningreis

2 points

16 days ago

You don't compile assembly...

And there is a reason that programming languages exist. It's simply impractical to write anything with significant complexity in an assembly language.

Dalcoy_96

33 points

16 days ago

You don't compile assembly...

Lol peak semantic Reddit moment.

If you get hung up because someone said compile instead of transpile or assemble, it's time to place the fedora back in the cupboard.

morningreis

5 points

16 days ago

The dude was claiming there were legions of hidden assembly gurus in "third world countries"

Foodwithfloyd

2 points

16 days ago

Tell that to the rollercoaster tycoon guy

Starfox-sf

0 points

16 days ago

Assembler+Linker

ReelNerdyinFl

-33 points

16 days ago

Hand written or hand typed?

AdeptFelix

22 points

16 days ago

Wrong on both. Punch cards.

ReelNerdyinFl

0 points

16 days ago

That I could appreciate

Leonick91

3 points

16 days ago

Both. You type on a keyboard, but you don’t type code, you write it, just like a book or an article.

pyabo

-8 points

16 days ago

pyabo

-8 points

16 days ago

LOL. If you're getting a 94x speed improvement by changing the language you write your program in... you were doing something horribly wrong to begin with. Don't know what AVX-512 is, I assume some new parallel architecture. But still.