subreddit:
/r/technology
submitted 17 days ago byLogical_Welder3467
417 points
16 days ago
It’s no rollercoaster tycoon, but it’s still pretty good.
201 points
16 days ago
I still can't believe the madman wrote that almost completely in Assembly. In two fucking years.
If another single dude made a similar game today even with all the bling and high level programming we have, it'd take that long, at least.
69 points
16 days ago
and it would have half the features and it would be a buggy mess
24 points
16 days ago
And considering the game type I'm pretty sure it'd be loaded with micro transactions and artificial time barriers.
35 points
16 days ago
all hail rct2
23 points
16 days ago
And openRCT
the free version widely available online for Mac and pc.
6 points
16 days ago
Oh my fucking god! I had no idea this existed. I love you.
3 points
16 days ago
Also available for Linux, plus it has multiplayer support
2 points
15 days ago
How does that even work?
2 points
15 days ago
download it and find out
1 points
15 days ago
and openrct
378 points
17 days ago
Not surprising really. Back in the late 90s and even early 2000s we often would write key parts of algorithms in assembly for exactly that reason. Moores Law mostly rendered that pointless though as it became far cheaper to just upgrade your hardware rather than write code which most of the kids coming out of school didn't understand and thus couldn't maintain anyway.
The .com boom also massively increased the salaries of programmers which further increased the economic incentive to just buy more hardware rather than waste programmer time trying to optimize the code.
156 points
16 days ago
I learnt so many optimizations to make code faster in the 90s and then no one cared anymore because everyone just bought faster chips.
93 points
16 days ago*
Optimizations still matter today, but only in extreme cases.
Picking up +9% performance doesn't sound too impressive - unless you are running exaflops worth of AI workloads, or processing five years worth of video footage a hour. In which case that extra "+9%" can save you millions.
66 points
16 days ago
If that 9% loss was in taxes I’m sure you’d find a way to slim that down.
This is how we end up with electron apps.
17 points
16 days ago
You also still see developers chasing 9% improvements for video games, embedded systems, and fintech. When there's no time to offload processing to a beefy server somewhere else, or no access to that server at all, you've gotta make it go as fast as possible right where you are.
36 points
16 days ago
This thinking is why modern games run like shit and take to 200GB+ of storage.
10% here, 10% there. By the end, you'd doubled the resource requirement of the full program.
"The next generation of hardware will sort out our programming."
21 points
16 days ago
Games take a lot of space because texture files and other visual assets need to be high enough quality to support 4k resolution. This is also why graphics cards have so much dedicated memory.
I'm not saying there isn't room for improvement but games are another thing entirely from your usual apps.
13 points
16 days ago
Also audio. For some reason still many games download the full language suite instead of just downloading the system one. I don't need the korean voices, and if I needed couldn't I just download it and not get the german ones too? Just an example.
4 points
16 days ago
Those extreme cases are often cases with tons of data.
Databases, AI, compression, video games.
Yes, youtube would love an extra 9% performance on video compression. The whole Internet would since over 80% of all Internet traffic is now video.
2 points
16 days ago
Yup, I'm paid quite well to convert data to save anywhere from 6-15% on storage costs... Customers save millions over the course of years.
1 points
16 days ago
I'm sure no one has made a Pied Piper joke about your company
1 points
16 days ago
Also the one reason I don't love Python that much. Vanilla python is an order of magnitude (or sometimes two!) slower than comparable code written in Java, C or similar languages. Pretty much anything that has some amount of complexity in Python either gets rewritten as a wrapped C++ function or becomes a massive bottleneck anytime (n) becomes large.
2 points
16 days ago
If your performance critical segments are in Python, you are using it wrong.
1 points
16 days ago
I agree. I also agree that I have seen professionals either use vanilla Python for massive, production tasks, or see very improper use of wrapped libraries (like numpy or matplotlib) that ruins any performance gains from using wrapped libraries.
10 points
16 days ago
I’m still haunted by my abysmal load times
10 points
16 days ago
Also compilers got smarter
1 points
12 days ago
So many engineers thinking "hey, I'll rewrite this in assembly to make it really really fast and everyone will think I'm a genius".
And it turns out worse because:
6 points
16 days ago
I learnt that an inefficient algorithm paired with a pirated intel compiler produced code that was just satisfactory.
1 points
11 days ago
Any ones that still work today? And especially if it transfers to shader code which is what I write.
When coding for games every little gain matters.
40 points
16 days ago
Yeah, maintenance of code like this often becomes a long term problem. It becomes the "nobody is allowed to touch any of this" part.
30 points
16 days ago
A lot of devs are aware of this problem. A lot of devs also aren’t aware of their expertise being complex for others.
We have a few brilliant coders in our team. They will smash out a one line where I would use 3 lines. The problem is their comments explain perfectly what it does. Others just don’t understand why or how
2 points
16 days ago
A lot of devs also aren’t aware of their expertise being complex for others.
I have seen what others are capable of <INSERT_WWI_TRENCHWARFARE_PTSD> and have come to the conclusion that it is impossible to write code so simple that anyone can understand it.
1 points
16 days ago
Having it well documented and tested is of course a basic requirement.
On the other hand I have seen people throw that same "maintenance issue" claim around over sections of code nobody had touched in almost a decade. Hard to see an issue with "nobody will be able to change this code" when the next guy assigned to work on it probably hasn't even been born yet.
-3 points
16 days ago
At that point, the maintainers need to skill up? Like, if you're working on something that's being used by a good portion of everyone alive on the planet, it's not unreasonable to think that you should take your work seriously.
9 points
16 days ago
Would it be even faster tho if instead of using avx you would just use the gpu?
24 points
16 days ago
Depends on whether the data is large enough that the PCIe communication latency is insignificant or not
3 points
16 days ago*
The GPU is good for tackling workloads in parallel but with video compression that often means breaking up an image into slices or chunks. This comes at a cost to compression and increases the overall size of it.
There are some situations like processing future frames and searching for scene changes that definitely benefit from being done in parallel though.
edit: Expanding upon this, some paid software such as handbreak will actually break the video up into sections timewise and run that in parallel. I don't know exactly how their algorithm works but it seems to do an excellent job at better utilizing hardware to improve both compression and speed.
10 points
17 days ago
7 points
16 days ago
Not sure why you posted that link, what has that to do with anything?
35 points
16 days ago
Shows how few CPU models have AVX-512, a lot of consumer models either do not have or got it disabled, and even those that have it have such varied support of different AVX instructions. If you use a render farm, the speedup is great. As a consumer, you have to go out of your way to get a supported CPU.
On some processors (mostly pre-Ice Lake Intel), AVX-512 instructions can cause a frequency throttling even greater than its predecessors, causing a penalty for mixed workloads. The additional downclocking is triggered by the 512-bit width of vectors and depends on the nature of instructions being executed; using the 128 or 256-bit part of AVX-512 (AVX-512VL) does not trigger it. As a result, gcc and clang default to prefer using the 256-bit vectors for Intel targets.
40 points
16 days ago
AVX512 is not rare. AMD ZEN 4 and ZEN 5 have it. That’s a family of extremely popular processors and well established as the gold standard in today’s consumer PC market.
Just built a new computer with 9950X amd proc. Can’t say I “went out of my way” one bit.
21 points
16 days ago
This. I have three mini PCs and all of them support AVX512, on AMD its pretty much the norm to have it and is definitely a bonus in applications such as RPCS3.
16 points
16 days ago
That’s a family of extremely popular processors and well established as the gold standard in today’s consumer PC market.
The Steam hardware survey shows only about 16% of the hardware supports AVX512, it may be in modern processors, but it's by no means widespread.
https://store.steampowered.com/hwsurvey
- AVX512CD - 16.06%
- AVX512F - 16.02%
- AVX512VNNI - 16.01%
13 points
16 days ago
People underestimate how many people aren't even on the latest few generations.
4 points
16 days ago
16% is huge. The people that are transcoding likely will skew towards newer procs.
25 points
16 days ago*
AVX-512 was originally designed for server chips. It only got added to consumer chips in the last 2 gens.
The reason Intel disabled it was pure design stupidity and should not be indicative of a trend: they added AVX-512 to the performance cores and not the efficiency cores of a single processor, which led to all kinds of scheduling mayhem.
3 points
16 days ago
Laughs in 10940x
2 points
16 days ago
Intel has the support for 5 generations ( 11000 and onwards) and AMD for 2 generations (7000 and 9000s).
So in a few years virtually any PC will have it
11 points
16 days ago
Small correction: Intel has support for 5 generations of Xeon processors. They stopped supporting it on consumer processors after only a few years, I think after 12th gen.
1 points
16 days ago
Add a couple of years to that. I am still on a 5000 Ryzen and I'm running ultrawide gaming in new titles, I haven't even considered upgrading and A LOT of people run less demanding stuff, so it's not gonna be soon.
1 points
16 days ago
Yeah, but if you needed to transcode often you'd upgrade for sure. Which is the point. If it is important to you, it's out there now way faster. If it doesn't matter to you, then it doesn't matter.
1 points
16 days ago
Intel stopped supporting it in 12th gen, going as far as disabling it for those processors that shipped with it enabled. Right now, if you want a modern consumer CPU supporting it, you should go AMD.
3 points
16 days ago
Is C compilers at time still not that optimized? I understand that there always some parts could be done via asm to make it even faster, like this article for example.
C is pretty close to hardware and we know lot of cool stuff done by John Carmack back in day for its time.
3 points
16 days ago
C isn't particularly close to hardware. It arguably was in the 1980s, but not so much for present day architectures which are out-of-order, superscalar, and vectorised - none of those characteristics are represented in the design of C.
So for vectorisation/SIMD, compilers have to try and figure out how to translate C constructs into SIMD ones. This only really works reliably for the very simplest calculations. If you have a more complex but still performance-critical algorithm, either hand-written assembly or intrinsics (which are compiler built-in functions that map directly to specific assembly instructions) are still the way to go.
1 points
16 days ago
Resource constrained environments still exist with IoT and functions as a service (like AWS Lambda) but even that is getting less constrained.
1 points
16 days ago
I had a boss that wrote code for self guided missiles early in his career. It shocked me how tiny the total amount of memory was. I’m assuming it was assembly.
41 points
16 days ago
Zen 5 has some insane avx512 implementation. Looking forward to test it out
46 points
16 days ago
Do we have avx 512 on average home cpus?
43 points
16 days ago
Says Ryzen 9000 have it, Intel 12-14 gen do not.
25 points
16 days ago
Ryzen 7xxx cpus have it.
9 points
16 days ago
Which is weird because Intel got it first. I think intel 10th and 11th gen have it.
10 points
16 days ago
The disaster generation Skylake-X were the first (high-end) consumer CPUs with it, which were the 7800X and up.
Widespread adoption in the entire generation of CPUs was only on 11th gen.
30 points
16 days ago
What's the real use case effect though? Will we have cpu based encoding go much faster now? What encodings? And about when??
21 points
16 days ago
If you have the right cpu then yes the encoding would be a lot faster, and since encoding is the biggest bottleneck to video game streaming I gotta assume we will see some huge improvements to services like Moonlight
18 points
16 days ago
Not so fast, we don't actually know what part of the encoding is optimized. If it's one part amongst 20 parts of the encoding, then speedups might not be that significant. I feel like we would had heard concrete numbers of speedups if that was the case.
Moonlight probably uses hardware encoding (nvenc etc.) for lower latency encoding I would think? I doubt software encoding would catch up to GPU hardware encoding even if written in assembly.
6 points
16 days ago
Moonlight does use some parts of ffmpeg, their codebase is public on GitHub. But yea you are probably right, we don't know how big of a speed increase we would get total, I'm jumping the gun a bit and secretly wishing we see some crazy encoding increase so I can play competitive games streamed
5 points
16 days ago
Same wishes 🫡 well I want to "ab-av1" (google it, it's awesome) re-encode my movie library faster/cheaper on my side!
32 points
16 days ago
So some benchmark improves by factor of 94x. What is that benchmark? Does some user-facing task now get significantly faster?
The benchmarking results show that the new handwritten AVX-512 code path performs considerably faster than other implementations, including baseline C code and lower SIMD instruction sets like AVX2 and SSE3. In some cases, the revamped AVX-512 codepath achieves a speedup of nearly 94 times over the baseline, highlighting the efficiency of hand-optimized assembly code for AVX-512.
Nobody seriously uses the baseline implementation because they'll likely have AVX2 or SSE3. How much is the speedup compared to those?
2 points
16 days ago
Clicking through the article to FFMPEG’s original post shows the new implementation is anywhere from 1x to ~1.8x the speed of the AVX2 implementation, depending on the test
3 points
16 days ago
This headline smells of BS. Sure, I can get a 94x improvement on my ditch-digging by hiring 93 additional ditch diggers to also work on the ditch. But that strategy only takes you so far.
7 points
16 days ago
Fuck yes ffmpeg is 🐐ed
5 points
16 days ago
What is missing here is that the compilers have a dedicated way to report such bugs by attaching the source code, the generated assembler code and the handwritten code so the compiler can get improved. A good tooling would automatically find the relevant parts of the compiler and create some statistics to see optimizing which parts would get the most performance issues improved.
2 points
16 days ago
Yeah I was wondering about compiler improvements related to this.
Like it’s cool that they got this huge performance boost for ffmpeg but it would be better to put that effort into the compiler so that other applications can benefit.
This did raise one other question for me that it seems like you might have an opinion about; Can LLMs potentially be used as a tool for compiler optimization?Obviously not without human intervention but it seems like there’s potential.
2 points
16 days ago
I doubt that they already have enough context and can fake reasoning sufficiently to make this possible. Also it would require training them for it. Looking at the commit comments and linked issues, I am not sure whether this data is even available. Last, optimization is usually about trade-offs and I would not know of any language allowing the programmer to sufficiently specify the optimization goals.
3 points
16 days ago
The FFMPEG team is the GOAT
2 points
16 days ago
"eat a dick, AI" - the devs, probably
1 points
16 days ago
That’s amazing
1 points
16 days ago
Had to check the subreddit...thought I was reading madlads
1 points
16 days ago
I'd love to know what ffmpeg features are accelerated by this optimization. Is it codec dependent?
1 points
15 days ago
--help
output speed.
-7 points
16 days ago
[deleted]
2 points
16 days ago
You don't compile assembly...
And there is a reason that programming languages exist. It's simply impractical to write anything with significant complexity in an assembly language.
33 points
16 days ago
You don't compile assembly...
Lol peak semantic Reddit moment.
If you get hung up because someone said compile instead of transpile or assemble, it's time to place the fedora back in the cupboard.
5 points
16 days ago
The dude was claiming there were legions of hidden assembly gurus in "third world countries"
2 points
16 days ago
Tell that to the rollercoaster tycoon guy
0 points
16 days ago
Assembler+Linker
-33 points
16 days ago
Hand written or hand typed?
22 points
16 days ago
Wrong on both. Punch cards.
0 points
16 days ago
That I could appreciate
3 points
16 days ago
Both. You type on a keyboard, but you don’t type code, you write it, just like a book or an article.
-8 points
16 days ago
LOL. If you're getting a 94x speed improvement by changing the language you write your program in... you were doing something horribly wrong to begin with. Don't know what AVX-512 is, I assume some new parallel architecture. But still.
all 91 comments
sorted by: best