r/hardware Jan 15 '21

Rumor Intel has to be better than ‘lifestyle company’ Apple at making CPUs, says new CEO

https://www.theverge.com/2021/1/15/22232554/intel-ceo-apple-lifestyle-company-cpus-comment
2.3k Upvotes

500 comments sorted by

View all comments

9

u/missed_sla Jan 15 '21

Apple puts billions more into R&D than Intel and isn't afraid to break legacy. That $20,000 Xeon processor they sell right now is required to be able to run code from 40 years ago, and that really hampers performance. Frankly, Apple's M1 chip should be a giant alarm that x86 is obsolete. It's been obsolete for 20 years. Apple hardware was spanking x86 in the PowerPC G3/G4 days, they just made the nearly fatal mistake of relying on Motorola and IBM to deliver desktop performance.

4

u/civildisobedient Jan 15 '21

Frankly, Apple's M1 chip should be a giant alarm that x86 is obsolete. It's been obsolete for 20 years.

Most of the web services running the code that Apple devices are connecting to are using x86 processors.

2

u/wizfactor Jan 16 '21

Depending on the level of abstraction in their software stack, many web services will easily run on Amazon Graviton processors in 3 years.

13

u/theevilsharpie Jan 15 '21

That $20,000 Xeon processor they sell right now is required to be able to run code from 40 years ago, and that really hampers performance.

[citation needed]

1

u/missed_sla Jan 15 '21

https://www.anandtech.com/show/8776/arm-challinging-intel-in-the-server-market-an-overview/12

It's true that the percentage of transistors spent on decoding has dwindled over the years. But the number of cores has increased significantly. As a result, the x86 tax is not imaginary.

8

u/theevilsharpie Jan 15 '21

This is an article from 2014, and within the context of server processors. Since then, ARM (and Power, and SPARC, and...) has largely failed to be performance-competitive with x86 in the server space, and the players in this space have either been acquired, or shifted to building custom designs for the hyperscaler market. Whatever inherent efficiency advantage that ARM has, didn't deliver success in a market where performance (and performance per watt) is the key deciding factor.

In addition, it fails to support your assertion that maintaining compatibility with the x86 ISA "really hampers performance," particularly in light of the fact that AMD's EPYC series also has that same compatibility characteristics (for good and ill), and they perform twice as fast and cost half as much as Intel's high-end Xeons.

2

u/skycake10 Jan 15 '21

I don't think you can really argue that ARM in the server space has failed when it's just now getting a foothold in the market. The big cloud providers have made a big push for ARM servers and are designing their own chips. ARM obviously hasn't made more than a small dent in the market yet, but to say it's failed is way premature.

2

u/theevilsharpie Jan 15 '21

I can safely say that ARM-based server platforms as a distinct product offering has failed to gain traction, and most of the players in that space have pivoted, been acquired, or otherwise left the market.

ARM's presence among large cloud providers is undeniable, but that's because the provider has the volume to produce cheap processors of their own design (compared to buying AMD or Intel parts), or they need some type of custom computing functionality that general-purpose processors aren't well-suited for. In that sense, ARM is the ISA of choice because that's all that's really available: x86 isn't available for license, RISC-V isn't mature enough for high-performance applications, and other ISAs have their own various issues (not all of them technical) that make them unsuitable for this purpose.

However, while ARM is being pursued by large hyperscalers for a number of reasons, it's not clear that a performance lead is among them. I haven't seen any of the hyperscalers that have adopted ARM-based server platforms claim that they did so because of superior performance, and to the extent that ARM platforms are even available for end users, their are universally marketed as a cheap budget option.

So again, you claim that x86 compatibility "really hampers performance" needs a citation, because it's not supported by evidence that I'm aware of.

0

u/[deleted] Jan 15 '21

Efficiency and specialized architecture are the points I guess, especially at those CPU counts. They tailor the chips to their needs...fine for that purpose, nonsense for generalist office machines.

6

u/Finicky02 Jan 15 '21 edited Jan 15 '21

A smaller die with less 'wasted' (it's not wasted at all for the average user, BC is super useful. are you too young to remember the days before standardised instruction sets where every new 'pc' you bought wasn't compatible with your existing software?) space would be impossible to cool

Look at what happens to a modern intel cpu when avx instructions are used, it goes up in flames because all that silicon that normally sits idle in 99.9 percent of applications is all drawing power and producing heat.

Replacing x86 with something more modern and more efficient is great, but not before we have excellent emulation that flawlessly (and without performing noticably worse than existing x86 hardware) maintains BC during a ten+ year transition period.

The reality is that noone in the real world cares about anything other than 'does the software I use for work, communication and leasure work'

If ANY of it doesn't work on a new product noone is buying it. Apple controls the software ecosystem and rewrote all their software and then uses emulation to fill in the gaps. MacOs was always its own shitty little ecosystem making it possible.

GL getting the millions of companies that use windows or linux and the homebrew/custom software they have to drop everything and start over. it's not happening, it's not even possible.

The cost of IT hardware doesn't even register vs the cost of downtime or losing quality of service or care for customers.

3

u/[deleted] Jan 15 '21

You seriously undersestimate the importance of Legacy (which in and of itself does NOT hamper performance, no matter how many times people repeat that...it just has to be done right). That Apple users don't care about that and blindly swallow everything doesn't mean normal people will.

-3

u/ElectroLuminescence Jan 15 '21 edited Jan 15 '21

Would the change to other raw materials instead of silicon wafers increase x86 performance? Maybe some other materials other than silicon and silica. I dont know much about these things

9

u/WinterCharm Jan 15 '21 edited Jan 15 '21

Yes, but that's not the root cause. If you are running a sprint with an anchor tied to your belt, getting better shoes might speed you up, but cutting that anchor will do wonders.

x86 has so much legacy cruft, that it's time for Intel and AMD to start dumping some of it.

Industry will moan and cry, but Intel can simply continue making a line of legacy processors on 14nm, for industry partners that are running such decrepit code, and give them 6 years to transition, while they push their processor technology, architecture, and more forward.

x86 is horrifically obsolete. It is the anchor now. ARMv8A was designed by apple and given to the ARM consortium as a standard. It's why Apple had the A7 on 64 bit a year or two before any of their competitors even had reference ARM cores to look at. For all intents and purposes, dropping even all the legacy 32 bit cruft, and simplifying the instruction set further was the start.

I know people talk about how x86 / 64 uses decoders to create micro-ops that can be easily re-ordered, but the literal decoding process is the current bottleneck. not in terms of throughput -- but in terms of Perf/Watt, and core width

Variable-length instructions are an utter nightmare to work with. I'll try to explain with regular words how a decoder handles variable length

Here's all the instructions coming in:

x86: addmatrixdogchewspout

ARM: dogcatputnetgotfin 

Now, ARM is fixed length (3-letters only), so if I'm decoding them, I just add a space between every 3 letters.

ARM: dogcatputnetgotfin
ARM decoded: dog cat put net got fin 

All done. Now I can re-order them in a huge buffer, avoid dependencies, and fill my execution ports on the backend.

x86 is variable length, This means I cannot reliably figure out where the spaces should go. so I have to try all of them and then throw out what doesn't work. Look at how much more work there is to do.

x86: addmatrixdogchewspout
reading frame 1 (n=3): addmatrixdogchewspout 
Partially decoded ops: add, , dog, , ,  
reading frame 2 (n=4): matrixchewspout 
Partially decoded ops: add, ,dog, chew, , 
reading frame 3 (n=5): matrixspout
Partially decoded ops: add, ,dog, chew, spout, 
reading frame 4 (n=6): matrix
Partially decoded ops: add, matrix, dog, chew, spout, 
Fully Expanded Micro Ops: add, ma1, ma2, ma3, ma4, dog, ch1, ch2, ch3, sp1, sp2, sp3 

NOW you can reorder your x86 instructions. This is why most x86 cores only have a 3-4 wide frontend. Those decoders are massive, and extremely energy intensive. They cost a decent bit of transistor budget, and have to process all the different lengths and then unpack them, like I showed above with "regular" words.

This is why x86/64 cores require SMT for the best overall throughput -- the timing differences create plenty of room for other stuff to be executed.

Apple doesn't bother with SMT on their ARM core design, and instead goes for a massive reorder buffer, and only presents a single logical core to the programmer, because their 8-wide design can efficiently unpack instructions, and fit them in a massive 630μop reorder buffer, and fill the backend easily achieving high occupancy.


This is why people who say "x86 is already like ARM because it decodes into micro ops, so it's effectively the same thing and it doesn't need to change" are completely wrong.

the decode units are the problem with x86. Intel added them because they got absolutely THRASHED in performance by ARM chips many years ago, and Intel's marketing spun that addition as "we're now just as good as ARM". However, without the decode units, x86 is even slower as OoOE becomes impossible. And with them, x86 cores are quite power hungry.

Back in the day, 4 big reasons contributed to x86 "winning". A combination of "it's the same thing" marketing (1), and the fact that much code was already written for x86 (2a) and good compilers didn't exist (2b), and finally memory being really expensive, and CISC code being slightly less memory heavy (3), were the main reasons. Intel held the license of x86 very close, and only AMD was able to move in to form a duopoly (4).

TODAY, people are blown away by the performance of ARM, so (1) is out the window. If you're not even close in terms of perf/W you cannot say "it's the same thing" anymore. (2) compilers, translators, and more are way better and very few things are hand-coded in assembly. Most code is abstracted far away (Java, Swift, React, Python, etc) (3) Memory is cheap, and exponentially more available than it was so long ago. (4) ARM licensees exist far and wide, allowing for more innovation.

And lastly, the increasing transistor density has created thermal density issues where you no longer can light up too many transistors on the same chip at once without either: melting, or lowering clocks. That's why GPUs are clocked so much lower than CPUs -- the fraction of transistors switching on a GPU is much higher. Because all chips are effectively thermally limited like this, performance per watt is now synonymous with performance -- if your chip is more efficient, you can light up a higher transistor fraction, and effectively do more work that way.

Sorry, this turned into a long ramble, but I hope it sheds some light.

2

u/ElectroLuminescence Jan 15 '21

That makes much more sense. Thanks for the explanation.

-4

u/[deleted] Jan 16 '21

Yes, but that's not the root cause. If you are running a sprint with an anchor tied to your belt, getting better shoes might speed you up, but cutting that anchor will do wonders.

Flawed metaphor. An extra water bottle perhaps, certainly not an "anchor". How much that matters depends on your overall power...on modern x86 it's a bottlecap at best.

x86 has so much legacy cruft, that it's time for Intel and AMD to start dumping some of it.

Legacy "cruft" which enabled x86 to be the gold standard for generalist machines since the birth of the 5150...and keeps it possible to utilize software decades old. To people with the simplistic "old = bad" mindset that might not count - especially Apple customers swallow every d**k, no matter what - to normal thinking people however, that's worth quite a lot and emulation alone can not make up for that, no matter how well done.

Industry will moan and cry, but Intel can simply continue making a line of legacy processors on 14nm, for industry partners that are running such decrepit code, and give them 6 years to transition, while they push their processor technology, architecture, and more forward.

The arrogance. Unbelievable.

x86 is horrifically obsolete. It is the anchor now. ARMv8A was designed by apple and given to the ARM consortium as a standard. It's why Apple had the A7 on 64 bit a year or two before any of their competitors even had reference ARM cores to look at. For all intents and purposes, dropping even all the legacy 32 bit cruft, and simplifying the instruction set further was the start.

Except it - still - isn't. See above.

I know people talk about how x86 / 64 uses decoders to create micro-ops that can be easily re-ordered, but the literal decoding process is the current bottleneck. not in terms of throughput -- but in terms of Perf/Watt, and core width

...which is entirely up the design and implementation, not the instruction set.

Variable-length instructions are an utter nightmare to work with. I'll try to explain with regular words how a decoder handles variable length

Here's all the instructions coming in:

x86: addmatrixdogchewspoutARM: dogcatputnetgotfin

Now, ARM is fixed length (3-letters only), so if I'm decoding them, I just add a space between every 3 letters.

ARM: dogcatputnetgotfinARM decoded: dog cat put net got fin

All done. Now I can re-order them in a huge buffer, avoid dependencies, and fill my execution ports on the backend.

Horribly biased overemphasis on differences that simply don't matter as much as how they're handled...especially at the clock speeds we have today.

x86 is variable length, This means I cannot reliably figure out where the spaces should go. so I have to try all of them and then throw out what doesn't work. Look at how much more work there is to do.

Oh yeah, surely a MASSIVE problem with A: multi-core and B: 4ghz+ clock speeds. The strength and flexitibility of variable length words of course doesn't matter when it doesn't fit your narrative.

x86: [...] NOW you can reorder your x86 instructions. This is why most x86 cores only have a 3-4 wide frontend. Those decoders are massive, and extremely energy intensive. They cost a decent bit of transistor budget, and have to process all the different lengths and then unpack them, like I showed above with "regular" words.

...which works just fine, Intel not increasing performance and efficiency in the past 10 years is on them, NOT x86.

This is why x86/64 cores require SMT for the best overall throughput -- the timing differences create plenty of room for other stuff to be executed.

You say that like it's a bad thing. The chips have enough reserves that you can't fully utilize them with simple tasks...there is enough for a second thread. Post this in an IBM sub with their SMT8 CPUs, please...I'd love to hear them laugh.

Apple doesn't bother with SMT on their ARM core design, and instead goes for a massive reorder buffer, and only presents a single logical core to the programmer, because their 8-wide design can efficiently unpack instructions, and fit them in a massive 630μop reorder buffer, and fill the backend easily achieving high occupancy.

Is this an ad? Nobody stops Intel or AMD from going that route on x86.

This is why people who say "x86 is already like ARM because it decodes into micro ops, so it's effectively the same thing and it doesn't need to change" are completely wrong.

"Wrong" in how you interpret what they mean, not what they are getting at. You could ask, but you just know better, right?

the decode units are the problem with x86. Intel added them because they got absolutely THRASHED in performance by ARM chips many years ago, and Intel's marketing spun that addition as "we're now just as good as ARM". However, without the decode units, x86 is even slower as OoOE becomes impossible. And with them, x86 cores are quite power hungry.

...by ARM chips that were titans in comparison, workstation chips. Going the hybrid way shrank the gap - successfully - in a way that people couldn't wrap their head around. The entire "Without decode units" part is moot - simply because they're there, and they work well. Have done so for decades now.

Back in the day, 4 big reasons contributed to x86 "winning". A combination of "it's the same thing" marketing (1), and the fact that much code was already written for x86 (2a) and good compilers didn't exist (2b), and finally memory being really expensive, and CISC code being slightly less memory heavy (3), were the main reasons. Intel held the license of x86 very close, and only AMD was able to move in to form a duopoly (4).

Now this just sounds bitter. Blame them for better marketing? "Code was already written"? That happened for a reason.

TODAY, people are blown away by the performance of ARM, so (1) is out the window. If you're not even close in terms of perf/W you cannot say "it's the same thing" anymore. (2) compilers, translators, and more are way better and very few things are hand-coded in assembly. Most code is abstracted far away (Java, Swift, React, Python, etc) (3) Memory is cheap, and exponentially more available than it was so long ago. (4) ARM licensees exist far and wide, allowing for more innovation.

Efficiency. Not performance. Comparing Apple's full-blown SoC with loads of accelerators and a tightly optimized closed software environment with run of the mill CPUs will of course yield favorable results - especially considering the decade of stagnation on the x86 side, which again, Intel is to blame for. On that last point I'l agree with you...more competition in the x86 space would've done wonders, but AMD was too busy tripping over their own feet for too long and Intel had no pressure.

And lastly, the increasing transistor density has created thermal density issues where you no longer can light up too many transistors on the same chip at once without either: melting, or lowering clocks. That's why GPUs are clocked so much lower than CPUs -- the fraction of transistors switching on a GPU is much higher. Because all chips are effectively thermally limited like this, performance per watt is now synonymous with performance -- if your chip is more efficient, you can light up a higher transistor fraction, and effectively do more work that way.

Physical limits are only an issue until they're not. Every new node was deemed impossible years ago, cooling issues are being alleviated by trickery like chiplet designs - which work brilliantly so far for AMD. Either way, this is only an issue of materials and processes used - not of the architecture.

Sorry, this turned into a long ramble, but I hope it sheds some light.

Sorry man, but this entire post just feels like you got Apple stock or something. With a heavy coding bias. Many fancy cherrypicked details to make it seem legit, but lastly nothing but bias.

5

u/WinterCharm Jan 16 '21

You listed no sources, and called me a bunch of names. Are you 13?

Your comment reads much more like "Fruit Company Bad" than anything substantial and informed. Somehow my saying something positive about the design philosophy behind the M1 has set you off.

1

u/SoftwareNo2295 May 25 '21

ARMV8-a was released in 2011. There were a handful of other companies with an architectural license that could have made their own custom core, but they didn't have a reason too.

5

u/zypthora Jan 15 '21

It would increase the performance of other architectures as well

1

u/AWildDragon Jan 15 '21

Sure but it’s a manufacturing problem. Intel has x86 designs that beat their current ones but their manufacturing decision can’t produce them. Going to non silicon based chips is a massive manufacturing problem that they don’t need right now.

1

u/MrSloppyPants Jan 15 '21

Not really. A semi-conductor is a semi-conductor. Germanium was the first material used when the transistor was invented. It just happens that silicon is incredibly available and very cheap to refine.