Silkenweb Example: Hackernews Clone

M1 Mac Mini Scores Higher Than My RTX 2080Ti in TensorFlow Speed Test

37 points by jcytong 4 years ago | 13 comments

loser777 4 years ago
At a few hundred microseconds per step, the benchmark steps start to approach the overhead of GPU kernel invocation and memory allocation. 1875 training steps means a batch size of 32, which is hardly optimal given the extremely small model size here. The same effect is the reason why CPUs remain latency competitive for batch size 1 inference, especially for small models.
visarga 4 years ago
It's not a conclusive test because the net is too lightweight. Such networks don't utilise the compute at maximum, so it's not showing what it would do under heavy load. At least try a ResNet-50.
- jcytong 4 years ago
  You are correct
  New results show it's 5x slower on the M1 https://medium.com/analytics-vidhya/m1-mac-mini-scores-highe...
  Interestingly, it's only utilizing the neural engine and no GPU. It'd be interesting to see what happens when M1 Tensorflow updates to utilize both GPU + NE
lostmsu 4 years ago
This is a very bad and unprofessional article. Given his test code, dataset size and training time I am sure if he'd check his GPU load, it would be under 4% because all the time would be wasted moving data to and from GPU.
- riggsdk 4 years ago
  It is my understanding that the M1 chip has unified CPU/GPU memory which means that Metal as the underlying framework might be clever enough to not copy the data at all. Not sure it applies to his use-case though.
  - lostmsu 4 years ago
    I was mostly talking about RTX 2080Ti which he is comparing against.
    It's like you're moving just across the street, and loading every single box into a car, crossing the street, then unloading the box instead of just walking on foot. You need to drive further (bigger networks) and load more boxes at once (batch size) for a car to actually be useful in this scenario.
205g0 4 years ago
Another apples to oranges comparison. OP should have compared each systems' TPU not GPU. He should redo the benchmark with a proper setup as requested in the comments here and on Medium, otherwise his post is quite misleading.
To facilitate Nvidia's tensor cores OP had to use Nvidia's own TF distr./image and configure it explicitly. Something PyTorch does out of the box. Nobody knows why Google doesn't do this, maybe they want to push their own Cloud TPUs.
> Adding PyTorch support would be high on my list.
Won't happen. PyTorch needs Apple's help bc of the lack of docs, they've asked already and Apple hasn't commented or promised any kind of support, nothing. That they've chose TF instead of the current market leader doesn't give me too much hope and might come from backroom deals we don't know of.
Wondering why OP didn't invest the money into a 2nd 2080 Ti.
andromeduck 4 years ago
Next up:
- my arduino scores higher than my raspberry pi in gpio speed tests
- my honda scores higher than my peterbilt in pizza delivery test
commandlinefan 4 years ago
Has anybody here tried out the 8 GB model? Thinking about getting one for the kids but not sure if 8 GB will be enough. This blog seems to be a +1.
- cbozeman 4 years ago
  Unless your kids are in university working on large models, I'm pretty sure the 8 gig version will more than suffice for general computing.
  I picked up the baseline model for $699 (8 GB unified RAM, 8/8/16 cores version with 256 GB of storage) and I've mostly been tinkering, as I use Windows 10 Pro / Fedora 33 on my workstation / gaming computer, but I can tell you it runs World of Warcraft: Shadowlands at 60 FPS 1440p resolution with Ultra settings, which is pretty damn astounding... I mean, I know WOW is running on a nearly 20 year old engine, but that engine has seen a hell of a lot of refinement over the years, and many advanced features have been added.
  The M1 is a testament to Apple's engineering team. I really look forward to seeing what they could do if went buck wild and gave themselves a 95 - 180 watt TPU range to compete with Ryzen 5000 / Threadripper 3000 series parts.
  We've seen that the M1 is competitive in low power scenarios... I want to know if it can be scaled up and be competitive when power is no concern.
  - commandlinefan 4 years ago
    Thanks, I appreciate the feedback. I've tried to use Windows 10 on computers with 8 GB and they haven't been much more useful than a paperweight so I was a little worried, but Apple does know hardware. Sounds like this will be perfect for what I need.
  - 205g0 4 years ago
    If GP looks for a general purpose computer for their kids, M1s are a clear yes but for DL? No way, as long as PyTorch is not running on M1s and an Nvidia GPU in some cheap PC shell has the same price tag.
olliej 4 years ago
This seems to be really short training - I though modern NN took hours to train (for prod scale), even with hardware acceleration?
4 years ago