Announcement

Collapse
No announcement yet.

Intel Revises PCIe Cooling Driver To Reduce Link Speed When Running Too Hot

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intel Revises PCIe Cooling Driver To Reduce Link Speed When Running Too Hot

    Phoronix: Intel Revises PCIe Cooling Driver To Reduce Link Speed When Running Too Hot

    Since last year Intel's open-source software engineers have been working on a PCIe bandwidth controller driver for the Linux kernel to avoid thermal issues by being able to automatically reduce the PCIe link speed when needed. This driver still isn't over the finish line but today brought the fifth iteration of these patches...

    Phoronix, Linux Hardware Reviews, Linux hardware benchmarks, Linux server benchmarks, Linux benchmarking, Desktop Linux, Linux performance, Open Source graphics, Linux How To, Ubuntu benchmarks, Ubuntu hardware, Phoronix Test Suite

  • #2
    What's this? Bus overheat? Bridge? Controller chip in device?

    HPC users are not going to be happy...

    Comment


    • #3
      Originally posted by tildearrow View Post
      HPC users are not going to be happy...
      Why not? As you can see, the feature is behind a Kconfig flag. If enabled, it can be adjusted or turned off via a sysfs knob.

      Comment


      • #4
        Originally posted by tildearrow View Post
        HPC users are not going to be happy...
        Due to mostly using servers, HPC users have a handful of industrial grade fans that draw 40W each in their systems, so they likely won't see throttling.

        Consumers on the other hand, especially those with small form factor devices may require this functionality. Heck, Intel CPUs nowadays do already have a hard time to avoid frying themselves. Adding a super-high-speed PHY on the package doesn't exactly make things better in this regard.

        Comment


        • #5
          Originally posted by tildearrow View Post
          What's this? Bus overheat? Bridge? Controller chip in device?

          HPC users are not going to be happy...
          they'll be less happy if they need to walk out into the middle of the datacenter to pull roasted chips.

          Comment


          • #6
            Contemporary transistors are increasingly plagued by thermal constraints and performance bottlenecks. To address these challenges, it's imperative to innovate and develop new technologies that can be produced on a massive scale, enhancing or potentially supplanting the existing models.

            Comment


            • #7
              Thermal throttling should be solved in hardware first and foremost. E.g. the hardware should not rely on software to prevent it from roasting. Software is a nice addition of course but it should never be the primary "fix"

              http://www.dirtcellar.net

              Comment


              • #8
                I am not sure about this feature. We might see similar errors and complaints as on windows, where users report their GPU PCIe 16x to be slower than they should be according to the official specification. Reason: The PCIe Bus clocked down to lower speeds due to power saving modes. Would be good to be able to set values manually in this new intel driver just in case.

                Comment


                • #9
                  Originally posted by waxhead View Post
                  Thermal throttling should be solved in hardware first and foremost. E.g. the hardware should not rely on software to prevent it from roasting. Software is a nice addition of course but it should never be the primary "fix"
                  Well, no, because the thing heating up the might not be the disk or GPU itself. Managing the thermals of a tightly-packed device is difficult and it takes coordinated effort.

                  Comment


                  • #10
                    Originally posted by Avamander View Post

                    Well, no, because the thing heating up the might not be the disk or GPU itself. Managing the thermals of a tightly-packed device is difficult and it takes coordinated effort.
                    The argument is basically the same one I complain about with Intel DPTF in hand-me-down mini-PCs using laptop chips... it shouldn't be a struggle to keep the temperature down for the duration of a memtest86+ run for lack of a special driver and shoddy UEFI that assumes Windows will always be there to babysit the thermals.

                    Your product shouldn't need to be the Therac-25 for you to be in the wrong for skimping on the hardware (or, in the case of a PC, firmware) interlocks.

                    Granted, given the thermald config file's terminology, I think the problem with my mini PCs' UEFI is that they boot up in "silent mode" and that, if I let them, the memtest86+ runs would probably have climbed to about 80°C and then throttled to maintain temperature equilibrium. (I run Debian on them to use them as low-measurement-noise benchmarking environments for my creations and modern Linux kernels plus thermald can manage the fans just fine.)
                    Last edited by ssokolow; 10 May 2024, 02:36 AM.

                    Comment

                    Working...
                    X