<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>trofi - All posts</title>
    <link href="https://trofi.github.io/feed/atom.xml" rel="self" />
    <link href="https://trofi.github.io" />
    <id>https://trofi.github.io/feed/atom.xml</id>
    <author>
        <name>Sergei Trofimovich</name>
        
        <email>slyich@gmail.com</email>
        
    </author>
    <updated>2026-02-14T00:00:00Z</updated>
    <entry>
    <title>sequoia pgp</title>
    <link href="https://trofi.github.io/posts/346-sequoia-pgp.html" />
    <id>https://trofi.github.io/posts/346-sequoia-pgp.html</id>
    <published>2026-02-14T00:00:00Z</published>
    <updated>2026-02-14T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="tldr">TL;DR</h2>
<p>If you are a <code>PGP</code> newbie (like me) then consider reading
<a href="https://book.sequoia-pgp.org/"><code>sq user documentation</code></a> book! It gave
me an idea of how to use the <code>sq</code> tool and how <code>PGP</code> concepts map to it.
The book also has a “Background” section that I missed so much to get a
better grasp of <code>PGP</code> model.</p>
<h2 id="story-mode">Story mode</h2>
<p>I created my first <code>PGP</code> key in 2008:</p>
<pre><code>$ gpg --list-public-keys slyich@gmail.com
pub   dsa1024/0x71A1EE76611FF3AA 2008-10-18 [SC] [revoked: 2018-07-04]
      9929AD151B96AF651958D35871A1EE76611FF3AA
uid                   [ revoked] Sergei Trofimovich &lt;slyfox@...&gt;
uid                   [ revoked] Sergei Trofimovich &lt;st@...&gt;
uid                   [ revoked] Sergei Trofimovich &lt;slyfox@...&gt;
uid                   [ revoked] Sergei Trofimovich &lt;slyich@...&gt;
uid                   [ revoked] Sergei Trofimovich &lt;slyfox@...&gt;
uid                   [ revoked] Sergei Trofimovich &lt;siarheit@...&gt;

pub   rsa4096/0x44FE231F3F3926E4 2018-07-03 [SC] [expires: 2027-08-18]
      62197C11C7C25A61C448E95644FE231F3F3926E4
uid                   [ultimate] Sergei Trofimovich &lt;slyich@...&gt;
sub   rsa4096/0xBA6C2FC245B4DF2C 2018-07-03 [E] [expires: 2027-08-18]
sub   rsa4096/0xED5E45E06F2AC293 2018-07-03 [S] [expires: 2027-08-18]</code></pre>
<p>You could tell I had no idea how to use it then even by looking at the
key! No subkeys, long list of attached identities some of which were
stale, I revoked <code>DSA-1024</code> key years after algorithm was officially
declared weak.</p>
<p>Looking back into the mail history I suspect I started using <code>PGP</code>
after Mikhail’s suggestion. Mikhail always introduced me to the new
fancy things he recently found. Be it <code>SoftICE</code>, <code>Gmail</code> beta, <code>XMPP</code>,
<code>Wave</code>, <code>GitHub</code> and infinite list of other things I already forgot.</p>
<p>Fast forward into 2018 <code>Gentoo</code> updated <a href="https://www.gentoo.org/glep/glep-0063.html"><code>GLEP 63</code></a>
and started requiring all the devs to have an <code>RSA</code> key, use keys with
expiration dates and discourage <code>DSA</code> key usage. This made my <code>2008</code> key
invalid. I had to generate a new key and picked <code>RSA-4096</code>. I either did
not follow an equivalent of
<a href="https://wiki.gentoo.org/wiki/Project:Infrastructure/Generating_GLEP_63_based_OpenPGP_keys">modern guide</a>
or it did not exist at the time. As a result I got very slow commit
signing experience :)</p>
<p>Even 10 years after I started using <code>PGP</code> I had almost no mental model
of what a key vs a subkey is. How both relate to private key (or keys?).
How does Web of Trust work. How to make
sure you don’t export to much to the keyservers. Why editing the
expiration date does not change the key itself.
All I read at the time was
<a href="https://www.gnupg.org/gph/en/manual.html">The GNU Privacy Handbook</a>
from 1999. From what I understand it was not updated since.</p>
<p>I used <code>gnupg</code> for key management and occasional file decryption. Email
clients had a decent <code>PGP</code> UI integration to be easily usable. But many
non-tech users were confused to see signature attaches and tried to
download and unpack it. I stopped using email signing by default.</p>
<p>I felt that <code>gnupg</code> as a tool was not very user friendly: it has a ton
of options and interactive questions that I have no idea how to answer
confidently.</p>
<p>Recent <a href="https://lwn.net/Articles/1055053/"><code>Fedora</code> and <code>GPG</code> 2.5</a> <code>LWN</code>
article from 2026 tricked me into looking at <code>Sequoia PGP</code>. Having heard
a bit of <code>LibrePGP</code> vs <code>OpenPGP</code> story it made me wonder: would <code>sq</code> tool
allow me to get a bit better mental model of basic <code>PGP</code> concepts and
ability to introspect keys and messages?</p>
<h2 id="trying-out-sq">Trying out <code>sq</code></h2>
<p>I read <a href="https://book.sequoia-pgp.org/"><code>sq user documentation</code></a> and I
strongly recommend reading it to get both the idea of <code>PGP</code> basics and
<code>sq</code> specifics on how to do trivial things!</p>
<p>Here is what <code>sq</code> has to say about my (private) <code>PGP</code> keys:</p>
<pre><code>$ sq key list
 - Backend softkeys has no keys.

 - 62197C11C7C25A61C448E95644FE231F3F3926E4
   - user IDs:
     - Sergei Trofimovich &lt;slyich@...&gt; (authenticated)
     - Sergei Trofimovich &lt;slyfox@...&gt; (UNAUTHENTICATED) revoked
   - created 2018-07-03 08:06:04 UTC
   - will expire 2027-08-18T20:45:35Z
   - usable for signing and decryption
   - @gpg-agent/default: available, locked

   - B6E7C10B37726D7DF059BFE7BA6C2FC245B4DF2C
     - created 2018-07-03 08:06:04 UTC
     - will expire 2027-08-18T20:45:44Z
     - usable for signing and decryption
     - @gpg-agent/default: available, locked
   - FA0D7526A27870BE3842498DED5E45E06F2AC293
     - created 2018-07-03 19:19:15 UTC
     - will expire 2027-08-18T20:46:23Z
     - usable for signing and decryption
     - @gpg-agent/default: available, locked

 - 9929AD151B96AF651958D35871A1EE76611FF3AA
   - user IDs:
     - Sergei Trofimovich &lt;siarheit@...&gt; (UNAUTHENTICATED)
     - Sergei Trofimovich &lt;slyfox@...&gt; (UNAUTHENTICATED)
     - Sergei Trofimovich &lt;slyfox@...&gt; (UNAUTHENTICATED)
     - Sergei Trofimovich &lt;slyfox@...&gt; (UNAUTHENTICATED)
     - Sergei Trofimovich &lt;slyich@...&gt; (UNAUTHENTICATED)
     - Sergei Trofimovich &lt;st@...&gt; (UNAUTHENTICATED)
   - created 2008-10-18 10:28:05 UTC
   - revoked on 2018-07-04 19:35:27 UTC, Key is superseded: Migrated to new more secure key 62197C11C7C25A61C448E95644FE231F3F3926E4
   - not valid: Policy rejected asymmetric algorithm: DSA1024 is not considered secure since 2014-02-01T00:00:00Z
   - usable for signing
   - @gpg-agent/default: available, locked

   - BD21D77765C9B8A655EAC11B8F20BA89A99E563C
     - created 2008-10-18 10:28:05 UTC
     - usable for decryption
     - @gpg-agent/default: available, locked</code></pre>
<p>This command shown me outright which identities I revoked in my current
key and they were wrong! (I fixed it since). If nothing else that was a
nice side-effect of trying <code>sq</code>.</p>
<p>I find this verbose output slightly more readable at least as a
first-time user. It shows a bit more detail on advertised algorithms in
the keys, violated security policies for outdated algorithms.</p>
<h2 id="other-random-bits">Other random bits</h2>
<p><code>sq network search</code> allows for a key lookup and gets keys into the cache
without any explicit assignment of trustworthiness to them.</p>
<p><code>sq pki link add</code> and <code>sq pki authenticate</code> allow for a lighter-weight
way of tracking the key authenticity locally without the need to export
your relation to other IDs.</p>
<p>I’ll not spend too much time here, but the book mentions nice things
like implementation bits of <code>WKD</code> and <code>DANE</code> to support <code>PGP</code>
infrastructure.</p>
<h2 id="introspection-sq-introspect-and-sq-dump">Introspection: <code>sq introspect</code> and <code>sq dump</code></h2>
<p><code>sq inspect</code> is a nice tool to explore the <code>PGP</code> keys and <code>PGP</code> messages.</p>
<p>Encrypting the message:</p>
<pre><code>$ sq encrypt --signer-email=slyich@gmail.com --for-email slyich@gmail.com foo --output foo.pgp
Composing a message...

 - encrypted for Sergei Trofimovich &lt;slyich@gmail.com&gt; (authenticated)
   - using 62197C11C7C25A61C448E95644FE231F3F3926E4

 - signed by Sergei Trofimovich &lt;slyich@gmail.com&gt; (authenticated)
   - using 62197C11C7C25A61C448E95644FE231F3F3926E4</code></pre>
<p>Exploring the content:</p>
<pre><code>$ sq inspect foo.pgp
foo.pgp: Encrypted OpenPGP Message.

      Recipient: BA6C2FC245B4DF2C
        Associated certificate:
          62197C11C7C25A61C448E95644FE231F3F3926E4
          Sergei Trofimovich &lt;slyich@gmail.com&gt; (authenticated)</code></pre>
<p>I think the equivalent <code>gpg</code> command is <code>gpg --list-packets</code>:</p>
<pre><code>$ gpg --list-packets foo.pgp
gpg: encrypted with rsa4096 key, ID 0xBA6C2FC245B4DF2C, created 2018-07-03
      &quot;Sergei Trofimovich &lt;slyich@gmail.com&gt;&quot;

&lt;asks for password&gt;

gpg: using &quot;0xED5E45E06F2AC293&quot; as default secret key for signing
# off=0 ctb=c1 tag=1 hlen=3 plen=523 new-ctb
:pubkey enc packet: version 3, algo 1, keyid BA6C2FC245B4DF2C
        data: [4088 bits]
# off=526 ctb=d2 tag=18 hlen=3 plen=729 new-ctb
:encrypted data packet:
        length: 729
        mdc_method: 2
# off=548 ctb=c4 tag=4 hlen=2 plen=13 new-ctb
:onepass_sig packet: keyid 44FE231F3F3926E4
        version 3, sigclass 0x00, digest 10, pubkey 1, last=1
# off=563 ctb=cb tag=11 hlen=2 plen=10 new-ctb
:literal data packet:
        mode b (62), created 0, name=&quot;&quot;,
        raw data: 4 bytes
# off=575 ctb=c2 tag=2 hlen=3 plen=658 new-ctb
:signature packet: algo 1, keyid 44FE231F3F3926E4
        version 4, created 1771059670, md5len 0, sigclass 0x00
        digest algo 10, begin of digest f2 08
        critical hashed subpkt 2 len 4 (sig created 2026-02-14)
        hashed subpkt 16 len 8 (issuer key ID 44FE231F3F3926E4)
        hashed subpkt 20 len 70 (notation: salt@notations.sequoia-pgp.org=[not human readable])
        hashed subpkt 33 len 21 (issuer fpr v4 62197C11C7C25A61C448E95644FE231F3F3926E4)
        hashed subpkt 35 len 21 (?)
        data: [4096 bits]</code></pre>
<p>It’s a lot more verbose than <code>sq inspect</code> (that’s nice!). But is it
obvious what algorithm was used to encrypt the message? One probably
needs to know algorithm numbers like <code>algo 1</code> or <code>digest algo 10</code>.</p>
<p>An <code>sq</code> equivalent would be <code>sq packet</code> invocation:</p>
<pre><code>$ sq packet dump foo.pgp
Public-Key Encrypted Session Key Packet, new CTB, 523 bytes
    Version: 3
    Recipient: BA6C2FC245B4DF2C
    Pk algo: RSA

Sym. Encrypted and Integrity Protected Data Packet, new CTB, 729 bytes
│   Version: 1
│   Session key: 477E1CA59418C21B6D95AB78B129EED8A53888447159D1660232F3EE03E151B9
│   Symmetric algo: AES-256
│   Decryption successful
│
├── One-Pass Signature Packet, new CTB, 13 bytes
│       Version: 3
│       Type: Binary
│       Pk algo: RSA
│       Hash algo: SHA512
│       Issuer: 44FE231F3F3926E4
│       Last: true
│
├── Literal Data Packet, new CTB, 10 bytes
│       Format: Binary data
│       Content: &quot;foo\n&quot;
│
├── Signature Packet, new CTB, 658 bytes
│       Version: 4
│       Type: Binary
│       Pk algo: RSA
│       Hash algo: SHA512
│       Hashed area:
│         Signature creation time: 2026-02-14 09:01:10 UTC (critical)
│         Issuer: 44FE231F3F3926E4
│           Sergei Trofimovich &lt;slyich@gmail.com&gt; (authenticated)
│         Notation: salt@notations.sequoia-pgp.org
│           00000000  25 33 44 74 e7 b8 1a 28  1c b1 56 bd f0 02 4e 02
│           00000010  26 fe dd 1f c8 8c ab 11  9d 18 f3 7b bd 39 0c ad
│         Issuer Fingerprint: 62197C11C7C25A61C448E95644FE231F3F3926E4
│           Sergei Trofimovich &lt;slyich@gmail.com&gt; (authenticated)
│         Intended Recipient: 62197C11C7C25A61C448E95644FE231F3F3926E4
│       Digest prefix: F208
│       Level: 0 (signature over data)
│
└── Modification Detection Code Packet, new CTB, 20 bytes
        Digest: 1A14AA0FFDCB56E6BD64E005BA088EF591344F8D
        Computed digest: 1A14AA0FFDCB56E6BD64E005BA088EF591344F8D
        Valid: true</code></pre>
<p>Here it’s slightly more obvious that session key was encrypted with
<code>RSA</code>, data was encrypted with <code>AES-256</code> and was signed with <code>SHA-512</code>.</p>
<p><code>sq key export</code> (private) and <code>sq cert export</code> (public) are a nice
complement to <code>sq inspect</code> and <code>sq packet dump</code> to get the idea what’s
in the keys.</p>
<h2 id="parting-words">Parting words</h2>
<p>I found <code>sq</code> UI quite usable to rekindle some <code>PGP</code> interest in me. I
even managed to fix messed up list of revoked identities on the current
key.</p>
<p>I don’t use anything advanced like smart cards to store keys or
detached offline certification key and I suspect <code>sq</code> has some
limitations there. But at least now I do understand what those things
are and how they are useful!</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>nixpkgs and repology changes</title>
    <link href="https://trofi.github.io/posts/345-nixpkgs-and-repology-changes.html" />
    <id>https://trofi.github.io/posts/345-nixpkgs-and-repology-changes.html</id>
    <published>2026-01-25T00:00:00Z</published>
    <updated>2026-01-25T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="tldr-nixpkgs-changed-the-way-it-exports-data-to-repology-and-many-packages-need-fixing">Tl;DR: <code>nixpkgs</code> changed the way it exports data to <code>repology</code> and many packages need fixing</h2>
<p>In <a href="https://github.com/NixOS/nixpkgs/pull/451424"><code>nixpkgs/451424</code></a> <code>nixpkgs</code>
changed substantially how it exports data about available packages.
As a result some packages like <code>flare</code> stopped being reported to <code>repology.org</code>
correctly. Luckily the fix is to usually use <code>pname</code> / <code>version</code> instead of
direct <code>name</code>. Example <a href="https://github.com/NixOS/nixpkgs/pull/483476">fix</a>
for <code>flare</code> package:</p>
<pre class="diff"><code>--- a/pkgs/by-name/fl/flare/package.nix
+++ b/pkgs/by-name/fl/flare/package.nix
@@ -6,7 +6,8 @@
 }:

 buildEnv {
-  name = &quot;flare-1.14&quot;;
+  pname = &quot;flare&quot;;
+  version = &quot;1.14&quot;;

   paths = [
     (callPackage ./engine.nix { })</code></pre>
<h2 id="the-bug">The bug</h2>
<p>I happen to maintain <a href="https://github.com/trofi/nix-olde"><code>nix-olde</code></a> program.
It’s a tool to show you what outdated packages your system has installed.
The tool is hacky: both in a way how it looks up what you have installed
in the system and in how it maps that information to <code>repology.org</code> database.
It errs on the side to not print possibly wrong or missing information.</p>
<p>A few days ago I casually ran <code>nix-olde</code> against my system and got very
odd outputs, like:</p>
<pre><code>        repology r:clock &quot;0.7.4&quot; | nixpkgs {&quot;0.8.4&quot;} {&quot;haskellPackages.clock&quot;}
        repology r:digest &quot;0.6.39&quot; | nixpkgs {&quot;0.0.2.1&quot;} {&quot;haskellPackages.digest&quot;}
        repology r:hedgehog &quot;0.2&quot; | nixpkgs {&quot;1.5&quot;} {&quot;haskellPackages.hedgehog&quot;}
        repology r:mmap &quot;0.6-23&quot; | nixpkgs {&quot;0.5.9&quot;} {&quot;haskellPackages.mmap&quot;}
        repology r:warp &quot;0.2.3&quot; | nixpkgs {&quot;3.4.9&quot;} {&quot;haskellPackages.warp&quot;}
        repology r:yaml &quot;2.3.12&quot; | nixpkgs {&quot;0.11.11.2&quot;} {&quot;haskellPackages.yaml&quot;}</code></pre>
<p>Here <code>nix-olde</code> says that it compared <code>R</code> language packages against
<code>Haskell</code> language packages and was unhappy about version mismatch. That
looked very wrong. Turns out that <code>repology.org</code> changed the way it
reports <code>nixpkgs</code> packages for <code>nixpkgs_unstable</code> repository.
Before the change <code>repology</code> reported the following package descriptions:</p>
<pre class="json"><code>&quot;python:networkx&quot;: [
  {
    &quot;repo&quot;: &quot;nixpkgs_unstable&quot;,
    &quot;srcname&quot;: &quot;python310Packages.networkx&quot;,
    &quot;visiblenamename&quot;: &quot;python3.10-networkx&quot;,
    &quot;version&quot;: &quot;2.8.6&quot;,
    &quot;status&quot;: &quot;outdated&quot;,
  },
]</code></pre>
<p>After the change it started producing the following output:</p>
<pre class="json"><code>&quot;python:networkx&quot;: [
  {
    &quot;repo&quot;: &quot;nixpkgs_unstable&quot;,
    &quot;srcname&quot;: &quot;python310Packages.networkx&quot;,
    &quot;visiblename&quot;: &quot;networkx&quot;,
    &quot;version&quot;: &quot;2.8.6&quot;,
    &quot;status&quot;: &quot;outdated&quot;,
  },
]</code></pre>
<p><code>nix-olde</code> used to rely on <code>visiblename</code>: it was conveniently not too
specific (did not contain a version) and yet was specific enough to contain
package ecosystem prefix. But now <code>visiblename</code> omits ecosystem entirely.
This caused all the problematic reports.
I fixed it with <a href="https://github.com/trofi/nix-olde/commit/ec6313f5abf33ae2701afa6c165a8abc2087699e">this commit</a>.</p>
<h2 id="how-does-nixpkgs-export-data-to-repology">How does <code>nixpkgs</code> export data to repology?</h2>
<p><code>repology.org</code> is configured to fetch package lists from <code>nixpkgs</code>
<a href="https://github.com/repology/repology-updater/blob/85f1accfc4d6edd1c940a620b886b4b21c0c1fb8/repos.d/nixos.yaml#L65">here</a>:</p>
<pre class="yaml"><code>- name: nix_unstable
  type: repository
  desc: nixpkgs unstable
  statsgroup: nix
  family: nix
  ruleset: [nix, nix_name]
  color: '7eb2dd'
  minpackages: 120000
  default_maintainer: fallback-mnt-nix@repology
  sources:
    - name: packages-unstable.json
      fetcher:
        class: FileFetcher
        url: https://channels.nixos.org/nixos-unstable/packages.json.br
      parser:
        class: NixJsonParser
        use_pname: true</code></pre>
<p>Note that it was changed
<a href="https://github.com/repology/repology-updater/commit/e134ff779ed1f776784473403dca19bdd3a64a64">very recently</a> to enable <code>use_pname</code>.</p>
<p><code>nixpkgs</code> on its side generates <code>packages.json.br</code> with an equivalent of
<a href="https://github.com/NixOS/nixpkgs/blob/aa6e0f1bcb02bcb16084e5ec00d56df94e1e235e/pkgs/top-level/make-tarball.nix#L47">this</a>:</p>
<pre class="bash"><code>NIX_STATE_DIR=$TMPDIR NIX_PATH= nix-instantiate --eval --raw --expr &quot;import $src/pkgs/top-level/packages-info.nix {}&quot; | sed &quot;s|$src/||g&quot; | jq -c &gt; packages.json
brotli -9 &lt; packages.json &gt; packages.json.br</code></pre>
<p>Note that this also
<a href="https://github.com/NixOS/nixpkgs/commit/e6fd1262842edfd00c54523a4b18d1a16f5c0587">changed recently</a>.
Before the change the code used to use <code>nix-env</code>:</p>
<pre class="bash"><code>(
  echo -n '{&quot;version&quot;:2,&quot;packages&quot;:'
  NIX_STATE_DIR=$TMPDIR NIX_PATH= nix-env -f $src -qa --meta --json --show-trace --arg config 'import ${./packages-config.nix}'
  echo -n '}'
) | sed &quot;s|$src/||g&quot; | jq -c &gt; packages.json
brotli -9 &lt; packages.json &gt; packages.json.br</code></pre>
<h2 id="is-there-a-difference-introduced">Is there a difference introduced?</h2>
<p>Why does it matter? Conveniently currently only <code>nixpkgs_unstable</code> has
the change. Previous releases like <code>nixos-25.11</code> still use the previous
mechanism. Let’s compare the outputs of <code>python3Packages.networkx</code>:</p>
<pre><code>$ curl -L https://channels.nixos.org/nixos-unstable/packages.json.br &gt; unstable-packages.json.br

$ brotli -d &lt;unstable-packages.json.br | jq '.packages.&quot;python313Packages.networkx&quot;|[.name, .pname, .version]'
[
  &quot;python3.13-networkx-3.5&quot;,
  &quot;networkx&quot;,
  &quot;3.5&quot;
]

$ curl -L https://channels.nixos.org/nixos-25.11/packages.json.br &gt; 25.11-packages.json.br

$ brotli -d &lt;25.11-packages.json.br | jq '.packages.&quot;python313Packages.networkx&quot;|[.name, .pname, .version]'
[
  &quot;python3.13-networkx-3.5&quot;,
  &quot;python3.13-networkx&quot;,
  &quot;3.5&quot;
]</code></pre>
<p>Here we see that <code>pname</code> changed from <code>python3.13-networkx</code> to <code>networkx</code>.
For this package it’s a reasonable change. Let’s look at <code>flare</code> instead:</p>
<pre><code>$ brotli -d &lt;unstable-packages.json.br | jq '.packages.flare|[.name, .pname, .version]'
[
  &quot;flare-1.14&quot;,
  &quot;flare-1.14&quot;,
  &quot;&quot;
]

$ brotli -d &lt;25.11-packages.json.br | jq '.packages.flare|[.name, .pname, .version]'
[
  &quot;flare-1.14&quot;,
  &quot;flare&quot;,
  &quot;1.14&quot;
]</code></pre>
<p>Here the change effectively broke both package name and version reporting
in <code>unstable</code>. As a result <code>flare-rpg</code> is missing an <code>unstable</code> entry on
<a href="https://repology.org/project/flare-rpg/versions">repology</a>:</p>
<pre><code>nixpkgs stable 23.11	flare	1.14
nixpkgs stable 24.05	flare	1.14
nixpkgs stable 24.11	flare	1.14
nixpkgs stable 25.05	flare	1.14
nixpkgs stable 25.11	flare	1.14</code></pre>
<p>Normally <code>unstable</code> entry looks like
<a href="https://repology.org/project/re2c/versions">this</a>:</p>
<pre><code>nixpkgs stable 23.11	re2c	3.1
nixpkgs stable 24.05	re2c	3.1
nixpkgs stable 24.11	re2c	3.1
nixpkgs stable 25.05	re2c	4.1
nixpkgs stable 25.11	re2c	4.3.1
nixpkgs unstable	re2c	4.4</code></pre>
<h2 id="why-does-it-happen">Why does it happen?</h2>
<p>Before the <a href="https://github.com/NixOS/nixpkgs/pull/451424"><code>nixpkgs/451424</code></a>
change the split from <code>"flare-1.14"</code> down to <code>"flare" "1.14"</code> was done
by <code>nix-env -qa</code> itself! It does not have any advanced heuristics. I
noticed it before when I just started on <code>nix-olde</code>.
For example in
<a href="https://github.com/NixOS/nix/issues/7540"><code>nix/7540</code></a> <code>nix</code> splits
<code>"font-adobe-75dpi-1.0.3"</code> in an unexpected way:</p>
<pre><code>$ nix-env -f. -qa --json | fgrep -A9  xorg.fontadobe75dpi
  &quot;xorg.fontadobe75dpi&quot;: {
    &quot;name&quot;: &quot;font-adobe-75dpi-1.0.3&quot;,
    &quot;outputName&quot;: &quot;out&quot;,
    &quot;outputs&quot;: {
      &quot;out&quot;: null
    },
    &quot;pname&quot;: &quot;font-adobe&quot;,
    &quot;system&quot;: &quot;x86_64-linux&quot;,
    &quot;version&quot;: &quot;75dpi-1.0.3&quot;
  },</code></pre>
<p>It’s <code>"font-adobe" "75dpi-1.0.3"</code>. But it should have been
<code>"font-adobe-75dpi" "1.0.3"</code> instead!</p>
<p>Normally all 3 <code>name / pname / version</code> come directly from the attributes
of a package:</p>
<pre><code>$ nix repl -f '&lt;nixpkgs&gt;'

nix-repl&gt; with re2c; [ name pname version ]
[
  &quot;re2c-4.4&quot;
  &quot;re2c&quot;
  &quot;4.4&quot;
]</code></pre>
<p>For some packages not all of them are defined:</p>
<pre><code>nix-repl&gt; with flare; [ name pname version ]
[
  &quot;flare-1.14&quot;
  «error: undefined variable 'pname'»
  «error: undefined variable 'version'»
]</code></pre>
<p>The fix is usually simple: instead of defining <code>name</code> directly use
<code>pname</code> / <code>version</code> split. For <a href="https://github.com/NixOS/nixpkgs/pull/483476">example</a>:</p>
<pre class="diff"><code>--- a/pkgs/by-name/fl/flare/package.nix
+++ b/pkgs/by-name/fl/flare/package.nix
@@ -6,7 +6,8 @@
 }:

 buildEnv {
-  name = &quot;flare-1.14&quot;;
+  pname = &quot;flare&quot;;
+  version = &quot;1.14&quot;;

   paths = [
     (callPackage ./engine.nix { })</code></pre>
<h2 id="how-does-repology-handle-such-packages">How does repology handle such packages?</h2>
<p>Does <code>repology</code> do anything special about these <code>"version": ""</code> cases?
<code>repology-updater/repology/parsers/parsers/nix.py</code> handles versions
somewhere <a href="https://github.com/repology/repology-updater/blob/85f1accfc4d6edd1c940a620b886b4b21c0c1fb8/repology/parsers/parsers/nix.py#L160">here</a>.</p>
<p>It already has to work around cases of wrong version splits:</p>
<pre class="python"><code>    for verprefix in ['100dpi', '75dpi']:
        if packagedata['version'].startswith(verprefix):
            pkg.log('dropping &quot;{}&quot;, &quot;{}&quot; does not belong to version'.format(packagedata['name'], verprefix), severity=Logger.ERROR)
            skip = True
            break</code></pre>
<p>Otherwise, <code>"version"</code> is extracted
<a href="https://github.com/repology/repology-updater/blob/85f1accfc4d6edd1c940a620b886b4b21c0c1fb8/repology/parsers/parsers/nix.py#L180">as is</a>:</p>
<pre class="python"><code>    pname = packagedata['pname']
    version = packagedata['version']
    # This is temporary solution (see #854) which overrides pname and version with ones
    # (ambigiously) parsed from name. That's what nix currently does (instead of exposing
    # explicitly set pname and version), and we do the same instead of using pname/version
    # provided by them to avoid unexpected change in data when/if they change their logic
    # As soon as they do and changed data is verified, this block may be removed
    match = re.match('(.+?)-([0-9].*)$', packagedata['name'])
    if match is None:
        pkg.log('cannot parse name &quot;{}&quot;'.format(packagedata['name']), severity=Logger.ERROR)
        continue
    elif not self._use_pname:
        pname = match.group(1)
        version = match.group(2)</code></pre>
<p>Thus, <code>_use_pname</code> (enabled in <code>nixpkgs_unstable</code>) effectively disables <code>name</code> version
heuristics.</p>
<h2 id="are-there-any-more-affected-packages">Are there any more affected packages?</h2>
<p>I wondered now many more packages in <code>packages.json</code> that do have
version in <code>pname</code> with a version specified, but lack <code>name</code> form:</p>
<pre><code>$ brotli -d &lt;unstable-packages.json.br | fgrep -B4 '&quot;version&quot;: &quot;&quot;' d | fgrep pname | sort -u | grep -P -- '-[0-9]+(\.[0-9]+)*\&quot;' | wc -l
260</code></pre>
<p>Here I filtered only <code>pname</code> that seemingly look like a version. I probably
missed a few complicated cases. But it’s a good first pass. Here are
package examples:</p>
<pre><code>$ brotli -d &lt;unstable-packages.json.br | fgrep -B4 '&quot;version&quot;: &quot;&quot;' d | fgrep pname | sort -u | grep -P -- '-[0-9]+(\.[0-9]
      &quot;pname&quot;: &quot;afro-graphics-theme-47.05&quot;,
      &quot;pname&quot;: &quot;ajantv2-module-17.5.0-5.10.248&quot;,
      &quot;pname&quot;: &quot;ajantv2-module-17.5.0-5.15.198&quot;,
      &quot;pname&quot;: &quot;ajantv2-module-17.5.0-6.1.161&quot;,
      &quot;pname&quot;: &quot;ajantv2-module-17.5.0-6.12.66&quot;,
      &quot;pname&quot;: &quot;ajantv2-module-17.5.0-6.12.67&quot;,
      &quot;pname&quot;: &quot;ajantv2-module-17.5.0-6.18.6&quot;,
      &quot;pname&quot;: &quot;ajantv2-module-17.5.0-6.18.7&quot;,
      &quot;pname&quot;: &quot;ajantv2-module-17.5.0-6.6.121&quot;,
      &quot;pname&quot;: &quot;android-studio-for-platform-2024.2.2.13&quot;,
      &quot;pname&quot;: &quot;android-studio-for-platform-canary-2024.3.1.9&quot;,
      &quot;pname&quot;: &quot;auditable-cargo-1.92.0&quot;,
      &quot;pname&quot;: &quot;autoreiv-theme-47.01&quot;,
      &quot;pname&quot;: &quot;bbswitch-unstable-2021-11-29-5.10.248&quot;,
      &quot;pname&quot;: &quot;bbswitch-unstable-2021-11-29-5.15.198&quot;,
      &quot;pname&quot;: &quot;bbswitch-unstable-2021-11-29-6.1.161&quot;,
      &quot;pname&quot;: &quot;bbswitch-unstable-2021-11-29-6.12.66&quot;,
      &quot;pname&quot;: &quot;bbswitch-unstable-2021-11-29-6.12.67&quot;,
      &quot;pname&quot;: &quot;bbswitch-unstable-2021-11-29-6.18.6&quot;,
      &quot;pname&quot;: &quot;bbswitch-unstable-2021-11-29-6.18.7&quot;,
      &quot;pname&quot;: &quot;bbswitch-unstable-2021-11-29-6.6.121&quot;,
      &quot;pname&quot;: &quot;binary-black-2024-02-15&quot;,
      &quot;pname&quot;: &quot;binary-blue-2024-02-15&quot;,
      &quot;pname&quot;: &quot;binary-red-2024-02-15&quot;,
      &quot;pname&quot;: &quot;binary-white-2024-02-15&quot;,
      &quot;pname&quot;: &quot;broadcom-sta-6.30.223.271-59-5.10.248&quot;,
      &quot;pname&quot;: &quot;broadcom-sta-6.30.223.271-59-5.15.198&quot;,
      &quot;pname&quot;: &quot;broadcom-sta-6.30.223.271-59-6.1.161&quot;,
      &quot;pname&quot;: &quot;broadcom-sta-6.30.223.271-59-6.12.66&quot;,
      &quot;pname&quot;: &quot;broadcom-sta-6.30.223.271-59-6.12.67&quot;,
      &quot;pname&quot;: &quot;broadcom-sta-6.30.223.271-59-6.18.6&quot;,
      &quot;pname&quot;: &quot;broadcom-sta-6.30.223.271-59-6.18.7&quot;,
      &quot;pname&quot;: &quot;broadcom-sta-6.30.223.271-59-6.6.121&quot;,
      &quot;pname&quot;: &quot;bundler-audit-0.9.2&quot;,
      &quot;pname&quot;: &quot;cabal2nix-2.21.0&quot;,
      &quot;pname&quot;: &quot;caribou-0.4.21&quot;,
      &quot;pname&quot;: &quot;catppuccin-frappe-2024-02-15&quot;,
      &quot;pname&quot;: &quot;catppuccin-latte-2024-02-15&quot;,
      &quot;pname&quot;: &quot;catppuccin-macchiato-2024-02-15&quot;,
      &quot;pname&quot;: &quot;catppuccin-mocha-2024-02-15&quot;,
      &quot;pname&quot;: &quot;cctools-binutils-darwin-dualas-1010.6&quot;,
      &quot;pname&quot;: &quot;cfn-nag-0.8.10&quot;,
      &quot;pname&quot;: &quot;chicken-base64-3.3.1&quot;,
      &quot;pname&quot;: &quot;chicken-defstruct-1.6&quot;,
      &quot;pname&quot;: &quot;chicken-http-client-0.18&quot;,
      &quot;pname&quot;: &quot;chicken-intarweb-1.7&quot;,
      &quot;pname&quot;: &quot;chicken-matchable-3.7&quot;,
      &quot;pname&quot;: &quot;chicken-sendfile-1.8.3&quot;,
      &quot;pname&quot;: &quot;chicken-simple-md5-0.0.1&quot;,
      &quot;pname&quot;: &quot;chicken-uri-common-1.4&quot;,
      &quot;pname&quot;: &quot;chicken-uri-generic-2.46&quot;,
      &quot;pname&quot;: &quot;cloudformation-0.9.64&quot;,
      &quot;pname&quot;: &quot;compass-1.0.3&quot;,
      &quot;pname&quot;: &quot;d1x-rebirth-full-2.0.0.7&quot;,
      &quot;pname&quot;: &quot;d2x-rebirth-full-2.0.0.7&quot;,
      &quot;pname&quot;: &quot;dbus-1&quot;,
      &quot;pname&quot;: &quot;deadbeef-with-plugins-1.10.0&quot;,
      &quot;pname&quot;: &quot;dejavu-fonts-2.37&quot;,
      &quot;pname&quot;: &quot;Dell-5130cdn-Color-Laser-1.3-1&quot;,
      &quot;pname&quot;: &quot;dfgraphics-theme-42.05&quot;,
      &quot;pname&quot;: &quot;distcc-masq-gcc-15.2.0&quot;,
      &quot;pname&quot;: &quot;docbook-sgml-3.1&quot;,
      &quot;pname&quot;: &quot;docbook-sgml-4.1&quot;,
      &quot;pname&quot;: &quot;dracula-2020-07-02&quot;,
      &quot;pname&quot;: &quot;drawio-headless-29.0.3&quot;,
      &quot;pname&quot;: &quot;eclipse-plugin-antlr-runtime-4.5.3&quot;,
      &quot;pname&quot;: &quot;eclipse-plugin-antlr-runtime-4.7.1&quot;,
      &quot;pname&quot;: &quot;ecm-7.0.6&quot;,
      &quot;pname&quot;: &quot;exact-audio-copy-1.8.0&quot;,
      &quot;pname&quot;: &quot;faust2alqt-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2alsa-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2csound-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2firefox-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2jack-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2jackrust-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2jaqt-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2ladspa-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2lv2-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2sc.py-2.83.1&quot;,
      &quot;pname&quot;: &quot;faust2sndfile-2.83.1&quot;,
      &quot;pname&quot;: &quot;fcitx5-with-addons-5.1.16&quot;,
      &quot;pname&quot;: &quot;flare-1.14&quot;,
      &quot;pname&quot;: &quot;fluentd-1.18.0&quot;,
      &quot;pname&quot;: &quot;foreman-0.87.2&quot;,
      &quot;pname&quot;: &quot;frogatto-unstable-2023-02-27&quot;,
      &quot;pname&quot;: &quot;geany-with-vte-2.1&quot;,
      &quot;pname&quot;: &quot;gear-2022-04-19&quot;,
      &quot;pname&quot;: &quot;gemset-theme-47.05&quot;,
      &quot;pname&quot;: &quot;gimp-with-plugins-2.10.38&quot;,
      &quot;pname&quot;: &quot;gimp-with-plugins-3.0.6&quot;,
      &quot;pname&quot;: &quot;git_fame-3.2.19&quot;,
      &quot;pname&quot;: &quot;gitweb-2.52.0&quot;,
      &quot;pname&quot;: &quot;glibc-iconv-2.42&quot;,
      &quot;pname&quot;: &quot;glibc-multi-2.42-47&quot;,
      &quot;pname&quot;: &quot;glob2-0.9.4.4&quot;,
      &quot;pname&quot;: &quot;gradient-grey-2018-10-20&quot;,
      &quot;pname&quot;: &quot;helm-3.19.1&quot;,
      &quot;pname&quot;: &quot;hiera-eyaml-4.3.0&quot;,
      &quot;pname&quot;: &quot;homesick-1.1.6&quot;,
      &quot;pname&quot;: &quot;html-proofer-5.0.8&quot;,
      &quot;pname&quot;: &quot;hxnodejs-6.9.0&quot;,
      &quot;pname&quot;: &quot;ibus-with-plugins-1.5.33&quot;,
      &quot;pname&quot;: &quot;idris-1.3.4&quot;,
      &quot;pname&quot;: &quot;idris-with-packages-1.3.4&quot;,
      &quot;pname&quot;: &quot;indi-full-2.1.6&quot;,
      &quot;pname&quot;: &quot;indi-full-nonfree-2.1.6&quot;,
      &quot;pname&quot;: &quot;indi-with-drivers-2.1.6&quot;,
      &quot;pname&quot;: &quot;inkscape-with-extensions-1.4.3&quot;,
      &quot;pname&quot;: &quot;ironhand-theme-47.05&quot;,
      &quot;pname&quot;: &quot;jolly-bastion-theme-47.04&quot;,
      &quot;pname&quot;: &quot;jool-4.1.14-5.10.248&quot;,
      &quot;pname&quot;: &quot;jool-4.1.14-5.15.198&quot;,
      &quot;pname&quot;: &quot;jool-4.1.14-6.1.161&quot;,
      &quot;pname&quot;: &quot;jool-4.1.14-6.12.66&quot;,
      &quot;pname&quot;: &quot;jool-4.1.14-6.12.67&quot;,
      &quot;pname&quot;: &quot;jool-4.1.14-6.18.6&quot;,
      &quot;pname&quot;: &quot;jool-4.1.14-6.18.7&quot;,
      &quot;pname&quot;: &quot;jool-4.1.14-6.6.121&quot;,
      &quot;pname&quot;: &quot;kakoune-2025.06.03&quot;,
      &quot;pname&quot;: &quot;keeagent-0.12.0&quot;,
      &quot;pname&quot;: &quot;keepass-charactercopy-1.0.0&quot;,
      &quot;pname&quot;: &quot;keepasshttp-1.8.4.2&quot;,
      &quot;pname&quot;: &quot;keepass-keetraytotp-0.108.0&quot;,
      &quot;pname&quot;: &quot;keepass-qrcodeview-1.0.4&quot;,
      &quot;pname&quot;: &quot;keepassrpc-1.16.0&quot;,
      &quot;pname&quot;: &quot;klibc-2.0.14&quot;,
      &quot;pname&quot;: &quot;legends-browser-1.19.2&quot;,
      &quot;pname&quot;: &quot;libidn2-2.3.8&quot;,
      &quot;pname&quot;: &quot;libxml2+py-2.15.1&quot;,
      &quot;pname&quot;: &quot;license_finder-7.0.1&quot;,
      &quot;pname&quot;: &quot;llvm-binutils-18.1.8&quot;,
      &quot;pname&quot;: &quot;llvm-binutils-19.1.7&quot;,
      &quot;pname&quot;: &quot;llvm-binutils-20.1.8&quot;,
      &quot;pname&quot;: &quot;llvm-binutils-21.1.8&quot;,
      &quot;pname&quot;: &quot;matrix-synapse-wrapped-1.145.0&quot;,
      &quot;pname&quot;: &quot;mayday-theme-47.05&quot;,
      &quot;pname&quot;: &quot;moonscape-2022-04-19&quot;,
      &quot;pname&quot;: &quot;mosaic-blue-2016-02-19&quot;,
      &quot;pname&quot;: &quot;mpv-with-scripts-0.41.0&quot;,
      &quot;pname&quot;: &quot;msp430-newlib-4.5.0.20241231&quot;,
      &quot;pname&quot;: &quot;nemo-with-extensions-6.6.3&quot;,
      &quot;pname&quot;: &quot;net-tools-1003.1-2008&quot;,
      &quot;pname&quot;: &quot;nineish-2019-12-04&quot;,
      &quot;pname&quot;: &quot;nineish-catppuccin-frappe-2025-01-27&quot;,
      &quot;pname&quot;: &quot;nineish-catppuccin-frappe-alt-2025-01-27&quot;,
      &quot;pname&quot;: &quot;nineish-catppuccin-latte-2025-01-27&quot;,
      &quot;pname&quot;: &quot;nineish-catppuccin-latte-alt-2025-01-27&quot;,
      &quot;pname&quot;: &quot;nineish-catppuccin-macchiato-2025-01-27&quot;,
      &quot;pname&quot;: &quot;nineish-catppuccin-macchiato-alt-2025-01-27&quot;,
      &quot;pname&quot;: &quot;nineish-catppuccin-mocha-2025-01-27&quot;,
      &quot;pname&quot;: &quot;nineish-catppuccin-mocha-alt-2025-01-27&quot;,
      &quot;pname&quot;: &quot;nineish-dark-gray-2020-07-02&quot;,
      &quot;pname&quot;: &quot;nineish-dark-gray-2021-07-20&quot;,
      &quot;pname&quot;: &quot;nineish-dark-light-2021-07-20&quot;,
      &quot;pname&quot;: &quot;nix-generate-from-cpan-3&quot;,
      &quot;pname&quot;: &quot;nix-index-0.1.9&quot;,
      &quot;pname&quot;: &quot;nixops-2.0.0-unstable-2025-12-28&quot;,
      &quot;pname&quot;: &quot;obsidian-theme-47.05&quot;,
      &quot;pname&quot;: &quot;ocaml5.3.0-uucp-17.0.0&quot;,
      &quot;pname&quot;: &quot;ocaml5.3.0-vg-0.9.5&quot;,
      &quot;pname&quot;: &quot;ocaml5.4.0-uucp-17.0.0&quot;,
      &quot;pname&quot;: &quot;ocaml5.4.0-vg-0.9.5&quot;,
      &quot;pname&quot;: &quot;open-watcom-bin-1.9&quot;,
      &quot;pname&quot;: &quot;open-watcom-v2-0-unstable-2025-11-15&quot;,
      &quot;pname&quot;: &quot;otpkeyprov-2.6&quot;,
      &quot;pname&quot;: &quot;phoebus-theme-47.05&quot;,
      &quot;pname&quot;: &quot;plikd-1.3.7&quot;,
      &quot;pname&quot;: &quot;postgresql-plperl-14.20&quot;,
      &quot;pname&quot;: &quot;postgresql-plperl-15.15&quot;,
      &quot;pname&quot;: &quot;postgresql-plperl-16.11&quot;,
      &quot;pname&quot;: &quot;postgresql-plperl-17.7&quot;,
      &quot;pname&quot;: &quot;postgresql-plperl-18.1&quot;,
      &quot;pname&quot;: &quot;postgresql-plpython3-14.20&quot;,
      &quot;pname&quot;: &quot;postgresql-plpython3-15.15&quot;,
      &quot;pname&quot;: &quot;postgresql-plpython3-16.11&quot;,
      &quot;pname&quot;: &quot;postgresql-plpython3-17.7&quot;,
      &quot;pname&quot;: &quot;postgresql-plpython3-18.1&quot;,
      &quot;pname&quot;: &quot;postgresql-pltcl-14.20&quot;,
      &quot;pname&quot;: &quot;postgresql-pltcl-15.15&quot;,
      &quot;pname&quot;: &quot;postgresql-pltcl-16.11&quot;,
      &quot;pname&quot;: &quot;postgresql-pltcl-17.7&quot;,
      &quot;pname&quot;: &quot;postgresql-pltcl-18.1&quot;,
      &quot;pname&quot;: &quot;postgrey-1.37&quot;,
      &quot;pname&quot;: &quot;powerline-symbols-2.8.4&quot;,
      &quot;pname&quot;: &quot;procps-1003.1-2008&quot;,
      &quot;pname&quot;: &quot;python3.13-subunit-1.4.5&quot;,
      &quot;pname&quot;: &quot;python3.14-subunit-1.4.5&quot;,
      &quot;pname&quot;: &quot;python3-3.13.11-llm-0.28&quot;,
      &quot;pname&quot;: &quot;rally-ho-theme-47.05&quot;,
      &quot;pname&quot;: &quot;recursive-2022-04-19&quot;,
      &quot;pname&quot;: &quot;retroarch-with-cores-1.22.2&quot;,
      &quot;pname&quot;: &quot;roundcube-plugin-carddav-4.4.6&quot;,
      &quot;pname&quot;: &quot;roundcube-plugin-contextmenu-3.3.1&quot;,
      &quot;pname&quot;: &quot;roundcube-plugin-custom_from-1.6.6&quot;,
      &quot;pname&quot;: &quot;roundcube-plugin-persistent_login-5.3.0&quot;,
      &quot;pname&quot;: &quot;roundcube-plugin-thunderbird_labels-1.6.0&quot;,
      &quot;pname&quot;: &quot;run-npush-0.7&quot;,
      &quot;pname&quot;: &quot;scope-lite-0.2.0&quot;,
      &quot;pname&quot;: &quot;service-wrapper-19.04&quot;,
      &quot;pname&quot;: &quot;signwriting-1.1.4&quot;,
      &quot;pname&quot;: &quot;simple-blue-2016-02-19&quot;,
      &quot;pname&quot;: &quot;simple-dark-gray-2016-02-19&quot;,
      &quot;pname&quot;: &quot;simple-dark-gray-2018-08-28&quot;,
      &quot;pname&quot;: &quot;simple-dark-gray-bootloader-2018-08-28&quot;,
      &quot;pname&quot;: &quot;simple-light-gray-2016-02-19&quot;,
      &quot;pname&quot;: &quot;simple-red-2016-02-19&quot;,
      &quot;pname&quot;: &quot;stripes-2016-02-19&quot;,
      &quot;pname&quot;: &quot;stripes-logo-2016-02-19&quot;,
      &quot;pname&quot;: &quot;system76-io-module-1.0.4-5.10.248&quot;,
      &quot;pname&quot;: &quot;system76-io-module-1.0.4-5.15.198&quot;,
      &quot;pname&quot;: &quot;system76-io-module-1.0.4-6.1.161&quot;,
      &quot;pname&quot;: &quot;system76-io-module-1.0.4-6.12.66&quot;,
      &quot;pname&quot;: &quot;system76-io-module-1.0.4-6.12.67&quot;,
      &quot;pname&quot;: &quot;system76-io-module-1.0.4-6.18.6&quot;,
      &quot;pname&quot;: &quot;system76-io-module-1.0.4-6.18.7&quot;,
      &quot;pname&quot;: &quot;system76-io-module-1.0.4-6.6.121&quot;,
      &quot;pname&quot;: &quot;system76-module-1.0.17-5.10.248&quot;,
      &quot;pname&quot;: &quot;system76-module-1.0.17-5.15.198&quot;,
      &quot;pname&quot;: &quot;system76-module-1.0.17-6.1.161&quot;,
      &quot;pname&quot;: &quot;system76-module-1.0.17-6.12.66&quot;,
      &quot;pname&quot;: &quot;system76-module-1.0.17-6.12.67&quot;,
      &quot;pname&quot;: &quot;system76-module-1.0.17-6.18.6&quot;,
      &quot;pname&quot;: &quot;system76-module-1.0.17-6.18.7&quot;,
      &quot;pname&quot;: &quot;system76-module-1.0.17-6.6.121&quot;,
      &quot;pname&quot;: &quot;systemtap-5.4&quot;,
      &quot;pname&quot;: &quot;taffer-theme-47.04&quot;,
      &quot;pname&quot;: &quot;teamocil-1.4.2&quot;,
      &quot;pname&quot;: &quot;tectonic-wrapped-0.15.0&quot;,
      &quot;pname&quot;: &quot;tergel-theme-47.01&quot;,
      &quot;pname&quot;: &quot;travis-1.9.1&quot;,
      &quot;pname&quot;: &quot;tsm-client-8.1.27.1&quot;,
      &quot;pname&quot;: &quot;unicode-emoji-17.0.0&quot;,
      &quot;pname&quot;: &quot;usbip-linux-5.10.248&quot;,
      &quot;pname&quot;: &quot;usbip-linux-5.15.198&quot;,
      &quot;pname&quot;: &quot;usbip-linux-6.1.161&quot;,
      &quot;pname&quot;: &quot;usbip-linux-6.12.67&quot;,
      &quot;pname&quot;: &quot;usbip-linux-6.18.7&quot;,
      &quot;pname&quot;: &quot;usbip-linux-6.6.121&quot;,
      &quot;pname&quot;: &quot;usbip-linux-hardened-6.12.66&quot;,
      &quot;pname&quot;: &quot;usbip-linux-lqx-6.18.6&quot;,
      &quot;pname&quot;: &quot;usbip-linux-xanmod-6.12.66&quot;,
      &quot;pname&quot;: &quot;usbip-linux-xanmod-6.18.6&quot;,
      &quot;pname&quot;: &quot;usbip-linux-zen-6.18.6&quot;,
      &quot;pname&quot;: &quot;util-linux-1003.1-2008&quot;,
      &quot;pname&quot;: &quot;vdr-epgtableid0-2.6.9&quot;,
      &quot;pname&quot;: &quot;vdr-hello-2.6.9&quot;,
      &quot;pname&quot;: &quot;vdrift-unstable-2021-09-05-with-data-1446&quot;,
      &quot;pname&quot;: &quot;vdr-osddemo-2.6.9&quot;,
      &quot;pname&quot;: &quot;vdr-pictures-2.6.9&quot;,
      &quot;pname&quot;: &quot;vdr-servicedemo-2.6.9&quot;,
      &quot;pname&quot;: &quot;vdr-skincurses-2.6.9&quot;,
      &quot;pname&quot;: &quot;vdr-status-2.6.9&quot;,
      &quot;pname&quot;: &quot;vdr-svdrpdemo-2.6.9&quot;,
      &quot;pname&quot;: &quot;vdr-with-plugins-2.6.9&quot;,
      &quot;pname&quot;: &quot;vettlingr-theme-47.05&quot;,
      &quot;pname&quot;: &quot;vscode-with-extensions-1.108.1&quot;,
      &quot;pname&quot;: &quot;wanderlust-theme-47.04&quot;,
      &quot;pname&quot;: &quot;waterfall-2022-04-19&quot;,
      &quot;pname&quot;: &quot;watersplash-2022-04-19&quot;,
      &quot;pname&quot;: &quot;wayfire-wrapped-0.10.1&quot;,</code></pre>
<p>At least <code>flare</code> is in the list. I think most of these require a similar
fix.</p>
<h2 id="parting-words">Parting words</h2>
<p><code>nixpkgs</code> now exposes slight less mangled data to <code>repology.org</code> for version
comparison. Unfortunately <code>nixpkgs</code> itself is not fully switched to
<code>pname</code> / <code>version</code> everywhere. And thus a small set of data is now lost.
It should be easy to find and to restore those with a fix similar to
<a href="https://github.com/NixOS/nixpkgs/pull/483476"><code>flare</code></a>.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>zellij terminal emulator</title>
    <link href="https://trofi.github.io/posts/344-zellij-terminal-emulator.html" />
    <id>https://trofi.github.io/posts/344-zellij-terminal-emulator.html</id>
    <published>2025-12-28T00:00:00Z</published>
    <updated>2025-12-28T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="my-tmux-setup">my <code>tmux</code> setup</h2>
<p>I use <code>tmux</code> by default for all my terminal sessions since around 2016
(almost 10 years!): on most days I switch from local desktop keyboard to
a laptop and back to run/debug stuff on my desktop. I usually have 4
sessions: various builders, development session, chat session and
various one-off investigations.</p>
<p>I have a moderate <a href="https://github.com/trofi/home/blob/master/.tmux.conf"><code>~/.tmux.conf</code></a>:</p>
<pre><code>set -ga terminal-overrides ',xterm*:smcup@:rmcup@'

set -sg escape-time 0

set -g mouse on
bind -T root WheelUpPane   if-shell -F -t = &quot;#{alternate_on}&quot; &quot;send-keys -M&quot; &quot;select-pane -t =; copy-mode -e; send-keys -M&quot;
bind -T root WheelDownPane if-shell -F -t = &quot;#{alternate_on}&quot; &quot;send-keys -M&quot; &quot;select-pane -t =; send-keys -M&quot;

set -g @scroll-speed-num-lines-per-scroll 3

# override default
set -g status-right-length 60 # was 40
set -g status-right '#h, %Y-%m-%d %H:%M' # was something like '&quot;#h&quot;, #S'

# Start windows and panes at 1, not 0
set -g base-index 1
setw -g pane-base-index 1

# defatul is ~2000
set-option -g history-limit 10000

# Allow title update from within tmux apps
set-option -g set-titles on

# Host-specific overrides
if-shell &quot;[ -e ~/.tmux.conf.local ]&quot; &quot;source-file ~/.tmux.conf.local&quot;

# Extend default variable list of:
#   &quot;DISPLAY SSH_ASKPASS SSH_AUTH_SOCK SSH_AGENT_PID SSH_CONNECTION WINDOWID XAUTHORITY&quot;
set-option -g update-environment &quot;DISPLAY SSH_ASKPASS SSH_AUTH_SOCK SSH_AGENT_PID SSH_CONNECTION WINDOWID XAUTHORITY   WAYLAND_DISPLAY SWAYSOCK I3SOCK&quot;

# Join current window into a previous one. Should act as inverse of Ctrl-b !
bind-key j &quot;join-pane -s !&quot;</code></pre>
<p>While a bit wordy it’s not a big or complicated setup. And it already
contains a few hacks like <code>set -sg escape-time 0</code> to make <code>ESC</code> button
act more responsive. Without it switching modes in <code>vim</code>-like editors
has a perceptible delay. I had to press <code>ESC</code> twice to get something out
of <code>vim</code> instantly.</p>
<h2 id="a-tmux-hickup">a <code>tmux</code> hickup</h2>
<p>A month ago I casually <code>ssh</code>ed on my desktop and was not able to access
any of my running <code>tmux</code> sessions:</p>
<pre><code>$ tmux a
open terminal failed: not a terminal
$ tmux
open terminal failed: not a terminal</code></pre>
<p>This happened because <code>tmux</code> updated from <code>3.5a</code> version to <code>3.6</code> and
changed its protocol implementation between server (still ran <code>3.5a</code>)
and the client (updated to <code>3.6</code>). The workaround was trivial: run <code>3.5</code>
client for a while until the machine get scheduled for a reboot.</p>
<p>I debugged it a bit and find out it was an intentional change. I filed
<a href="https://github.com/tmux/tmux/issues/4711"><code>tmux issue #4711</code></a> to improve
the error message on <code>tmux</code> side.
From what I understand at least on <code>linux</code> the change that caused the
protocol change was entirely in <code>tmux</code> source code base (around
<code>compat/imsg.h</code> / <code>compat/imsg.c</code> files imported from <code>OpenBSD</code>).
<code>tmux</code> author decided not to improve the error message.</p>
<h2 id="zellij"><code>zellij</code></h2>
<p>This event prompted me to wonder what are the other terminal
multiplexers out there.</p>
<p>I tried <code>zellij</code> and am using it for the past month. It’s not a drop-in
replacement for <code>tmux</code> but it gets very close for my use cases. I
collected a few niceties and a few snags below I encountered while using
it.</p>
<h3 id="nice-zellij-retains-many-ctrl-b-tmux-style-keys-as-is">nice: <code>zellij</code> retains many <code>Ctrl-b</code> <code>tmux</code>-style keys as is</h3>
<p>Default <code>zellij</code> configuration is usable as is for a <code>tmux</code> user:
<code>Ctrl-b c</code> opens a new pane (as expected), <code>Ctrl-b ,</code> renames a tab and
so on. That makes it very easy to try <code>zellij</code> without any exploration
of config file format.</p>
<h3 id="snag-some-keybindings-interfere-with-other-applications">snag: some keybindings interfere with other applications</h3>
<p><code>zellij</code> uses a few escape-style initial sequences, not just
<code>tmux</code>-style <code>Ctrl-b</code>, but also <code>Ctrl-p</code> (panes), <code>Ctrl-n</code> (resizes),
<code>Ctrl-h</code> (moves), <code>Ctrl-o</code> (session operations) and a bunch of <code>Alt-</code>
ones.</p>
<p>Sometimes these escapes interfere with rich applications like <code>mc</code>,
<code>vifm</code>, <code>vim</code> or <code>helix</code> editors.</p>
<p>On the bright side many of them are very convenient, like <code>Alt-n</code> /
<code>Alt-f</code> to get a short-lived pane.</p>
<p>Status bar always makes it clear that you got into one of <code>zellij</code>
modes. But I had to disable quite a few <code>Ctrl-</code> and <code>Alt-</code>-based key
bindings in favor of deeper nested <code>Ctrl-b</code> ones in <code>tmux</code> style.</p>
<h3 id="nice-ctrl-g-to-disable-all-the-zellij-bindings">nice: <code>Ctrl-g</code> to disable all the <code>zellij</code> bindings</h3>
<p>Given the above <code>zellij</code> has a nice <code>Ctrl-g</code> kill-switch to turn all
bindings off (except <code>Ctrl-g</code> itself).</p>
<h3 id="nice-scrollback-buffer-editing">nice: scrollback buffer editing</h3>
<p>When in the scrollback scrolling mode (<code>Ctrl-b [</code> in<code>tmux</code>) I sometimes
want to save part of the log (or all the contents) into a file. In <code>zellij</code>
it’s right there at <code>e</code> key: it opens default editor with full contents.</p>
<h3 id="nice-fast-scrolling-on-copypaste-from-the-scrollback">nice: fast scrolling on copy/paste from the scrollback</h3>
<p>Very occasionally I want to copy 100-200 lines of a scrollback and paste
it into the browser. I usually use the mouse and in <code>tmux</code> it was very
slow for me. I did not always succeed on a <code>ChromeOS</code> terminal.</p>
<h3 id="nice-modern-features">nice: modern features</h3>
<p>I noticed that many features like link uderscores and tooltip pop-ups
work in<code>zellij</code> without any configuration just like they work in a host
terminal. <code>tmux</code> does not advertise some of them.</p>
<h3 id="snag-cpu-ram-usage-is-high">snag: <code>CPU</code> / <code>RAM</code> usage is high</h3>
<p><code>zellij</code> can use quite a bit of <code>CPU</code> (and <code>RAM</code>) if the program pipes
out a lot of text. For example <code>cat -v /dev/zero</code> command will use
<code>135%</code> <code>CPU</code> on my system with quite a bit of <code>RAM</code> usage (I <code>Ctrl-C</code>ed
at <code>15GiB</code>).</p>
<p>It’s not as bad on more typical multiline workloads.</p>
<p>There is an existing <a href="https://github.com/zellij-org/zellij/issues/3594">report</a>
to get it slightly better.</p>
<h3 id="snag-a-banner-in-the-status-line">snag: a banner in the status line</h3>
<p><code>zellij</code> keeps its verbatim name in status bar as an advertisement bit.
Which takes away 10 bytes from the status bar
(<a href="https://github.com/zellij-org/zellij/issues/4504">the report</a>).</p>
<p>In theory it’s a one-liner change. But patching it out is not very
convenient as <code>zellij</code> implements plugins as <code>wasm</code> binaries and ships
the status bar as precompiled <code>.wasm</code> file. I did not manage to rebuild
it locally yet.</p>
<h3 id="snag-no-tmux-style-mouse-drag-support">snag: no <code>tmux</code>-style mouse drag support</h3>
<p>I liked how <code>tmux</code> allows you to resize panes just by dragging them.
<code>zellij</code> did not implement it yet (<a href="https://github.com/zellij-org/zellij/issues/1262">the report</a>).</p>
<p><code>Ctrl-n</code> and arrow keys would have to do for now.</p>
<h3 id="snag-no-environment-variable-clobber-support">snag: no environment variable clobber support</h3>
<p><code>tmux</code> has a nice variable clobbering feature when <code>DISPLAY</code>,
<code>SSH_AUTH_SOCK</code> and a few other variables are updated with variable
values of the most recent attached client. As a result <code>ssh-agent</code>,
<code>X11</code> sessions and other things Just Work in newly opened panes.
<code>zellij</code> does not have the feature yet
(<a href="https://github.com/zellij-org/zellij/issues/1637">the report</a>’</p>
<h3 id="snag-no-support-for-editing-keys-in-tab-editor">snag: no support for editing keys in tab editor</h3>
<p>In <code>tmux</code> when renaming a tab you can use things like <code>Ctrl-w</code> to delete
existing word. <code>zellij</code> dumps a <code>[119;5u</code> escape.</p>
<h2 id="parting-words">parting words</h2>
<p>When I started using <code>zellij</code> I got a lot more than I expected:</p>
<ul>
<li>friendly UI that tells you what modes are there and which one is active</li>
<li>nice and short configuration file format</li>
<li>modern terminal features support (URL underscores, tooltips)</li>
<li>mostly compatible with <code>tmux</code> key bindings</li>
<li>intuitive text selection in the scrollback</li>
</ul>
<p>But it has quite a few snags as well:</p>
<ul>
<li><p>banner in the status line</p></li>
<li><p>no environment variable clobber support</p></li>
<li><p>default config needs some tweaking to be more usable:</p>
<ul>
<li>disable hello pop-up (<code>show_startup_tips false</code>)</li>
<li>disable key bindings that clash with <code>helix</code> editor</li>
<li>disable pane frames by default (<code>pane_frames false</code>)</li>
<li>disable session serialization (<code>session_serialization false</code>)</li>
</ul></li>
</ul>
<p>So far <code>zellij</code> is a nice alternative to <code>tmux</code> for me.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>AoC of 2025</title>
    <link href="https://trofi.github.io/posts/343-AoC-of-2025.html" />
    <id>https://trofi.github.io/posts/343-AoC-of-2025.html</id>
    <published>2025-12-17T00:00:00Z</published>
    <updated>2025-12-17T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>This time I managed to finish all <code>AoC</code> challenges within a week of
problems being published.
My solutions: <a href="https://github.com/trofi/AoC/tree/main/2025" class="uri">https://github.com/trofi/AoC/tree/main/2025</a>.
First 9 problems I managed to do in a day they were published. But the
<a href="https://adventofcode.com/2025/day/10">10th</a> one was tough.</p>
<p>As usual I tried to solve the problems within 24 hours of publish
time and get the source code under <code>4K</code>. I used <code>rust</code> again. I did not
use any external crates.
This time I also attempted to handle errors in a more graceful way to
avoid <code>.unwrap()</code> / <code>.expect("")</code> calls. I found a few nice patterns
like collecting into <code>Result&lt;Vec&lt;_&gt;&gt;</code>:</p>
<pre class="rust"><code>$ evcxr

&gt;&gt; #[derive(Debug)] struct E{}

&gt;&gt; [Ok(1),Ok(3),Ok(5)].into_iter().collect::&lt;Result&lt;Vec&lt;_&gt;, E&gt;&gt;()
Ok([1, 3, 5])</code></pre>
<p>Or reducing <code>Result</code> values:</p>
<pre><code>&gt;&gt; [Ok(1),Ok(3),Ok(5)].into_iter().sum::&lt;Result&lt;isize, E&gt;&gt;()
Ok(9)</code></pre>
<p>I felt I did a bit too much <code>.map_err()</code> conversions to capture more into
error context. I also relied on <code>Debug</code> instances instead of <code>Display</code>
as I allowed <code>main()</code> to return by error type outside <code>main()</code>. There
probably are better mechanisms to achieve the same. But otherwise error
propagation is quite pleasant in <code>rust</code>.</p>
<h2 id="funniest-problems">Funniest problems</h2>
<p>No pencil-and-paper problems this time. All required a program to write.
I did solve a few examples from <code>Day 10</code> by hand to explore it a bit
better. A few of the problems had very interesting problem statement:</p>
<ul>
<li><a href="https://adventofcode.com/2025/day/8">Day 8: Playground</a> is a nice
concise definition of a large input graph.</li>
<li><a href="https://adventofcode.com/2025/day/10">Day 10: Factory</a> tricked me
into searching for a brute force solution. After failing that I managed
to see a nice system of equations and a function to minimize.
I did not solve the minimization part nicely, but I liked the equation
solver.</li>
</ul>
<p>It felt like first few problems had very gnarly corner cases to handle.
I was afraid the problem complexity will really go up. But it was just
about right for me.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>Zero Hydra Failures towards 25.11 NixOS release</title>
    <link href="https://trofi.github.io/posts/342-Zero-Hydra-Failures-towards-25.11-NixOS-release.html" />
    <id>https://trofi.github.io/posts/342-Zero-Hydra-Failures-towards-25.11-NixOS-release.html</id>
    <published>2025-11-03T00:00:00Z</published>
    <updated>2025-11-03T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>It is November again! The usual plan is to have a <code>NixOS-.25.11</code> release
on 30th (<a href="https://github.com/NixOS/nixpkgs/issues/443568">full schedule</a>).</p>
<p>Yesterday the schedule got to
<a href="https://github.com/NixOS/nixpkgs/issues/457852"><code>ZHF phase</code></a> where no
major changes are accepted to <code>master</code> branch and the focus is on fixing
build failures</p>
<p>It’s a good time to fix easy build failures or remove long broken
packages. <a href="https://github.com/NixOS/nixpkgs/issues/457852" class="uri">https://github.com/NixOS/nixpkgs/issues/457852</a> contains
detailed step-by-step to identify interesting packages.</p>
<p>This year <code>nixpkgs</code> has especially large list of failures to sort out.
It feels like most build failures are either <code>cmake-4</code> or <code>qt-6.10</code>
related.</p>
<h2 id="an-example-package-fix">an example package fix</h2>
<p>Let’s try to fix a single package for <code>ZHF</code>. I’ll pick the
<a href="https://hydra.nixos.org/build/310538459"><code>diskscan</code></a>. It’s build log
is typical of <code>cmake-4</code> failure:</p>
<pre><code>...
CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake &lt; 3.5 has been removed from CMake.

  Update the VERSION argument &lt;min&gt; value.  Or, use the &lt;min&gt;...&lt;max&gt; syntax
  to tell CMake that the project requires at least &lt;min&gt; but has been updated
  to work with policies introduced by &lt;max&gt; or earlier.

  Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.</code></pre>
<p>I know little to nothing about <code>cmake</code>. But this failure is the result
of <code>cmake</code> dropping support for pre-<code>cmake-3.5</code> behavior. Chances are
upstream already fixed the problem and we can use the patch as is.</p>
<p>Let’s find the source repository by inspecting the package’s definition:</p>
<pre><code>$ EDITOR=cat nix edit -f '&lt;nixpkgs&gt;' diskscan</code></pre>
<pre class="nix"><code>{
  lib,
  stdenv,
  fetchFromGitHub,
  cmake,
  ncurses,
  zlib,
}:

stdenv.mkDerivation rec {
  pname = &quot;diskscan&quot;;
  version = &quot;0.21&quot;;

  src = fetchFromGitHub {
    owner = &quot;baruch&quot;;
    repo = &quot;diskscan&quot;;
    rev = version;
    sha256 = &quot;sha256-2y1ncPg9OKxqImBN5O5kXrTsuwZ/Cg/8exS7lWyZY1c=&quot;;
  };

  buildInputs = [
    ncurses
    zlib
  ];

  nativeBuildInputs = [ cmake ];

  meta = with lib; {
    homepage = &quot;https://github.com/baruch/diskscan&quot;;
    description = &quot;Scan HDD/SSD for failed and near failed sectors&quot;;
    platforms = with platforms; linux;
    maintainers = with maintainers; [ peterhoeg ];
    license = licenses.gpl3;
    mainProgram = &quot;diskscan&quot;;
  };
}</code></pre>
<p>Easy! <a href="https://github.com/baruch/diskscan" class="uri">https://github.com/baruch/diskscan</a> displayed nothing related to
<code>cmake-4</code> fix. Let’s write one! Trying to reproduce the failure locally
against upstream <code>master</code> branch:</p>
<pre><code>$ git clone https://github.com/baruch/diskscan
$ cd diskscan

$ nix build --impure --expr 'with import &lt;nixpkgs&gt; {}; diskscan.overrideAttrs (oa: { src = builtins.fetchGit ./.; })' -L
...
diskscan&gt; CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
diskscan&gt;   Compatibility with CMake &lt; 3.5 has been removed from CMake.</code></pre>
<p>Yay! Same failure! For this particular case the fix is trivial:</p>
<pre class="diff"><code>--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required(VERSION 3.0.2)
+cmake_minimum_required(VERSION 3.10)
 project(diskscan
         VERSION 0.19)
</code></pre>
<p>Testing the fix:</p>
<pre><code>$ nix build --impure --expr 'with import &lt;nixpkgs&gt; {}; diskscan.overrideAttrs (oa: { src = builtins.fetchGit ./.; })' -L
warning: Git tree '/tmp/diskscan' is dirty
# done!

$ find result/
result/
result/bin
result/bin/diskscan
result/share
result/share/man
result/share/man/man1
result/share/man/man1/diskscan.1.gz</code></pre>
<p>You can run the result and see if it does what’s expected. I proposed
this trivial fix upstream as <a href="https://github.com/baruch/diskscan/pull/77"><code>PR#77</code></a>.</p>
<p>Now we can use that to craft the <code>nixpkgs</code> fix! Let’s check if the bug
is still there:</p>
<pre><code>$ git clone https://github.com/NixOS/nixpkgs
$ cd nixpkgs

$ nix build -f. diskscan -L
...
diskscan&gt; CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
diskscan&gt;   Compatibility with CMake &lt; 3.5 has been removed from CMake.</code></pre>
<p>Still there. Crafting the patch against <code>nixpkgs</code>:</p>
<pre class="diff"><code>--- a/pkgs/by-name/di/diskscan/package.nix
+++ b/pkgs/by-name/di/diskscan/package.nix
@@ -2,6 +2,7 @@
   lib,
   stdenv,
   fetchFromGitHub,
+  fetchpatch,
   cmake,
   ncurses,
   zlib,
@@ -18,6 +19,16 @@ stdenv.mkDerivation rec {
     sha256 = &quot;sha256-2y1ncPg9OKxqImBN5O5kXrTsuwZ/Cg/8exS7lWyZY1c=&quot;;
   };

+  patches = [
+    # cmake-4 support:
+    #   https://github.com/baruch/diskscan/pull/77
+    (fetchpatch {
+      name = &quot;cmake-4.patch&quot;;
+      url = &quot;https://github.com/baruch/diskscan/commit/6e342469dcab32be7a33109a4d394141d5c905b5.patch?full_index=1&quot;;
+      hash = &quot;sha256-05ctYPmGWTJRUc4aN35fvb0ITwIZlQdIweH7tSQ0RjA=&quot;;
+    })
+  ];
+
   buildInputs = [
     ncurses
     zlib</code></pre>
<p>And testing the build:</p>
<pre><code>$ nix build -f. diskscan -L
...

$ find result/
result/
result/bin
result/bin/diskscan
result/share
result/share/man
result/share/man/man1
result/share/man/man1/diskscan.1.gz</code></pre>
<p>All good! Proposed the fix as
<a href="https://github.com/NixOS/nixpkgs/pull/458258"><code>PR#458258</code></a>.</p>
<h2 id="parting-words">parting words</h2>
<p>If you are thinking to contribute to <code>nixpkgs</code> and never did <code>ZHF</code> is a
good time to start!</p>
<p><code>cmake-4</code> set of failures has it’s own seemingly infinite list of
<a href="https://github.com/NixOS/nixpkgs/issues/445447">failures</a> waiting to be
fixed just like the example above.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>AoC of 2024</title>
    <link href="https://trofi.github.io/posts/341-AoC-of-2024.html" />
    <id>https://trofi.github.io/posts/341-AoC-of-2024.html</id>
    <published>2025-11-01T00:00:00Z</published>
    <updated>2025-11-01T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>Almost a year later I finally finished <code>AoC</code> 2024:
<a href="https://adventofcode.com/2024" class="uri">https://adventofcode.com/2024</a>.
My solutions: <a href="https://github.com/trofi/AoC/tree/main/2024" class="uri">https://github.com/trofi/AoC/tree/main/2024</a>.</p>
<p>Similar to last year I managed to solve most of problem within a day of
them being published with a single exception of
<a href="https://adventofcode.com/2024/day/21">problem 21</a>. That one took me
most of 2025 :)</p>
<p>As usual problems were appearing once a day at 5AM from Dec 1 to Dec 25.
Nowadays I get up at 6 AM. Sometimes I had a chance to at least read the
problem descriptions in the morning and try to solve in the evening.</p>
<p>My personal goal was to solve the problems within 24 hours of publish
time and get the source code under <code>4K</code>. I failed it in a few problems.
Again, I used <code>rust</code>. I tried not to use external crates but I started
using <code>cargo</code> and <code>workspaces</code> to make <code>rust-analyzer</code> and <code>cargo run</code>
to Just Work in the source directory. I enabled <code>cargo</code> very late once
I got stuck on <code>problem 21</code>.</p>
<h2 id="funniest-problems">Funniest problems</h2>
<p>Most of the problems felt slightly easier than last year’s ones (where
I did not get past <code>problem 21</code>).</p>
<p>Again, there are no miracles in the solutions I did. If they ran in a
few seconds time I did not do much to tune them. But some of them
required a pencil and paper style solution. Those were great!</p>
<p>Here is my list of fun problems I remembered:</p>
<ul>
<li><a href="https://adventofcode.com/2024/day/3">Day 3: Mull It Over</a>:
slightly unusual problem where I had the chance to use
<a href="https://re2c.org/"><code>re2c</code></a> lexer generator.</li>
<li><a href="https://adventofcode.com/2024/day/12">Day 12: Garden Groups</a>:
while being one of the typical graph traversal problems I liked
how part 2 could be solved entirely with primitives built from part 1.</li>
<li><a href="https://adventofcode.com/2024/day/17">Day 17: Chronospatial Computer</a>:
an opportunity to write a CPU emulator! I could not resist. Part 2
required me a bit of pencil and paper.</li>
<li><a href="https://adventofcode.com/2024/day/20">Day 20: Race Condition</a>:
it’s a nice form of finding shortest path in the maze with a warp
twist.</li>
<li><a href="https://adventofcode.com/2024/day/21">Day 21: Keypad Conundrum</a>:
a simple “control the robot with a keypad” problem. Part 2 has a great
twist.</li>
<li><a href="https://adventofcode.com/2024/day/24">Day 24: Crossed Wires</a>:
part 2is a deceptively simple problem description that I found easier
to solve by staring into <code>graphviz</code> output.</li>
</ul>
<p>Looks like this year we had even more graph traversal problems. Quick
grep for <code>visited</code> says it is <code>8</code>.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>profiling binutils linkers in nixpkgs</title>
    <link href="https://trofi.github.io/posts/340-profiling-binutils-linkers-in-nixpkgs.html" />
    <id>https://trofi.github.io/posts/340-profiling-binutils-linkers-in-nixpkgs.html</id>
    <published>2025-10-11T00:00:00Z</published>
    <updated>2025-10-11T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="background">background</h2>
<p>I’ve been using <code>binutils-2.45</code> against local <code>nixpkgs</code> checkout for a
while to weed out minor problems in other packages. So far I encountered
my old friend
<a href="https://github.com/NixOS/nixpkgs/pull/438714"><code>guile</code> over-stripping issue</a>.</p>
<p><a href="https://en.wikipedia.org/wiki/Gold_(linker)"><code>GNU gold</code> linker</a> was
<a href="https://lists.gnu.org/archive/html/info-gnu/2025-02/msg00001.html">deprecated</a>
in <code>binutils</code> upstream as it does not have developer
power behind it. While <code>bfd</code> linker (the default) gets maintenance
attention. <code>binutils-2.45</code> intentionally does not have <code>gold</code> source
distributed with it to nudge users off <code>gold</code>.</p>
<p>To fix rare <code>nixpkgs</code> package build failures that rely on <code>ld.gold</code> I
trivially replaced all the links to <code>ld.bfd</code> locally and built my system.
No major problems found.</p>
<h2 id="an-ld.gold-removal-obstacle">an <code>ld.gold</code> removal obstacle</h2>
<p>In a <a href="https://discourse.nixos.org/t/removing-gold-from-nixpkgs/70496/8">recent discussion thread</a>
the question was raised if/how <code>nixpkgs</code> could switch to <code>gold</code>-less
<code>binutils</code>. One of the interesting points of the thread is that <code>ld.bfd</code>
is occasionally ~<code>3x</code> times <a href="https://github.com/NixOS/nixpkgs/pull/418735#issuecomment-2993624063">slower than <code>ld.gold</code></a>
on files that already take multiple seconds to link with <code>ld.gold</code>.
For me it was quite a surprise as <code>ld.bfd</code> does
<a href="https://www.youtube.com/watch?v=h5pXt_YCwkU">get speedup improvements</a>
time to time. Thus, I suspected it should be some kind of a serious bug
on <code>ld.bfd</code> side.
I wondered if I could reproduce such a big performance
drop and if I could find a low hanging fix. Or at least report a bug to
<code>binutils</code> upstream.</p>
<p>I used the same <code>pandoc</code> <code>nixpkgs</code> package to do the linker testing. It’s
a nice example as it builds only 4 small <code>haskell</code> source files and
links in a huge amount of static <code>haskell</code> libraries. A perfect linker
load test. Preparing the baseline:</p>
<pre><code># pulling in already built package into cache
$ nix build --no-link -f. pandoc
# pulling in all the build dependencies into cache
$ nix build --no-link -f. pandoc --rebuild

# timing the build:
$ time nix build --no-link -f. pandoc --rebuild
error: derivation '/nix/store/az6dbzm341jc7n4sw7w0ifspxgsm4093-pandoc-cli-3.7.0.2.drv' may not be deterministic: output '/nix/store/znmj21k8nrqc3hcax6yfy446g8bgk7z3-pandoc-cli-3.7.0.2' differs

real    0m12,850s
user    0m0,719s
sys     0m0,105s</code></pre>
<p>Do not mind that determinism error. I got about <code>13 seconds</code> of the
package build time. Seems to match the original timing. Then I tried
<code>ld.bfd</code> by passing extra <code>--ghc-option=-optl-fuse-ld=bfd</code> option to
<code>./Setup configure</code>:</p>
<pre><code>$ time nix build --impure --expr 'with import &lt;nixpkgs&gt; {};
    pandoc.overrideAttrs (oa: {
        configureFlags = oa.configureFlags ++ [&quot;--ghc-option=-optl-fuse-ld=bfd&quot;];})'
...
real    0m37,391s
user    0m0,691s
sys     0m0,120s</code></pre>
<p>37 seconds! At least <code>3x</code> slowdown. I was glad to see such a simple
reproducer.</p>
<h2 id="linker-performance-profiles">linker performance profiles</h2>
<p>I dropped into a development shell to explore individual <code>ld</code> commands:</p>
<pre><code>$ nix develop --impure --expr 'with import &lt;nixpkgs&gt; {};
    pandoc.overrideAttrs (oa: {
        configureFlags = oa.configureFlags ++ [&quot;--ghc-option=-optl-fuse-ld=bfd&quot; ];})'
$$ genericBuild
...
[4 of 4] Linking dist/build/pandoc/pandoc
^C

$$ # ready for interactive exploration</code></pre>
<p>I ran the <code>./Setup build -v</code> to extract exact <code>ghc --make ...</code> invocation:</p>
<pre><code>$$ ./Setup build -v
...
Linking...
Running: &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make -fbuilding-cabal-package -O -split-sections -static -outputdir dist/build/pandoc/pandoc-tmp -odir dist/build/pandoc/pandoc-tmp ...</code></pre>
<p>And ran <code>ld.bfd</code> under <code>perf</code>:</p>
<pre><code>$ perf record -g &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... '-optl-fuse-ld=bfd' -fforce-recomp</code></pre>
<p>Then I built the <a href="https://github.com/brendangregg/FlameGraph"><code>flamegraph</code></a> picture:</p>
<pre><code>$ perf script &gt; out.perf
$ perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded
$ perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; bfd.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/bfd.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/bfd.svg" title="ld.bfd profile on pandoc" alt="bfd.svg" /></a></p>
<p>You can click on the pictures and explore it interactively.
Does it look fine to you? Anything odd?
We already see a tiny hint: <code>_bfd_elf_gc_mark()</code> takes suspiciously large
amount of space on the picture.
Let’s build the same picture for <code>gold</code> using <code>-optl-fuse-ld=gold</code> option:</p>
<pre><code>$ perf record -g &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... '-optl-fuse-ld=gold' -fforce-recomp
$ perf script &gt; out.perf
$ perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded
$ perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; gold.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/gold.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/gold.svg" title="ld.gold profile on pandoc" alt="gold.svg" /></a></p>
<p>The profile looks more balanced. The staircase around sorting looks
peculiar. Most of the time is spent
on section sorting in <code>gold::Output_section::sort_attached_input_sections</code>.
It’s also a great hint: <strong>how many sections should there be so that sorting
alone would take <code>25%</code> of the link time</strong>?</p>
<h2 id="section-count">section count</h2>
<p>Where do these numerous sections come from?
<code>nixpkgs</code> enables <a href="https://github.com/NixOS/nixpkgs/blob/54a538734a5e77bca43dcbe4ad20357b5eb1cffd/pkgs/development/haskell-modules/generic-builder.nix#L349C20-L349C45"><code>-split-sections</code></a>
by default on <code>linux</code> which uses <code>ghc</code> <a href="https://downloads.haskell.org/ghc/9.12.2/docs/users_guide/phases.html#ghc-flag-fsplit-sections"><code>-fsplit-sections</code></a>.
Those are very close in spirit to <code>gcc</code> <a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-ffunction-sections"><code>-ffunction-sections</code></a>
feature. Both place each function in a separate <code>ELF</code> section assuming
that the user will use <code>-Wl,--gc-sections</code> linker option to
garbage-collect unreferenced sections in the final executable to make
binaries smaller.</p>
<p>So how many sections do you normally expect per object file?</p>
<p>In <code>C</code> (with <code>-ffunction-sections</code>) I would expect section count to
be close to function count in the source
file. Some optimization passes duplicate (clone) or inline functions,
but the expansion should not be too large (famous last words). Hopefully
not <code>100x</code>, but closer to <code>1.5x</code> maybe? <code>C++</code> might be trickier to
reason about.</p>
<p>In <code>haskell</code> it’s a lot more complicated: lazy evaluation model creates
numerous smaller functions out of one source function, cross-module
aggressive inlining brings in many expressions.
Here is an example of a one liner compilation and it’s section count:</p>
<pre class="haskell"><code>-- # cat Main.hs
main = print &quot;hello&quot;</code></pre>
<p><strong>Quiz question: how many sections do you expect to see for this source
file after <code>ghc -c Main.hs</code>?</strong></p>
<p>Building and checking unsplit form first:</p>
<pre><code>$ ghc -c Main.hs -fforce-recomp

$ size Main.o
   text    data     bss     dec     hex filename
    378     304       0     682     2aa Main.o

$ readelf -SW Main.o
There are 13 section headers, starting at offset 0xb50:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 00013a 00  AX  0   0  8
  [ 2] .rela.text        RELA            0000000000000000 000708 0001c8 18   I 10   1  8
  [ 3] .data             PROGBITS        0000000000000000 000180 000130 00  WA  0   0  8
  [ 4] .rela.data        RELA            0000000000000000 0008d0 000210 18   I 10   3  8
  [ 5] .bss              NOBITS          0000000000000000 0002b0 000000 00  WA  0   0  1
  [ 6] .rodata.str       PROGBITS        0000000000000000 0002b0 000010 01 AMS  0   0  1
  [ 7] .note.GNU-stack   PROGBITS        0000000000000000 0002c0 000000 00      0   0  1
  [ 8] .comment          PROGBITS        0000000000000000 0002c0 00000c 01  MS  0   0  1
  [ 9] .note.gnu.property NOTE            0000000000000000 0002d0 000030 00   A  0   0  8
  [10] .symtab           SYMTAB          0000000000000000 000300 000228 18     11   5  8
  [11] .strtab           STRTAB          0000000000000000 000528 0001dd 00      0   0  1
  [12] .shstrtab         STRTAB          0000000000000000 000ae0 00006e 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)</code></pre>
<p>378 bytes of code, 12 sections. 6 of them are relevant: <code>.text</code>, <code>.rela.text</code>,
<code>.data</code>, <code>.rela.data</code>, <code>.rodata.str</code>. All very close to a typical <code>C</code> program.</p>
<p>Now let’s throw in <code>-fsplit-sections</code>.</p>
<p><strong>Quiz question: guess how many more sections there will be? 0? 1? 10? 100? 1000?</strong></p>
<pre><code>$ ghc -c Main.hs -fforce-recomp -fsplit-sections

$ size Main.o
   text    data     bss     dec     hex filename
    365     304       0     669     29d Main.o

$ readelf -SW Main.o
There are 39 section headers, starting at offset 0xd90:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 000040 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 000040 000000 00  WA  0   0  1
  [ 4] .rodata.str..LrKs_bytes PROGBITS        0000000000000000 000040 000005 01 AMS  0   0  1
  [ 5] .rodata.str..LrKq_bytes PROGBITS        0000000000000000 000045 000005 01 AMS  0   0  1
  [ 6] .rodata.str.cKA_str PROGBITS        0000000000000000 00004a 000006 01 AMS  0   0  1
  [ 7] .data..LsKw_closure PROGBITS        0000000000000000 000050 000028 00  WA  0   0  8
  [ 8] .rela.data..LsKw_closure RELA            0000000000000000 0007c8 000030 18   I 36   7  8
  [ 9] .data..LuKL_srt   PROGBITS        0000000000000000 000078 000020 00  WA  0   0  8
  [10] .rela.data..LuKL_srt RELA            0000000000000000 0007f8 000048 18   I 36   9  8
  [11] .text..LsKu_info  PROGBITS        0000000000000000 000098 000062 00  AX  0   0  8
  [12] .rela.text..LsKu_info RELA            0000000000000000 000840 000090 18   I 36  11  8
  [13] .data..LsKu_closure PROGBITS        0000000000000000 000100 000020 00  WA  0   0  8
  [14] .rela.data..LsKu_closure RELA            0000000000000000 0008d0 000018 18   I 36  13  8
  [15] .data..LuL2_srt   PROGBITS        0000000000000000 000120 000028 00  WA  0   0  8
  [16] .rela.data..LuL2_srt RELA            0000000000000000 0008e8 000060 18   I 36  15  8
  [17] .text.Main_main_info PROGBITS        0000000000000000 000148 000069 00  AX  0   0  8
  [18] .rela.text.Main_main_info RELA            0000000000000000 000948 0000a8 18   I 36  17  8
  [19] .data.Main_main_closure PROGBITS        0000000000000000 0001b8 000020 00  WA  0   0  8
  [20] .rela.data.Main_main_closure RELA            0000000000000000 0009f0 000018 18   I 36  19  8
  [21] .data..LuLj_srt   PROGBITS        0000000000000000 0001d8 000020 00  WA  0   0  8
  [22] .rela.data..LuLj_srt RELA            0000000000000000 000a08 000048 18   I 36  21  8
  [23] .text.ZCMain_main_info PROGBITS        0000000000000000 0001f8 000062 00  AX  0   0  8
  [24] .rela.text.ZCMain_main_info RELA            0000000000000000 000a50 000090 18   I 36  23  8
  [25] .data.ZCMain_main_closure PROGBITS        0000000000000000 000260 000020 00  WA  0   0  8
  [26] .rela.data.ZCMain_main_closure RELA            0000000000000000 000ae0 000018 18   I 36  25  8
  [27] .data..LrKr_closure PROGBITS        0000000000000000 000280 000010 00  WA  0   0  8
  [28] .rela.data..LrKr_closure RELA            0000000000000000 000af8 000030 18   I 36  27  8
  [29] .data..LrKt_closure PROGBITS        0000000000000000 000290 000010 00  WA  0   0  8
  [30] .rela.data..LrKt_closure RELA            0000000000000000 000b28 000030 18   I 36  29  8
  [31] .data.Main_zdtrModule_closure PROGBITS        0000000000000000 0002a0 000020 00  WA  0   0  8
  [32] .rela.data.Main_zdtrModule_closure RELA            0000000000000000 000b58 000048 18   I 36  31  8
  [33] .note.GNU-stack   PROGBITS        0000000000000000 0002c0 000000 00      0   0  1
  [34] .comment          PROGBITS        0000000000000000 0002c0 00000c 01  MS  0   0  1
  [35] .note.gnu.property NOTE            0000000000000000 0002d0 000030 00   A  0   0  8
  [36] .symtab           SYMTAB          0000000000000000 000300 0002e8 18     37  13  8
  [37] .strtab           STRTAB          0000000000000000 0005e8 0001dd 00      0   0  1
  [38] .shstrtab         STRTAB          0000000000000000 000ba0 0001ea 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)</code></pre>
<p>38 sections! If we ignore 6 irrelevant sections it’s 32 relevant sections
compared to 6 relevant sections before.
Our actual <code>main</code> top-level function itself is hiding in <code>.*Main_main.*</code>
sections. You will notices a lot of them. And on top of that whatever <code>ghc</code>
managed to “float-out” outside the function.</p>
<p>This is unoptimized code. If we throw in <code>-O2</code> we will get this:</p>
<pre><code>$ ghc -c Main.hs -fforce-recomp -fsplit-sections -O2

$ size Main.o
   text    data     bss     dec     hex filename
    306     304       0     610     262 Main.o

$ readelf -SW Main.o
There are 45 section headers, starting at offset 0x10d0:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 000040 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 000040 000000 00  WA  0   0  1
  [ 4] .rodata.str.Main_zdtrModule2_bytes PROGBITS        0000000000000000 000040 000005 01 AMS  0   0  1
  [ 5] .rodata.str.Main_zdtrModule4_bytes PROGBITS        0000000000000000 000045 000005 01 AMS  0   0  1
  [ 6] .rodata.str.Main_main5_bytes PROGBITS        0000000000000000 00004a 000006 01 AMS  0   0  1
  [ 7] .data.Main_main4_closure PROGBITS        0000000000000000 000050 000028 00  WA  0   0  8
  [ 8] .rela.data.Main_main4_closure RELA            0000000000000000 000a60 000030 18   I 42   7  8
  [ 9] .data..Lu1AB_srt  PROGBITS        0000000000000000 000078 000020 00  WA  0   0  8
  [10] .rela.data..Lu1AB_srt RELA            0000000000000000 000a90 000048 18   I 42   9  8
  [11] .text.Main_main3_info PROGBITS        0000000000000000 000098 000062 00  AX  0   0  8
  [12] .rela.text.Main_main3_info RELA            0000000000000000 000ad8 000090 18   I 42  11  8
  [13] .data.Main_main3_closure PROGBITS        0000000000000000 000100 000020 00  WA  0   0  8
  [14] .rela.data.Main_main3_closure RELA            0000000000000000 000b68 000018 18   I 42  13  8
  [15] .data.Main_main2_closure PROGBITS        0000000000000000 000120 000020 00  WA  0   0  8
  [16] .rela.data.Main_main2_closure RELA            0000000000000000 000b80 000048 18   I 42  15  8
  [17] .text.Main_main1_info PROGBITS        0000000000000000 000140 000032 00  AX  0   0  8
  [18] .rela.text.Main_main1_info RELA            0000000000000000 000bc8 000060 18   I 42  17  8
  [19] .data.Main_main1_closure PROGBITS        0000000000000000 000178 000028 00  WA  0   0  8
  [20] .rela.data.Main_main1_closure RELA            0000000000000000 000c28 000060 18   I 42  19  8
  [21] .text.Main_main_info PROGBITS        0000000000000000 0001a0 00001d 00  AX  0   0  8
  [22] .rela.text.Main_main_info RELA            0000000000000000 000c88 000030 18   I 42  21  8
  [23] .data.Main_main_closure PROGBITS        0000000000000000 0001c0 000010 00  WA  0   0  8
  [24] .rela.data.Main_main_closure RELA            0000000000000000 000cb8 000018 18   I 42  23  8
  [25] .text.Main_main6_info PROGBITS        0000000000000000 0001d0 000024 00  AX  0   0  8
  [26] .rela.text.Main_main6_info RELA            0000000000000000 000cd0 000030 18   I 42  25  8
  [27] .data.Main_main6_closure PROGBITS        0000000000000000 0001f8 000020 00  WA  0   0  8
  [28] .rela.data.Main_main6_closure RELA            0000000000000000 000d00 000048 18   I 42  27  8
  [29] .text.ZCMain_main_info PROGBITS        0000000000000000 000218 00001d 00  AX  0   0  8
  [30] .rela.text.ZCMain_main_info RELA            0000000000000000 000d48 000030 18   I 42  29  8
  [31] .data.ZCMain_main_closure PROGBITS        0000000000000000 000238 000010 00  WA  0   0  8
  [32] .rela.data.ZCMain_main_closure RELA            0000000000000000 000d78 000018 18   I 42  31  8
  [33] .data.Main_zdtrModule3_closure PROGBITS        0000000000000000 000248 000010 00  WA  0   0  8
  [34] .rela.data.Main_zdtrModule3_closure RELA            0000000000000000 000d90 000030 18   I 42  33  8
  [35] .data.Main_zdtrModule1_closure PROGBITS        0000000000000000 000258 000010 00  WA  0   0  8
  [36] .rela.data.Main_zdtrModule1_closure RELA            0000000000000000 000dc0 000030 18   I 42  35  8
  [37] .data.Main_zdtrModule_closure PROGBITS        0000000000000000 000268 000020 00  WA  0   0  8
  [38] .rela.data.Main_zdtrModule_closure RELA            0000000000000000 000df0 000048 18   I 42  37  8
  [39] .note.GNU-stack   PROGBITS        0000000000000000 000288 000000 00      0   0  1
  [40] .comment          PROGBITS        0000000000000000 000288 00000c 01  MS  0   0  1
  [41] .note.gnu.property NOTE            0000000000000000 000298 000030 00   A  0   0  8
  [42] .symtab           SYMTAB          0000000000000000 0002c8 000378 18     43   2  8
  [43] .strtab           STRTAB          0000000000000000 000640 00041a 00      0   0  1
  [44] .shstrtab         STRTAB          0000000000000000 000e38 000295 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)</code></pre>
<p>44 sections! A Lot.</p>
<p>Back to our <code>pandoc</code> binary. It just calls into <code>libHSpandoc.a</code> library
(<code>nixpkgs</code> uses static linking for <code>haskell</code> bits today). Linker would
have to wade through all it’s sections to pick only used things.</p>
<p><strong>Quiz question: guess how many sections does <code>pandoc</code> library have?
100? 1000? 10000? A million? What is your guess?</strong></p>
<p>Let’s count! On my system <code>pandoc</code> library happens to hide at
<code>&lt;&lt;NIX&gt;&gt;-pandoc-3.7.0.2/lib/ghc-9.10.3/lib/x86_64-linux-ghc-9.10.3-2870/pandoc-3.7.0.2-Af80LA3Iq30D5LRTMZUszs/libHSpandoc-3.7.0.2-Af80LA3Iq30D5LRTMZUszs.a</code>.
I’ll just use that ugly path.</p>
<pre><code># dumping section count per individual object file in the archive:
$ readelf -h &lt;&lt;NIX&gt;&gt;-pandoc-3.7.0.2/lib/ghc-9.10.3/lib/x86_64-linux-ghc-9.10.3-2870/pandoc-3.7.0.2-Af80LA3Iq30D5LRTMZUszs/libHSpandoc-3.7.0.2-Af80LA3Iq30D5LRTMZUszs.a |
    grep 'Number of section'
  Number of section headers:         18
  ...
  Number of section headers:         26207
  Number of section headers:         16755
  ...
  Number of section headers:         8898
  ..
  Number of section headers:         1047
  Number of section headers:         1324
  ...
  Number of section headers:         687
  Number of section headers:         228

# summing up all section counts:
$ readelf -h &lt;&lt;NIX&gt;&gt;-pandoc-3.7.0.2/lib/ghc-9.10.3/lib/x86_64-linux-ghc-9.10.3-2870/pandoc-3.7.0.2-Af80LA3Iq30D5LRTMZUszs/libHSpandoc-3.7.0.2-Af80LA3Iq30D5LRTMZUszs.a |
    grep 'Number of section' | awk '{ size += $5 } END { print size }'
494450</code></pre>
<p><code>494</code> <code>thousands</code> sections! Almost half a million sections. And it’s just
one (largest) of many <code>pandoc</code> dependencies. That’s why <code>ld.gold</code> takes
a considerable amount of time to just sort through all these sections.</p>
<h2 id="performance-hog-clues">performance hog clues</h2>
<p><code>ld.bfd</code> has even harder time getting through such a big list of sections.
But why exactly? The names
of <code>_bfd_elf_gc_mark / _bfd_elf_gc_mark_reloc</code> functions in the profiles
hint that they track unreferenced sections.</p>
<p>To trigger section garbage collection
<code>ghc</code> <a href="https://github.com/ghc/ghc/blob/f9790ca81deb8b14ff2eabf701aecbcfd6501963/compiler/GHC/Linker/Static.hs#L241">uses <code>-Wl,--gc-sections</code></a>
linker option.
Note that <code>ghc</code> only enables garbage collection for <code>GNU ld</code> (I think
both <code>ld.gold</code> and <code>ld.bfd</code> count as GNU). <code>lld</code> and <code>mold</code>
both support <code>-Wl,--gc-sections</code> option as well.</p>
<p>If we look at the implementation
of <code>_bfd_elf_gc_mark</code> we see a suspicious
<a href="https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elflink.c;h=91c77c211ef065a77883004eb696adacd92a00be;hb=815d9a14cbbb3b81843f7566222c87fb22e7255d#l14063">list traversal</a>
and some recursive descend:</p>
<pre class="c"><code>bool
_bfd_elf_gc_mark_reloc (struct bfd_link_info *info,
                        asection *sec,
                        elf_gc_mark_hook_fn gc_mark_hook,
                        struct elf_reloc_cookie *cookie)
{
  asection *rsec;
  bool start_stop = false;

  rsec = _bfd_elf_gc_mark_rsec (info, sec, gc_mark_hook, cookie, &amp;start_stop);
  while (rsec != NULL)
    {
      if (!rsec-&gt;gc_mark)
        {
          if (bfd_get_flavour (rsec-&gt;owner) != bfd_target_elf_flavour
              || (rsec-&gt;owner-&gt;flags &amp; DYNAMIC) != 0)
            rsec-&gt;gc_mark = 1;
          else if (!_bfd_elf_gc_mark (info, rsec, gc_mark_hook))
            return false;
        }
      if (!start_stop)
        break;
      rsec = bfd_get_next_section_by_name (rsec-&gt;owner, rsec);
    }
  return true;
}</code></pre>
<p>If we do such marking for each section it probably has quadratic complexity.
But maybe not. To get something simpler to explore I tried to craft a
trivial example of <code>1 million</code> sections:</p>
<pre><code>$ for (( i=0; i&lt;1000000; i++ )); do printf &quot;int var_$i __attribute__ ((section (\&quot;.data.$i\&quot;))) = { $i };\n&quot;; done &gt; main.c; printf &quot;int main() {}&quot; &gt;&gt; main.c; gcc -c main.c -o main.o; echo &quot;bfd:&quot;; time gcc main.o -o main -fuse-ld=bfd; echo &quot;gold:&quot;; time gcc main.o -o main -fuse-ld=gold
bfd:

real    0m6,123s
user    0m5,384s
sys     0m0,701s
gold:

real    0m1,107s
user    0m0,844s
sys     0m0,242s</code></pre>
<p>This test generates the following boilerplate code:</p>
<pre class="c"><code>// $ head -n 5 main.c
int var_0 __attribute__ ((section (&quot;.data.0&quot;))) = { 0 };
int var_1 __attribute__ ((section (&quot;.data.1&quot;))) = { 1 };
int var_2 __attribute__ ((section (&quot;.data.2&quot;))) = { 2 };
int var_3 __attribute__ ((section (&quot;.data.3&quot;))) = { 3 };
int var_4 __attribute__ ((section (&quot;.data.4&quot;))) = { 4 };
// ...
// $ tail -n 5 main.c
int var_999996 __attribute__ ((section (&quot;.data.999996&quot;))) = { 999996 };
int var_999997 __attribute__ ((section (&quot;.data.999997&quot;))) = { 999997 };
int var_999998 __attribute__ ((section (&quot;.data.999998&quot;))) = { 999998 };
int var_999999 __attribute__ ((section (&quot;.data.999999&quot;))) = { 999999 };
int main() {}</code></pre>
<p>We are seeing <code>6x</code> time difference between linkers. Could it be our case?
Let’s check with <code>perf</code> if we hit the same hot paths as in <code>pandoc</code> case:</p>
<pre><code>$ perf record -g gcc main.o -o main -fuse-ld=bfd
$ perf script &gt; out.perf &amp;&amp; perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded &amp;&amp; perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; try1.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/try1.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/try1.svg" title="ld.bfd profile on synthetic 1M independent data sections" alt="try1.svg" /></a></p>
<p>The profile looks not too bad: no large <code>.*_gc_.*</code> bits seen anywhere.
Not exactly our problem then. Let’s try to add more references across
sections to see if we start seeing the symbol traversals:</p>
<pre><code>$ printf &quot;int var_0 __attribute__ ((section (\&quot;.data.0\&quot;))) = { $i };\n&quot; &gt; main.c; for (( i=1; i&lt;1000000; i++ )); do printf &quot;void * var_$i __attribute__ ((section (\&quot;.data.$i\&quot;))) = { &amp;var_$((i-1)) };\n&quot;; done &gt;&gt; main.c; printf &quot;int main() { return (long)var_99999; }&quot; &gt;&gt; main.c; gcc -c main.c -o main.o; echo &quot;bfd:&quot;; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections -Wl,--no-as-needed; echo &quot;gold:&quot;; time gcc main.o -o main -fuse-ld=gold -Wl,--gc-sections -Wl,--no-as-needed
gcc: internal compiler error: Segmentation fault signal terminated program cc1
Please submit a full bug report, with preprocessed source (by using -freport-bug).
See &lt;https://gcc.gnu.org/bugs/&gt; for instructions.</code></pre>
<p>Whoops, crashed <code>gcc</code>. Filed <a href="https://gcc.gnu.org/PR122198"><code>PR122198</code></a>.
It’s a stack overflow. Throwing more stack at the problem with <code>ulimit -s unlimited</code>:</p>
<pre><code>$ printf &quot;int var_0 __attribute__ ((section (\&quot;.data.0\&quot;))) = { $i };\n&quot; &gt; main.c; for (( i=1; i&lt;1000000; i++ )); do printf &quot;void * var_$i __attribute__ ((section (\&quot;.data.$i\&quot;))) = { &amp;var_$((i-1)) };\n&quot;; done &gt;&gt; main.c; printf &quot;int main() { return (long)var_99999; }&quot; &gt;&gt; main.c; gcc -c main.c -o main.o; echo &quot;bfd:&quot;; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections -Wl,--no-as-needed; echo &quot;gold:&quot;; time gcc main.o -o main -fuse-ld=gold -Wl,--gc-sections -Wl,--no-as-needed
bfd:

real    0m5.296s
user    0m4.572s
sys     0m0.714s
gold:

real    0m1.172s
user    0m0.874s
sys     0m0.295s</code></pre>
<p>This test generates slightly different form:</p>
<pre class="c"><code>// $ head -n 5 main.c
int var_0 __attribute__ ((section (&quot;.data.0&quot;))) = { 1000000 };
void * var_1 __attribute__ ((section (&quot;.data.1&quot;))) = { &amp;var_0 };
void * var_2 __attribute__ ((section (&quot;.data.2&quot;))) = { &amp;var_1 };
void * var_3 __attribute__ ((section (&quot;.data.3&quot;))) = { &amp;var_2 };
void * var_4 __attribute__ ((section (&quot;.data.4&quot;))) = { &amp;var_3 };
// ...
// $ tail -n 5 main.c
void * var_999996 __attribute__ ((section (&quot;.data.999996&quot;))) = { &amp;var_999995 };
void * var_999997 __attribute__ ((section (&quot;.data.999997&quot;))) = { &amp;var_999996 };
void * var_999998 __attribute__ ((section (&quot;.data.999998&quot;))) = { &amp;var_999997 };
void * var_999999 __attribute__ ((section (&quot;.data.999999&quot;))) = { &amp;var_999998 };
int main() { return (long)var_99999; }</code></pre>
<p>The main difference compared to the previous attempt is that sections are used
and can’t be removed by section garbage collector. Is this profile closer to
<code>pandoc</code> case? Looking at the trace:</p>
<pre><code>$ perf record -g gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections -Wl,--no-as-needed
$ perf script &gt; out.perf &amp;&amp; perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded &amp;&amp; perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; try2.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/try2.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/try2.svg" title="ld.bfd profile on 1M interdependent data sections" alt="try2.svg" /></a></p>
<p>It’s not too close to <code>pandoc</code> case, but the huge vertical thing on the
left is a good sign that we at least start hitting the <code>gc</code> traversal
code. Now we need to increase its presence somehow.
Perhaps it’s more symbols per section? I tried to simulate code
references instead of data references and see what happens:</p>
<pre><code>$ printf &quot;int f_0() __attribute__ ((section (\&quot;.text.0\&quot;))); int f_0() { return 0; };\n&quot; &gt; main.c; for (( i=1; i&lt;20000; i++ )); do printf &quot;int f_$i() __attribute__ ((section (\&quot;.text.$i\&quot;))); int f_$i() { return f_$((i-1))(); };\n&quot;; done &gt;&gt; main.c; printf &quot;int main() { return f_19999(); }&quot; &gt;&gt; main.c; gcc -O0 -c main.c -o main.o; echo &quot;bfd:&quot;; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections -Wl,--no-as-needed; echo &quot;gold:&quot;; time gcc main.o -o main -fuse-ld=gold -Wl,--gc-sections -Wl,--no-as-needed
bfd:

real    0m5,627s
user    0m2,567s
sys     0m3,047s
gold:

real    0m0,053s
user    0m0,041s
sys     0m0,012s</code></pre>
<p>How the test generates this file:</p>
<pre class="c"><code>// $ head -n 5 main.c
int f_0() __attribute__ ((section (&quot;.text.0&quot;))); int f_0() { return 0; };
int f_1() __attribute__ ((section (&quot;.text.1&quot;))); int f_1() { return f_0(); };
int f_2() __attribute__ ((section (&quot;.text.2&quot;))); int f_2() { return f_1(); };
int f_3() __attribute__ ((section (&quot;.text.3&quot;))); int f_3() { return f_2(); };
int f_4() __attribute__ ((section (&quot;.text.4&quot;))); int f_4() { return f_3(); };
// ..
// $ tail -n 5 main.c
int f_19996() __attribute__ ((section (&quot;.text.19996&quot;))); int f_19996() { return f_19995(); };
int f_19997() __attribute__ ((section (&quot;.text.19997&quot;))); int f_19997() { return f_19996(); };
int f_19998() __attribute__ ((section (&quot;.text.19998&quot;))); int f_19998() { return f_19997(); };
int f_19999() __attribute__ ((section (&quot;.text.19999&quot;))); int f_19999() { return f_19998(); };
int main() { return f_19999(); }</code></pre>
<p>That time difference is more interesting! Note that <code>ld.gold</code> spends just
<code>50ms</code> on the input while <code>ld.bfd</code> does something for <code>5 seconds</code>!
Getting the profile:</p>
<pre><code>$ perf record -g gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections -Wl,--no-as-needed
$ perf script &gt; out.perf &amp;&amp; perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded &amp;&amp; perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; try3.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/try3.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/try3.svg" title="ld.bfd trace on 20K dependent text sections" alt="try3.svg" /></a></p>
<p>At last! We managed to hit exactly the <code>_bfd_elf_gc_mark ()</code>.
The final input file object file is not too large:</p>
<pre><code>$ ls -lh main.o
-rw-r--r-- 1 slyfox users 5.7M Oct  8 20:58 main.o</code></pre>
<p>I filed <a href="https://sourceware.org/PR33530"><code>PR33530</code></a> upstream bug report
hoping that fix will not be too complicated and started writing this
blog post.</p>
<h2 id="testing-the-patch">testing the patch</h2>
<p>My plan was to figure out more details about <code>binutils</code> <code>gc</code> in this post
and try to fix it. Alas even before I got to it H.J. already prepared
<a href="https://sourceware.org/PR33530#c1">the fix</a>!</p>
<p>Synthetic test shown great results:</p>
<pre><code>$ printf &quot;int f_0() __attribute__ ((section (\&quot;.text.0\&quot;))); int f_0() { return 0; };\n&quot; &gt; main.c; for (( i=1; i&lt;20000; i++ )); do printf &quot;int f_$i() __attribute__ ((section (\&quot;.text.$i\&quot;))); int f_$i() { return f_$((i-1))(); };\n&quot;; done &gt;&gt; main.c; printf &quot;int main() { return f_19999(); }&quot; &gt;&gt; main.c; gcc -O0 -c main.c -o main.o; echo &quot;bfd:&quot;; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections
bfd:

real    0m0,119s
user    0m0,080s
sys     0m0,038s</code></pre>
<p><code>120ms</code> compared to the previous <code>5s</code> is a <code>40x</code> speedup. It’s still
twice as slow as <code>50ms</code> for <code>ld.gold</code>, but the absolute time is way
harder to notice. I tried to find a new degradation point by adding more
functions:</p>
<pre><code># 100K sections:
$ printf &quot;int f_0() __attribute__ ((section (\&quot;.text.0\&quot;))); int f_0() { return 0; };\n&quot; &gt; main.c; for (( i=1; i&lt;100000; i++ )); do printf &quot;int f_$i() __attribute__ ((section (\&quot;.text.$i\&quot;))); int f_$i() { return f_$((i-1))(); };\n&quot;; done &gt;&gt; main.c; printf &quot;int main() { return f_99999(); }&quot; &gt;&gt; main.c; gcc -O0 -c main.c -o main.o; echo &quot;bfd:&quot;; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections
bfd:

real    0m0.628s
user    0m0.472s
sys     0m0.154s

# 1M sections:
$ printf &quot;int f_0() __attribute__ ((section (\&quot;.text.0\&quot;))); int f_0() { return 0; };\n&quot; &gt; main.c; for (( i=1; i&lt;1000000; i++ )); do printf &quot;int f_$i() __attribute__ ((section (\&quot;.text.$i\&quot;))); int f_$i() { return f_$((i-1))(); };\n&quot;; done &gt;&gt; main.c; printf &quot;int main() { return f_999999(); }&quot; &gt;&gt; main.c; gcc -O0 -c main.c -o main.o; echo &quot;bfd:&quot;; time gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections
bfd:

real    0m8.697s
user    0m6.956s
sys     0m1.726s

$ size main.o
   text    data     bss     dec     hex filename
43000115              0       0 43000115        2902133 main.o</code></pre>
<p><code>8s</code> on a file with <code>1M</code> sections sounds quite fast! Let’s see where
<code>ld.bfd</code> spends its time now:</p>
<pre><code>$ perf record -g gcc main.o -o main -fuse-ld=bfd -Wl,--gc-sections
$ perf script &gt; out.perf &amp;&amp; perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded &amp;&amp; perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; fixed.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/fixed.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/fixed.svg" title="fixed ld.bfd profile on 1M dependent text sections" alt="fixed.svg" /></a></p>
<p>Once again the profile looks more balanced now. Yay! How about real
<code>pandoc</code>?</p>
<pre><code>$ time nix build --no-link -f. pandoc --rebuild

real    0m17,013s
user    0m0,672s
sys     0m0,123s</code></pre>
<p><code>17s</code> is a bit slower than <code>13s</code> of <code>ld.gold</code>, but not as bad as it used
to be. And it’s profile:</p>
<pre><code>$ perf record -g &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... '-optl-fuse-ld=bfd' -fforce-recomp

$ perf script &gt; out.perf &amp;&amp; perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded &amp;&amp; perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; fixed-pandoc.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/fixed-pandoc.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/fixed-pandoc.svg" title="fixed ld.bfd profile on 1M dependent text sections" alt="fixed-pandoc.svg" /></a></p>
<h2 id="bonus-other-linkers">bonus: other linkers</h2>
<p>Both <code>ld.gold</code> and <code>ld.lld</code> are quite fast at handling synthetic tests now,
But both still spend a few seconds on <code>pandoc</code>. How about other linkers?
Let’s add <code>lld</code> and <code>mold</code> to the mix. I’ll measure
<code>ghc --make '-optl-fuse-ld=$linker' -fforce-recomp</code> execution time:</p>
<pre><code># without the fix
$ time &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... pandoc ... -optl-fuse-ld=bfd -fforce-recomp
real    0m30.589s user    0m24.576s sys     0m6.447s

# with the fix
$ time &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... pandoc ... -optl-fuse-ld=bfd -fforce-recomp
real    0m8.676s user    0m6.484s sys     0m2.584s

$ time &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... pandoc ... -optl-fuse-ld=gold -fforce-recomp
real    0m5.929s user    0m5.543s sys     0m0.829s

$ time &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... pandoc ... -optl-fuse-ld=lld -fforce-recomp
real    0m1.413s user    0m1.754s sys     0m1.509s

$ time &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... pandoc ... -optl-fuse-ld=mold -fforce-recomp
real    0m1.209s user    0m0.424s sys     0m0.215s</code></pre>
<p>Note: we do measure not just liker time, but also <code>ghc</code> code generation
time. Actual link time does not take as much. Same values in tables:</p>
<table>
<thead>
<tr>
<th>linker</th>
<th><code>real</code> (sec)</th>
<th><code>user</code> (sec)</th>
<th><code>sys</code> (sec)</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>ld.bfd</code> (orig)</td>
<td>30.6</td>
<td>24.6</td>
<td>6.4</td>
</tr>
<tr>
<td><code>ld.bdf</code> (fixed)</td>
<td>8.7</td>
<td>6.5</td>
<td>2.6</td>
</tr>
<tr>
<td><code>ld.gold</code></td>
<td>5.9</td>
<td>5.5</td>
<td>0.8</td>
</tr>
<tr>
<td><code>lld</code></td>
<td>1.4</td>
<td>1.8</td>
<td>1.5</td>
</tr>
<tr>
<td><code>mold</code></td>
<td>1.2</td>
<td>0.4</td>
<td>0.2</td>
</tr>
</tbody>
</table>
<p><code>lld</code> and <code>mold</code> link times are impressive! They are <code>6x-7x</code> times faster
than fixed version of <code>ld.bfd</code>. Let’s look at their profiles to see what
they spend their time on. <code>lld</code> goes first:</p>
<pre><code>$ perf record -g &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... pandoc ... -optl-fuse-ld=lld -fforce-recomp

$ perf script &gt; out.perf &amp;&amp; perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded &amp;&amp; perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; lld-pandoc.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/lld-pandoc.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/lld-pandoc.svg" title="lld profile on pandoc" alt="lld-pandoc.svg" /></a></p>
<p>The only things that stick out in this profile are <code>malloc()</code> (about <code>20%</code>
of the time?) and <code>memmove()</code> calls (about <code>6%</code> of the time). Otherwise,
we see about equal time spent on reading (<code>linkerDriver</code> part on the right),
relocation processing (<code>RelocationScanner</code> in the middle) and writing
(<code>HashTableSection::writeTo</code> slightly to the left).</p>
<p>I wonder if memory management were to be optimized for <code>lld</code>, would it be
as fast as <code>mold</code> on this input? And now <code>mold</code> profile:</p>
<pre><code>$ perf record -g &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... pandoc ... -optl-fuse-ld=mold -fforce-recomp

$ perf script &gt; out.perf &amp;&amp; perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded &amp;&amp; perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; mold-pandoc.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/mold-pandoc.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/mold-pandoc.svg" title="mold profile on pandoc" alt="mold-pandoc.svg" /></a></p>
<p>The profile is not very readable as we mainly see the <code>tbb</code> threading
dispatch of chunks of work to do. If I scroll around I see
things like <code>scan_abs_relocations</code>, <code>apply_reloc_alloc</code>, <code>resolve_symbols</code>,
<code>split_contents</code>, <code>scan_relocations</code>. It looks like half the time
is spent on the management of parallelism. Playing a bit with the
<code>mold</code> parameters I noticed that <code>-Wl,--threads=4</code> is the maximum thread
count where I get any speed improvements for parallelism. Anything above
will clutter CPU usage profile with <code>sched_yield</code> “busy” wait threads.</p>
<p>To get the idea where actual CPU is spent it might be more interesting to
look at single-threaded profile using <code>-optl-Wl,--no-threads</code>:</p>
<pre><code>$ perf record -g &lt;&lt;NIX&gt;&gt;-ghc-9.10.3/bin/ghc --make ... pandoc ... -optl-fuse-ld=mold -optl-Wl,--no-threads -fforce-recomp

$ perf script &gt; out.perf &amp;&amp; perl ~/dev/git/FlameGraph/stackcollapse-perf.pl out.perf &gt; out.folded &amp;&amp; perl ~/dev/git/FlameGraph/flamegraph.pl out.folded &gt; mold-no-threads.svg</code></pre>
<p><a href="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/mold-pandoc-no-threads.svg"><img src="https://trofi.github.io/posts.data/340-profiling-binutils-linkers-in-nixpkgs/mold-pandoc-no-threads.svg" title="mold profile on pandoc single thread" alt="mold-pandoc-no-threads.svg" /></a></p>
<p>I’ll leave the interpretation of the picture to the reader.</p>
<h2 id="parting-words">parting words</h2>
<p><code>ld.gold</code> is about to be removed from <code>binutils</code>.</p>
<p>In <code>-fsplit-sections</code> mode <code>ghc</code> code generator produces huge amount of
sections per <code>ELF</code> file which manages to strain both <code>ld.bfd</code> and
<code>ld.gold</code> linkers. <code>ghc</code> should probably be fixed to produce one section
per strongly connected component of functions that refer one another.
<code>libHSpandoc</code> consists of almost half a million <code>ELF</code> sections. Average
section size for which is about 250 bytes.</p>
<p><code>ld.bfd</code> while being slower than <code>ld.gold</code> still has simple performance
bugs in obscure scenarios. Just like today’s
<a href="https://sourceware.org/PR33530"><code>PR33530</code></a> example.</p>
<p><code>binutils</code> upstream was very fast to come up with a possible fix to test.</p>
<p>Even fixed <code>ld.bfd</code> is still quite a bit slower on synthetic test with
huge sections compared to <code>ld.gold</code>. But at least it’s not exponentially
worse.</p>
<p><code>lld</code> and <code>mold</code> are still way faster than either <code>ld.bfd</code> or <code>ld.gold</code>.
About 6-7 times on <code>pandoc</code> test. <code>ld.bfd</code> still has a lot of room for
improvement :)</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>gcc-16 devirtualization changes</title>
    <link href="https://trofi.github.io/posts/339-gcc-16-devirtualization-changes.html" />
    <id>https://trofi.github.io/posts/339-gcc-16-devirtualization-changes.html</id>
    <published>2025-09-27T00:00:00Z</published>
    <updated>2025-09-27T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="a-quiz">A quiz</h2>
<p>Let’s start from a quiz: is this single file program a valid <code>c++</code>
program? Will it always build and run?</p>
<pre class="cpp"><code>void d_impl(void); /* has no definition at all! */

struct B { virtual void f(void) {} };

struct D : public B { virtual void f(void) { d_impl(); } };

void do_f(struct B* o) { o-&gt;f(); }

int main(void) { return 0; }</code></pre>
<h2 id="running-gcc">Running <code>gcc</code></h2>
<p>It feels like this whole program is just an obfuscated version of
<code>int main(){}</code> and thus should Just Work, right? And <code>gcc-15</code> would agree:</p>
<pre><code>$ g++-15 a.cc -o a -fopt-info -O2
$ ./a</code></pre>
<p>But with <code>gcc-16</code> it fails to link:</p>
<pre><code>$ g++-16 a.cc -o a -fopt-info -O2
a.cc:11:30: optimized: speculatively devirtualizing call in void do_f(B*)/3 to virtual void B::f()/1
a.cc:11:30: optimized: speculatively devirtualizing call in void do_f(B*)/3 to virtual void D::f()/2
a.cc:11:30: optimized: devirtualized call in void do_f(B*)/3 to 2 targets
a.cc:11:30: optimized:  Inlined virtual void B::f()/10 into void do_f(B*)/3 which now has time 12.400000 and size 11, net change of -2.
a.cc:11:30: optimized:  Inlined virtual void D::f()/11 into void do_f(B*)/3 which now has time 12.160000 and size 10, net change of -1.

ld: /tmp/nix-shell.QW52Fh/ccNR2yWI.o: in function `do_f(B*)':
a.cc:(.text+0x29): undefined reference to `d_impl()'
ld: /tmp/nix-shell.QW52Fh/ccNR2yWI.o: in function `D::f()':
a.cc:(.text._ZN1D1fEv[_ZN1D1fEv]+0x1): undefined reference to `d_impl()'
collect2: error: ld returned 1 exit status</code></pre>
<p>Note: it fails to find an implementation of <code>void d_impl(void);</code> function.</p>
<h2 id="devirtualization-mechanics">devirtualization mechanics</h2>
<p>Why did <code>gcc</code> not notice missing reference before?
<code>-fopt-info</code> gives us a hint that <code>gcc</code> “devirtualized” virtual call
of <code>void do_f(struct B* o) { o-&gt;f(); }</code> into non-virtual calls and got
extra references into the code.
<code>-fdump-tree-all</code> can show us the result after these transformations:</p>
<pre><code>$ g++ a.cc -o a -fopt-info -O2 -fdump-tree-all
...

# I removed a bit of unrelated detail manually
$ cat a.cc.273t.optimized

void B::f (struct B * const this) { return; }

void D::f (struct D * const this) { d_impl (); }

void do_f (struct B * o)
{
  int (*) () * _1;
  int (*) () _2;
  void * PROF_6;
  void * PROF_8;

  _1 = o_4(D)-&gt;_vptr.B;
  _2 = *_1;
  PROF_6 = [obj_type_ref] OBJ_TYPE_REF(_2;(struct B)o_4(D)-&gt;0B);
  if (PROF_6 == D::f) {
    d_impl (); [tail call]
    return;
  }

  PROF_8 = [obj_type_ref] OBJ_TYPE_REF(_2;(struct B)o_4(D)-&gt;0B);
  if (PROF_8 == B::f)
    return;

  OBJ_TYPE_REF(_2;(struct B)o_4(D)-&gt;0B) (o_4(D)); [tail call]
}

int main () { return 0; }</code></pre>
<p>Here <code>gcc-16</code> expanded <code>void do_f(struct B* o) { o-&gt;f(); }</code> against
unknown <code>o-&gt;f()</code> call into a few known types: <code>o-&gt;B::f()</code> and
<code>o-&gt;D::f()</code> calls by checking the function addresses via vtable. This
allowed <code>gcc</code> to inline <code>B::f()</code> and <code>D::f()</code>. Pseudocode of the result:</p>
<pre class="cpp"><code>// before
void do_f(struct B* o) { o-&gt;f(); }

// after
void do_f(struct B* o) {

  if (o-&gt;f == D::f) {
    // inlined D::f()
    d_impl(); // our new reference!
    return;
  }

  if (o-&gt;f == B::f) {
    // inlined B::f()
    return;
  }

  // other types
  o-&gt;f();
}</code></pre>
<p><code>gcc-15</code> did not use to do this kind of transformations. It’s a recent
change added in the <a href="https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9ee937b2f92a930eb5407260a56e5fe0fa137e85">commit <code>Add --param max-devirt-targets</code></a>.
This extended existing devirtualization optimization to consider not
just one possible devirtualization target (before the patch), but at most
<code>3</code>.</p>
<h2 id="fixes-and-workarounds">Fixes and workarounds</h2>
<p>Now we can even work the original example around and build it:</p>
<pre><code>$ g++ a.cc -o a -fopt-info -O2 --param=max-devirt-targets=1
$ ./a</code></pre>
<p>Is the above a completely hypothetical scenario? Why would you have such
code lying around? Well, I initially noticed it on
<a href="https://gitlab.kitware.com/cmake/cmake/-/issues/27256"><code>cmake</code> build failure</a>.
There <code>./bootstrap</code> script failed to build initial <code>cmake</code> on
<code>gcc-16</code>. <code>./bootstrap</code> code uses only a subset of <code>cmake</code> source code,
but it had a few <code>#include</code> that do refer to the code that does not
get compiled/linked in <code>./bootstrap</code>. The devirtualization change exposed
it. The <a href="https://gitlab.kitware.com/cmake/cmake/-/merge_requests/11243/diffs?commit_id=ea04e19daf7010781d0df980b9683a642093e381">fix</a>
was to <code>#ifdef</code> out the code that has no chance to execute on <code>./bootstrap</code>.
To transfer it back to our example the fix is similar to the following:</p>
<pre class="cpp"><code>void d_impl(void); /* has no definition at all! */

struct B { virtual void f(void) {} };

#if 0
struct D : public B { virtual void f(void) { d_impl(); } };
#endif

void do_f(struct B* o) { o-&gt;f(); }

int main(void) { return 0; }</code></pre>
<pre><code>$ g++ a.cc -o a -fopt-info -O2
a.cc:9:30: optimized: speculatively devirtualizing call in void do_f(B*)/2 to virtual void B::f()/1
a.cc:9:30: optimized:  Inlined virtual void B::f()/6 into void do_f(B*)/2 which now has time 7.200000 and size 9, net change of -2.

$ ./a</code></pre>
<p>All good now!</p>
<h2 id="parting-words">Parting words</h2>
<p>The initial example was not quite correct and caused link failures when
devirtualization kicked in. Including headers to the unlinked code does
not always work.</p>
<p>Devirtualization does sometimes bloat the code a bit with references that
have no chance to execute in real programs. Profile-guided optimizations
help a lot to avoid generation of completely dead code by getting better
estimates of observed behavior.</p>
<p><code>cmake</code> is <a href="https://gitlab.kitware.com/cmake/cmake/-/merge_requests/11243/diffs?commit_id=ea04e19daf7010781d0df980b9683a642093e381">fixed</a>
and can now be built with <code>gcc-16</code>!</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>nix-build in tmpfs</title>
    <link href="https://trofi.github.io/posts/338-nix-build-in-tmpfs.html" />
    <id>https://trofi.github.io/posts/338-nix-build-in-tmpfs.html</id>
    <published>2025-08-22T00:00:00Z</published>
    <updated>2025-08-22T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>I build a lot of <code>nix</code> packages locally. Until <code>nix-2.30</code> release
<code>nix-build</code> command triggered builds in <code>/tmp</code> directory by default.
As it’s not a <code>tmpfs</code> by default I used to enable
<code>boot.tmp.useTmpfs = true;</code> to force all of <code>/tmp</code> into <code>tmpfs</code> to get
slightly faster builds.</p>
<p>In <code>nix-2.30</code> <code>nix</code> <a href="https://discourse.nixos.org/t/nix-2-30-0-released/66449">switched</a>
its default build directory to <code>/nix/var/nix/builds</code>:</p>
<pre><code>... `build-dir` no longer defaults to `$TMPDIR` ...</code></pre>
<p>This made my builds slow again. It’s especially noticeable when a few
huge tarballs start unpacking on disk in parallel. Here is my new
workaround to get that directory to <code>tmpfs</code> as well:</p>
<pre class="nix"><code># cat /etc/nixos/tmpfs.nix
{ lib, ... }:
{
  systemd.mounts = [{
    wantedBy = [ &quot;nix-daemon.service&quot; ];
    what = &quot;tmpfs&quot;;
    where = &quot;/nix/var/nix/builds&quot;;
    type = &quot;tmpfs&quot;;
    mountConfig.Options = lib.concatStringsSep &quot;,&quot; [
      &quot;mode=0755&quot;
      &quot;strictatime&quot;
      &quot;rw&quot;
      &quot;nosuid&quot;
      &quot;nodev&quot;
      &quot;size=100G&quot; # WARNING: you might want to change this value
    ];
  }];
}</code></pre>
<p>It creates a <code>systemd</code> <code>mount</code> unit and makes it a pre-requisite of
<code>nix-daemon</code> and mounts just before <code>nix-daemon</code> start.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>four years on NixOS</title>
    <link href="http://trofi.github.io/posts/337-four-years-on-nixos.html" />
    <id>http://trofi.github.io/posts/337-four-years-on-nixos.html</id>
    <published>2025-08-20T00:00:00Z</published>
    <updated>2025-08-20T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>It’s another yearly instance of my <code>NixOS</code> journey
(<a href="https://trofi.github.io/posts/316-three-years-on-nixos.html">2024 instance</a>). I meant to
write it around <code>25.05</code> but completely forgot!</p>
<h2 id="system-maintenance">system maintenance</h2>
<p>As usual I don’t remember what I did to my system over the past year,
so I’m at the <code>git log</code> for <code>/etc/nixos</code> as it contains all the changes:</p>
<ul>
<li>follow <code>hardware.opengl</code> -&gt; <code>hardware.graphics</code> rename</li>
<li>follow <code>hardware.opengl.driSupport{,32Bit}</code> removal</li>
<li>follow rename of <code>gnome.adwaita-icon-theme</code></li>
<li>follow <code>hardware.pulseaudio</code> -&gt; <code>services.pulseaudio</code> rename</li>
<li>follow <code>okular</code> -&gt; <code>kdePackages.okular</code> rename</li>
<li>drop deprecated <code>i18n.supportedLocales</code></li>
<li>follow <code>networking.wireless.iwd.settings.General</code> -&gt; <code>networking.wireless.iwd.settings.DriverQuirks</code> rename</li>
<li>follow <code>services.postfix.config</code> -&gt; <code>services.postfix.settings.main</code> rename</li>
<li>fix <code>services.postfix.settings.main.mynetworks</code> type (changed from <code>string</code> to array)</li>
</ul>
<p>It is quite a bit more of renames than last year. I think locale changes
actually broke my locales at runtime and I had to figure out what to
change to get them back.</p>
<p>I did not have major package build failures that required any local
changes.</p>
<p>This time I had the following non-trivial problems in upstream packages:</p>
<ul>
<li><code>duperemove</code> would hang up on <code>NoCOW</code> files: <a href="https://github.com/markfasheh/duperemove/pull/376" class="uri">https://github.com/markfasheh/duperemove/pull/376</a>.
I had to bisect the regression and fix it upstream. It was easy as I
am somewhat familiar with <code>duperemove</code> implementation.</li>
<li><code>nix</code> started stripping too much in <code>gcc-14</code> warning logs: <a href="https://github.com/NixOS/nix/pull/13109" class="uri">https://github.com/NixOS/nix/pull/13109</a>.
It came up after <code>nixpkgs</code> switched to <code>gcc-14</code>. Took some time to figure
out what is so special about <code>gcc</code> warnings. Ended up being quite easy.</li>
<li><code>perf</code> stated to hang up on my system: <a href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c21986d33d6beb269a35b38dcb8adaa5bd228527" class="uri">https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c21986d33d6beb269a35b38dcb8adaa5bd228527</a>.
It came up after I ran system-wide profiler to see what <code>wine</code> does to
use 100% CPU on an ancient game. THe fix was trivial once looked at.
I don’t remember what changed to expose the hangups. Maybe I just never
did it before? Ended up being easy as well.</li>
<li><code>mpv</code> started failing GPU rendering due to <code>libplacebo</code> / <code>shaderc</code>
incompatibility: <a href="https://code.videolan.org/videolan/libplacebo/-/issues/335" class="uri">https://code.videolan.org/videolan/libplacebo/-/issues/335</a>.
Most arcane bug of all: had to bisect the whole system to find the
package first, then had to bisect down to the commit. Upstream eventually
fixed it. But if <code>nixpkgs</code> had up-to-date <code>shaderc</code> we would not stumble
on this bug. The only non-trivial bug from the whole list.</li>
</ul>
<h2 id="community-support">Community support</h2>
<p>I still feel that <code>NixOS</code> community is a welcoming place for newcomers,
experimenters and people who do grunt maintenance work. <code>NixOS</code> community
now had elected their first Steering Committee who can help resolving
high-level conflicts.</p>
<p>Some of the amusing things I did over the past year:</p>
<ul>
<li><a href="https://trofi.github.io/posts/330-another-nix-language-nondeterminism-example.html"><code>nix</code> language non-determinism in <code>sort</code> built-in</a></li>
<li><code>stdenv</code> fix to handle root directories that start with dash (<code>-</code>): <a href="https://github.com/NixOS/nixpkgs/pull/317106" class="uri">https://github.com/NixOS/nixpkgs/pull/317106</a>.
<code>diffoscope-269</code> release was a great stress test for <code>nixpkgs</code> <code>bash</code> code :)</li>
<li>Found and fixed ~50 more eval failures in <code>nixpkgs</code> found by <a href="https://trofi.github.io/posts/309-listing-all-nixpkgs-packages.html">the hack</a>.
This hack was also the trigger that exposed <code>sort</code> non-determinism above.</li>
<li>Fixed <code>nixpkgs</code> <code>isMachO</code> helper: <a href="https://github.com/NixOS/nixpkgs/pull/432097" class="uri">https://github.com/NixOS/nixpkgs/pull/432097</a>.
Reading 4 bytes from the file in pure <code>bash</code>. How hard could it be?</li>
</ul>
<p>Just like last year I managed to get about 800 commits into <code>nixpkgs</code>
this year.</p>
<p>I stopped reading any Matrix channels completely and only skim through
<a href="https://discourse.nixos.org/">discourse</a> and read <code>github</code> notifications.</p>
<h2 id="home-server-experience">Home server experience</h2>
<p>I did not have to adapt anything for the past year. Things still Just Work.</p>
<h2 id="local-experiments">Local experiments</h2>
<p>I switched to <a href="https://trofi.github.io/posts/331-trying-out-helix-editor.html"><code>helix</code> editor</a>
and to <a href="https://trofi.github.io/posts/333-to-chromium.html"><code>chromium</code> browser</a>. Both were quite
smooth transitions.</p>
<p>I continued <code>gcc</code> testing. This year it was <code>gcc-15</code> branch. <code>nixpkgs</code>
still manages to serve as a reasonable vehicle to
<a href="https://trofi.github.io/posts/332-gcc-15-bugs-pile-2.html">find bugs</a>. Just like last year I
found about 50 compiler bugs. Did not manage to fix any myself.</p>
<h2 id="parting-words">Parting words</h2>
<p><code>NixOS</code> still works for me.</p>
<p>Give <code>NixOS</code> a go if you did not yet :)</p>]]></summary>
</entry>
<entry>
    <title>nix and guix for Gentoo in 2025</title>
    <link href="https://trofi.github.io/posts/336-nix-and-guix-for-gentoo-in-2025.html" />
    <id>https://trofi.github.io/posts/336-nix-and-guix-for-gentoo-in-2025.html</id>
    <published>2025-08-19T00:00:00Z</published>
    <updated>2025-08-19T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>Two years have passed since the <a href="https://trofi.github.io/posts/287-nix-and-guix-for-gentoo-in-2023.html">last issue</a>
of <a href="https://github.com/trofi/nix-guix-gentoo"><code>::nix-guix</code></a> overlay updates.
The overlay still ships latest <code>nix-2.30.2</code> and <code>guix-1.4.0</code>
packages. <strong>One notable addition is <code>lix-2.93.3</code>!</strong></p>
<p>Our list of contributors over past 2 years is:</p>
<pre><code>dependabot[bot]
G-Src
Jiajie Chen
Kris Scott
Sergei Trofimovich
Vincent de Phily</code></pre>
<p>There are no major user-visible changes. But a few things to note are:</p>
<ul>
<li><code>sys-apps/lix</code> was added to the family of <code>nix</code>-like package managers</li>
<li><code>sys-apps/guix</code> is not masked any more as <code>guile-3</code> was unmasked in
<code>::gentoo</code>!</li>
<li>old pre-<code>meson</code> version of <code>nix</code> are dropped</li>
<li><code>sys-apps/nix</code> does not enable fallback if user namespaces fail to
initialize. This should guard users from accidentally building
non-hermetic packages (they are very likely to break on Gentoo for
various reasons)</li>
<li>added <code>USE=allocate-build-users</code> to <code>sys-apps/nix</code> to use fully dynamic
user builders (instead of requiring <code>acc-user/</code> set of static users)</li>
<li><code>sys-apps/guix</code> <code>ebuild</code> was ported to <code>guile-single.eclass</code></li>
</ul>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>sort by key in coreutils</title>
    <link href="https://trofi.github.io/posts/335-sort-by-key-in-coreutils.html" />
    <id>https://trofi.github.io/posts/335-sort-by-key-in-coreutils.html</id>
    <published>2025-05-08T00:00:00Z</published>
    <updated>2025-05-08T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>This post is about <code>sort</code> tool from <code>GNU coreutils</code>. Until today I
foolishly thought that to sort a file by a second (and just second
column) you just need to use <code>sort -k2</code> option.</p>
<p>Indeed, that does seem to work for a simple case:</p>
<pre><code>$ printf &quot;1 2\n2 1\n&quot;
1 2
2 1</code></pre>
<pre><code>$ printf &quot;1 2\n2 1\n&quot; | sort -k2
2 1
1 2</code></pre>
<p>But today I attempted a slightly more complicated sort by sorting commit
history:</p>
<pre><code>abcd foo: commit z
bcde bar: commit a
cdef foo: commit y
defg bar: commit b</code></pre>
<p>I wanted to sort these by the <code>foo:</code> / <code>bar:</code> component while preserving
the order of commit within the component. I want this outcome:</p>
<pre><code>bcde bar: commit a
defg bar: commit b
abcd foo: commit z
cdef foo: commit y</code></pre>
<p>How do you achieve that? My naive attempt was to use <code>sort -k2 --stable</code>:</p>
<pre><code>$ sort -k2 --stable &lt;l
bcde bar: commit a
defg bar: commit b
cdef foo: commit y
abcd foo: commit z</code></pre>
<p>Note how <code>foo:</code> commits were unexpectedly reordered.</p>
<p>Quick quiz: why did excessive reorder happen? How to fix it?</p>
<h2 id="the-answer">The answer</h2>
<p><code>sort --help</code> has an answer by describing what <code>-k2</code> key selection
actually does. But <code>--debug</code> option is even better at illustrating what
is being compared. Let’s use that:</p>
<pre><code>$ LANG=C sort -k2 --stable --debug &lt;l
sort: text ordering performed using simple byte comparison
sort: leading blanks are significant in key 1; consider also specifying 'b'
bcde bar: commit a
    ______________
defg bar: commit b
    ______________
cdef foo: commit y
    ______________
abcd foo: commit z
    ______________</code></pre>
<p>The <code>______________</code> underscore shows the actual compared key: it’s not
just <code>foo:</code> or <code>bar:</code>. It’s the whole line that starts at the
whitespace right before <code>foo:</code> and <code>bar:</code>. The fix is to tweak the selector:</p>
<pre><code>$ LANG=C sort -k2,2 --stable --debug &lt;l
sort: text ordering performed using simple byte comparison
sort: leading blanks are significant in key 1; consider also specifying 'b'
bcde bar: commit a
    _____
defg bar: commit b
    _____
abcd foo: commit z
    _____
cdef foo: commit y
    _____</code></pre>
<p>or with <code>-b</code> if leading spaces look confusing:</p>
<pre><code>$ LANG=C sort -k2,2 --stable --debug -b &lt;l
sort: text ordering performed using simple byte comparison
bcde bar: commit a
     ____
defg bar: commit b
     ____
abcd foo: commit z
     ____
cdef foo: commit y
     ____</code></pre>
<p>This way the sorting works as expected:</p>
<pre><code>$ LANG=C sort -k2,2 --stable -b &lt;l
bcde bar: commit a
defg bar: commit b
abcd foo: commit z
cdef foo: commit y</code></pre>
<h2 id="parting-words">parting words</h2>
<p><code>sort -k</code> is tricky: it’s not a field number but a field range. <code>--debug</code>
option is great at showing used sorting key (or keys of <code>--stable</code> is not
used).</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>Zero Hydra Failures towards 25.05 NixOS release</title>
    <link href="https://trofi.github.io/posts/334-Zero-Hydra-Failures-towards-25.05-NixOS-release.html" />
    <id>https://trofi.github.io/posts/334-Zero-Hydra-Failures-towards-25.05-NixOS-release.html</id>
    <published>2025-05-01T00:00:00Z</published>
    <updated>2025-05-01T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>It’s May 1 and that means <code>NixOS-25.05</code> is almost
<a href="https://github.com/NixOS/nixpkgs/issues/390768">there</a>. Today the
release entered <a href="https://github.com/NixOS/nixpkgs/issues/390768"><code>ZHF</code> phase</a>
(<code>Zero Hydra Failures</code>) where the main focus
is to squash as many build failures as possible before the release.</p>
<p>It’s a good time to fix easy build failures or remove long broken
packages. <a href="https://github.com/NixOS/nixpkgs/issues/390768" class="uri">https://github.com/NixOS/nixpkgs/issues/390768</a> contains
detailed step-by-step to identify interesting packages.</p>
<h2 id="an-example-package-fix">an example package fix</h2>
<p>I usually try to fix at least one package during <code>ZHF</code>. This time I
picked <a href="https://hydra.nixos.org/build/294989234"><code>hheretic</code></a>. The
failure does not look too cryptic:</p>
<pre><code>...
checking for OpenGL support... no
configure: error: *** OpenGL not found!</code></pre>
<p>To get a bit more detail I usually use <code>nix develop</code>:</p>
<pre><code>$ nix develop -f. hheretic
$$ genericBuild
checking for OpenGL support... no
configure: error: *** OpenGL not found!
...
Running phase: buildPhase
no Makefile or custom buildPhase, doing nothing
...</code></pre>
<p>Here I ran <code>genericBuild</code> to start a build process similar to what a
<code>nix build -f. hheretic</code> would do.
I got expected error (and a bit of extra stuff). Now I can peek at
<code>config.log</code> to check why <code>OpenGL</code> was not detected:</p>
<pre><code>$ cat config.log
...
configure:5413: checking for OpenGL support
configure:5429: gcc -o conftest  -Wall -O2 -ffast-math -fomit-frame-pointer   conftest.c -lm  -Lno -lGL -lGLU &gt;&amp;5
conftest.c:30:10: fatal error: GL/gl.h: No such file or directory
   30 | #include &lt;GL/gl.h&gt;
      |          ^~~~~~~~~
compilation terminated.</code></pre>
<p>The compiler does not see <code>GL/gl.h</code> header: a missing dependency. The
first thing I tried was this patch:</p>
<pre class="diff"><code>--- a/pkgs/by-name/hh/hheretic/package.nix
+++ b/pkgs/by-name/hh/hheretic/package.nix
@@ -4,6 +4,8 @@
   fetchFromGitHub,
   SDL,
   SDL_mixer,
+  libGL,
+  libGLU,
   autoreconfHook,
   gitUpdater,
 }:
@@ -27,6 +29,8 @@ stdenv.mkDerivation (finalAttrs: {
   buildInputs = [
     SDL
     SDL_mixer
+    libGL
+    libGLU
   ];

   strictDeps = true;</code></pre>
<p>Running <code>nix build -f. hheretic</code> against it makes the package build
successfully. The change is proposed as a
<a href="https://github.com/NixOS/nixpkgs/pull/403458"><code>PR#403458</code></a> now.
As a bonus let’s figure out when the package broke. In the
<a href="https://hydra.nixos.org/job/nixos/trunk-combined/nixpkgs.hheretic.x86_64-linux">history tab</a>
we can see that:</p>
<ul>
<li><a href="https://hydra.nixos.org/build/292311010" class="uri">https://hydra.nixos.org/build/292311010</a> was the last successful build</li>
<li><a href="https://hydra.nixos.org/build/293013734" class="uri">https://hydra.nixos.org/build/293013734</a> was the first failing build</li>
</ul>
<p>Both links have <code>Inputs</code> tab where we can extract <code>nixpkgs</code> commits that
correspond to the build. That is enough for bisection:</p>
<pre><code>$ git clone https://github.com/NixOS/nixpkgs
$ cd nixpkgs/
$ git bisect start 81b934af6399c868c693a945415bd59771f41718 316f79657ec153b51bee287fb1fb016b104af9ef
    Bisecting: 2949 revisions left to test after this (roughly 12 steps)
    [8490862820028f5c371ac0a7fde471990ff6ad80] evcc: 0.200.9 -&gt; 0.201.0 (#390530)
$ git bisect run nix build -f. hheretic
running 'nix' 'build' '-f.' 'hheretic'
Bisecting: 1476 revisions left to test after this (roughly 11 steps)
...
Bisecting: 0 revisions left to test after this (roughly 1 step)
[e24f567a68111784e81cdda85e3784dd977f2ef8] Merge master into staging-next
running 'nix' 'build' '-f.' 'hheretic'
e47403cf2a2c76ae218bbf519c538b0ed419fa5f is the first bad commit
commit e47403cf2a2c76ae218bbf519c538b0ed419fa5f
Date:   Tue Mar 11 09:41:21 2025 +0100

    SDL: point alias to SDL_compat

 pkgs/top-level/all-packages.nix | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
bisect found first bad commit</code></pre>
<p>Looking at <a href="https://github.com/NixOS/nixpkgs/commit/e47403cf2a2c76ae218bbf519c538b0ed419fa5f" class="uri">https://github.com/NixOS/nixpkgs/commit/e47403cf2a2c76ae218bbf519c538b0ed419fa5f</a>
the <code>GitHub</code> UI says it corresponds to
<a href="https://github.com/NixOS/nixpkgs/pull/389106"><code>PR#389106</code></a>.
Added
<a href="https://github.com/NixOS/nixpkgs/pull/389106#issuecomment-2845845704">the comment</a>
there to get attention of relevant authors.</p>
<h2 id="parting-words">parting words</h2>
<p><code>ZHF</code> event is a good way to contribute to <code>nixpkgs</code>. If you never did
but were waiting for an occasion it’s a good one to try!</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>To chromium</title>
    <link href="https://trofi.github.io/posts/333-to-chromium.html" />
    <id>https://trofi.github.io/posts/333-to-chromium.html</id>
    <published>2025-04-20T00:00:00Z</published>
    <updated>2025-04-20T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="tldr">Tl;DR</h2>
<p>I switched from <code>firefox</code> to <code>chromium</code> as a primary web browser on my
desktop.</p>
<h2 id="on-firefox">On <code>firefox</code></h2>
<p>I was a happy <code>firefox</code> user since the <code>1.5</code> release. Internet says it was
released in 2005. This made a 20-year run for me. The web changed so
much since then. Adobe Flash went away and web 2.0 <code>javascript</code>-heavy
applications took its place. At some point I had to start using content
filtering extensions to be able to browse the web.</p>
<p><code>firefox</code> was able to keep up with the times most of the time. It felt
like at some point UI became too sluggish. Subjectively <code>quantum</code> 2017
release made it snappy again.</p>
<p>Fast forward to 2025 <code>firefox</code> mostly meets my needs, but there are a
few performance warts I don’t know how to deal with:</p>
<ul>
<li>Some of web-based instance messengers are very slow in <code>firefox</code>: when
I switch to the tab it freezes the whole of <code>firefox</code> for a few
seconds (it happens every time I switch to a tab, not just for the
first time).</li>
<li>Branch selection (drop-down menu) at pull request creation time on
<code>github</code> is visibly slow on repositories with many branches (<code>200+</code>
in <code>nixpkgs</code>).</li>
<li><code>firefox</code> start up time on mostly empty user profiles on <code>HDDs</code> are very
slow: about tens of seconds.</li>
</ul>
<p>Over past few years I have encountered a few widespread bugs in <code>firefox</code>:</p>
<ul>
<li><code>100%</code> CPU usage on <code>HTTP3</code>: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1749914" class="uri">https://bugzilla.mozilla.org/show_bug.cgi?id=1749914</a></li>
<li><code>tab crashes due to LLVM bug</code>: <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=1741454" class="uri">https://bugzilla.mozilla.org/show_bug.cgi?id=1741454</a></li>
</ul>
<h2 id="on-chrome">On <code>chrome</code></h2>
<p>I already used <code>chrome</code> at work for about 10 years. And 3 years ago I
started using <code>chrome</code> on a <code>chromebook</code> laptop for some of personal
things. But for a personal desktop my strong preference prefer is not to
use proprietary software.</p>
<p>With a recent shift to AI and advertising at Mozilla I wondered what are
the alternatives to <code>firefox</code> there are if I should give <code>chromium</code> a
proper try.</p>
<h2 id="on-chromium">On <code>chromium</code></h2>
<p>After about 2 months using <code>chromium</code> I should say that it is very
pleasant to use. Subjectively fonts look a bit better in <code>chromium</code> and
most performance hiccups I encountered in <code>firefox</code> disappeared (but a
new one appeared, mentioned below).</p>
<p>Helper pages like <code>chrome://about</code>, <code>chrome://flags</code> and <code>chrome://gpu</code>
are a reasonable substitute for <code>firefox</code> <code>about:config</code>.</p>
<p>I also discovered a few bugs/warts/known-issues as well:</p>
<ul>
<li><p><code>wayland</code> backend is not enabled by default and needs either a flag
like <code>--enable-features=UseOzonePlatform --ozone-platform=wayland</code> or
an option selected at
<code>chrome://flags &gt; Preferred Ozone platform &gt; Wayland</code>.</p>
<p>While it’s a one-off setup it feels like <code>wayland</code> might not be the
primary target for <code>linux</code> desktops.</p></li>
<li><p><code>pdf</code> viewer is quite a bit slower than in <code>firefox</code>. 350-paged doc
make <code>chromium</code> visibly struggle to scroll around, might be a known
<a href="https://issues.chromium.org/issues/345117890" class="uri">https://issues.chromium.org/issues/345117890</a>.</p>
<p>I have to fall back to local viewers for larger docs.</p>
<p><strong>UPDATE</strong>: installing <a href="https://github.com/mozilla/pdf.js?tab=readme-ov-file#getting-started"><code>pdf.js</code></a>
ended up being even better solution.</p></li>
<li><p><code>chromium</code> syncs on disk somewhat frequently. There is a 15-years old
<a href="https://issues.chromium.org/issues/41198599" class="uri">https://issues.chromium.org/issues/41198599</a> that mentions it’s all
the actions user does are synced on disk time to time. I don’t think
it’s a real problem for modern SSDs, but still it feel quite wasteful.</p></li>
<li><p><code>sway</code> sometimes crashes completely when I visit certain utility
provider sites with a message like:</p>
<pre><code>00:00:25.696 [sway/sway_text_node.c:110] cairo_image_surface_create failed: invalid value (typically too big) for the size of the input (surface, pattern, etc.)
00:00:25.696 [sway/sway_text_node.c:110] cairo_image_surface_create failed: invalid value (typically too big) for the size of the input (surface, pattern, etc.)
sway: render/pass.c:23: wlr_render_pass_add_texture: Assertion `box-&gt;x &gt;= 0 &amp;&amp; box-&gt;y &gt;= 0 &amp;&amp; box-&gt;x + box-&gt;width &lt;= options-&gt;texture-&gt;width &amp;&amp; box-&gt;y + box-&gt;height &lt;= options-&gt;texture-&gt;height' failed.</code></pre>
<p>There is a bunch of open bugs with related error messages. I looks
like those are usually <code>sway</code> or <code>wlroots</code> robustness bugs. <code>chromium</code>
is probably also at fault here trying to create surfaces of
unreasonable dimensions.</p>
<p>Installing <code>sway</code> and <code>wlroots</code> from <code>git</code> <code>master</code> fixed all crashes
for me.</p></li>
</ul>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>gcc-15 bugs, pile 2</title>
    <link href="https://trofi.github.io/posts/332-gcc-15-bugs-pile-2.html" />
    <id>https://trofi.github.io/posts/332-gcc-15-bugs-pile-2.html</id>
    <published>2025-04-19T00:00:00Z</published>
    <updated>2025-04-19T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>8 more months have passed since my previous
<a href="https://trofi.github.io/posts/323-gcc-15-bugs-pile-1.html">pile report</a>. <code>gcc-15</code> was
<a href="https://gcc.gnu.org/pipermail/gcc/2025-April/245943.html">branched off</a>
from <code>master</code> and will receive only regression fixes. <code>master</code> is called
<code>gcc-16</code> now.</p>
<p>It’s a good time to look at the compiler bugs I encountered.</p>
<h2 id="summary">summary</h2>
<p>I got about 30 of those:</p>
<ul>
<li><a href="https://gcc.gnu.org/PR116516"><code>rtl-optimization/116516</code></a>: ICE on
<code>linux-6.10</code> due to inability to handle some address calculation
expressions.</li>
<li><a href="https://gcc.gnu.org/PR116797"><code>middle-end/116516</code></a>: ICE on
<code>libvpx-1.14.1</code> due to a vectorizer bug that tried to access outside
array boundary.</li>
<li><a href="https://gcc.gnu.org/PR116814"><code>middle-end/116814</code></a>: ICE on
<code>libjack2-1.9.22</code> due to <code>gcc</code> inability to generate code for
saturated subtraction</li>
<li><a href="https://gcc.gnu.org/PR116817"><code>tree-optimization/116817</code></a>: ICE on
<code>libajantv2-16.2</code> <code>gcc</code> vectorizer broke on a loop invariant</li>
<li><a href="https://gcc.gnu.org/PR116857"><code>libstdc++/116857</code></a>: <code>mingw32</code> build
failure, was exposed after re-enabling most warnings on <code>gcc</code> headers.</li>
<li><a href="https://gcc.gnu.org/PR116880"><code>c++/116880</code></a>: <code>co_await</code> use-after-free
on <code>nix-2.24.8</code> code. A <code>gcc</code> bug in coroutine lifetime management.</li>
<li><a href="https://gcc.gnu.org/PR116911"><code>c++/116911</code></a>: <code>qt5.qtbase</code> build
failure due to <code>gcc</code> regression in assigning external linkage to local
variables.</li>
<li><a href="https://gcc.gnu.org/PR117039"><code>bootstrap/117039</code></a>: <code>-Werror=</code> <code>libcpp</code>
<code>gcc</code> build failure due to format string problems.</li>
<li><a href="https://gcc.gnu.org/PR117114"><code>c++/117114</code></a>: <code>-Woverloaded-virtual</code>
false positives due to a <code>gcc</code> in how it tracks methods in case of
multiple inheritance.</li>
<li><a href="https://gcc.gnu.org/PR117141"><code>middle-end/117141</code></a>: duplicate pattern
definitions for subtraction-with-saturation primitive. A build warning.</li>
<li><a href="https://gcc.gnu.org/PR117177"><code>c/117177</code></a>: wrong code on global arrays
used by <code>python-3.12.7</code> and others. <code>gcc</code> generated invalid bytes that
represent the array.</li>
<li><a href="https://gcc.gnu.org/PR117190"><code>c/117190</code></a>: ICE on <code>linux-6.11.3</code>,
another case of <code>gcc</code> inability to generate static const arrays
similar to the previous entry.</li>
<li><a href="https://gcc.gnu.org/PR117194"><code>target/117194</code></a>: wrong code on
<code>highway-1.2.0</code> in vectorizer code. <code>gcc</code> used incorrect order of
operands in <code>ANDN</code> primitive.</li>
<li><a href="https://gcc.gnu.org/PR117220"><code>libstdc++/117220</code></a>: <code>stl_iterator</code> and
<code>clang</code> incompatibility: <code>gcc</code> allows slightly different mix of
<code>[[..]]</code> and <code>__attribute((..))</code> style of attributes ordering than
<code>clang</code>.</li>
<li><a href="https://gcc.gnu.org/PR117288"><code>lto/117288</code></a>: <code>lto</code> ICE on <code>wolfssl</code>,
constant arrays are not handled by <code>gcc</code>. This time in <code>LTO</code> bytecode.</li>
<li><a href="https://gcc.gnu.org/PR117306"><code>tree-optimization/117306</code></a>: <code>-O3</code>
vectorizer ICE on <code>netpbm-11.8.0</code> of certain <code>bool</code> calculation patterns.</li>
<li><a href="https://gcc.gnu.org/PR117378"><code>middle-end/117378</code></a>: <code>waybar</code> ICE on
<code>c++</code> due to a <code>gcc</code> bug in expansion of ternary operators.</li>
<li><a href="https://gcc.gnu.org/PR117476"><code>rtl-optimization/117476</code></a>: wrong code
on <code>grep</code> and <code>libgcrypt</code> in a code that handles zero-extension.</li>
<li><a href="https://gcc.gnu.org/PR117496"><code>middle-end/117496</code></a>: infinite recursion
on <code>cdrkit</code> due to <code>a | b</code> pattern generating still foldable result.</li>
<li><a href="https://gcc.gnu.org/PR117843"><code>bootstrap/117843</code></a>: <code>fortran</code> bootstrap
build failure (<code>-Werror</code>). A missing enum entry handling.</li>
<li><a href="https://gcc.gnu.org/PR117980"><code>c++/117980</code></a>: ICE on <code>nix-2.25.2</code> where
<code>gcc</code> transformation broke the type of underlying expression.</li>
<li><a href="https://gcc.gnu.org/PR118124"><code>c++/118124</code></a>: ICE on <code>nss</code>, <code>c++</code>
constant arrays were not handled in <code>initializer_list&lt;...&gt;</code>.</li>
<li><a href="https://gcc.gnu.org/PR118168"><code>preprocessor/118168</code></a>: slow <code>mypy</code>
compilation on <code>-Wmisleading-indentation</code>. <code>gcc</code> parsed the whole file
multiple times to resolve locations.</li>
<li><a href="https://gcc.gnu.org/PR118205"><code>tree-optimization/118205</code></a>: <code>libdeflate</code>
wrong code, fails <code>libtiff</code> tests due to a <code>gcc</code> bug in handling
certain form of <code>PHI</code> modes.</li>
<li><a href="https://gcc.gnu.org/PR118409"><code>tree-optimization/118409</code></a>: <code>gas</code> is
compiled incorrectly due to <code>gcc</code> bug in handling <code>xor</code> on sub-byte
bit fields.</li>
<li><a href="https://gcc.gnu.org/PR118856"><code>c++/118856</code></a>: <code>mesonlsp-4.3.7</code> ICE
and wrong code due to too early temporary destruction for arrays.</li>
<li><a href="https://gcc.gnu.org/PR119138"><code>c++/119138</code></a>: <code>mingw32</code> bootstrap
failure due to a <code>gcc</code> regression in attribute tracking for pointers.</li>
<li><a href="https://gcc.gnu.org/PR119226"><code>middle-end/119226</code></a>: <code>vifm-0.14</code> ICE on
<code>strcspn()</code> due to a bad folding recently added to <code>gcc</code> just for this
function.</li>
<li><a href="https://gcc.gnu.org/PR119278"><code>analyzer/119278</code></a>: <code>gnutls</code> <code>-fanalyzer</code>
ICE due to lack of handling of a new type for static const arrays.</li>
<li><a href="https://gcc.gnu.org/PR119428"><code>target/119428</code></a>: <code>e2fsprogs-1.47.2</code>
wrong code on bit reset due to a wrong <code>btr</code> pattern.</li>
<li><a href="https://gcc.gnu.org/PR119646"><code>c++/119428</code></a>: <code>lix</code> ICE on coroutine
code where coroutine types and values cause <code>gcc</code> to fail to handle
more complicated (but allowed by standard) cases.</li>
</ul>
<h2 id="fun-bugs">fun bugs</h2>
<h3 id="e2fsprogs-bug"><code>e2fsprogs</code> bug</h3>
<p>The <a href="https://gcc.gnu.org/PR119428"><code>e2fsprogs bug</code></a> was an interesting
case of wrong code. This was enough to trigger it:</p>
<pre class="c"><code>// $ cat bug.c
__attribute__((noipa, optimize(1)))
void bug_o1(unsigned int nr, void * addr)
{
        unsigned char   *ADDR = (unsigned char *) addr;

        ADDR += nr &gt;&gt; 3;
        *ADDR &amp;= (unsigned char) ~(1 &lt;&lt; (nr &amp; 0x07));
}

__attribute__((noipa, optimize(2)))
void bug_o2(unsigned int nr, void * addr)
{
        unsigned char   *ADDR = (unsigned char *) addr;

        ADDR += nr &gt;&gt; 3;
        *ADDR &amp;= (unsigned char) ~(1 &lt;&lt; (nr &amp; 0x07));
}

int main() {
  void * bmo1 = __builtin_malloc(1024);
  void * bmo2 = __builtin_malloc(1024);
  for (unsigned bno = 0; bno &lt; 1024 * 8; ++bno) {
    __builtin_memset(bmo1, 0xff, 1024);
    __builtin_memset(bmo2, 0xff, 1024);
    bug_o1(bno, bmo1);
    bug_o2(bno, bmo2);
    if (__builtin_memcmp(bmo1, bmo2, 1024) != 0)
      __builtin_trap();
  }
}</code></pre>
<p>Crashing as:</p>
<pre><code>$ gcc bug.c -o bug -O0 &amp;&amp; ./bug
Illegal instruction (core dumped)</code></pre>
<p>The <code>gcc</code> <a href="https://gcc.gnu.org/cgit/gcc/commit/?id=584b346a4c7a6e6e77da6dc80968401a3c08161d">fix</a>
amends mask calculation as:</p>
<pre class="diff"><code>--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18168,7 +18168,8 @@
  [(set (match_dup 4) (match_dup 1))
   (set (match_dup 0)
        (any_rotate:SWI (match_dup 4)
-		       (subreg:QI (match_dup 2) 0)))]
+		       (subreg:QI
+			 (and:SI (match_dup 2) (match_dup 3)) 0)))]
  &quot;operands[4] = gen_reg_rtx (&lt;MODE&gt;mode);&quot;)
 
 (define_insn_and_split &quot;*&lt;insn&gt;&lt;mode&gt;3_mask_1&quot;
@@ -18202,7 +18203,8 @@
   == GET_MODE_BITSIZE (&lt;MODE&gt;mode) - 1&quot;
  [(set (match_dup 4) (match_dup 1))
   (set (match_dup 0)
-       (any_rotate:SWI (match_dup 4) (match_dup 2)))]
+       (any_rotate:SWI (match_dup 4)
+		       (and:QI (match_dup 2) (match_dup 3))))]
  &quot;operands[4] = gen_reg_rtx (&lt;MODE&gt;mode);&quot;)
 
 (define_insn_and_split &quot;*&lt;insn&gt;&lt;mode&gt;3_add&quot;</code></pre>
<p>Here <code>gcc</code> incorrectly compiled <code>bug_o2()</code> into a single <code>btr</code>
instruction. <code>gcc</code> assumed <code>btr</code> performs a typical 8-bit mask on
register operand like other instructions do. But in case of <code>btr</code> it’s
a 3/4/5-bit mask (for 8/16/32-bit offsets).</p>
<h3 id="mesonlsp-bug"><code>mesonlsp</code> bug</h3>
<p>The <a href="https://gcc.gnu.org/PR118856"><code>mesonlsp</code> bug</a> was also interesting.
There seemingly trivial code:</p>
<pre class="cpp"><code>// $ cat bug.cpp
#include &lt;string&gt;
#include &lt;vector&gt;

int main(){
  for (const auto &amp;vec : std::vector&lt;std::vector&lt;std::string&gt;&gt;{
           {&quot;aaa&quot;},
       }) {
  }
}</code></pre>
<p>crashed at runtime:</p>
<pre><code># ok
$ g++ bug.cpp -o bug -fsanitize=address
$ ./bug

# bad:
$ g++ bug.cpp -o bug -fsanitize=address -std=c++23
$ ./bug

=================================================================
==3828042==ERROR: AddressSanitizer: heap-use-after-free on address 0x7ba90dbe0040 at pc 0x000000404279 bp 0x7ffd9db5c110 sp 0x7ffd9db5c108
READ of size 8 at 0x7ba90dbe0040 thread T0
    #0 0x000000404278 in std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_data() const (bug+0x404278)
...

0x7ba90dbe0040 is located 0 bytes inside of 32-byte region [0x7ba90dbe0040,0x7ba90dbe0060)
freed by thread T0 here:
    #0 0x7f790f1180c8 in operator delete(void*, unsigned long) (/&lt;&lt;NIX&gt;&gt;/gcc-15.0.1-lib/lib/libasan.so.8+0x1180c8)
    #1 0x000000406a4b in std::__new_allocator&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;::deallocate(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;*, unsigned long) (bug+0x406a4b)
...

previously allocated by thread T0 here:
    #0 0x7f790f1171a8 in operator new(unsigned long) (/&lt;&lt;NIX&gt;&gt;/gcc-15.0.1-lib/lib/libasan.so.8+0x1171a8)
    #1 0x000000404c9f in std::__new_allocator&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; &gt;::allocate(unsigned long, void const*) (bug+0x404c9f)
...

SUMMARY: AddressSanitizer: heap-use-after-free (bug+0x404278) in std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt;::_M_data() const
Shadow bytes around the buggy address:
  0x7ba90dbdfd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ba90dbdfe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ba90dbdfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ba90dbdff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7ba90dbdff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=&gt;0x7ba90dbe0000: fa fa 00 00 00 fa fa fa[fd]fd fd fd fa fa fd fd
  0x7ba90dbe0080: fd fa fa fa fd fd fd fd fa fa fa fa fa fa fa fa
  0x7ba90dbe0100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7ba90dbe0180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7ba90dbe0200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7ba90dbe0280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==3828042==ABORTING</code></pre>
<p>It’s a use-after-free bug. Caused by the <code>gcc</code> bugs in temporary
variables lifetime tracking. The <code>gcc</code> fixes
(<a href="https://gcc.gnu.org/cgit/gcc/commit/?id=e96e1bb69c7b46db18e747ee379a62681bc8c82d">one</a>,
<a href="https://gcc.gnu.org/cgit/gcc/commit/?id=720c8f685210af9fc9c31810e224751102f1481e">two</a>)
are not very small, thus I’ll not post them here.</p>
<h2 id="histograms">histograms</h2>
<p>As usual what are the subsystems we found the bugs in?</p>
<ul>
<li><code>c++</code>: 8</li>
<li><code>middle-end</code>: 6</li>
<li><code>tree-optimization</code>: 4</li>
<li><code>bootstrap</code>: 2</li>
<li><code>c</code>: 2</li>
<li><code>libstdc++</code>: 2</li>
<li><code>lto</code>: 2</li>
<li><code>rtl-optimization</code>: 2</li>
<li><code>target</code>: 2</li>
<li><code>analyzer</code>: 1</li>
<li><code>preprocessor</code>: 1</li>
</ul>
<p>Surprisingly this time <code>c++</code> is at the top of the list. It feels like
coroutine related bugs pushed the needle. Otherwise, <code>middle-end</code> and
<code>tree-optimization</code> that follow are expected.</p>
<h2 id="parting-words">parting words</h2>
<p>Of the bugs above it looks like I reported only 18 of those while 13
were already reported by others.</p>
<p>Optimized handling of global constant arrays (<code>#embed</code>-style code) caused
numerous bugs in various subsystems from compiler crashes to wrong code.</p>
<p>The most disruptive change probably is the switch to
<a href="https://trofi.github.io/posts/326-gcc-15-switched-to-c23.html"><code>c23</code></a>.</p>
<p>Past month was very quiet from <code>gcc</code> bugs view. <code>gcc-15</code> is in a good
shape to be released.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>Trying out helix editor</title>
    <link href="https://trofi.github.io/posts/331-trying-out-helix-editor.html" />
    <id>https://trofi.github.io/posts/331-trying-out-helix-editor.html</id>
    <published>2025-02-15T00:00:00Z</published>
    <updated>2025-02-15T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>This is another February story about text editors similar to the
<a href="https://trofi.github.io/posts/277-from-mcedit-to-vim.html"><code>vim</code> one</a>. You might want to
ignore this one as well :)</p>
<h2 id="tldr">Tl;DR</h2>
<p><code>helix</code> is a nice program: I switched to it from <code>vim</code> as a default text
editor. If you never heard of <code>helix</code> editor and are <code>vim</code> or <code>nvim</code>
user I suggest you to look at it. <code>hx --tutor</code> is short and yet
it covers a few cool things. <a href="https://helix-editor.com/" class="uri">https://helix-editor.com/</a> has a nice
<code>asciinema</code> intro and shows expected look and feel.</p>
<h2 id="background">Background</h2>
<p>I am a happy 2 years old <code>vim</code> user with a simple
<a href="https://github.com/trofi/home/blob/master/.vimrc"><code>~/.vimrc</code> config</a>.
Strong <code>vim</code> features for me are:</p>
<ul>
<li>startup speed</li>
<li>UI speed</li>
<li>basic spell checking</li>
<li>syntax highlighting for many languages</li>
<li>tab/space whitespace highlighting</li>
<li>configurable color scheme (ideally a blue one)</li>
<li><code>emacs</code>-style page scrolling and line editing when in insert mode</li>
</ul>
<p>I still manage to use <code>vim</code> without any external plugins.</p>
<p>The weak <code>vim</code> points for me are:</p>
<ul>
<li><code>vim</code>-specific configuration language (I don’t know how to read anything
beyond trivial <code>set</code> assignments)</li>
<li><code>vim</code>-specific regex language extensions (<code>:h /magic</code>)</li>
<li>lack of language server protocol support (<code>LSP</code>) support without
external plugins</li>
<li>defaults keep compatibility with old versions of <code>vim</code> which sometimes
don’t make sense to me as a new user</li>
<li>it’s written in an ancient form of <code>C</code> which is known to trigger some
ubiquitous safety checks and have to disable them like
<a href="https://github.com/vim/vim/issues/5581"><code>-D_FORTIFY_SOURCE=1</code> hack</a></li>
<li>“backwards” model of many actions in normal mode</li>
</ul>
<p>To expand a bit on “backwards” model here is a simple example: in <code>vim</code>
the key sequence <code>df&lt;</code> will delete (<code>d</code>) everything from current
position to <code>&lt;</code> symbol inclusive. But you will not see what exactly
<code>vim</code> is about to delete until you press <code>&lt;</code>. Would be nice to see what
<code>f&lt;</code> selects first and only then press <code>d</code> with more confidence. As a
result I rarely use such shortcuts despite them being very convenient
for a common editing use case. <code>vim</code>’s very own <code>vf&lt;d</code> command sequence
is a lot more intuitive to what I would expect, but that requires switch
to a visual mode (prefixed <code>v</code>).</p>
<h2 id="a-fun-problem">A fun problem</h2>
<p>From time to time I use <code>vim</code> to write <code>markdown</code> files (such as this
blog post). I like pasting code snippets here and there for better
illustration. Once day I idly wondered if <code>vim</code> could be taught to
support syntax highlighting of the code snippets within the <code>markdown</code>
files:</p>
<pre class="markdown"><code># Would it not be magic if it just worked?

An example `c` snippet within `markdown`:

```c
// What's up here with the highlight?
int like_this_one(long long);
```</code></pre>
<p><code>vim</code> does not do any special highlighting in a <code>c</code> block.</p>
<p>Many days later I encountered a mastodon thread that mentioned <code>vim</code>
proposal to use <code>TextMate</code> grammar
<a href="https://github.com/vim/vim/issues/9087"><code>Issue#9087</code></a>. It discussed
various options of different highlighter engines, their pros, cons, and
what <code>vim</code> should use longer term.
<a href="https://tree-sitter.github.io/tree-sitter/"><code>Tree-sitter</code></a> was
specifically mentioned multiple times there as The Solution to all
the highlighting problems an editor could have. It’s a long discussion
with many branches.
From there I learned about (and started using) a few <code>tree-sitter</code> based
programs, like <a href="https://github.com/sharkdp/bat"><code>bat</code></a>,
<a href="https://difftastic.wilfred.me.uk/"><code>difftastic</code></a> and
<a href="https://helix-editor.com/"><code>helix</code></a>.</p>
<p><code>helix</code> editor was mentioned as one of <code>tree-sitter</code> users. I heard
about <code>helix</code> before from my friend. At the time I just started my <code>vim</code>
journey and I did not give <code>helix</code> a serious try.
But this time I felt I was able to compare both.</p>
<h2 id="my-setup">My setup</h2>
<p>I got <code>helix</code> up and running to the state I could use it by default in
2 short evenings. After that I did a few incremental tweaks. My whole
configuration right now is one page long:</p>
<pre class="toml"><code># $ cat config.toml
[editor]
auto-pairs = false
bufferline = &quot;always&quot;
rulers = [73]
true-color = true # TMUX term does not always agree

[editor.statusline]
# added &quot;file-type&quot;, &quot;position-percentage&quot;
right = [&quot;file-type&quot;, &quot;diagnostics&quot;, &quot;selections&quot;, &quot;register&quot;, &quot;position&quot;, &quot;position-percentage&quot;, &quot;file-encoding&quot;]

[editor.whitespace.render]
space = &quot;all&quot;
tab = &quot;all&quot;
nbsp = &quot;all&quot;
nnbsp = &quot;all&quot;

[editor.whitespace.characters]
tab = &quot;&gt;&quot;
tabpad = &quot;-&quot;

[keys.insert]
C-c = &quot;normal_mode&quot;
C-up = [&quot;scroll_up&quot;, &quot;move_visual_line_up&quot;]
C-down = [&quot;scroll_down&quot;, &quot;move_visual_line_down&quot;]

[keys.normal]
C-c = &quot;normal_mode&quot;
C-up = [&quot;scroll_up&quot;, &quot;move_visual_line_up&quot;]
C-down = [&quot;scroll_down&quot;, &quot;move_visual_line_down&quot;]
ins = [&quot;insert_mode&quot;]

[keys.select]
C-c = &quot;normal_mode&quot;
C-up = [&quot;scroll_up&quot;, &quot;move_visual_line_up&quot;]
C-down = [&quot;scroll_down&quot;, &quot;move_visual_line_down&quot;]</code></pre>
<p>On top of that I enabled a few language servers not configured in
<code>helix</code> by default:</p>
<pre class="toml"><code># $ cat languages.toml

# spell checker
[language-server.harper-ls]
command = &quot;harper-ls&quot;
args = [&quot;--stdio&quot;]

[[language]]
name = &quot;markdown&quot;
language-servers = [&quot;marksman&quot;, &quot;harper-ls&quot;]
auto-format = false

[language-server.rust-analyzer.config]
check.command = &quot;clippy&quot;</code></pre>
<h2 id="niceties">Niceties</h2>
<p>I was surprised to discover how much <code>helix</code> already provides without
much extra configuration:</p>
<ul>
<li>24-bit colors in the terminal emulator (I use <code>alacritty</code> most of the
time and I appreciate finer grained colors)</li>
<li>a ton of pre-configure <code>LSP</code> servers: <code>hx --health</code> reports 273 lines</li>
<li>helpful pop-ups when prefix keys are pressed, like <code>&lt;space&gt;</code>, <code>:</code>, or
<code>m</code></li>
<li><code>tree-sitter</code>-based syntax highlighting makes highlighting more
consistent across languages</li>
</ul>
<h3 id="toml-configuration-language"><code>toml</code> configuration language</h3>
<p>I grew to like <code>toml</code> compared to other custom <code>.ini</code>-like formats.
Custom configurations are sometimes not general enough. For example in
<code>nix.conf</code> config there is no way (to my knowledge) to add a trailing
whitespace to:</p>
<pre class="ini"><code>bash-prompt-suffix = dev&gt;</code></pre>
<p><code>toml</code> on the other hand, is more predictable in this regard. It is quite
common and expressive enough to encode simple arrays and strings with
any contents.</p>
<p>I’m a bit afraid of the configurations that are programming languages.
It’s not too bad for me when they are general-purpose languages with
good error messages, well understood semantics, and present
introspection for available options and helpers.</p>
<h3 id="color-themes-can-be-set-in-rgb">Color themes can be set in <code>RGB</code></h3>
<p>Using full <code>RGB</code> range to define color elements is great. <code>helix</code> comes
with a nice dark default theme suitable for long editing sessions.</p>
<p>The only caveat of a default theme is that some colors are not unique
for cases where it matters. For example, to work around a bug in default
theme I needed to pick a different color for secondary selection:</p>
<pre class="toml"><code># $ cat themes/sf.toml
inherits = &quot;default&quot;

# workaround https://github.com/helix-editor/helix/issues/12601
&quot;ui.selection&quot; = { bg = &quot;#540020&quot; }
&quot;ui.selection.primary&quot; = { bg = &quot;#540099&quot; }</code></pre>
<p>Using <code>RGB</code> is so much better than picking a pre-defined color. I did
not know I need it until I tried :)</p>
<h3 id="selection-and-multi-selection-feels-intuitive">Selection and multi-selection feels intuitive</h3>
<p><code>helix</code> does show what navigation commands select before I about to do
an action. An example would be <code>vim</code> <code>df"</code> sequence compared to
<code>helix</code> <code>f"d</code> sequence. Before pressing <code>d</code> I am more confident what
it is about to delete.</p>
<p>After using <code>helix</code> for a while I am actually more comfortable using
<code>vim</code> <code>f</code> / <code>t</code> (and similar) navigation commands because I understand
better what they actually do.</p>
<p>Multi-selection is also very natural: you create a bunch of cursors
based on you search (<code>s</code> command) in your selection (or by extending a
column with <code>C</code> command) and start modifying text interactively at each
active cursor at the same time. In multi-selection mode it’s more
natural to use navigation commands like <code>f</code>, <code>t</code> and <code>w</code> to do bulk
edits. In <code>vim</code> I used to use arrow keys more and did not see much use
for more complex navigation commands. But now once I’m used to then I’m
using them in <code>vim</code> as well.</p>
<h3 id="ide-experience-is-unexpectedly-good">IDE experience is unexpectedly good</h3>
<p>The <code>LSPs</code> provide you a navigation, hints, symbol search and so much
more. It’s so easy to explore new and existing projects for various
cross-references. Before jumping into the target you can look at the bit
of context in preview and it might be enough for a thing you are
looking for!</p>
<p>Even in this post <code>&lt;space&gt;s</code> (symbol lookup) provides a Table Of
Contents output with a preview.</p>
<p>For development projects <code>&lt;space&gt;f</code> provides you a file picker with a
fuzzy search. Now I have to rely a lot less on mashing <code>&lt;TAB&gt;</code> in the
shell to get to a file I want to edit. I even installed <code>fzf</code> to emulate
similar fuzzy search experience when I need it in <code>bash</code> to pass a file
to other programs.</p>
<p>To make <code>clangd</code> to work (a <code>C</code> or <code>C++</code> LSP) one needs
a <code>compile_commands.json</code> file. <code>meson</code>-based projects just create it
unconditionally, <code>cmake</code>-based projects do it after
<code>-DCMAKE_EXPORT_COMPILE_COMMANDS=YES</code> option is enabled and for
<code>autotools</code>-based projects there is a
<a href="https://github.com/rizsotto/Bear"><code>bear</code></a> hack to wrap <code>make</code> and
extract the commands after the build.</p>
<p><code>bear</code> is not able to handle projects like <code>gcc</code> where local <code>gcc</code> is
used to compile most of the code. But smaller ones work good enough.</p>
<p><a href="https://github.com/helix-editor/helix/wiki/Language-Server-Configurations" class="uri">https://github.com/helix-editor/helix/wiki/Language-Server-Configurations</a>
has tips for many more language servers.</p>
<h2 id="snags">Snags</h2>
<p>Over past month of <code>helix</code> use as a default editor I encountered a few
limitations I had to work around or just accept the limitation.</p>
<h3 id="trailing-whitespace-highlighting-minor">Trailing whitespace highlighting (minor)</h3>
<p>Whitespace highlighting is a bit blunt: I would prefer spaces to be
highlighted only in trailing context while highlighting tabs everywhere.
<a href="https://github.com/helix-editor/helix/issues/2719" class="uri">https://github.com/helix-editor/helix/issues/2719</a>.</p>
<p>But it’s not a big deal. I need to be careful to copy text into clipboard
buffer not with a mouse selection but via <code>&lt;space&gt;y</code></p>
<h3 id="spell-checking-minor">Spell checking (minor)</h3>
<p>I was surprised to see that <code>helix</code> does not yet have spell checking
integration and was afraid I could not use it as I make huge amount of
typos and rely on <code>aspell</code> so much. The integration is tracked by
<a href="https://github.com/helix-editor/helix/issues/11660" class="uri">https://github.com/helix-editor/helix/issues/11660</a>.</p>
<p>But luckily there are a few language servers that do implement spell
checking. I’m using <code>harper</code> as:</p>
<pre class="toml"><code>[language-server.harper-ls]
command = &quot;harper-ls&quot;
args = [&quot;--stdio&quot;]</code></pre>
<p>It works reasonably well for English but does not support anything else.
Having an <code>LSP</code> based on something like <code>aspell</code> would be nice.</p>
<h3 id="default-theme-colors-minor">Default theme colors (minor)</h3>
<p><code>helix</code> tutorial has section that demonstrates primary and secondary
selections via <code>(</code> and <code>)</code> navigation. But the colors of both selection
types are identical in the default theme. That was very confusing. The
issue is 2.5 years old:
<a href="https://github.com/helix-editor/helix/issues/12601" class="uri">https://github.com/helix-editor/helix/issues/12601</a>.</p>
<p>Luckily the workaround is trivial: you can modify default theme just
for selection distinct colors:</p>
<pre class="toml"><code>inherits = &quot;default&quot;

&quot;ui.selection&quot; = { bg = &quot;#540020&quot; }
&quot;ui.selection.primary&quot; = { bg = &quot;#540099&quot; }</code></pre>
<h3 id="saving-last-cursor-position-in-the-file">Saving last cursor position in the file</h3>
<p>I like the save/restore of the file cursor in edited file. It was a
default <code>vim</code> setting. <code>helix</code> does not have an equivalent yet:
<a href="https://github.com/helix-editor/helix/issues/1133" class="uri">https://github.com/helix-editor/helix/issues/1133</a></p>
<h3 id="buffer-search-only-highlights-current-search-not-all">Buffer search only highlights current search, not all</h3>
<p>In <code>vim</code> I was using <code>set hlsearch</code> option to highlight (and keep
highlighted) all the occurrences that match the search. Not just the
current one. <code>helix</code> is yet to implement it:
<a href="https://github.com/helix-editor/helix/issues/1733" class="uri">https://github.com/helix-editor/helix/issues/1733</a>.</p>
<h3 id="generic-autocomplete">Generic autocomplete</h3>
<p>I like autocompletion of arbitrary words in a text file.
So far <code>helix</code> only ever does <code>LSP</code> autocompletion.</p>
<p><a href="https://github.com/helix-editor/helix/issues/1063" class="uri">https://github.com/helix-editor/helix/issues/1063</a> tracks the addition
of a simple keyword-based completer.</p>
<h2 id="parting-words">Parting words</h2>
<p><code>helix</code> feels like a good modern <code>vim</code> successor for my use cases. It’s
extensive use of <code>RGB</code> colors and unicode characters gives a look and
feel of a program beyond a terminal application. A ton of pre-configured
<code>LSPs</code> makes it a nice lightweight code navigator on par with IDE
experience.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>Another Nix Expression Language non-determinism example</title>
    <link href="https://trofi.github.io/posts/330-another-nix-language-nondeterminism-example.html" />
    <id>https://trofi.github.io/posts/330-another-nix-language-nondeterminism-example.html</id>
    <published>2024-12-26T00:00:00Z</published>
    <updated>2024-12-26T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>Today I found another source of non-determinism in <code>nix expression language</code>.
This time it’s a
<a href="https://github.com/NixOS/nix/issues/12106"><code>builtins.sort</code></a> primitive!
How do you break <code>sort</code>?
Compared to the
<a href="https://trofi.github.io/posts/292-nix-language-nondeterminism-example.html">previous non-determinism instance</a>
this case of non-determinism breaking <code>sort</code> is not as arcane.</p>
<h2 id="a-working-sort-example">a working <code>sort</code> example</h2>
<p>Before triggering the problematic condition let’s look at a working sort:</p>
<pre><code>$ nix repl
nix-repl&gt; builtins.sort builtins.lessThan [ 4 3 2 1 ]
[
  1
  2
  3
  4
]

nix-repl&gt; builtins.sort (a: b: a &lt; b) [ 4 3 2 1 ]
[
  1
  2
  3
  4
]</code></pre>
<p>All nice and good: we pass the comparison predicate and get some result
back. In the first case we are passing a builtin comparator. In the
second case we write a lambda that implements <code>&lt;</code>. Nothing fancy.
Normally <code>sort</code> function expects a bunch of properties from the passed
predicate, like
<a href="https://en.wikipedia.org/wiki/Weak_ordering#Strict_weak_orderings">“strict weak ordering”</a>
to return something that looks sorted.</p>
<h2 id="suspicious-sort-call">suspicious <code>sort</code> call</h2>
<p>But what happens if we pass a predicate that does not satisfy that
property? On vanilla <code>nix</code> that would be:</p>
<pre><code>nix-repl&gt; builtins.sort (a: b: true) [ 4 3 1 2 ]
[ 2 1 3 4 ]</code></pre>
<p>The result is not sensible, but at least it did not crash. All good?</p>
<p><strong>Quiz question:</strong> Is this returned order guaranteed to be the same across <code>nix</code> implementations
on different platforms?</p>
<h2 id="triggering-the-non-determinism">triggering the non-determinism</h2>
<p>Today I tried to build <code>nix</code> package with <code>gcc</code> <code>STL</code>
<a href="https://gcc.gnu.org/onlinedocs/libstdc++/manual/debug_mode_using.html#debug_mode.using.mode">debugging enabled</a>.
In theory it’s simple: you pass <code>-D_GLIBCXX_DEBUG</code> via
<code>CXXFLAGS</code> and you get your debugging for free.
I was chasing an unrelated <code>nix</code> memory corruption bug and did just that.
I hoped for a simple case like the past <a href="https://github.com/NixOS/nix/pull/8825"><code>PR#8825</code></a>.
To my surprise <code>nixpkgs</code> evaluation started triggering <code>libstdc++</code>
assertions. For the above “suspicious sort” example the execution was:</p>
<pre><code>$ nix eval --expr 'builtins.sort (a: b: true) [ 4 3 2 1 ]

/nix/store/L89IQC7AM6I60Y8VK507ZWRZXF0WCD3V-gcc-14-20241116/include/c++/14-20241116/bits/stl_algo.h:5027:
In function:
    void std::stable_sort(_RAIter, _RAIter, _Compare) [with _RAIter =
    nix::Value**; _Compare = nix::prim_sort(EvalState&amp;, PosIdx, Value**,
    Value&amp;)::&lt;lambda(nix::Value*, nix::Value*)&gt;]

Error: comparison doesn't meet irreflexive requirements, assert(!(a &lt; a)).

Objects involved in the operation:
    instance &quot;functor&quot; @ 0x7ffd7d2fdb00 {
      type = nix::prim_sort(nix::EvalState&amp;, nix::PosIdx, nix::Value**, nix::Value&amp;)::{lambda(nix::Value*, nix::Value*)#1};
    }
    iterator::value_type &quot;ordered type&quot;  {
      type = nix::Value*;
    }
Aborted (core dumped)</code></pre>
<p>Uh-oh. A crash where there was none before. Note how <code>libstdc++</code> tells us
that our comparator is not expected to return <code>true</code> for <code>a &lt; a</code>.</p>
<h2 id="builtins.sort-implementation"><code>builtins.sort</code> implementation</h2>
<p>Looking at the <code>nix</code> implementation around the crash it reveals that
<code>nix</code> uses <code>std::stable_sort</code> to implement <code>builtins.sort</code>
(<a href="https://github.com/NixOS/nix/blob/bff9296ab997269d703c5222b7e17d67a107aeed/src/libexpr/primops.cc#L3642">link</a>) with no predicate validation:</p>
<pre class="cpp"><code>static void prim_sort(EvalState &amp; state, const PosIdx pos, Value * * args, Value &amp; v)
{
    state.forceList(*args[1], pos, &quot;while evaluating the second argument passed to builtins.sort&quot;);

    auto len = args[1]-&gt;listSize();
    if (len == 0) {
        v = *args[1];
        return;
    }

    state.forceFunction(*args[0], pos, &quot;while evaluating the first argument passed to builtins.sort&quot;);

    auto list = state.buildList(len);
    for (const auto &amp; [n, v] : enumerate(list))
        state.forceValue(*(v = args[1]-&gt;listElems()[n]), pos);

    auto comparator = [&amp;](Value * a, Value * b) {
        /* Optimization: if the comparator is lessThan, bypass
           callFunction. */
        if (args[0]-&gt;isPrimOp()) {
            auto ptr = args[0]-&gt;primOp()-&gt;fun.target&lt;decltype(&amp;prim_lessThan)&gt;();
            if (ptr &amp;&amp; *ptr == prim_lessThan)
                return CompareValues(state, noPos, &quot;while evaluating the ordering function passed to builtins.sort&quot;)(a, b);
        }

        Value * vs[] = {a, b};
        Value vBool;
        state.callFunction(*args[0], vs, vBool, noPos);
        return state.forceBool(vBool, pos, &quot;while evaluating the return value of the sorting function passed to builtins.sort&quot;);
    };

    /* FIXME: std::sort can segfault if the comparator is not a strict
       weak ordering. What to do? std::stable_sort() seems more
       resilient, but no guarantees... */
    std::stable_sort(list.begin(), list.end(), comparator);

    v.mkList(list);
}</code></pre>
<p>Here <code>comparator()</code> calls user-supplied function written in
<code>nix expression language</code> directly (if we ignore a performance special
case) into <code>std::stable_sort()</code>. The comment suggests that <code>std::sort()</code>
was already crashing here.
This means that today <code>builtins.sort</code> semantics are following <code>c++</code>
<code>std::stable_sort()</code> along with it’s undefined behaviors and
instability for non-conformant <code>comparator()</code> predicate.</p>
<h2 id="tracking-down-bad-predicates">tracking down bad predicates</h2>
<p><code>nixpkgs</code> is a vast codebase. It’s quite hard to figure out which part
of <code>nix expression language</code> code triggers this condition from a <code>C++</code>
stack trace. I added the following hack into local <code>nix</code> to convert
those violations into nix-level exceptions:</p>
<pre class="diff"><code>--- a/src/libexpr/primops.cc
+++ b/src/libexpr/primops.cc
@@ -3633,6 +3633,24 @@ static void prim_sort(EvalState &amp; state, const PosIdx pos, Value * * args, Value
                 return CompareValues(state, noPos, &quot;while evaluating the ordering function passed to builtins.sort&quot;)(a, b);
         }

+        /* Validate basic ordering requirements for comparator: */
+        {
+            Value * vs[] = {a, a};
+            Value vBool;
+            state.callFunction(*args[0], vs, vBool, noPos);
+            bool br = state.forceBool(vBool, pos, &quot;while evaluating the return value of the sorting function passed to builtins.sort&quot;);
+            if (br)
+                state.error&lt;EvalError&gt;(&quot;!(a &lt; a) assert failed&quot;).atPos(pos).debugThrow();
+        }
+        {
+            Value * vs[] = {b, b};
+            Value vBool;
+            state.callFunction(*args[0], vs, vBool, noPos);
+            bool br = state.forceBool(vBool, pos, &quot;while evaluating the return value of the sorting function passed to builtins.sort&quot;);
+            if (br)
+                state.error&lt;EvalError&gt;(&quot;!(b &lt; b) assert failed&quot;).atPos(pos).debugThrow();
+        }
+
         Value * vs[] = {a, b};
         Value vBool;
         state.callFunction(*args[0], vs, vBool, noPos);</code></pre>
<p>Here before calling the <code>compare(a,b)</code> against two different list
elements we are making sure that <code>compare(a,a)</code> and <code>compare(b,b)</code> does
not return <code>true</code>.
And now the error is a bit less intimidating:</p>
<pre><code>nix-repl&gt; builtins.sort (a: b: true) [ 4 3 2 1 ]
error:
       … while calling the 'sort' builtin
         at «string»:1:1:
            1| builtins.sort (a: b: true) [ 4 3 2 1 ]
             | ^

       error: !(a &lt; a) assert failed</code></pre>
<p>On a <code>nixpkgs</code> input the evaluation now fails as:</p>
<pre><code>$ nix-instantiate -A colmapWithCuda --show-trace
error:
       … while calling a functor (an attribute set with a '__functor' attribute)
         at pkgs/top-level/all-packages.nix:5843:20:
         5842|   colmap = libsForQt5.callPackage ../applications/science/misc/colmap { inherit (config) cudaSupport; };
         5843|   colmapWithCuda = colmap.override { cudaSupport = true; };
             |                    ^
         5844|
...
         at pkgs/development/cuda-modules/generic-builders/multiplex.nix:88:17:
           87|   # perSystemReleases :: List Package
           88|   allReleases = lib.pipe releaseSets [
             |                 ^
           89|     (lib.attrValues)</code></pre>
<p>This points us at
<a href="https://github.com/NixOS/nixpkgs/blob/1557114798a3951db0794379f26b68a5fdf68b12/pkgs/development/cuda-modules/generic-builders/multiplex.nix#L83"><code>cuda-modules/generic-builders/multiplex.nix</code></a>:</p>
<pre class="nix"><code>  preferable =
    p1: p2: (isSupported p2 -&gt; isSupported p1) &amp;&amp; (strings.versionAtLeast p1.version p2.version);

  # ...

  newest = builtins.head (builtins.sort preferable allReleases);</code></pre>
<p>Can you quickly say if <code>preferable</code> satisfies <code>lessThan</code> requirements?
<code>left &gt;= right</code> is generally problematic for sorts:</p>
<pre><code>nix-repl&gt; builtins.sort (a: b: a &gt;= b) [ 4 3 3 1 ]
error:
       … while calling the 'sort' builtin
         at «string»:1:1:
            1| builtins.sort (a: b: a &gt;= b) [ 4 3 3 1 ]
             | ^

       error: !(a &lt; a) assert failed</code></pre>
<p>To make the comparator stricter it should contain strict inequality,
like <code>b &lt; a</code> or <code>!(a &gt;= b)</code>:</p>
<pre><code>nix-repl&gt; builtins.sort (a: b: b &lt; a) [ 4 3 3 1 ]
[
  4
  3
  3
  1
]

nix-repl&gt; builtins.sort (a: b: !(b &gt;= a)) [ 4 3 3 1 ]
[
  4
  3
  3
  1
]</code></pre>
<p>I proposed a seemingly trivial change as <a href="https://github.com/NixOS/nixpkgs/pull/368366"><code>PR#368366</code></a>:</p>
<pre class="diff"><code>--- a/pkgs/development/cuda-modules/generic-builders/multiplex.nix
+++ b/pkgs/development/cuda-modules/generic-builders/multiplex.nix
@@ -81,7 +81,7 @@ let
   redistArch = flags.getRedistArch hostPlatform.system;

   preferable =
-    p1: p2: (isSupported p2 -&gt; isSupported p1) &amp;&amp; (strings.versionAtLeast p1.version p2.version);
+    p1: p2: (isSupported p2 -&gt; isSupported p1) &amp;&amp; (strings.versionOlder p2.version p1.version);

   # All the supported packages we can build for our platform.
   # perSystemReleases :: List Package
</code></pre>
<p>I’m not sure it’s correct.</p>
<p><code>cuda-modules</code> is not the only <code>sort</code> <code>lessThan</code> property violation. Next
failure is a <code>stan</code> package:</p>
<pre><code>$ nix build --no-link -f. cmdstan --show-trace
...
       … while calling the 'sort' builtin
         at pkgs/build-support/coq/meta-fetch/default.nix:115:55:
          114|     if (isString x &amp;&amp; match &quot;^/.*&quot; x == null) then
          115|       findFirst (v: versions.majorMinor v == x) null (sort versionAtLeast (attrNames release))
             |                                                       ^
          116|     else

       error: !(a &lt; a) assert failed</code></pre>
<p>Here you can already see that the pattern is suspiciously similar:
<code>sort versionAtLeast</code> probably does not do what it’s expected to do.
Proposed a similar fix as <a href="https://github.com/NixOS/nixpkgs/pull/368429"><code>PR#368429</code></a>.</p>
<p>Other packages affected:</p>
<ul>
<li><code>mathematica</code>: <a href="https://github.com/NixOS/nixpkgs/pull/368433">PR#368433</a></li>
</ul>
<p>More stuff to fix!</p>
<h2 id="parting-words">Parting words</h2>
<p><code>nix expression language</code> used in <code>nix</code> package manager is of
minimalistic kind: it does not have much syntax sugar
hoping to reach high performance and predictability of the evaluation.
And it it manages to surprise me time and time again where I have to
debug both <code>nix expression language</code> and it’s underlying <code>c++</code>
implementation.</p>
<p>Sorting is tricky if you allow a user-supplied sorting predicate.</p>
<p><code>nixpkgs</code> has a few more sorting predicate violations that needs to be
fixed. I found at least <a href="https://github.com/NixOS/nixpkgs/pull/368366"><code>cuda</code></a>,
<a href="https://github.com/NixOS/nixpkgs/pull/368429"><code>coq</code></a> and
<a href="https://github.com/NixOS/nixpkgs/pull/368433"><code>mathematica</code></a>.</p>
<p>Examples found after the first version of the post was published:
- <a href="https://github.com/NixOS/nixpkgs/pull/418946"><code>coqPackages_8_20</code> used <code>sort (&lt;=)</code></a></p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>Rebasing past reformats</title>
    <link href="https://trofi.github.io/posts/329-rebasing-past-reformats.html" />
    <id>https://trofi.github.io/posts/329-rebasing-past-reformats.html</id>
    <published>2024-12-12T00:00:00Z</published>
    <updated>2024-12-12T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="tldr">TL;DR</h2>
<p>Did you ever have to deal with a huge list of conflicts on rebase caused
by automatic reformatting of an upstream codebase?
If you got into a similar situation you might be able to automatically
recreate your changes with <code>git filter-branch --tree-filter</code> and a
<code>git commit --allow-empty</code> trick.</p>
<h2 id="story-mode">story mode</h2>
<p>I have local fork of <code>staging</code> branch of
<a href="https://github.com/NixOS/nixpkgs/"><code>nixpkgs</code></a> <code>git</code> repository to do
various tests against experimental upstream packages (like <code>gcc</code> from
<code>master</code> branch) or experimental <code>nix</code> features (like <code>ca-derivations</code>).
I have about 350 patches in the fork. I sync this forked branch about
daily against the upstream <code>nixpkgs/staging</code>. Most of the time
<code>git pull --rebase</code> is enough and no conflicts are there. Once a month
there is one or two files to tweak. Not a big deal.</p>
<p>A few days ago <code>nixpkgs</code> landed a partial source code reformatting
patch as <a href="https://github.com/NixOS/nixpkgs/pull/322537"><code>PR#322537</code></a>. It
automatically re-indents ~21000 <code>.nix</code> files in the repository with a
<code>nixfmt</code> tool. My <code>git pull --rebase</code> generated conflicts on first few
patches against my branch. I aborted it with <code>git rebase --abort</code>.
I would not be able to manually solve such a huge list of commits and I
wondered if I could somehow regenerate my patches against the indented
source.
In theory rebasing past such change should be a mechanical operation: I
have the source tree before the patch and after the patch. All I need to
do is to autoformat both <code>before</code> and <code>after</code> trees and then <code>diff</code>
them.
I managed to do it with help of <code>git commit --allow-empty</code> and
<code>git filter-branch --tree-filter</code>.</p>
<h3 id="actual-commands">actual commands</h3>
<p>Here is the step-by-step I did to rebase my local <code>staging</code> branch past
the source reformatting
<a href="https://github.com/NixOS/nixpkgs/commit/667d42c00d566e091e6b9a19b365099315d0e611"><code>667d42c00d566e091e6b9a19b365099315d0e611</code> commit</a>
to avoid conflicts:</p>
<ol type="1">
<li><p>Create an empty commit (to absorb initial formatting later):</p>
<pre><code>$ git commit --allow-empty -m &quot;EMPTY commit: will absorb relevant formatting changes&quot;</code></pre></li>
<li><p>Move the last empty commit in the patch queue to the beginning of
the patch queue:</p>
<pre><code>$ git rebase -i --keep-base</code></pre>
<p>In the edit menu move the
<code>"EMPTY commit: will absorb relevant formatting changes"</code> entry from
last line of the list to the first line.</p></li>
<li><p>Get files in the branch affected by the formatting change:</p>
<p>The formatting change is <code>667d42c00d566e091e6b9a19b365099315d0e611</code>.</p>
<pre><code>$ FORMATTED_FILES=$(git diff --name-only \
    667d42c00d566e091e6b9a19b365099315d0e611^..667d42c00d566e091e6b9a19b365099315d0e611 \
    -- $(git diff --name-only origin/staging...staging) | tr $'\n' ' ')</code></pre>
<p>This will populate <code>FORMATTED_FILES</code> shell variable with affected
files.</p></li>
<li><p>Reformat the <code>$FORMATTED_FILES</code> files:</p>
<pre><code>$ FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch \
  --tree-filter &quot;nixfmt $FORMATTED_FILES&quot; -- $(git merge-base origin/staging staging)..
...
Rewrite 6fc0a951e9b7a7e3f80628ca0a6c4c9f54fd2dd6 (56/327) (65 seconds passed, remaining 314 predicted)
...
Rewrite c20df82da66da6521f355af508bfedc047cffa64 (326/326) (1183 seconds passed, remaining 0 predicted)
Ref 'refs/heads/staging' was rewritten</code></pre>
<p>This command will populate our empty commit with reformatting changes
and rebase the rest of commits against it without manual intervention.</p></li>
<li><p>Rebase past the formatting as usual:</p>
<pre><code>$ git rebase -i</code></pre>
<p>Here <code>git rebase -i</code> will tell you that the first commit became empty.
You can either skip or commit an empty one. I skipped it with
<code>git rebase --skip</code>.</p></li>
</ol>
<p>Done!</p>
<p>Once I executed the above I got just one trivial conflict unrelated to
reformatting.</p>
<h2 id="parting-words">parting words</h2>
<p><code>git filter-branch --tree-filter</code> is a great tool to mangle the
repository! But before using it make sure you back you local tree: it’s
very easy to get it to “destroy” all your work (<code>git reflog</code> will still
be able to save your past commits).</p>
<p>It took <code>git filter-branch --tree-filter</code> about 5 minutes to rebase
<code>326</code> commits that touch ~200 files. My understanding is that most time
is spent on <code>nixfmt</code> utility itself and not on <code>git</code> operations.
<code>nixfmt</code> is not very fast: it takes about a minute to reformat the whole
of <code>nixpkgs</code> (<code>~300MB</code> of <code>.nix</code> files).</p>
<p><code>nixpkgs</code> plans for reformat event more sources in future. I will likely
be using this tip a few more times.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>C union initialization and gcc-15</title>
    <link href="https://trofi.github.io/posts/328-c-union-init-and-gcc-15.html" />
    <id>https://trofi.github.io/posts/328-c-union-init-and-gcc-15.html</id>
    <published>2024-12-01T00:00:00Z</published>
    <updated>2024-12-01T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="a-contrived-example">a contrived example</h2>
<p>Let’s start from a quiz. What do you think will this program print:</p>
<pre class="c"><code>#include &lt;stdio.h&gt;

__attribute__((noipa)) static void use_stack(void) {
    volatile int foo[] = { 0x40, 0x41, 0x42, 0x43, };
}

__attribute__((noipa)) static int do_it(void) {
    // use 'volatile' to inhibit constant propagation
    volatile union {
        int dummy;
        struct { int fs[4]; } s;
    } v = { 0 };
    return v.s.fs[3];
}

int main(void) {
    use_stack();
    int r = do_it();
    printf(&quot;v.s:\n&quot;);
    printf(&quot;  .fs[3] = %#08x\n&quot;, r);
}</code></pre>
<p>The program initializes <code>v</code> union with <code>{ 0 }</code>. Which should be an
equivalent of <code>v.dummy = 0;</code>. Then the program accesses <code>v.s.fs[3]</code>.
That element does not overlap in memory with <code>v.dummy</code>. What should it
do?</p>
<p>One of the possible answers is: <code>v.s.fs[3]</code> is a garbage value.</p>
<p>Let’s try to run it on <code>gcc-14</code>:</p>
<pre><code>$ gcc-14 a.c -o a -O2 &amp;&amp; ./a
v.s:
  .fs[3] = 00000000</code></pre>
<p>The value is all zeros. Is it a coincidence? <code>valgrind</code> does not
complain either. Let’s have a peek at the disassembly dump:</p>
<pre class="asm"><code>; $ objdump --no-addresses --no-show-raw-insn -d a
&lt;use_stack&gt;:
        movdqa 0xea8(%rip),%xmm0 ; load the constant from memory
        movaps %xmm0,-0x18(%rsp) ; store the constant on stack
        ret
        xchg   %ax,%ax

&lt;do_it&gt;:
        pxor   %xmm0,%xmm0       ; zero-initialize 16 bytes
        movaps %xmm0,-0x18(%rsp) ; store all 16 bytes of zeros on stack
        mov    -0xc(%rsp),%eax   ; read 32-bits of zeros (part of 16-byte
                                 ; zeroing one line above)
        ret</code></pre>
<p><code>gcc-14</code> implements <code>v = { 0 };</code> as a 128-bit (16-byte)
zero initialization of <code>sizeof(v)</code> via
<code>pxor %xmm0,%xmm0; movaps %xmm0,-0x18(%rsp)</code>.</p>
<p>How about <code>gcc-15</code>?</p>
<pre><code>$ gcc a.c -o a -O2 &amp;&amp; ./a
v.s:
  .fs[3] = 0x000043</code></pre>
<p>Whoops. That is clearly uninitialized value left after <code>use_stack()</code>
execution. <code>valigrind</code> is also not happy about it:</p>
<pre><code>$ valgrind --quiet --track-origins=yes ./a
v.s:
Use of uninitialised value of size 8
   at 0x48B954A: _itoa_word (in ...-glibc-2.40-36/lib/libc.so.6)
   by 0x48C43EB: __printf_buffer (in ...-glibc-2.40-36/lib/libc.so.6)
   by 0x48C6300: __vfprintf_internal (in ...-glibc-2.40-36/lib/libc.so.6)
   by 0x48BA71E: printf (in ...-glibc-2.40-36/lib/libc.so.6)
   by 0x401074: main (a.c:20)
 Uninitialised value was created by a stack allocation
   at 0x401190: do_it (a.c:12)</code></pre>
<p>Disassembly:</p>
<pre><code>; $ objdump --no-addresses --no-show-raw-insn -d a
&lt;use_stack&gt;:
        movabs $0x4100000040,%rax ; load 64-bit part 1
        movabs $0x4300000042,%rdx ; load 64-bit part 2
        mov    %rax,-0x18(%rsp)   ; store part 1 on stack
        mov    %rdx,-0x10(%rsp)   ; store part 2 on stack
        ret
        nop

&lt;do_it&gt;:
        movl   $0x0,-0x18(%rsp)   ; zero-initialize first 32 bits of a union
        mov    -0xc(%rsp),%eax    ; read uninitialized 32-bit value at 12-byte
                                  ; offset from a union start
        ret</code></pre>
<p><code>gcc-15</code> implements <code>v = { 0 }</code> as a single 32-bit store as if it was
<code>v.dummy = 0;</code> and leaves the rest of the union intact.</p>
<p>Is it a bug?</p>
<p><code>gcc-15</code> intentionally changed the zeroing behavior to do less in
<a href="https://gcc.gnu.org/PR116416"><code>PR116416</code></a> with
<a href="https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=0547dbb725b6d8e878a79e28a2e171eafcfbc1aa">this commit</a>
to generate more optimal code.</p>
<p>Fun fact: the patch also adds a <code>-fzero-init-padding-bits=unions</code>
option to enable the old behavior.</p>
<h2 id="the-real-bug">the real bug</h2>
<p>The above example might sound theoretical, but I extracted it from an
<code>mbedtls</code> test suite failure. After a recent <code>gcc-15</code> update the tests
are now failing as:</p>
<pre><code>The following tests FAILED:
        91 - psa_crypto-suite (Failed)
       113 - psa_crypto_storage_format.v0-suite (Failed)</code></pre>
<p>I initially thought it’s a compiler bug related to arithmetic. But
exploring the failing test I found the following pattern:</p>
<pre class="c"><code>// at tests/src/psa_exercise_key.c:
  psa_mac_operation_t operation = PSA_MAC_OPERATION_INIT;

// library/psa_crypto.c:

  if (operation.hash_ctx.id != 0) { return error; }
  //...

// include/psa/crypto_struct.h:
  #define PSA_MAC_OPERATION_INIT { 0, 0, 0, { 0 } }

// include/psa/crypto.h:
  typedef struct psa_mac_operation_s psa_mac_operation_t;

// include/psa/crypto_struct.h:
  struct psa_mac_operation_s {
    unsigned int id;
    uint8_t mac_size;
    unsigned int is_sign : 1;
    psa_driver_mac_context_t ctx;
  };

// include/psa/crypto_driver_contexts_composites.h:
  typedef union {
    unsigned dummy; /* Make sure this union is always non-empty */
    mbedtls_psa_mac_operation_t mbedtls_ctx;
  } psa_driver_mac_context_t;

// include/psa/crypto_builtin_composites.h:
  typedef struct {
    psa_algorithm_t alg;
    union {
        unsigned dummy; /* Make the union non-empty even with no supported algorithms. */
        mbedtls_psa_hmac_operation_t hmac;
        mbedtls_cipher_context_t cmac;
    } ctx;
  } mbedtls_psa_mac_operation_t;

// include/psa/crypto_builtin_composites.h
  typedef struct {
    /** The HMAC algorithm in use */
    psa_algorithm_t alg;
    /** The hash context. */
    struct psa_hash_operation_s hash_ctx;
    /** The HMAC part of the context. */
    uint8_t opad[PSA_HMAC_MAX_HASH_BLOCK_SIZE];
  } mbedtls_psa_hmac_operation_t;

// include/psa/crypto_types.h
  typedef uint32_t psa_algorithm_t;

// include/psa/crypto_struct.h
  struct psa_hash_operation_s {
    /** Unique ID indicating which driver got assigned to do the
     * operation. Since driver contexts are driver-specific, swapping
     * drivers halfway through the operation is not supported.
     * ID values are auto-generated in psa_driver_wrappers.h.
     * ID value zero means the context is not valid or not assigned to
     * any driver (i.e. the driver context is not active, in use). */
    unsigned int id;
    psa_driver_hash_context_t ctx;
  };</code></pre>
<p>It’s quite a bit of indirection, but if we compress it into a single
<code>struct</code> definition and remove irrelevant bits we will get something
like that:</p>
<pre class="c"><code>  struct {
    unsigned int id; // initialized below
    uint8_t mac_size; // initialized below
    unsigned int is_sign : 1; // initialized below
    union {
      unsigned dummy; // initialized below
      struct {
        uint32_t alg; // initialized below, alias of `dummy`

        // anything below is NOT initialized

        union {
          unsigned dummy;
          struct {
              uint32_t alg;
              struct {
                  unsigned int id; // &lt;- we are about to use this field
                  psa_driver_hash_context_t ctx;
              } hash_ctx;
              uint8_t opad[PSA_HMAC_MAX_HASH_BLOCK_SIZE];
          } hmac;
          // ..
       } mbedtls_ctx;
    } ctx;
  } operation = {
    0, // id
    0, // mac_size
    0, // is_sign
    { 0 } // ctx.dummy
  };

  if (operation.hash_ctx.id != 0) { return error; }</code></pre>
<p><code>valgrind</code> complains about the use of an uninitialized value as:</p>
<pre><code>$ valgrind --track-origins=yes --trace-children=yes --num-callers=50 --track-fds=yes --leak-check=full --show-reachable=yes --malloc-fill=0xE1 --free-fill=0xF1 tests/test_suite_psa_crypto
...
==2758824== Conditional jump or move depends on uninitialised value(s)
==2758824==    at 0x483C6B: psa_hash_setup (psa_crypto.c:2298)
==2758824==    by 0x490ADA: psa_hmac_setup_internal (psa_crypto_mac.c:90)
==2758824==    by 0x490ADA: psa_mac_setup (psa_crypto_mac.c:299)
==2758824==    by 0x48412C: psa_driver_wrapper_mac_sign_setup (psa_crypto_driver_wrappers.h:2297)
==2758824==    by 0x48412C: psa_mac_setup (psa_crypto.c:2619)
==2758824==    by 0x4083BC: test_mac_key_policy (test_suite_psa_crypto.function:2192)
==2758824==    by 0x408877: test_mac_key_policy_wrapper (test_suite_psa_crypto.function:2264)
==2758824==    by 0x429F4E: dispatch_test (main_test.function:170)
==2758824==    by 0x42A813: execute_tests (host_test.function:676)
==2758824==    by 0x40247A: main (main_test.function:263)
==2758824==  Uninitialised value was created by a stack allocation
==2758824==    at 0x40822C: test_mac_key_policy (test_suite_psa_crypto.function:2167)</code></pre>
<p>Unfortunately I don’t think there is a simple fix for that (apart from
enabling new <code>-fzero-init-padding-bits=unions</code> compiler flag if it’s
supported.
I filed the issue upstream as
<a href="https://github.com/Mbed-TLS/mbedtls/issues/9814"><code>Issue #9814</code></a> hoping to
get some guidance.</p>
<h2 id="parting-words">parting words</h2>
<p><code>gcc-15</code> will be more efficient at handling partial union initialization.</p>
<p>It will likely be at expense of exposing real code bugs like the
<a href="https://github.com/Mbed-TLS/mbedtls/issues/9814"><code>mbedtls</code></a> one to the
users. It’s a bit scary to discover it in security library first.</p>
<p>At least <code>valgrind</code> is able to detect trivial cases of uninitialized
use of partially initialized unions.</p>
<p><code>gcc-15</code> also provides <code>-fzero-init-padding-bits=unions</code> to flip the old
behavior back on. This will allow nailing down bugs using a single
compiler version instead of comparing to <code>gcc-14</code>.</p>
<p>I suspect <code>gcc</code> historically zero-initialized the whole enums to be
closer to incomplete <code>struct</code> initialization
<a href="https://en.cppreference.com/w/c/language/struct_initialization">rule</a>.
But now it causes performance problems if the union branches are
different in size.</p>
<p>I suspect we’ll see a few more projects affected by this change.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>ski 1.5.0 is out</title>
    <link href="https://trofi.github.io/posts/327-ski-1.5.0-is-out.html" />
    <id>https://trofi.github.io/posts/327-ski-1.5.0-is-out.html</id>
    <published>2024-11-23T00:00:00Z</published>
    <updated>2024-11-23T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>TL;DR: <a href="https://github.com/trofi/ski/releases/tag/v1.5.0"><code>ski-1.5.0</code></a> is
available for download!</p>
<p>It is primarily a maintenance release that completely removes <code>motif</code> and
<code>gtk</code> backends, fixes building against <code>C23</code> toolchains and adds a small
<code>-initramfs</code> option to supply separate <code>initramfs</code> file to be used along
with emulated kernel.</p>
<p><a href="https://trofi.github.io/posts/255-ski-1.4.0-is-out.html"><code>1.4.0</code></a> announcement has a few hints
on how to run it.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>gcc-15 switched to C23</title>
    <link href="https://trofi.github.io/posts/326-gcc-15-switched-to-c23.html" />
    <id>https://trofi.github.io/posts/326-gcc-15-switched-to-c23.html</id>
    <published>2024-11-17T00:00:00Z</published>
    <updated>2024-11-17T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="tldr">Tl;DR</h2>
<p>In November <code>gcc</code>
<a href="https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=55e3bd376b2214e200fa76d12b67ff259b06c212">merged</a>
the switch from <code>C17</code> (<code>-std=gnu17</code>) to <code>C23</code> (<code>-std=c23</code>) language
standard used by default for <code>C</code> code.
This will cause quite a few build failures in projects written in
<code>C</code>. A few example fixes:</p>
<ul>
<li><a href="https://github.com/libffi/libffi/pull/861/files"><code>libffi</code></a>: optional
<code>va_start</code> parameter.</li>
<li><a href="https://github.com/vapier/ncompress/pull/40/files"><code>ncompress</code></a>:
<code>void foo()</code> changed the meaning to <code>void foo(void)</code>.</li>
<li><a href="https://lore.kernel.org/ell/20241117001814.2149181-1-slyich@gmail.com/T/#t"><code>ell</code></a>
<code>bool</code>, <code>true</code> and <code>false</code> are new keywords. And specifically <code>false</code>
is not equals to <code>0</code> or <code>NULL</code>.</li>
</ul>
<h2 id="more-words">more words</h2>
<p><code>C23</code> has a few high-visibility breaking changes compared to <code>C17</code>.</p>
<h3 id="bool-true-and-false-are-unconditionally-defined-now"><code>bool</code>, <code>true</code>, and <code>false</code> are unconditionally defined now</h3>
<p><code>true</code> and <code>false</code> are now predefined constants (instead of being a
part of <code>&lt;stdbool.h&gt;</code> macros and <code>typedefs</code>). Thus, code like below does
not compile any more:</p>
<pre class="c"><code>enum { false = 0; }
typedef int bool;</code></pre>
<p>Error messages:</p>
<pre><code>$ printf 'enum { false = 0 };' | gcc -std=c17 -c -x c -
$ printf 'enum { false = 0 };' | gcc -c -x c -
&lt;stdin&gt;:1:8: error: expected identifier before 'false'

$ printf 'typedef int bool;' | gcc -std=c17 -c -x c -
$ printf 'typedef int bool;' | gcc -c -x c -
&lt;stdin&gt;:1:13: error: two or more data types in declaration specifiers
&lt;stdin&gt;:1:1: warning: useless type name in empty declaration</code></pre>
<p>The fix is usually to use <code>&lt;stdbool.h&gt;</code> or avoid name collisions.
Example affected project is <code>linux</code>.</p>
<h3 id="partially-defined-int-function-prototypes-are-just-int-void-now">partially defined <code>int (*)()</code> function prototypes are just <code>int (*)(void)</code> now</h3>
<p>This one is trickier to fix when intentionally used. <code>C</code> happened to
allow the following code:</p>
<pre class="c"><code>// $ cat a.c
typedef void (*PF)();

static int f0(void)  { return 42; }
static int f1(int a) { return 42 + a; }

int main() {
    PF pf;

    // 0-argument function pointer
    pf = f0;
    pf();

    // 1-argument function pointer
    pf = f1;
    pf(42);

    // 3-argument function pointer: an odd one, but happens to work
    pf(42,42,42);
}</code></pre>
<p>But not any more:</p>
<pre><code>$ gcc -std=c17 -c a.c
$ gcc -c a.c
a.c: In function 'main':
a.c:15:8: error: assignment to 'PF' {aka 'int (*)(void)'} from incompatible pointer type 'int (*)(int)' [-Wincompatible-pointer-types]
   15 |     pf = f1;
      |        ^
a.c:16:5: error: too many arguments to function 'pf'
   16 |     pf(42);
      |     ^~
a.c:19:5: error: too many arguments to function 'pf'
   19 |     pf(42,42,42);
      |     ^~</code></pre>
<p>This hack is used at least in <code>ski</code>, <code>ghc</code> and <code>ncompress</code>. But more
frequently it’s use is an accident (<code>ell</code>, <code>iwd</code>, <code>bash</code> and a few others).</p>
<h2 id="parting-words">parting words</h2>
<p>Quick quiz: the above changes look like they tickle some very obscure
case. How many packages are affected on a typical desktop system? What
would be your guess? 1? 5? 100? 1000?</p>
<p>So far on my system (~2000 installed packages) I observed the failures
of the following projects:</p>
<ul>
<li><code>linux</code></li>
<li><code>speechd</code></li>
<li><code>vde2</code></li>
<li><code>sane-backends</code></li>
<li><code>timidity</code></li>
<li><code>neovim</code></li>
<li><code>bluez</code></li>
<li><code>samba</code></li>
<li><code>weechat</code></li>
<li><code>iwd</code></li>
<li><code>protobuf</code></li>
<li><code>netpbm</code></li>
<li><code>mariadb-connector-c</code></li>
<li><code>liblqr1</code></li>
<li><code>sqlite-odbc-driver</code></li>
<li><code>python:typed-ast</code></li>
<li><code>python2</code></li>
<li><code>perl:XS-Parse-Keyword</code></li>
<li><code>pgpdump</code></li>
<li><code>ell</code></li>
<li><code>SDL-1</code></li>
<li><code>ruby-3.1</code></li>
<li><code>dnsmasq</code></li>
<li><code>ghc</code></li>
<li><code>gnupg</code></li>
<li><code>ghostscript</code></li>
<li><code>procmail</code></li>
<li><code>jq</code></li>
<li><code>libsndfile</code></li>
<li><code>ppp</code></li>
<li><code>time</code></li>
<li><code>postfi</code>x</li>
<li><code>mcpp</code></li>
<li><code>xmlrpc-c</code></li>
<li><code>unifdef</code></li>
<li><code>hotdoc</code></li>
<li><code>mypy</code></li>
<li><code>rustc</code></li>
<li><code>xorg:libXt</code></li>
<li><code>rsync</code></li>
<li><code>oniguruma</code></li>
<li><code>ltrace</code></li>
<li><code>sudo</code></li>
<li><code>lsof</code></li>
<li><code>lv</code></li>
<li><code>dbus-glib</code></li>
<li><code>argyllcms</code></li>
<li><code>valgrind</code></li>
<li><code>postgresql-14</code></li>
<li><code>gdb</code></li>
<li><code>git</code></li>
<li><code>ncompress</code></li>
<li><code>w3m</code></li>
<li><code>freeglut</code></li>
<li><code>xcur2png</code></li>
<li><code>vifm</code></li>
<li><code>p11-kit</code></li>
<li><code>cyrus-sasl</code></li>
<li><code>xvidcore</code></li>
<li><code>guile</code></li>
<li><code>editline</code></li>
<li><code>e2fsprogs</code></li>
<li><code>gsm</code></li>
<li><code>libconfig</code></li>
<li><code>db</code></li>
<li><code>libtirpc</code></li>
<li><code>nghttp2</code></li>
<li><code>libkrb5</code></li>
<li><code>libgpg-error</code></li>
<li><code>cpio</code></li>
<li><code>sharutils</code></li>
<li><code>gpm</code></li>
<li><code>expect</code></li>
<li><code>ncurses</code></li>
<li><code>yasm</code></li>
<li><code>texinfo-6.7</code></li>
<li><code>gettext</code></li>
<li><code>unzip</code></li>
<li><code>gdbm</code></li>
<li><code>m4</code></li>
<li><code>binutils</code></li>
<li><code>ed</code></li>
<li><code>gmp</code></li>
<li><code>bash</code></li>
</ul>
<p>That’s more than 80 packages, or about 4% of all the packages I have
installed.</p>
<p>Looks like <code>gcc-15</code> will be a disruptive release (just like <code>gcc-14</code>)
that will require quite a few projects to adapt to new requirements
(either by fixing code or by slapping <code>-std=gnu17</code> as a requirement).</p>
<p>Most of the build failures above are not yet fixed upstream. These can be
good first contribution fixes if you are thinking of making one.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>Zero Hydra Failures towards 24.11 NixOS release</title>
    <link href="https://trofi.github.io/posts/325-Zero-Hydra-Failures-towards-24.11-NixOS-release.html" />
    <id>https://trofi.github.io/posts/325-Zero-Hydra-Failures-towards-24.11-NixOS-release.html</id>
    <published>2024-11-04T00:00:00Z</published>
    <updated>2024-11-04T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p><code>ZHF</code> (or Zero Hydra Failures) is the time when most build failures are
squashed before final <code>NixOS-24.11</code> release
(see <a href="https://github.com/NixOS/nixpkgs/issues/352882">full release schedule</a>).</p>
<p>To follow the tradition let’s fix one bug for <code>ZHF</code>.
I picked <a href="https://hydra.nixos.org/build/276690936"><code>xorg.libAppleWM</code></a> build
failure. It’s not a very popular package.
The failure looks trivial:</p>
<pre><code>make[2]: Entering directory '/build/libapplewm-be972ebc3a97292e7d2b2350eff55ae12df99a42/src'
  CC       applewm.lo
gcc: error: unrecognized command-line option '-iframeworkwithsysroot'</code></pre>
<p>The build was happening for <code>x86_64-linux</code> target. While this package
is <code>MacOS</code>-specific: it uses Darwin APIs and links to its libraries
directly. No reason to try to build it on <code>x86_64-linux</code>.
The fix is to constrain the package to <code>darwin</code> targets (the default
platforms for <code>xorg</code> packages is <code>unix</code>):</p>
<pre class="diff"><code>--- a/pkgs/servers/x11/xorg/overrides.nix
+++ b/pkgs/servers/x11/xorg/overrides.nix
@@ -171,6 +171,9 @@ self: super:
   libAppleWM = super.libAppleWM.overrideAttrs (attrs: {
     nativeBuildInputs = attrs.nativeBuildInputs ++ [ autoreconfHook ];
     buildInputs =  attrs.buildInputs ++ [ xorg.utilmacros ];
+    meta = attrs.meta // {
+      platforms = lib.platforms.darwin;
+    };
   });

   libXau = super.libXau.overrideAttrs (attrs: {</code></pre>
<p>This fix is now known as
<a href="https://github.com/NixOS/nixpkgs/pull/353618"><code>PR#353618</code></a>.</p>
<h2 id="parting-words">Parting words</h2>
<p>I picked very lazy example of a broken package.
<a href="https://github.com/NixOS/nixpkgs/issues/352882" class="uri">https://github.com/NixOS/nixpkgs/issues/352882</a> contains more links and
hints on how to find and fix known failures.</p>
<p>As usual contributing towards <code>ZHF</code> is very easy. Give it a try!</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>xmms2 0.9.4 is out</title>
    <link href="https://trofi.github.io/posts/324-xmms2-0.9.4-is-out.html" />
    <id>https://trofi.github.io/posts/324-xmms2-0.9.4-is-out.html</id>
    <published>2024-10-07T00:00:00Z</published>
    <updated>2024-10-07T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>Tl;DR: <code>xmms2-0.9.4</code> is out and you can get it at
<a href="https://github.com/xmms2/xmms2-devel/releases/tag/0.9.4" class="uri">https://github.com/xmms2/xmms2-devel/releases/tag/0.9.4</a>!</p>
<p><a href="https://github.com/xmms2"><code>xmms2</code></a> is still a music player
daemon with various plugins to support stream decoding and
transformation. See
<a href="https://trofi.github.io/posts/244-xmms2-0.9.1-is-out.html">older announcement</a> on how to
get started with <code>xmms2</code>.</p>
<h2 id="highlights">Highlights</h2>
<p>It’s a small maintenance release. The only notable change is support for
<code>ffmpeg-7</code> as a build dependency.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>gcc-15 bugs, pile 1</title>
    <link href="https://trofi.github.io/posts/323-gcc-15-bugs-pile-1.html" />
    <id>https://trofi.github.io/posts/323-gcc-15-bugs-pile-1.html</id>
    <published>2024-08-25T00:00:00Z</published>
    <updated>2024-08-25T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>About 4 months have passed since <code>gcc-14.1.0</code> release. Around the same
time <code>gcc-15</code> development has started and a few major changes were
merged into the <code>master</code> development branch.</p>
<h2 id="summary">summary</h2>
<p>This time I waited to collect about 20 bug reports I encountered:</p>
<ul>
<li><a href="https://gcc.gnu.org/PR114933"><code>c++/114933</code></a>: <code>mcfgthread-1.6.1</code>
type check failure. Ended up being <code>mcfgthread</code> bug caused by stronger
<code>gcc</code> checks.</li>
<li><a href="https://gcc.gnu.org/PR114872"><code>tree-optimization/114872</code></a>: <code>sagemath</code>
<code>SIGSEGV</code>ed due to broken assumptions around <code>setjmp()</code> / <code>longjmp()</code>.
Not a <code>gcc</code>bug either.</li>
<li><a href="https://gcc.gnu.org/PR115115"><code>target/115115</code></a>: <code>highway-1.0.7</code> test
suite expected too specific <code>_mm_cvttps_epi32()</code> semantics. A <code>gcc-12</code>
regression!</li>
<li><a href="https://gcc.gnu.org/PR115146"><code>target/115146</code></a>: <code>highway-1.0.7</code> test
suite exposed <code>gcc-15</code> bug in vectoring <code>bswap16()</code>-like code.</li>
<li><a href="https://gcc.gnu.org/PR115227"><code>tree-optimization/115227</code></a>: <code>libepoxy</code>,
<code>p11-kit</code> and <code>doxygen</code> can’t fit in RAM of 32-bit <code>gcc</code> due to memory
leak in value range propagation subsystem.</li>
<li><a href="https://gcc.gnu.org/PR115397"><code>target/115397</code></a>: <code>numpy</code> ICE for <code>-m32</code>:
<code>gcc</code> code generator generated a constant pool memory reference and
crashed in instruction selection.</li>
<li><a href="https://gcc.gnu.org/PR115403"><code>c++/115403</code></a>: <code>highway</code> build failure
due to wrong scope handling of <code>#pragma GCC target</code> by <code>gcc</code>.</li>
<li><a href="https://gcc.gnu.org/PR115602"><code>tree-optimization/115602</code></a>:
<code>liblapack-3.12.0</code> ICE in <code>slp</code> pass. <code>gcc</code> generated a self-reference
cycle after applying code sub-expression elimination.</li>
<li><a href="https://gcc.gnu.org/PR115655"><code>bootstrap/115655</code></a>: <code>gcc</code> bootstrap
failure on <code>-Werror=unused-function</code>.</li>
<li><a href="https://gcc.gnu.org/PR115797"><code>libstdc++/115797</code></a>: <code>gcc</code> failed to
compile <code>extern "C" { #include &lt;math.h&gt; }</code> code. <code>&lt;math.h&gt;</code> was fixed
to survive such imports.</li>
<li><a href="https://gcc.gnu.org/PR115863"><code>middle-end/115863</code></a>: wrong code on
<code>zlib</code> when handling saturated logic. A bug in truncation handling.</li>
<li><a href="https://gcc.gnu.org/PR115916"><code>rtl-optimization/115916</code></a>: wrong code on
<code>highway</code>. Bad arithmetic shift <code>ubsan</code>-related fix in <code>gcc</code>’s own code.</li>
<li><a href="https://gcc.gnu.org/PR115961"><code>middle-end/115961</code></a>: wrong code on <code>llvm</code>,
bad bit field truncation handling for sub-byte bitfield sizes. Saturated
truncation arithmetics handling was applied too broadly.</li>
<li><a href="https://gcc.gnu.org/PR115991"><code>tree-optimization/115991</code></a>: ICE on
<code>linux-6.10</code>. Caused by too broad acceptance of sub-register use in an
instruction. ENded up selecting invalid instructions.</li>
<li><a href="https://gcc.gnu.org/PR116037"><code>rtl-optimization/116037</code></a>: <code>python3</code>
hang up due to an <code>-fext-dce</code> bug.</li>
<li><a href="https://gcc.gnu.org/PR116200"><code>rtl-optimization/116200</code></a>: crash during
<code>gcc</code> bootstrap, wrong code on <code>libgcrypt</code>. A bug in <code>RTL</code> constant pool
handling.</li>
<li><a href="https://gcc.gnu.org/PR116353"><code>rtl-optimization/116353</code></a>: ICE on
<code>glibc-2.39</code>. Another <code>RTL</code> bug where <code>gcc</code> instruction selector was
presented with invalid value reference.</li>
<li><a href="https://gcc.gnu.org/PR116411"><code>middle-end/116411</code></a>: ICE on
<code>readline-8.2p13</code>. Conditional operation was incorrectly optimized for
some of built-in functions used in branches.</li>
<li><a href="https://gcc.gnu.org/PR116412"><code>tree-optimization/116412</code></a>: ICE on
<code>openblas-0.3.28</code>. Similar to the above: conditional operation was
incorrectly optimized for complex types.</li>
</ul>
<h2 id="fun-bug">fun bug</h2>
<p>The <a href="https://gcc.gnu.org/PR115863"><code>zlib</code> bug</a> is probably the most
unusual one. Due to a typo in newly introduced set of optimizations
<code>gcc</code> managed to convert <code>a &gt; b ? b : a</code> type of expressions into an
equivalent of <code>b &gt; a ? b : a</code>. But it only does it for <code>b = INT_MAX</code>
type of arguments (case of saturation).</p>
<p>As a result it only broke <code>zlib</code> test suite as it specifically tests for
out of range access to cause <code>SIGSEGV</code>. For well-behaved inputs it never
caused any problems. The <code>gcc</code> fix
<a href="https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=aae535f3a870659d1f002f82bd585de0bcec7905">was trivial</a>:</p>
<pre class="diff"><code>--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9990,7 +9990,7 @@
   rtx sat = force_reg (DImode, GEN_INT (GET_MODE_MASK (&lt;MODE&gt;mode)));
   rtx dst;

-  emit_insn (gen_cmpdi_1 (op1, sat));
+  emit_insn (gen_cmpdi_1 (sat, op1));

   if (TARGET_CMOVE)
     {
@@ -10026,7 +10026,7 @@
   rtx sat = force_reg (SImode, GEN_INT (GET_MODE_MASK (&lt;MODE&gt;mode)));
   rtx dst;

-  emit_insn (gen_cmpsi_1 (op1, sat));
+  emit_insn (gen_cmpsi_1 (sat, op1));

   if (TARGET_CMOVE)
     {
@@ -10062,7 +10062,7 @@
   rtx sat = force_reg (HImode, GEN_INT (GET_MODE_MASK (QImode)));
   rtx dst;

-  emit_insn (gen_cmphi_1 (op1, sat));
+  emit_insn (gen_cmphi_1 (sat, op1));

   if (TARGET_CMOVE)
     {</code></pre>
<p>We swap argument order to restore original intent.</p>
<h2 id="histograms">histograms</h2>
<p>Where did most <code>gcc</code> bugs come from?</p>
<ul>
<li><code>tree-optimization</code>: 4</li>
<li><code>rtl-optimization</code>: 4</li>
<li><code>middle-end</code>: 3</li>
<li><code>target</code>: 3</li>
<li><code>c++</code>: 1</li>
<li><code>bootstrap</code>: 1</li>
<li><code>libstdc++</code>: 1</li>
</ul>
<p>As usual <code>tree-optimization</code> is at the top of subsystem causing troubles.
But this time <code>rtl-optimization</code> got close to it as well.</p>
<p><code>highway</code> managed to yield us 4 new bugs while <code>llvm</code> got us just one
new bug.</p>
<h2 id="parting-words">parting words</h2>
<p><code>gcc-15</code> got a few very nice optimizations (and bugs) related to
saturated truncation, zero/sign-extension elimination, constant folding
in <code>RTL</code>.</p>
<p>I saw at least 5 bugs related to wrong code generation (I’m also
slowly reducing another one in the background). <code>middl-end</code> ones
were easy to reduce and explore, <code>RTL</code> ones were very elusive.</p>
<p>The most disruptive change is probably a removal of <code>#include &lt;cstdint&gt;</code>
from one of <code>libstdc++</code> headers. That requires quite a few upstream
fixes to add missing headers (<a href="https://github.com/google/cppdap/pull/133"><code>cppdap</code></a>,
<a href="https://github.com/google/woff2/pull/176"><code>woff2</code></a>,
<a href="https://github.com/silnrsi/graphite/pull/91"><code>graphite</code></a>,
<a href="https://github.com/KhronosGroup/glslang/pull/3684"><code>glslang</code></a>,
<a href="https://github.com/widelands/widelands/pull/6522"><code>widelands</code></a>,
<a href="https://github.com/wesnoth/wesnoth/pull/9250"><code>wesnoth</code></a> and many others).</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>gcc-15 template checking improvements</title>
    <link href="https://trofi.github.io/posts/322-gcc-15-template-checking-improvements.html" />
    <id>https://trofi.github.io/posts/322-gcc-15-template-checking-improvements.html</id>
    <published>2024-07-22T00:00:00Z</published>
    <updated>2024-07-22T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="tldr">Tl;DR</h2>
<p>On 18 Jul <code>gcc</code>
<a href="https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=313afcfdabeab3e6705ac0bd1273627075be0023">merged</a>
extended correctness checks for template functions. This will cause some
incorrect unused code to fail to compile. Consider fixing or deleting
the code. I saw at least two projects affected by it:</p>
<ul>
<li><a href="https://github.com/GNUAspell/aspell/pull/650"><code>aspell</code></a></li>
<li><a href="https://sourceforge.net/p/mjpeg/patches/63/"><code>mjpegtools</code></a></li>
</ul>
<h2 id="more-words">more words</h2>
<p><code>c++</code> is a complex language with a type system that does static checking.
Most of the time checking the type correctness is easy by both human and
the compiler. But sometimes it’s less trivial. Namespaces and function
arguments can bring various declarations into the scope. Template code
splits a single definition point into two: template definition point
and template instantiation.</p>
<p>Let’s look at a simple example:</p>
<pre class="cpp"><code>template &lt;typename T&gt; struct S {
    int foo(void) { return bar(); }
};

int bar() { return 42; }

int main() {
    S&lt;int&gt; v;
    return v.foo();
}</code></pre>
<p>This fails to build on all recent <code>gcc</code> as:</p>
<pre><code>$ g++ -c a.cc
a.cc: In member function 'int S&lt;T&gt;::foo()':
a.cc:2:28: error: there are no arguments to 'bar' that depend on a
  template parameter, so a declaration of 'bar' must be available [-fpermissive]
    2 |     int foo(void) { return bar(); }
      |                            ^~~
a.cc:2:28: note: (if you use '-fpermissive', G++ will accept your code,
  but allowing the use of an undeclared name is deprecated)</code></pre>
<p><code>gcc</code> really wants <code>bar</code> to be visible at the template instantiation
time. But what is we don’t call <code>foo</code> at all?</p>
<pre class="cpp"><code>template &lt;typename T&gt; struct S {
    int foo(void) { return bar(); }
};

int main() {}</code></pre>
<p>Still fails the same:</p>
<pre><code>$ g++ -c a.cc
a.cc: In member function 'int S&lt;T&gt;::foo()':
a.cc:2:28: error: there are no arguments to 'bar' that depend on a
  template parameter, so a declaration of 'bar' must be available [-fpermissive]
    2 |     int foo(void) { return bar(); }
      |                            ^~~
a.cc:2:28: note: (if you use '-fpermissive', G++ will accept your code,
  but allowing the use of an undeclared name is deprecated)</code></pre>
<p>That is neat: even if you never try to instantiate a function <code>gcc</code>
still tries to do basic checks on it.</p>
<p>But what if we call <code>foo()</code> via <code>this</code> pointer explicitly?</p>
<pre class="cpp"><code>template &lt;typename T&gt; struct S {
    int foo(void) { return this-&gt;bar(); }
};

int main() {}</code></pre>
<p>Is it valid <code>c++</code>?</p>
<p><code>gcc-14</code> says it’s fine:</p>
<pre><code>$ g++-14 -c a.cc
&lt;ok&gt;</code></pre>
<p>Is there a way to somehow make <code>bar()</code> available via <code>this</code>? Maybe, via
inheritance? Apparently, no. <code>gcc-15</code> now flags the code above as
unconditionally invalid:</p>
<pre><code>$ g++-15 -c a.cc
a.cc: In member function 'int S&lt;T&gt;::foo()':
a.cc:2:34: error: 'struct S&lt;T&gt;' has no member named 'bar'
    2 |     int foo(void) { return this-&gt;bar(); }
      |                                  ^~~</code></pre>
<p>To get it to work you need something like a
<a href="https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern#Static_polymorphism"><code>CRTP</code></a>
pattern:</p>
<pre class="cpp"><code>// Assume Derived::bar() will be provided.
template &lt;typename Derived&gt; struct S {
    int foo(void) { return static_cast&lt;Derived*&gt;(this)-&gt;bar(); }
};

int main() {}</code></pre>
<p>Interestingly the above problem pops up time to time in real projects in
template code that was not tried after refactors. One such example is an
<a href="https://github.com/GNUAspell/aspell/pull/650"><code>aspell</code> bug</a>:</p>
<pre class="cpp"><code>  template&lt;class Parms&gt;
  void VectorHashTable&lt;Parms&gt;::recalc_size() {
    size_ = 0;
    for (iterator i = begin(); i != this-&gt;e; ++i, ++this-&gt;_size);
  }</code></pre>
<p><code>gcc-14</code> built it just fine. <code>gcc-15</code> started rejecting the build as:</p>
<pre><code>In file included from modules/speller/default/readonly_ws.cpp:51:
modules/speller/default/vector_hash-t.hpp:
  In member function 'void aspeller::VectorHashTable&lt;Parms&gt;::recalc_size()':
modules/speller/default/vector_hash-t.hpp:186:43:
  error: 'class aspeller::VectorHashTable&lt;Parms&gt;' has no member named 'e'
  186 |     for (iterator i = begin(); i != this-&gt;e; ++i, ++this-&gt;_size);
      |                                           ^
modules/speller/default/vector_hash-t.hpp:186:59:
  error: 'class aspeller::VectorHashTable&lt;Parms&gt;' has no member named '_size'; did you mean 'size'?
  186 |     for (iterator i = begin(); i != this-&gt;e; ++i, ++this-&gt;_size);
      |                                                           ^~~~~
      |                                                           size</code></pre>
<p><code>VectorHashTable</code> does not contain <code>_size</code> field, but it does contain
<code>size_</code> (used just a line before). <code>e</code> field is not a thing either.</p>
<p>The change is simple:</p>
<pre class="diff"><code>--- a/modules/speller/default/vector_hash-t.hpp
+++ b/modules/speller/default/vector_hash-t.hpp
@@ -183,7 +183,7 @@ namespace aspeller {
   template&lt;class Parms&gt;
   void VectorHashTable&lt;Parms&gt;::recalc_size() {
     size_ = 0;
-    for (iterator i = begin(); i != this-&gt;e; ++i, ++this-&gt;_size);
+    for (iterator i = begin(), e = end(); i != e; ++i, ++size_);
   }

 }</code></pre>
<p>Or you could also delete the function if it was broken like that for a
while.</p>
<p>Another example is <a href="https://sourceforge.net/p/mjpeg/patches/63/"><code>mjpegtools</code> bug</a>:</p>
<pre class="cpp"><code>// The commented-out method prototypes are methods to be implemented by
// subclasses.  Not all methods have to be implemented, depending on
// whether it's appropriate for the subclass, but that may impact how
// widely the subclass may be used.
template &lt;class INDEX, class SIZE&gt;
class Region2D
{
  public:
    // ...

    template &lt;class REGION, class REGION_O, class REGION_TEMP&gt;
    void UnionDebug (Status_t &amp;a_reStatus,
        REGION_O &amp;a_rOther, REGION_TEMP &amp;a_rTemp);

    // bool DoesContainPoint (INDEX a_tnY, INDEX a_tnX);

    // ...
}

template &lt;class INDEX, class SIZE&gt;
template &lt;class REGION, class REGION_TEMP&gt;
void
Region2D&lt;INDEX,SIZE&gt;::UnionDebug (Status_t &amp;a_reStatus, INDEX a_tnY,
    INDEX a_tnXStart, INDEX a_tnXEnd, REGION_TEMP &amp;a_rTemp)
{
    // ...
            if (!((rHere.m_tnY == a_tnY
                &amp;&amp; (tnX &gt;= a_tnXStart &amp;&amp; tnX &lt; a_tnXEnd))
            || this-&gt;DoesContainPoint (rHere.m_tnY, tnX)))
                goto error;
    // ...
}</code></pre>
<p>Here <code>mjpegtools</code> assumes that <code>DoesContainPoint</code> should come from
derived type. But modern <code>c++</code> just does allow it to be defined like that:</p>
<pre><code>In file included from SetRegion2D.hh:12,
                 from MotionSearcher.hh:15,
                 from newdenoise.cc:19:
Region2D.hh: In member function 'void Region2D&lt;INDEX, SIZE&gt;::UnionDebug(Status_t&amp;, INDEX, INDEX, INDEX, REGION_TEMP&amp;)':
Region2D.hh:439:34: error: 'class Region2D&lt;INDEX, SIZE&gt;' has no member named 'DoesContainPoint'
  439 |                         || this-&gt;DoesContainPoint (rHere.m_tnY, tnX)))
      |                                  ^~~~~~~~~~~~~~~~</code></pre>
<p>The <a href="https://sourceforge.net/p/mjpeg/Code/3513/">fix</a> just deleted these
unusable functions. An alternative fix would need to look closer to a
<code>CRTP</code> tweak in our contrived example. But it’s a bit more invasive change.</p>
<h2 id="parting-words">parting words</h2>
<p><code>gcc-15</code> will reject more invalid unusable <code>c++</code> code in uninstantiated
templates. The simplest code change might be to just delete broken code.
More involved fix would require some knowledge of the codebase to fix
the declaration lookups (or to fix obvious typos).</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>seekwatcher 0.15</title>
    <link href="https://trofi.github.io/posts/321-seekwatcher-0.15.html" />
    <id>https://trofi.github.io/posts/321-seekwatcher-0.15.html</id>
    <published>2024-07-07T00:00:00Z</published>
    <updated>2024-07-07T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p><a href="https://github.com/trofi/seekwatcher/releases/tag/v0.15"><code>seekwatcher-0.15</code> is here</a>!</p>
<p><code>seekwatcher</code> is a tool to visualize access to the block device.</p>
<p>It’s been 2.5 years since <a href="https://trofi.github.io/posts/234-seekwatcher-0.14.html"><code>seekwatcher-0.14</code> release</a>.
The only change is the switch from <code>mencoder</code> to <code>ffmpeg</code> tool. While at
it default codec is switched from <code>MPEG2</code> to <code>H264</code>.</p>
<p>As usual here is the program’s result ran against <code>btrfs scrub</code> on my
device:</p>
<pre><code>$ seekwatcher -t scrub.trace -p 'echo 3 &gt; /proc/sys/vm/drop_caches; sync; btrfs scrub start -B /' -d /dev/nvme1n1p2
$ seekwatcher -t scrub.trace -o scrub.mpeg --movie
$ seekwatcher -t scrub.trace -o scrub.png</code></pre>
<p>Outputs:</p>
<ul>
<li><a href="https://trofi.github.io/posts.data/321-seekwatcher/scrub.png">image</a> (127K)</li>
<li><a href="https://trofi.github.io/posts.data/321-seekwatcher/scrub.mpeg">video</a> (926K)</li>
</ul>
<p><code>H264</code> makes video size comparable to the image report size.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>blog tweaks</title>
    <link href="https://trofi.github.io/posts/320-blog-tweaks.html" />
    <id>https://trofi.github.io/posts/320-blog-tweaks.html</id>
    <published>2024-07-06T00:00:00Z</published>
    <updated>2024-07-06T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="tldr">Tl;DR</h2>
<p>A few changes happened to this blog in the past few weeks:</p>
<ul>
<li><p><code>RSS</code> feed and web pages no longer embed <code>svg</code> images into <code>&lt;html&gt;</code>
and include them via <code>&lt;img src="..."&gt;</code>.</p>
<p>This fixes <code>RSS</code> readers like <code>miniflux</code> but might break others. At
least now there should be an icon in place of a missing picture
instead of just stripped tags.</p>
<p>As a small bonus <code>RSS</code> feed should not be as large to download.</p></li>
<li><p><code>RSS</code> feed now includes source code snippets without syntax
highlighting.</p>
<p>I never included <code>css</code> style into <code>rss</code> feed. <code>highlighting-kate</code> uses
various tags and decorates them with links heavily. This change fixes
source code rendering in <code>liferea</code>.</p></li>
<li><p><code>RSS</code> feed now embeds <code>https://</code> self-links instead of <code>http://</code>
(except for a few recent entries to avoid breaking reading history).</p></li>
</ul>
<h2 id="more-words">More words</h2>
<p>I started this blog in 2010. In 2013 I moved it to
<a href="https://jaspervdj.be/hakyll/"><code>hakyll</code></a> static site generator. The
initial version was just
<a href="https://github.com/trofi/trofi.github.io.gen/blob/7ed816cf5515a47703f8cb2c804244a569bba30f/src/site.hs">88 lines of <code>haskell</code> code</a>.</p>
<p>I did not know much about <code>hakyll</code> back then and I kept it that way for
about 10 years: it just worked for me. The only thing I missed were
tag-based <code>RSS</code> feeds and article breakdown per tag. It prevented the
blog from being added to thematic <code>RSS</code> aggregators like
<a href="https://planet.gentoo.org/"><code>Planet Gentoo</code></a>. But it was not a big deal.
I though I would add it “soon” and never did.</p>
<p>The only “non-trivial” tweaks I did were
<a href="https://trofi.github.io/posts/300-inline-graphviz-dot-in-hakyll.html"><code>dot</code> support</a>
and <a href="https://trofi.github.io/posts/318-inline-gnuplot.html"><code>gunplot</code> support</a>.</p>
<p>Fast forward to 2024 a few weeks ago I boasted to my friend how cool my
new <code>gnuplot</code> embeddings are. To what the response was “What pictures?”.
Apparently <code>miniflux</code> does not like <code>&lt;svg&gt;</code> tags embedded into <code>&lt;html&gt;</code>
and strips them away leaving only bits of <code>&lt;title&gt;</code> tags that almost
looks like original <code>graphviz</code> input :)</p>
<p>That meant my cool hack with <code>svg</code> embedding did not quite work for
<code>RSS</code> feed. I moved all the embeddings into separate <code>.svg</code> files with
<a href="https://github.com/trofi/trofi.github.io.gen/commit/12812bab87ce4bdff91227527d543ee3ac2161a9">this change</a>.</p>
<p>It’s not a big change, but it does violate some <code>hakyll</code> assumptions.
Apparently <code>hakyll</code> can output only one destination file for a source
file. For example, <code>foo.md</code> can only produce <code>foo.html</code> and not <code>foo.html</code>
plus indefinite amount of pictures. There is a
<a href="https://jaspervdj.be/hakyll/tutorials/06-versions.html">version support</a>
in <code>hakyll</code>, but it assumes that we know number of outputs upfront. It’s
not really usable for cases like <code>N</code> unknown outputs from an input. To
work it around I’m writing all the auxiliary files without the <code>hakyll</code>
dependency tracker knowledge. I do it by defining <code>Writable</code> instance:</p>
<pre class="haskell"><code>data PWI = PWI {
    pandoc :: H.Item String
  , inlines :: [(String, H.Item DBL.ByteString)]
} deriving (GG.Generic)

deriving instance DB.Binary PWI

instance H.Writable PWI where
    write path item = do
        -- emit page itself:
        let PWI pand inls = H.itemBody item
        H.write path pand
        -- emit inlines nearby:
        CM.forM_ inls $ \(fp, contents) -&gt; do
            H.makeDirectories fp
            H.write fp contents</code></pre>
<p>Here <code>inlines</code> is the list of pairs of filenames and their contents to
write on disk and <code>pandoc</code> is the primary content one would normally
write as <code>H.Item String</code>.</p>
<p>While at it, I disabled syntax highlighting in <code>RSS</code> feed as <code>liferea</code>
rendered highlighted source as an unreadable mess. And <code>miniflux</code> just
stripped out all the links and styles. <a href="https://github.com/trofi/trofi.github.io.gen/commit/1dc9d5a9d6b54db928f3fdaef1c0dcb4b6d567df">The change</a>
is somewhat long, but it’s gist is a single extra <code>writerHighlightStyle</code>
option passed to <code>pandoc</code> render:</p>
<pre class="haskell"><code>pandocRSSWriterOptions :: TPO.WriterOptions
pandocRSSWriterOptions = pandocWriterOptions{
    -- disable highlighting
    TPO.writerHighlightStyle = Nothing
}</code></pre>
<p>The last thing I changed was to switch from <code>http://</code> links to
<code>https://</code> links by default. In theory it’s a
<a href="https://github.com/trofi/trofi.github.io.gen/commit/cfc80bb575c1b131225c43c1fed47ff639540bd9">one-character change</a>.
In practice that would break unread history for all <code>RSS</code> users. I worked
it around by restoring <code>http://</code> root link for current <code>RSS</code> entries
with <a href="https://github.com/trofi/trofi.github.io.gen/commit/6b1883a1b23f6965314bfd2b55cb3e9e6a42ec16">metadata change</a>.</p>
<p>That way all new posts should contain <code>https://</code> root links and all
site-local links should automatically become <code>https://</code> links.</p>
<p>Still no tag support. Maybe later.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>probabilities are hard</title>
    <link href="http://trofi.github.io/posts/319-probabilities-are-hard.html" />
    <id>http://trofi.github.io/posts/319-probabilities-are-hard.html</id>
    <published>2024-06-23T00:00:00Z</published>
    <updated>2024-06-23T00:00:00Z</updated>
    <summary type="html"><![CDATA[<h2 id="make---shuffle-background"><code>make --shuffle</code> background</h2>
<p><a href="https://trofi.github.io/posts/238-new-make-shuffle-mode.html">A while ago</a> I added <code>--shuffle</code>
mode to <code>GNU make</code> to shake out missing dependencies in build rules of
<code>make</code>-based build systems. It managed to find
<a href="https://trofi.github.io/posts/249-an-update-on-make-shuffle.html">a few bugs</a> since.</p>
<h2 id="the-shuffling-algorithm">the shuffling algorithm</h2>
<p>The core function of <code>--shuffle</code> is to generate one random permutation
of prerequisites for a target. I did not try to implement anything
special. I searched for “random shuffle” and got
<a href="https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle">Fisher–Yates shuffle</a>
link from <code>wikipedia</code>, skimmed the page and came up with this algorithm:</p>
<pre class="c"><code>/* Shuffle array elements using RAND().  */
static void
random_shuffle_array (void **a, size_t len)
{
  size_t i;
  for (i = 0; i &lt; len; i++)
    {
      void *t;

      /* Pick random element and swap. */
      unsigned int j = rand () % len;
      if (i == j)
        continue;

      /* Swap. */
      t = a[i];
      a[i] = a[j];
      a[j] = t;
    }
}</code></pre>
<p>The diagram of a single step looks this way:</p>
<img src="https://trofi.github.io/posts.data.inline/319-probabilities-are-hard/fig-0.gv.svg" />
<p>The implementation looked so natural: we attempt to shuffle each element
with another element chosen randomly using equal probability (assuming
<code>rand () % len</code> is unbiased). At least it seemed to produce random
results.</p>
<p><strong>Quiz question</strong>: do you see the bug in this implementation?</p>
<p>This version was shipped in <code>make-4.4.1</code>.
I ran <code>make</code> from <code>git</code> against <code>nixpkgs</code> and discovered a ton of
parallelism bugs. I could not be happier than that. I never got to
actual testing the quality of permutation probabilities.</p>
<h2 id="bias-in-initial-implementation">bias in initial implementation</h2>
<p>Artem Klimov had a closer look at it and discovered a bug in the
algorithm above! The algorithm has a common implementation error for
Fisher–Yates
<a href="https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Implementation_errors">documented</a>
on the very page I looked at before /o\. Artem demonstrated problems of
permutation quality on the following trivial <code>Makefile</code>:</p>
<pre class="makefile"><code>all: test1 test2 test3 test4 test5 test6 test7 test8;

test%:
	mkdir -p tests
	echo $@ &gt; tests/$@

test8:
	# no mkdir
	echo 'override' &gt; tests/$@</code></pre>
<p>This test was supposed to fail <code>12.5%</code> of the time in <code>--shuffle</code> mode:
only when <code>test8</code> is scheduled as the first to execute. Alas the test
when ran over thousands runs failed with <code>10.1%</code> probability. That is
<code>2%</code> too low.</p>
<p>Artem also provided a fixed version of the shuffle implementation:</p>
<pre class="c"><code>static void
random_shuffle_array (void **a, size_t len)
{
  size_t i;
  for (i = len - 1; i &gt;= 1; i--)
    {
      void *t;

      /* Pick random element and swap. */
      unsigned int j = make_rand () % (i + 1);

      /* Swap. */
      t = a[i];
      a[i] = a[j];
      a[j] = t;
    }
}</code></pre>
<p>The diagram of a single step looks this way:</p>
<img src="https://trofi.github.io/posts.data.inline/319-probabilities-are-hard/fig-1.gv.svg" />
<p>Note how this version makes sure that shuffled indices (“gray” color)
never gets considered for future shuffle iterations.</p>
<p>At least for me it’s more obvious to see why this algorithm does not
introduce any biases. But then again I did not suspect problems in the
previous one either. I realized I don’t have a good intuition on why the
initial algorithm manages to produce biases. Where does bias come from
if we pick the target element with equal probability from all the
elements available?</p>
<h1 id="a-simple-test">a simple test</h1>
<p>To get the idea how the bias looks like I wrote a tiny program:</p>
<pre class="c"><code>// $ cat a.c
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
#include &lt;time.h&gt;

#define LEN 3
static int a[LEN];

static void random_shuffle_array (void) {
  for (size_t i = 0; i &lt; LEN; i++) {
      unsigned int j = rand () % LEN;
      int t = a[i]; a[i] = a[j]; a[j] = t;
    }
}

static void random_shuffle_array_fixed (void) {
  for (size_t i = LEN - 1; i &gt;= 1; i--) {
      unsigned int j = rand () % (i + 1);
      int t = a[i]; a[i] = a[j]; a[j] = t;
    }
}

static void do_test(const char * name, void(*shuffler)(void)) {
    size_t hist[LEN][LEN];
    memset(hist, 0, sizeof(hist));

    size_t niters = 10000000;

    printf(&quot;%s shuffle probability over %zu iterations:\n&quot;, name, niters);
    for (size_t iter = 0; iter &lt; niters; ++iter) {
        // Initialize array `a` with { `0`,  ..., `LEN - 1` }.
        for (size_t i = 0; i &lt; LEN; ++i) a[i] = i;
        shuffler ();
        for (size_t i = 0; i &lt; LEN; ++i) hist[i][a[i]] += 1;
    }

    int prec_digits = 3; /* 0.??? */
    int cell_width = 3 + prec_digits; /* &quot; 0.???&quot; */

    printf(&quot;%*s  &quot;, cell_width, &quot;&quot;);
    for (size_t j = 0; j &lt; LEN; ++j)
        printf(&quot;%*zu&quot;, cell_width, j);
    puts(&quot;&quot;);

    for (size_t i = 0; i &lt; LEN; ++i) {
        printf(&quot;%*zu |&quot;, cell_width, i);
        for (size_t j = 0; j &lt; LEN; ++j)
            printf(&quot; %.*f&quot;, prec_digits, (double)(hist[i][j]) / (double)(niters));
        puts(&quot;&quot;);
    }
}

int main() {
    srand(time(NULL));
    do_test(&quot;broken&quot;, &amp;random_shuffle_array);
    puts(&quot;&quot;);
    do_test(&quot;fixed&quot;, &amp;random_shuffle_array_fixed);
}</code></pre>
<p>Here the program implement both current (broken) and new (fixed) shuffle
implementations. The histogram is collected over 10 million runs.
Then it prints a probability of each element to be found at a location.
We shuffle an array of <code>LEN = 3</code> elements: <code>{ 0, 1, 2, }</code>.
Here is the output of the program:</p>
<pre><code>$ gcc a.c -o a -O2 -Wall &amp;&amp; ./a
broken shuffle probability over 10000000 iterations:
             0     1     2
     0 | 0.333 0.370 0.296
     1 | 0.333 0.297 0.370
     2 | 0.334 0.333 0.333

fixed shuffle probability over 10000000 iterations:
             0     1     2
     0 | 0.333 0.333 0.334
     1 | 0.333 0.334 0.333
     2 | 0.333 0.333 0.333</code></pre>
<p>Here the program tells us that:</p>
<ul>
<li>broken version of the shuffle moves element <code>0</code> to <code>1</code> position <code>37%</code> of the time</li>
<li>broken version moves element <code>0</code> to <code>2</code> position <code>29.6%</code> of the time</li>
<li>fixed version is much closed to uniform distribution and has roughly
<code>33.3%</code> <code>0-&gt;1</code> and <code>0-&gt;2</code> probabilities</li>
</ul>
<p>The same data above in plots:</p>
<img src="https://trofi.github.io/posts.data.inline/319-probabilities-are-hard/fig-2.gp.svg" />
<h2 id="a-bit-of-arithmetic">a bit of arithmetic</h2>
<p>To get a bit better understanding of the bias let’s get exact probability
value for each element move for 3-element array.</p>
<h3 id="broken-version">broken version</h3>
<p>To recap the implementation we are looking at here is:</p>
<pre class="c"><code>void random_shuffle_array (void) {
  for (size_t i = 0; i &lt; LEN; i++) {
      unsigned int j = rand () % LEN;
      int t = a[i]; a[i] = a[j]; a[j] = t;
    }
}</code></pre>
<p>Let’s start from broken shuffle with <code>1/(N+1)</code> shuffle probability.</p>
<p>Our initial array state is <code>{ 0, 1, 2, }</code> with probability <code>1/1</code>
(or <code>100%</code>) for each already assigned value:</p>
<ul>
<li>probability at index <code>0</code>:
<ul>
<li>value <code>0</code>: <code>1/1</code></li>
<li>value <code>1</code>: <code>0/1</code></li>
<li>value <code>2</code>: <code>0/1</code></li>
</ul></li>
<li>probability at index <code>1</code>:
<ul>
<li>value <code>0</code>: <code>0/1</code></li>
<li>value <code>1</code>: <code>1/1</code></li>
<li>value <code>2</code>: <code>0/1</code></li>
</ul></li>
<li>probability at index <code>2</code>:
<ul>
<li>value <code>0</code>: <code>0/1</code></li>
<li>value <code>1</code>: <code>0/1</code></li>
<li>value <code>2</code>: <code>1/1</code></li>
</ul></li>
</ul>
<p>On each iteration <code>i</code> we perform the actions below:</p>
<ul>
<li>at <code>i</code> position: <code>1/3</code> probability of swapping any of the possible elements</li>
<li>at non-<code>i</code> positions: <code>2/3</code> probability of keeping and old element (and <code>1/3</code>
probability of absorbing value at <code>i</code> position mentioned in the previous bullet)</li>
</ul>
<p>Thus after first shuffle step at <code>i=0</code> our probability state will be:</p>
<ul>
<li>probability at index <code>0</code>:
<ul>
<li>value <code>0</code>: <code>1/3</code> (was <code>1.0</code>)</li>
<li>value <code>1</code>: <code>1/3</code> (was <code>0.0</code>)</li>
<li>value <code>2</code>: <code>1/3</code> (was <code>0.0</code>)</li>
</ul></li>
<li>probability at index <code>1</code>:
<ul>
<li>value <code>0</code>: <code>1/3</code> (was <code>0.0</code>)</li>
<li>value <code>1</code>: <code>2/3</code> (was <code>1.0</code>)</li>
<li>value <code>2</code>: <code>0/3</code> (was <code>0.0</code>)</li>
</ul></li>
<li>probability at index <code>2</code>:
<ul>
<li>value <code>0</code>: <code>1/3</code> (was <code>0.0</code>)</li>
<li>value <code>1</code>: <code>0/3</code> (was <code>0.0</code>)</li>
<li>value <code>2</code>: <code>2/3</code> (was <code>1.0</code>)</li>
</ul></li>
</ul>
<p>So far so good: element <code>0</code> has even probability among all 3 elements,
and elements <code>1</code> and <code>2</code> decreased their initial probabilities from <code>1/1</code>
down to <code>2/3</code>.</p>
<p>Let’s trace through next <code>i=1</code> step. After that the updated state will be:</p>
<ul>
<li>probability at index <code>0</code>:
<ul>
<li>value <code>0</code>: <code>3/9</code> (was <code>1/3</code>)</li>
<li>value <code>1</code>: <code>4/9</code> (was <code>1/3</code>)</li>
<li>value <code>2</code>: <code>2/9</code> (was <code>1/3</code>)</li>
</ul></li>
<li>probability at index <code>1</code>:
<ul>
<li>value <code>0</code>: <code>3/9</code> (was <code>1/3</code>)</li>
<li>value <code>1</code>: <code>3/9</code> (was <code>2/3</code>)</li>
<li>value <code>2</code>: <code>3/9</code> (was <code>0/3</code>)</li>
</ul></li>
<li>probability at index <code>2</code>:
<ul>
<li>value <code>0</code>: <code>3/9</code> (was <code>1/3</code>)</li>
<li>value <code>1</code>: <code>2/9</code> (was <code>0/3</code>)</li>
<li>value <code>2</code>: <code>4/9</code> (was <code>2/3</code>)</li>
</ul></li>
</ul>
<p>Again, magically current (<code>i=1</code>) element got perfect balance. Zero
probabilities are gone by now.</p>
<p>Final <code>i=2</code> step yields this:</p>
<ul>
<li>probability at index <code>0</code>:
<ul>
<li>value <code>0</code>: <code>9/27</code> (was <code>3/9</code>)</li>
<li>value <code>1</code>: <code>10/27</code> (was <code>4/9</code>)</li>
<li>value <code>2</code>: <code>8/27</code> (was <code>2/9</code>)</li>
</ul></li>
<li>probability at index <code>1</code>:
<ul>
<li>value <code>0</code>: <code>9/27</code> (was <code>3/9</code>)</li>
<li>value <code>1</code>: <code>8/27</code> (was <code>3/9</code>)</li>
<li>value <code>2</code>: <code>10/27</code> (was <code>3/9</code>)</li>
</ul></li>
<li>probability at index <code>2</code>:
<ul>
<li>value <code>0</code>: <code>9/27</code> (was <code>3/9</code>)</li>
<li>value <code>1</code>: <code>9/27</code> (was <code>2/9</code>)</li>
<li>value <code>2</code>: <code>9/27</code> (was <code>4/9</code>)</li>
</ul></li>
</ul>
<p>The same state sequence in diagrams:</p>
<img src="https://trofi.github.io/posts.data.inline/319-probabilities-are-hard/fig-3.gv.svg" />
<p>Note that final probabilities differ slightly: <code>8/27</code>, <code>9/27</code> and <code>10/27</code>
are probabilities where all should have been <code>9/27</code> (or <code>1/3</code>). This
matches observed values above!</p>
<p>The bias comes from the fact that each shuffle step affects probabilities
of all cells, not just immediately picked cells for a particular shuffle.
That was very hard to grasp for me just by glancing at the algorithm!</p>
<h3 id="fixed-version">Fixed version</h3>
<p>To recap the implementation we are looking at here is:</p>
<pre class="c"><code>void random_shuffle_array_fixed (void) {
  for (size_t i = LEN - 1; i &gt;= 1; i--) {
      unsigned int j = rand () % (i + 1);
      int t = a[i]; a[i] = a[j]; a[j] = t;
    }
}</code></pre>
<p>Now let’s look at a shuffle with <code>1/(i+1)</code> probability.
Our initial state is the same <code>{ 0, 1, 2, }</code> with probabilities <code>1/1</code>:</p>
<ul>
<li>probability at index <code>0</code>:
<ul>
<li>value <code>0</code>: <code>1/1</code></li>
<li>value <code>1</code>: <code>0/1</code></li>
<li>value <code>2</code>: <code>0/1</code></li>
</ul></li>
<li>probability at index <code>1</code>:
<ul>
<li>value <code>0</code>: <code>0/1</code></li>
<li>value <code>1</code>: <code>1/1</code></li>
<li>value <code>2</code>: <code>0/1</code></li>
</ul></li>
<li>probability at index <code>2</code>:
<ul>
<li>value <code>0</code>: <code>0/1</code></li>
<li>value <code>1</code>: <code>0/1</code></li>
<li>value <code>2</code>: <code>1/1</code></li>
</ul></li>
</ul>
<p>As the algorithm iterated over the array backwards we start from <code>i=2</code>
(<code>N=3</code>).</p>
<ul>
<li>probability at index <code>0</code>:
<ul>
<li>value <code>0</code>: <code>2/3</code> (was <code>1/1</code>)</li>
<li>value <code>1</code>: <code>0/3</code> (was <code>0/1</code>)</li>
<li>value <code>2</code>: <code>1/3</code> (was <code>0/1</code>)</li>
</ul></li>
<li>probability at index <code>1</code>:
<ul>
<li>value <code>0</code>: <code>0/3</code> (was <code>0/1</code>)</li>
<li>value <code>1</code>: <code>2/3</code> (was <code>1/1</code>)</li>
<li>value <code>2</code>: <code>1/3</code> (was <code>0/1</code>)</li>
</ul></li>
<li>probability at index <code>2</code>:
<ul>
<li>value <code>0</code>: <code>1/3</code> (was <code>0/1</code>)</li>
<li>value <code>1</code>: <code>1/3</code> (was <code>0/1</code>)</li>
<li>value <code>2</code>: <code>1/3</code> (was <code>1/1</code>)</li>
</ul></li>
</ul>
<p>As expected the probabilities are the mirror image of the first step of
the broken implementation.</p>
<p>The next step though is a bit different: <code>i=1</code> (<code>N=2</code>). It effectively
averages probabilities at index <code>0</code> and index <code>1</code>.</p>
<ul>
<li>probability at index <code>0</code>:
<ul>
<li>value <code>0</code>: <code>1/3</code> (was <code>2/3</code>)</li>
<li>value <code>1</code>: <code>1/3</code> (was <code>0/3</code>)</li>
<li>value <code>2</code>: <code>1/3</code> (was <code>1/3</code>)</li>
</ul></li>
<li>probability at index <code>1</code>:
<ul>
<li>value <code>0</code>: <code>1/3</code> (was <code>0/3</code>)</li>
<li>value <code>1</code>: <code>1/3</code> (was <code>2/3</code>)</li>
<li>value <code>2</code>: <code>1/3</code> (was <code>1/3</code>)</li>
</ul></li>
<li>probability at index <code>2</code> (unchanged):
<ul>
<li>value <code>0</code>: <code>1/3</code></li>
<li>value <code>1</code>: <code>1/3</code></li>
<li>value <code>2</code>: <code>1/3</code></li>
</ul></li>
</ul>
<p>Or the same in diagrams:</p>
<img src="https://trofi.github.io/posts.data.inline/319-probabilities-are-hard/fig-4.gv.svg" />
<p>The series are a lot simpler than the broken version: on each step
handled element always ends up with identical expected probabilities.
Its so much simpler!</p>
<h2 id="element-bonus">30-element bonus</h2>
<p>Let’s look at the probability table for an array of 30-elements. The
only change I did for the program above is to change <code>LEN</code> from <code>3</code> to
<code>30</code>:</p>
<img src="https://trofi.github.io/posts.data.inline/319-probabilities-are-hard/fig-5.gp.svg" />
<p>This plot shows a curious <code>i == j</code> cut off line where probability changes
drastically:</p>
<ul>
<li><code>15-&gt;15</code> (or any <code>i-&gt;i</code>) shuffle probability is lowest and is about <code>2.8%</code></li>
<li><code>15-&gt;16</code> (or any <code>i-&gt;i+1</code>) shuffle probability is highest and is about <code>4.0%</code></li>
</ul>
<h2 id="make---shuffle-bias-fix"><code>make --shuffle</code> bias fix</h2>
<p>I posted Artem’s fix upstream for inclusion as
<a href="https://mail.gnu.org/archive/html/bug-make/2024-06/msg00008.html">this email</a>:</p>
<pre class="diff"><code>--- a/src/shuffle.c
+++ b/src/shuffle.c
@@ -104,12 +104,16 @@ static void
 random_shuffle_array (void **a, size_t len)
 {
   size_t i;
-  for (i = 0; i &lt; len; i++)
+
+  if (len &lt;= 1)
+    return;
+
+  for (i = len - 1; i &gt;= 1; i--)
     {
       void *t;

       /* Pick random element and swap. */
-      unsigned int j = make_rand () % len;
+      unsigned int j = make_rand () % (i + 1);
       if (i == j)
         continue;
</code></pre>
<h2 id="parting-words">parting words</h2>
<p>Artem Klimov found, fixed and explained the bias in <code>make --shuffle</code>
implementation. Thank you, Artem!</p>
<p>Probabilities are hard! I managed to get wrong seemingly very simple
algorithm. The bias is not too bad: <code>make --shuffle</code> is still able to
produce all possible permutations of the targets. But some of them are
slightly less frequent than the others.</p>
<p>The bias has a curious structure:</p>
<ul>
<li>least likely permutations candidate is <code>i-&gt;i</code> “identity” shuffle</li>
<li>most likely permutation candidate is <code>i-&gt;i+1</code> “right shift” shuffle</li>
</ul>
<p>At least the initial implementation was not completely broken and still
was able to generate all permutations.</p>
<p>With luck <a href="https://mail.gnu.org/archive/html/bug-make/2024-06/msg00008.html">the fix</a>
will be accepted upstream and we will get more fair <code>--shuffle</code> mode.</p>
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>inline gnuplot</title>
    <link href="http://trofi.github.io/posts/318-inline-gnuplot.html" />
    <id>http://trofi.github.io/posts/318-inline-gnuplot.html</id>
    <published>2024-06-22T00:00:00Z</published>
    <updated>2024-06-22T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>Time to time I find myself needing to plot histograms and approximations
in occasional posts.
Similar to <a href="https://trofi.github.io/posts/300-inline-graphviz-dot-in-hakyll.html">inline <code>graphviz</code></a>
support today I added <code>gnuplot</code> <code>svg</code> inlining support into this blog.
The trivial example looks this way:</p>
<img src="https://trofi.github.io/posts.data.inline/318-inline-gnuplot/fig-0.gp.svg" />
<p>The above is generated using the following <code>.md</code> snippet:</p>
<pre><code>```{render=gnuplot}
plot [-pi:pi] sin(x)
```</code></pre>
<p><code>hakyll</code> <a href="https://github.com/trofi/trofi.github.io.gen/commit/4fb830628c6923873c0b21b2ac444a73d4d47cee">integration</a>
is also straightforward:</p>
<pre class="haskell"><code>inlineGnuplot :: TP.Block -&gt; Compiler TP.Block
inlineGnuplot cb@(TP.CodeBlock (id, classes, namevals) contents)
  | (&quot;render&quot;, &quot;gnuplot&quot;) `elem` namevals
  = TP.RawBlock (TP.Format &quot;html&quot;) . DT.pack &lt;$&gt; (
      unixFilter &quot;gnuplot&quot;
          [ &quot;--default-settings&quot;
          , &quot;-e&quot;, &quot;set terminal svg&quot;
          , &quot;-&quot;]
          (DT.unpack contents))
inlineGnuplot x = return x</code></pre>
<p>Here we call <code>gnuplot --default-settings -e "set terminal svg" -</code> and
pass our script over <code>stdin</code>. Easy!
For those who wonder what <code>gnuplot</code> is capable of have a look at
<a href="http://www.gnuplot.info/demo_svg_4.6/"><code>gnuplot.info</code> demo page</a>.
As a bonus here is the time chart of my commits into <code>nixpkgs</code>:</p>
<img src="https://trofi.github.io/posts.data.inline/318-inline-gnuplot/fig-1.gp.svg" />
<p>Have fun!</p>]]></summary>
</entry>
<entry>
    <title>gcc simd intrinsics bug</title>
    <link href="http://trofi.github.io/posts/317-gcc-simd-intrinsics-bug.html" />
    <id>http://trofi.github.io/posts/317-gcc-simd-intrinsics-bug.html</id>
    <published>2024-06-16T00:00:00Z</published>
    <updated>2024-06-16T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p><code>highway</code> keeps yielding very interesting <code>gcc</code> bugs. Some of them are
so obscure that I don’t even understand <code>gcc</code> developers’ comments on
where the bug lies: in <code>highway</code> or on <code>gcc</code>. In this post I’ll explore
<a href="https://gcc.gnu.org/PR115161"><code>PR115161</code></a> report here as an example of
how <code>gcc</code> handles <code>simd</code> intrinsics.</p>
<h2 id="simplest-xmm-intrinsics-example">simplest <code>xmm</code> intrinsics example</h2>
<p>Let’s start from an example based on another closely related bug:</p>
<pre class="c"><code>#include &lt;emmintrin.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdint.h&gt;
#include &lt;string.h&gt;

int main(void) {
    const __m128i  iv = _mm_set1_epi32(0x4f000000); // 1
    const __m128   fv = _mm_castsi128_ps(iv);       // 2
    const __m128i riv = _mm_cvttps_epi32(fv);       // 3

    uint32_t r[4];
    memcpy(r, &amp;riv, sizeof(r));
    printf(&quot;%#08x %#08x %#08x %#08x\n&quot;, r[0], r[1], r[2], r[3]);
}</code></pre>
<p>The above example implements a vectored form of <code>(int)2147483648.0</code>
conversion using following steps:</p>
<ol type="1">
<li>Place 4 identical 32-bit integer <code>0x4f000000</code> values into 128-bit
<code>iv</code> variable (likely an <code>xmm</code> register).</li>
<li>Bit cast <code>4 x 0x4f00000</code> into <code>4 x 2147483648.0</code> of 32-bit <code>floats</code>.</li>
<li>Convert <code>4 x 2147483648.0</code> 32-bit <code>floats</code> into <code>4 x int32_t</code> by
truncating the fractional part and leaving the integer one.</li>
<li>Print the conversion result in hexadecimal form.</li>
</ol>
<p>Or the same in pictures:</p>
<img src="https://trofi.github.io/posts.data.inline/317-gcc-simd-intrinsics-bug/fig-0.gv.svg" />
<p>Note: <code>2147483648.0</code> is exactly 2<sup>31</sup>. Maximum <code>int32_t</code> can hold is
2<sup>31</sup>-1, or <code>2147483647</code> (one less than our value at hand).</p>
<p><strong>Quick quiz: What should this example return? Does it depend on the
compiler options?</strong></p>
<p>In theory those <code>_mm*()</code> compiler intrinsics are tiny wrappers over
corresponding <code>x86_64</code> instructions.
<a href="https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html">Intel guide</a>
says that <code>_mm_cvttps_epi32()</code> is a <code>cvttps2dq</code> instruction.</p>
<p>Running the example:</p>
<pre><code>$ gcc -Wall a.c -o a0 -O0 &amp;&amp; ./a0
0x80000000 0x80000000 0x80000000 0x80000000

$ gcc -Wall a.c -o a1 -O1 &amp;&amp; ./a1
0x7fffffff 0x7fffffff 0x7fffffff 0x7fffffff</code></pre>
<p>Optimization levels do change the behaviour of the code when
overflow happens: sometimes the result is 2<sup>31</sup> and sometimes it’s
2<sup>31</sup>-1. Uh-oh. Let’s have a peek at the assembly of both cases.</p>
<p><code>-O0</code> case:</p>
<pre><code>$ rizin ./a0
[0x00401050]&gt; aaaa
[0x00401050]&gt; s main
[0x00401136]&gt; pdf
            ; DATA XREF from entry0 @ 0x401068
; int main(int argc, char **argv, char **envp);
; ...
          movl  $0x4f000000, var_8ch
          movl  var_8ch, %eax
; ...
          movl  %eax, var_80h
          movd  var_80h, %xmm1
          punpckldq %xmm1, %xmm0
; ...
          movaps %xmm0, var_48h
          cvttps2dq var_48h, %xmm0
          movaps %xmm0, var_78h
          movq  var_78h, %rax
          movq  var_70h, %rdx
          movq  %rax, var_28h
          movq  %rdx, var_20h
          movl  var_1ch, %esi
          movl  var_20h, %ecx
          movl  var_24h, %edx
          movl  var_28h, %eax
          leaq  str.08x___08x___08x___08x, %rdi      ; 0x402004 ; &quot;%#08x %#08x %#08x %#08x\n&quot; ; const char *format
          movl  %esi, %r8d
          movl  %eax, %esi
          movl  $0, %eax
          callq sym.imp.printf                       ; sym.imp.printf ; int printf(const char *format)
; ...</code></pre>
<p>While it’s a lot of superfluous code we do see there <code>cvttps2dq</code>
instruction and <code>printf()</code> call against its result.</p>
<p><code>-O1</code> case:</p>
<pre><code>$ rizin ./a1
[0x00401040]&gt; aaaa
[0x00401040]&gt; s main
[0x00401126]&gt; pdf
            ; DATA XREF from entry0 @ 0x401058
; int main(int argc, char **argv, char **envp);
          subq  $8, %rsp
          movl  $0x7fffffff, %r9d
          movl  $0x7fffffff, %r8d
          movl  $0x7fffffff, %ecx
          movl  $0x7fffffff, %edx
          leaq  str.08x___08x___08x___08x, %rsi      ; 0x402004 ; &quot;%#08x %#08x %#08x %#08x\n&quot;
          movl  $2, %edi
          movl  $0, %eax
          callq sym.imp.__printf_chk                 ; sym.imp.__printf_chk
          movl  $0, %eax
          addq  $8, %rsp
          retq</code></pre>
<p>Here we don’t see <code>cvttps2dq</code> at all! <code>gcc</code> just puts <code>0x7fffffff</code>
constants into registers and calls <code>printf()</code> directly.
For completeness let’s try to find out the exact optimization pass that
performs this constant folding. Normally I would expect it to be a tree
optimization, and thus <code>-fdump-tree-all</code> would tell me where the magic
happens. Alas:</p>
<pre class="c"><code>// $ gcc a.c -o a -O2 -fdump-tree-all &amp;&amp; ./a
// $ cat a.c.265t.optimized

;; Function main (main, funcdef_no=574, decl_uid=6511, cgraph_uid=575, symbol_order=574) (executed once)

int main ()
{
  unsigned int _2;
  vector(4) int _3;
  unsigned int _4;
  unsigned int _5;
  unsigned int _6;

  &lt;bb 2&gt; [local count: 1073741824]:
  _3 = __builtin_ia32_cvttps2dq ({ 2.147483648e+9, 2.147483648e+9, 2.147483648e+9, 2.147483648e+9 });
  _2 = BIT_FIELD_REF &lt;_3, 32, 96&gt;;
  _6 = BIT_FIELD_REF &lt;_3, 32, 64&gt;;
  _4 = BIT_FIELD_REF &lt;_3, 32, 32&gt;;
  _5 = BIT_FIELD_REF &lt;_3, 32, 0&gt;;
  __printf_chk (2, &quot;%#08x %#08x %#08x %#08x\n&quot;, _5, _4, _6, _2);
  return 0;

}</code></pre>
<p>Here we see that <code>_mm_set1_epi32()</code> and <code>_mm_castsi128_ps()</code> were
“folded” into a <code>2.147483648e+9</code> successfully, but <code>_mm_cvttps_epi32()</code>
was not. And yet the final assembly does not contain the call. Let’s
look at the <code>RTL</code> passes that usually follow <code>tree</code> ones as part
of the optimization:</p>
<pre><code>$ gcc a.c -o a -O2 -fdump-rtl-all-slim &amp;&amp; ./a
$ ls -1 *r.*
a.c.266r.expand
a.c.267r.vregs
a.c.268r.into_cfglayout
a.c.269r.jump
a.c.270r.subreg1
a.c.271r.dfinit
a.c.272r.cse1
a.c.273r.fwprop1
a.c.274r.cprop1
a.c.275r.pre
a.c.277r.cprop2
a.c.280r.ce1
a.c.281r.reginfo
a.c.282r.loop2
a.c.283r.loop2_init
a.c.284r.loop2_invariant
a.c.285r.loop2_unroll
a.c.287r.loop2_done
a.c.290r.cprop3
a.c.291r.stv1
a.c.292r.cse2
a.c.293r.dse1
a.c.294r.fwprop2
a.c.296r.init-regs
a.c.297r.ud_dce
a.c.298r.combine
a.c.300r.stv2
a.c.301r.ce2
a.c.302r.jump_after_combine
a.c.303r.bbpart
a.c.304r.outof_cfglayout
a.c.305r.split1
a.c.306r.subreg3
a.c.308r.mode_sw
a.c.309r.asmcons
a.c.314r.ira
a.c.315r.reload
a.c.316r.postreload
a.c.319r.split2
a.c.320r.ree
a.c.321r.cmpelim
a.c.322r.pro_and_epilogue
a.c.323r.dse2
a.c.324r.csa
a.c.325r.jump2
a.c.326r.compgotos
a.c.328r.peephole2
a.c.329r.ce3
a.c.331r.fold_mem_offsets
a.c.332r.cprop_hardreg
a.c.333r.rtl_dce
a.c.334r.bbro
a.c.335r.split3
a.c.336r.sched2
a.c.338r.stack
a.c.340r.zero_call_used_regs
a.c.341r.alignments
a.c.343r.mach
a.c.344r.barriers
a.c.349r.shorten
a.c.350r.nothrow
a.c.351r.dwarf2
a.c.352r.final
a.c.353r.dfinish</code></pre>
<p>It’s a long list of passes! Let’s look at the first <code>266r.expand</code>:</p>
<pre><code>$ cat a.c.266r.expand
;;
;; Full RTL generated for this function:
;;
    1: NOTE_INSN_DELETED
    3: NOTE_INSN_BASIC_BLOCK 2
    2: NOTE_INSN_FUNCTION_BEG
    5: r106:V4SF=vec_duplicate([`*.LC1'])
    6: r105:V4SF=r106:V4SF
      REG_EQUAL const_vector
    7: r104:V4SI=fix(r105:V4SF)

    8: r99:V4SI=r104:V4SI
    9: r108:V4SI=vec_select(r99:V4SI,parallel)
   10: r107:SI=vec_select(r108:V4SI,parallel)
   11: r110:V4SI=vec_select(vec_concat(r99:V4SI,r99:V4SI),parallel)
   12: r109:SI=vec_select(r110:V4SI,parallel)
   13: r112:V4SI=vec_select(r99:V4SI,parallel)
   14: r111:SI=vec_select(r112:V4SI,parallel)
   15: r113:SI=vec_select(r99:V4SI,parallel)
   16: r114:DI=`*.LC2'
   17: r9:SI=r107:SI
   18: r8:SI=r109:SI
   19: cx:SI=r111:SI
   20: dx:SI=r113:SI
   21: si:DI=r114:DI
   22: di:SI=0x2
   23: ax:QI=0
   24: ax:SI=call [`__printf_chk'] argc:0
      REG_CALL_DECL `__printf_chk'
   25: r103:SI=0
   29: ax:SI=r103:SI
   30: use ax:SI</code></pre>
<p>Here <code>V4SF</code> means the vector type of 4 floats, <code>V4SI</code> is a vector type
of 4 <code>int</code>, <code>SI</code> is an <code>int</code> type, <code>DI</code> is a <code>long</code> type. It looks like
our <code>float-&gt;int32_t</code> conversion happens in two early <code>RTL</code> instructions:</p>
<pre><code>    5: r106:V4SF=vec_duplicate([`*.LC1'])
    6: r105:V4SF=r106:V4SF
      REG_EQUAL const_vector
    7: r104:V4SI=fix(r105:V4SF)</code></pre>
<p>The rest of <code>RTL</code> code is extraction of that result as <code>printf()</code>
arguments. It’s a lot of superfluous data moves. Later optimizations
should clean it up and assign “hardware” registers like <code>r9</code> to virtual
registers like <code>r108</code>. For completeness final <code>353r.dfinish</code> looks this
way:</p>
<pre><code>$ cat a.c.353r.dfinish

;; Function main (main, funcdef_no=574, decl_uid=6511, cgraph_uid=575, symbol_order=574) (executed once)

    1: NOTE_INSN_DELETED
    3: NOTE_INSN_BASIC_BLOCK 2
    2: NOTE_INSN_FUNCTION_BEG
   34: {sp:DI=sp:DI-0x8;clobber flags:CC;clobber [scratch];}
      REG_UNUSED flags:CC
      REG_CFA_ADJUST_CFA sp:DI=sp:DI-0x8
   35: NOTE_INSN_PROLOGUE_END
   19: cx:SI=0x7fffffff
   20: dx:SI=0x7fffffff
   44: {ax:DI=0;clobber flags:CC;}
      REG_UNUSED flags:CC
   17: r9:SI=0x7fffffff
   18: r8:SI=0x7fffffff
   22: di:SI=0x2
   32: si:DI=`*.LC2'
      REG_EQUIV `*.LC2'
   24: ax:SI=call [`__printf_chk'] argc:0
      REG_DEAD r9:SI
      REG_DEAD r8:SI
      REG_DEAD di:SI
      REG_DEAD si:DI
      REG_DEAD cx:SI
      REG_DEAD dx:SI
      REG_UNUSED ax:SI
      REG_CALL_DECL `__printf_chk'
   45: {ax:DI=0;clobber flags:CC;}
      REG_UNUSED flags:CC
   46: NOTE_INSN_EPILOGUE_BEG
   37: {sp:DI=sp:DI+0x8;clobber flags:CC;clobber [scratch];}
      REG_UNUSED flags:CC
      REG_CFA_ADJUST_CFA sp:DI=sp:DI+0x8
   30: use ax:SI
   38: simple_return
   41: barrier
   33: NOTE_INSN_DELETED</code></pre>
<p>Here we don’t have <code>fix()</code> calls any more. <code>printf()</code> call already
contains immediate <code>r8:SI=0x7fffffff</code> constants. All registers are
resolved to real register names. Searching for <code>fix()</code> in all the pass
files I found that <code>272r.cse1</code> was the last pass that mentioned it.
<code>a.c.273r.fwprop1</code> already has the constants inlined. Looking at
<code>272r.cse1</code> in <code>-fdump-rtl-all-all</code> we can see that details are inferred
by <code>cse1</code> about the <code>fix()</code> <code>RTL</code> instruction:</p>
<pre><code>(insn 7 6 8 2 (set (reg:V4SI 104)
        (fix:V4SI (reg:V4SF 106))) &quot;...-gcc-15.0.0/lib/gcc/x86_64-unknown-linux-gnu/15.0.0/include/emmintrin.h&quot;:863:19 4254 {
fix_truncv4sfv4si2}
     (expr_list:REG_EQUAL (const_vector:V4SI [
                (const_int 2147483647 [0x7fffffff]) repeated x4
            ])
        (expr_list:REG_DEAD (reg:V4SF 105)
            (nil))))</code></pre>
<p><code>fix_truncv4sfv4si2()</code> is the name of function that implements conversion
from <code>fix()</code> call down to the lower level instructions. And it looks
like <code>fix()</code> expansion also derived that the finals result is a constant:
<code>(expr_list:REG_EQUAL (const_vector:V4SI [ (const_int 2147483647 [0x7fffffff]) repeated x4])</code>.
Next <code>fwprop1</code> pass will use that constant value everywhere where <code>r104</code>
is used.</p>
<p><a href="https://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html"><code>gcc</code> internals</a>
documentation says that <code>fix_trunc</code> is a <code>float-to-int</code> conversion. Note
that this conversion does not look specific to our intrinsic. Any
code that casts floats would use the same helper. That explains why
<code>_mm_cvttps_epi32()</code> semantics around the overflow are not honored and
generic floating conversion code it performed by <code>gcc</code> as if it was
written as <code>(int)(2147483648.0f)</code>. Apparently both <code>0x7fffffff</code> and
<code>0x80000000</code> values are correct under that assumption.</p>
<p>The problem is that <code>_mm_cvttps_epi32()</code> is more specific than any valid
<code>float-&gt;int</code> conversion. <code>intel</code> manual specifically says that at
<a href="https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html"><code>CVTTPS2DQ</code> description</a>
in <code>"Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4"</code>:</p>
<pre><code>Description
...
When a conversion is inexact, a truncated (round toward zero) value is
returned. If a converted result is larger than the maximum signed
doubleword integer, the floating-point invalid exception is raised, and
if this exception is masked, the indefinite integer value (80000000H) is
returned.</code></pre>
<p>Thus, <code>0x80000000</code> would be a correct value here and not <code>0x7fffffff</code>.</p>
<h2 id="avoiding-the-_mm_cvttps_epi32-non-determinism">avoiding the <code>_mm_cvttps_epi32()</code> non-determinism</h2>
<p>OK, <code>gcc</code> decided to treat it as problematic when handling overflow
condition. That should be easy to work around by checking first if our
value is in range first, right? Say, something like the following
pseudocode:</p>
<pre class="c"><code>float v = 2147483648.0f;
int32_t result;
if (v &gt;= 2147483648.0f) {
    result = 0x7fffffff;
} else {
    result = fix(v);
}</code></pre>
<p>In a vectored code writing branching code is problematic, thus one needs
to be creative and use masking. That is what <code>highway</code> did in
<a href="https://github.com/google/highway/commit/9dc6e1ecb0748df78398b037d6a8a89e667702e7"><code>avoid GCC "UB" in truncating cases</code></a>
commit. It’s a lot of code, but it’s idea is to mask away values
calculated against overflows:</p>
<pre class="diff"><code>@@ -10884,7 +10869,11 @@ HWY_API VFromD&lt;D&gt; ConvertInRangeTo(D /*di*/, VFromD&lt;RebindToFloat&lt;D&gt;&gt; v) {
 // F32 to I32 ConvertTo is generic for all vector lengths
 template &lt;class D, HWY_IF_I32_D(D)&gt;
 HWY_API VFromD&lt;D&gt; ConvertTo(D di, VFromD&lt;RebindToFloat&lt;D&gt;&gt; v) {
-  return detail::FixConversionOverflow(di, v, ConvertInRangeTo(di, v));
+  const RebindToFloat&lt;decltype(di)&gt; df;
+  // See comment at the first occurrence of &quot;IfThenElse(overflow,&quot;.
+  const MFromD&lt;D&gt; overflow = RebindMask(di, Ge(v, Set(df, 2147483648.0f)));
+  return IfThenElse(overflow, Set(di, LimitsMax&lt;int32_t&gt;()),
+                    ConvertInRangeTo(di, v));
 }</code></pre>
<p>If we amend our original example with this tweak we will get the
following equivalent code:</p>
<pre class="c"><code>// $ cat bug.cc
#include &lt;stdint.h&gt;
#include &lt;string.h&gt;
#include &lt;emmintrin.h&gt;

__attribute__((noipa))
static void assert_eq_p(void * l, void * r) {
    char lb[16];
    char rb[16];

    __builtin_memcpy(lb, l, 16);
    __builtin_memcpy(rb, r, 16);

    if (__builtin_memcmp(lb, rb, 16) != 0) __builtin_trap();
}

#if 0
#include &lt;stdio.h&gt;
__attribute__((noipa))
static void d_i(const char * prefix, __m128i p) {
    uint64_t v[2];
    memcpy(v, &amp;p, 16);

    fprintf(stderr, &quot;%10s(i): %#016lx %#016lx\n&quot;, prefix, v[0], v[1]);
}
#endif

__attribute__((noipa))
static void assert_eq(__m128i l, __m128i r) { assert_eq_p(&amp;l, &amp;r); }

int main() {
  const __m128i su = _mm_set1_epi32(0x4f000000);
  const __m128  sf = _mm_castsi128_ps(su);

  const __m128  overflow_mask_f32 = _mm_cmpge_ps(sf, _mm_set1_ps(2147483648.0f));
  const __m128i overflow_mask = _mm_castps_si128(overflow_mask_f32);

  const __m128i conv = _mm_cvttps_epi32(sf);
  const __m128i yes = _mm_set1_epi32(INT32_MAX);

  const __m128i a = _mm_and_si128(overflow_mask, yes);
  const __m128i na = _mm_andnot_si128(overflow_mask, conv);

  const __m128i conv_masked = _mm_or_si128(a, na);

  const __m128i actual = _mm_cmpeq_epi32(conv_masked, _mm_set1_epi32(INT32_MAX));
  const __m128i expected = _mm_set1_epi32(-1);

  assert_eq(expected, actual);
}</code></pre>
<p>Here <code>_mm_and_si128()</code> and <code>_mm_andnot_si128()</code> are used to mask away
converted values larger than <code>2147483648.0f</code>.
If we look at the diagram it looks this way (I collapsed vector values
into <code>... x4</code> form as all the values should be identical):</p>
<img src="https://trofi.github.io/posts.data.inline/317-gcc-simd-intrinsics-bug/fig-1.gv.svg" />
<p>Here <code>conv -&gt; na</code> green arrow shows where we throw away all the indefinite
values. They all get substituted for <code>yes = 0x7FFFffff x4</code> value.
Thus, the program should finally be deterministic, right? Let’s check:</p>
<pre><code>$ gcc bug.cc -O0 -o a &amp;&amp; ./a

$ gcc bug.cc -O2 -o a &amp;&amp; ./a
Illegal instruction (core dumped)</code></pre>
<p>It does not. Only <code>-O0</code> case works (just like before). Looking at the
assembly again, just <code>-O2</code> this time:</p>
<pre><code>$ rizin ./a
; [0x004010a0]&gt; aaaa
; [0x004010a0]&gt; s main
; [0x00401040]&gt; pdf
            ; DATA XREF from entry0 @ 0x4010a8
            ;-- section..text:
/ int main(int argc, char **argv, char **envp);
|           ; arg uint64_t arg7 @ xmm0
|                 subq  $8, %rsp                             ; [13] -r-x section size 483 named .text
|                 movss data.00402004, %xmm1                 ; [0x402004:4]=0x4f000000
|                 movss data.00402008, %xmm3                 ; [0x402008:4]=0x7fffffff
|                 shufps $0, %xmm1, %xmm1
|                 movaps %xmm1, %xmm2
|                 cvttps2dq %xmm1, %xmm0
|                 shufps $0, %xmm3, %xmm3
|                 cmpleps %xmm1, %xmm2
|                 movdqa %xmm2, %xmm1
|                 andps %xmm3, %xmm2
|                 pandn %xmm0, %xmm1
|                 por   %xmm2, %xmm1
|                 pcmpeqd %xmm0, %xmm1                       ; arg7
|                 pcmpeqd %xmm0, %xmm0                       ; arg7
|                 callq sym.assert_eq_int64_t___vector_2___int64_t___vector_2 ; sym.assert_eq_int64_t___vector_2___int64_t___vector_2
|                 xorl  %eax, %eax
|                 addq  $8, %rsp
\                 retq</code></pre>
<p>At the first glance <code>cvttps2dq</code> instruction is present, thus <code>gcc</code> was
not able to completely constant fold it away. Thus, it’s not immediately
obvious why it’s incorrect. Let’s have a look at the control flow
diagram reconstructed from the assembly:</p>
<img src="https://trofi.github.io/posts.data.inline/317-gcc-simd-intrinsics-bug/fig-2.gv.svg" />
<p>In practice <code>pcmpeqd %xmm0, %xmm1</code> instruction that was supposed to
implement <code>_mm_cmpeq_epi32(conv_masked, _mm_set1_epi32(INT32_MAX))</code> gets
<code>INT32_MAX</code> not as a constant (say, from <code>%xmm3</code>), but as a <code>%xmm0</code>
register assuming it already has the expected value. Red line shows
where the assumption is introduced and brown dotted line shows what it
is removing.</p>
<p>The optimizer was not able to constant-fold all the arithmetic operations,
but it was able to fold just enough to introduce the discrepancy between
assumed and actual value of <code>cvttps2dq</code>.</p>
<p>To remove this overly specific assumption <code>gcc-15</code> updated <code>fix()</code> code
not to assume a particular value on overflows using
<a href="https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=b05288d1f1e4b632eddf8830b4369d4659f6c2ff">this patch</a>:</p>
<pre class="diff"><code>--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -2246,7 +2246,18 @@ fold_convert_const_int_from_real (enum tree_code code, tree type, const_tree arg
   if (! overflow)
     val = real_to_integer (&amp;r, &amp;overflow, TYPE_PRECISION (type));

-  t = force_fit_type (type, val, -1, overflow | TREE_OVERFLOW (arg1));
+  /* According to IEEE standard, for conversions from floating point to
+     integer. When a NaN or infinite operand cannot be represented in the
+     destination format and this cannot otherwise be indicated, the invalid
+     operation exception shall be signaled. When a numeric operand would
+     convert to an integer outside the range of the destination format, the
+     invalid operation exception shall be signaled if this situation cannot
+     otherwise be indicated.  */
+  if (!flag_trapping_math || !overflow)
+    t = force_fit_type (type, val, -1, overflow | TREE_OVERFLOW (arg1));
+  else
+    t = NULL_TREE;
+
   return t;
 }

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 5caf1dfd957f..f6b4d73b593c 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2256,14 +2256,25 @@ simplify_const_unary_operation (enum rtx_code code, machine_mode mode,
       switch (code)
 	{
 	case FIX:
+	  /* According to IEEE standard, for conversions from floating point to
+	     integer. When a NaN or infinite operand cannot be represented in
+	     the destination format and this cannot otherwise be indicated, the
+	     invalid operation exception shall be signaled. When a numeric
+	     operand would convert to an integer outside the range of the
+	     destination format, the invalid operation exception shall be
+	     signaled if this situation cannot otherwise be indicated.  */
 	  if (REAL_VALUE_ISNAN (*x))
-	    return const0_rtx;
+	    return flag_trapping_math ? NULL_RTX : const0_rtx;
+
+	  if (REAL_VALUE_ISINF (*x) &amp;&amp; flag_trapping_math)
+	    return NULL_RTX;

 	  /* Test against the signed upper bound.  */
 	  wmax = wi::max_value (width, SIGNED);
 	  real_from_integer (&amp;t, VOIDmode, wmax, SIGNED);
 	  if (real_less (&amp;t, x))
-	    return immed_wide_int_const (wmax, mode);
+	    return (flag_trapping_math
+		    ? NULL_RTX : immed_wide_int_const (wmax, mode));

 	  /* Test against the signed lower bound.  */
 	  wmin = wi::min_value (width, SIGNED);
@@ -2276,13 +2287,17 @@ simplify_const_unary_operation (enum rtx_code code, machine_mode mode,

 	case UNSIGNED_FIX:
 	  if (REAL_VALUE_ISNAN (*x) || REAL_VALUE_NEGATIVE (*x))
-	    return const0_rtx;
+	    return flag_trapping_math ? NULL_RTX : const0_rtx;
+
+	  if (REAL_VALUE_ISINF (*x) &amp;&amp; flag_trapping_math)
+	    return NULL_RTX;

 	  /* Test against the unsigned upper bound.  */
 	  wmax = wi::max_value (width, UNSIGNED);
 	  real_from_integer (&amp;t, VOIDmode, wmax, UNSIGNED);
 	  if (real_less (&amp;t, x))
-	    return immed_wide_int_const (wmax, mode);
+	    return (flag_trapping_math
+		    ? NULL_RTX : immed_wide_int_const (wmax, mode));

 	  return immed_wide_int_const (real_to_integer (x, &amp;fail, width),
 				       mode);</code></pre>
<p>It fixes both tree optimizations of <code>RTL</code> optimizations not to assume a
specific value on known overflows.
After the fix <code>gcc</code> generates something that passes the test at hand:</p>
<pre><code>$ g++ bug.cc -o bug -O2 &amp;&amp; ./bug</code></pre>
<p>And the <code>highway</code> test suite.
For completeness the generated code now looks like this:</p>
<pre><code>$ rizin ./a
; [0x004010a0]&gt; aaaa
; [0x004010a0]&gt; s main
; [0x00401040]&gt; pdf
            ; DATA XREF from entry0 @ 0x4010b8
            ;-- section..text:
/ int main(int argc, char **argv, char **envp);
|           ; arg uint64_t arg8 @ xmm1
|                 subq  $8, %rsp                             ; [13] -r-x section size 499 named .text
|                 movss data.00402004, %xmm0                 ; [0x402004:4]=0x4f000000
|                 shufps $0, %xmm0, %xmm0
|                 movaps %xmm0, %xmm2
|                 cmpleps %xmm0, %xmm2
|                 cvttps2dq %xmm0, %xmm0
|                 movdqa %xmm2, %xmm1
|                 pandn %xmm0, %xmm1
|                 movss data.00402008, %xmm0                 ; [0x402008:4]=0x7fffffff
|                 shufps $0, %xmm0, %xmm0
|                 andps %xmm0, %xmm2
|                 pcmpeqd %xmm0, %xmm0
|                 por   %xmm1, %xmm2
|                 pcmpeqd %xmm1, %xmm1                       ; arg8
|                 psrld $1, %xmm1
|                 pcmpeqd %xmm2, %xmm1                       ; arg8
|                 callq sym.assert_eq_int64_t___vector_2___int64_t___vector_2 ; sym.assert_eq_int64_t___vector_2___int64_t___vector_2
|                 xorl  %eax, %eax
|                 addq  $8, %rsp
\                 retq</code></pre>
<p>This code looks slightly closer to originally written <code>C</code> code: <code>%xmm2</code>
collects masked result of <code>cvttps2dq</code> and <code>%xmm1</code> contains <code>0x7FFFffff</code>
value.</p>
<h2 id="parting-words">Parting words</h2>
<p>While not as powerful as tree passes <code>RTL</code> passes are capable of folding
constants, propagating assumed values and removing dead code.</p>
<p><code>highway</code> uncovered an old <code>gcc</code> <a href="https://gcc.gnu.org/PR115161">bug</a> in
a set of <code>float-&gt;int</code> conversion <code>x86</code> intrinsics. This bug was not seen
as frequently until <code>gcc</code> implemented more constant folding cases for
intrinsics in <a href="https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f2449b55fb2d32">this change</a>.</p>
<p><code>gcc</code> still has a few places where it could constant-fold a lot more:</p>
<ul>
<li>handle <code>_mm_cvttps_epi32(constant)</code></li>
<li>eliminate redundant <code>movaps %xmm0, %xmm2; cmpleps %xmm0, %xmm2</code> and
below</li>
</ul>
<p>But <code>gcc</code> does not do it today.</p>
<p>If <code>gcc</code> thinks that some intrinsic returns a value that differs from
reality it’s very hard to reliably convince <code>gcc</code> to assume something
else. Sometimes it’s easier to use inline assembly to get the desired
result as a short term workaround.</p>
<p>Have fun!</p>]]></summary>
</entry>

</feed>
