Git Bisect and Security Advisories

6 minute read

How to compute a vulnerability exposure window with git bisect.

When a security vulnerability is found in software, it is desirable to know how long the software was vulnerable to that particular vulnerability and which versions were impacted. This information allows communications to be sent to consumers of that software so they know to upgrade, and it places bounds on the known window of vulnerability (which is relevant to to investigations that might look for indicators of abuse). Additionally, accurate information about the range of affected versions reduces false positives in security scanning tools.

Recently, I submitted a security advisory for Rust’s nalgebra crate. While discussion provided a version number that fixed the vulnerability, I still needed to find the version that introduced the vulnerability to populate the unaffected field in the advisory format. The following post describes my methodology for finding when the vulnerability was introduced, without any prior knowledge into the nalgebra crate.

Understanding the vulnerability

First, we need to understand the vulnerability well enough to evaluate whether a given version of the code contains that vulnerability. The vulnerability in question results from a failure to validate input in a deserialize implementation for the VecStorage struct. In this case, the deserialize function was automatically generated by a macro.

#[cfg_attr(feature = "serde-serialize", derive(Serialize, Deserialize))]
pub struct VecStorage<T, R: Dim, C: Dim> {
    data: Vec<T>,
    nrows: R,
    ncols: C,
}

The first line of the code snippet is the macro that generates the serialization and deserialization logic (derive(Serialize, Deserialize)). This logic is only enabled if the serde-serialize feature flag is enabled (this allows users to opt-in to the serialization functionality if it is required).

In the following lines, we see the struct members. data is a vector which is expected to have length nrows * ncols. However, the automatically generated deserializer does not guarantee that invariant. This allows specially crafted inputs to violate the length invariant, which allows memory access past the end of data’s allocated buffer.1

We now know what to look for: if VecStorage has an automatically derived Deserialize implementation, then then the version is vulnerable.

Finding the range of affected commits

Now that we understand how to check a version of the software for the vulnerability, we need an efficient strategy to scan the history of the project to find when the vulnerability was introduced. The nalgebra Git history has nearly two thousand commits, so evaluating each commit by hand is unreasonable.

The process can be accelerated using a binary search to determine which commits to evaluate. Git implements this binary search functionality with a sub-command called bisect. With the bisect tool, you tell Git whether a given commit should be labeled as “good” or “bad.” With those labels, Git will attempt to narrow in on the exact commit that caused the change from “good” to “bad.” In our case, we define “good” to mean “not vulnerable” and “bad” to mean “vulnerable.”

Git bisect binary search animation

The following steps demonstrate how to run git bisect on the nalgebra repository:

  1. Find the most recent vulnerable commit. If a fix is available, the commit immediately preceding it is likely to be the most recent vulnerable commit. If a fix is not available, the latest commit on the main branch may suffice. In this case, the nalgebra project fixed the issue in 5bff536. So, we run git checkout 5bff536^, where the ^ is used to checkout the preceding commit.
     git clone https://github.com/dimforge/nalgebra.git
     cd nalgebra/
     git checkout 5bff536^
    
  2. Start the bisect process. Because we know the current commit is vulnerable, we mark it as “bad” with git bisect bad.
     git bisect start
     git bisect bad
    
  3. Choose a known-good commit. If additional information is available to indicate a previously “good” commit, you can provide that information to reduce the search space. In this case, we do not have such information and must assume the vulnerability could have been introduced at any earlier point.
     # Without knowledge of a previous good commit, this will tell
     # `git bisect` to search the entire commit history.
     git bisect next
    
     # With knowledge of a good commit
     git checkout 123ABCD
     git bisect good
    
  4. Step through commits: At this point, Git will begin automatically stepping through commits. When you mark a commit as good or bad, it will automatically advance to the next commit. As you continue marking commits, Git will narrow the search space to find which commit introduced the vulnerability.

    At each step, we run a recursive search with ripgrep to check if the derived Deserialize implementation is present:

     rg 'struct VecStorage' -B 1
    

    When we see output like the following, we know the commit is vulnerable.

     27-#[cfg_attr(feature = "serde-serialize", derive(Serialize, Deserialize))]
     28:pub struct VecStorage<T, R: Dim, C: Dim> {
    

    And we mark those commits as bad:

     git bisect bad
    

    When there is no output from ripgrep, we know that VecStorage is not present in that commit, so we mark it as good:

     git bisect good
    
  5. Continue stepping through commits: Repeat the instructions from step #4 until you see output like the following:
     Bisecting: 7 revisions left to test after this (roughly 3 steps)
     [0f66403cbbe9eeac15cedd8a906c0d6a3d8841f2] Rename `MatrixVec` to `VecStorage`.
    

    If at any point you make a mistake, you can start over by running:

     git bisect reset
     git checkout 5bff536^
    

Relocated code

We were able to identify commit 0f66403 after only several iterations. However, this is not quite the commit that introduced the vulnerability. From the commit message, we see that VecStorage was renamed from MatrixVec:

Bisecting: 7 revisions left to test after this (roughly 3 steps)
[0f66403cbbe9eeac15cedd8a906c0d6a3d8841f2] Rename `MatrixVec` to `VecStorage`.

Running git show 0f66403cbbe9eeac15cedd8a906c0d6a3d8841f2 displays the diff for this commit. We can see that the struct was only renamed - the derive code is still present, which means the code is still vulnerable:

 #[cfg_attr(feature = "serde-serialize", derive(Serialize, Deserialize))]
-pub struct MatrixVec<N, R: Dim, C: Dim> {
+pub struct VecStorage<N, R: Dim, C: Dim> {
     data: Vec<N>,
     nrows: R,
     ncols: C,
 }

To continue searching backwards from this point, we need to restart the bisect operation at this new commit:

git bisect reset
git checkout 0f66403cbbe9eeac15cedd8a906c0d6a3d8841f2
git bisect bad
git bisect next

And modify our checks with the new struct name:

rg 'struct MatrixVec' -B 1

Modified code

We are quickly closing in on the point where the vulnerability was introduced. Along the way, you might notice that the feature flag logic was removed. However, Deserialize is still present inside the #[derive(...)] macro, which means we can mark this commit as “bad” and continue:

#[derive(Eq, Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct MatrixVec<N, R: Dim, C: Dim> {

Found, at last

Finally, we have reached the end and have identified commit 086e6e7.

086e6e719f53fecba6dadad2e953a487976387f5 is the first bad commit
commit 086e6e719f53fecba6dadad2e953a487976387f5
Date:   Sun Feb 12 18:17:09 2017 +0100

    Doc + slerp + conversions.

:100644 100644 37a437170a0652cdea2b1fa9acc432b0c36d0238 398f0db9bd785f614dce61b12189118ee94f0243 M      Cargo.toml
:100644 100644 f9586c02d4bb6566c368f4af1355ce0ddeeb4468 00457c113ed7b92e490db1c0fc40369ce9bee791 M      Makefile
:100644 100644 c860659cb7e48ed3fe1b02f10eb477e6dde753f7 7d6a99fa156eac406835d7ed97573ed5f5266061 M      README.md
:000000 040000 0000000000000000000000000000000000000000 1ec3a39daebd14edf61757bf8bd97def798d24b4 A      examples
:040000 040000 75d90764b1b2611d068ac8924988cefb425c05f3 2f2d71b9b97c4b8c339ae3397e67d550afa4a027 M      src
:040000 040000 6daa48d65a69f7450442bf4dd3f35fa2549d8f5e 3224ca40928811a275336a1d7f47784b3962d876 M      tests

Running git show 086e6e719f53fecba6dadad2e953a487976387f5, we scan the diff and find that this is where the Deserialize implementation was added. We found the commit where this vulnerability was introduced!

-#[derive(Eq, Debug, Clone, PartialEq)]
+#[derive(Eq, Debug, Clone, PartialEq, Serialize, Deserialize)]
 pub struct MatrixVec<N, R: Dim, C: Dim> {

Converting to versions

There is one last step to our process. We need to find the first version of this crate that was released with vulnerable code. We know that 086e6e719f53fecba6dadad2e953a487976387f5 first introduced the vulnerability, so we can run the following command to see which tags contain that commit in their history:

git tag --contains 086e6e719f53fecba6dadad2e953a487976387f5

This outputs a list of tag names. v0.11.0 is the earliest version containing the vulnerable Deserialize implementation. We now have all the information we need to complete the advisory report.

  1. This is known as a buffer over-read or a buffer overflow