infoSource is a cybersecurity newsletter. By subscribing to infoSource you will remain up-to-date on the latest in communication, computer and software cybersecurity issues.

It's almost always delicious, especially with Eggs.

Should you trust the hash? 

I like breakfast with eggs and hash, but that's not the question. As a food recipe hash is an amalgam of various delicious elements brought together. Meat, potatoes and fried onions that's food "hashery."  A chef can get really fancy with a culinary hash recipe, and this paradigm also includes the algorithms used by a cybersecurity chef, of sorts.

"It's not always what you know. It's sometimes how you come to know it."

~ InfoBro

 A hash function in relation to cybersecurity (and math, there's a good "bit" of mathematics) should have these important security properties:

Image: Elise Bauer
  1.  preimage resistance - meaning that it is computationally infeasible to predetermine the input value based upon the output value. In other words, you can't fake an input based on knowing what the result is without a major time and resource investment.
  2. second preimage resistance - meaning that finding a different input that produces the same output is also computationally infeasible. In other words, good luck with that too (unless the chosen hashing algorithm is fundamentally flawed or weak). This is often associated with a hashing vulnerability referred to as a collision. So, finally also...
  3. collision resistance - which, circling back to the first two properties, iterates again that the function should be resistant, and emphasizing again - computationally infeasible, to reproducing the same output for two (keeping it simple, but also, or more) different inputs.

    There's No Perfect Hash (Function)

    In the simplest terms a hash value may be but is usually a fixed length unique value that can be associated with only one input. Think of fixed length in this context as a series of ones and zeroes (bits) that is always the same number of units long and unique no matter how much information you provide as an input. The output is often referred to as a digest because an arbitrarily large input value is represented by a fixed length output, a digest of what was gobbled up and processed by the function.

    Most hash functions are imperfect because it is inevitable that a hash function, due to the finite number of fixed length digests that can be produced, will always encounter a collision because the inputs are infinite and fixed length values are not. The smaller the number of bits used to produce the digest (the "output") the more likely it is that the input value will produce an output that will be the same as an alternative input. That is unless the hash function is capable of producing variable length outputs. For perspective however, consider that a unique 256-bit hash digest value output represents roughly the number of atoms in the universe (read it all the way through to obtain a detailed understanding). There are many proofs that can demonstrate that the length of the output is not the only limitation imposed on collision resistance, but the simplest, in terms of a deterministic algorithm, is the so called "birthday paradox."

     Why is this, or hash functions generally, important?

    Maybe I'm just a nerd, that's highly likely. :-) The importance of hash functions is related to data management in a number of different, although related ways. The hash function was invented to provide a solution to the problem of indexing and searching large collections of information efficiently. Beginning with a relatively simple checksum formula to verify the validity of 10 digit numbers, the algorithm provides a 'check' for the validity the input. This evolved when it was suggested that searching through data by putting information into "buckets" was a faster way of searching, if all you had to search for were the bucket numbers to find the information. Essentially, using a formula to determine the bucket number, using that number as the identifying value to find the bucket where the data is stored was more efficient.  Additionally, such formulas evolved into deterministic algorithms that can be used to validate the integrity of information such as credit card numbers, account numbers, phone numbers, etc.  Anytime you perform an online transaction hash functions are used to ensure the validity and integrity of nearly everything from the moment the communication or actions are engaged.

    Here's why I'm telling you this.

    Recently, (well not so recently, as it's been a long standing legal conundrum) a number of platforms *cough* Apple *cough* have delved into an effort to monitor for data that is being stored in their online services, specifically, for potentially illicit content. This is not new, online platforms have attempted to identify and report illegal and illicit content uploaded by users from the olden times, the times even of the online Bulletin Board System (BBS), a simple system of sharing ideas and content. In present day the methods of sharing ideas is as robust as imagination allows from text including images and videos.

    Every individual bit of information that lands on a server, that is shared without prior encryption, can also be compared to that which is known or thought to be known, by the use of algorithms that can parcel the information by way of hash functions. A court ruling by the U.S. Court of Appeals for the Ninth Circuit, determined that a private search based submissions of information from Google to the National Center for Missing and Exploited Children (NCMEC) was inadmissible.  The submission was predicated on a purported predetermination by Google (Being careful here - through which Google's procedure, that is proprietary and opaque, there's a bigger issue here; a determination was made). Although there were additional issues at play regarding Fourth amendment restrictions based on the doctrine, for Government, regarding the use of private search based submissions. The turning point, and I'm not a lawyer, seems to have been that a Google employee hadn't looked at the alleged potentially illegal images, but relied on what seems to have been a hash value match. Apple's methodology to detect illicit content, known as Child Sexual Abuse Material (CSAM), is powered using a system of mathematical transformations known as Private Set Intersection (PSI)] that is based heavily on systematic hashing. Content is reported for review is based on a validation methodology known as threshold secret sharing. The scheme aims to keep the information "private" as it is collected. This may suffer the same pitfalls as the Google case, however, Apple has stated reports will be manually reviewed.

    The Bottom Line

    The validity of hash algorithms, in the eyes of the court, may be at play (and in jeopardy) for identifying illicit content, unless there is a human observer. Can the sensitivities of an analyst, whatever the ilk, be preserved? Can an analyst rely upon the scrutiny of an algorithm to make a perfunctory and decisive decision about the calculated content rather than making a physical observation to make a recommendation for reporting? Time will tell, but, this ruling has wide spanning implications for the submission of information that has not been physically observed.

    Stay safe. Find evil. Subscribe. Give Feedback and Stay Tuned.

    Stay Informed

    When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.

    But That's Why I Have A Firewall, Right?

    Related Posts



    No comments made yet. Be the first to submit a comment
    Already Registered? Login Here
    Monday, 22 July 2024

    Contact Me


    1309 S Street S.E., Washington, DC, 20020
    00 1 202-276-8641

    Send Me a Message

    Contact Me