My application generates a data field from a hash function, so the resulting data becomes a non-duplicative list. However, I want the data to reflect the case where some of the data is duplicated, as if the key was repeated before the hash is applied. Ideally, I would like to control the size of the set of number of keys that are repeated (i.e. how often the same hash is duplicated), and how many times the each unique key is repeated (i.e. How many copies of the same hash are repeated in the resulting hash data).
Should I simple generate a dataset of keys, manually duplicate some of them, then ‘feed’ that data into the hash to populate the resulting data field?
Yep, you’ve got the right idea… generate a set of data, duplicate some of the values, upload it as a dataset, then use the digest formula function to generate the hash values.