Can I lookup a value in one dataset using a field in another?

Basically, I’ve got a whole lot of data to work from (too much for my prototyping needs), and I want to be able to create all sorts of derivative datasets from it. But the data I’ve got is meaningfully related. So, while I’m happy to pull a thousand random rows from dataset A, when I pull from dataset B, for certain fields, I only want rows that match the data pulled from dataset A. Does that make sense?

It does. I actually got a formula function that would allow you to lookup values from a dataset working in development last night. I should have it pushed up tonight. The syntax will be:

from_dataset('Countries', 'name', id: country_id)

Does that look like it would work for your purposes?

1 Like

Oooh. I think so!

So, what that does is look in the (external) dataset called “Countries” for a row with an id equal to the country_id that is returned from a different field (in this schema), and it results the field name from that row. Right?

Out of curiosity, how will this handle cases in which the lookup results multiple matches? For example, what if instead of a unique id, I try to match on something less unique? Or would it just pick the first match? Or randomly select from multiple? (My QA past is rearing its head here, and I wanna make sure this does explode or overly tax your DB.)

Also, how tough would multivariate lookup be at this point? It’s been a few years since I’ve played with Rails, but, given your syntax, it feels like this wouldn’t be a big jump.

Yep, you’ve got it.

Right now it just picks the last record matching the criteria. The third argument to the function is a Hash, so looking up based on multiple variables is a simple as:

`from_dataset(‘Countries’, ‘name’, col1: field1, col2: field2)

1 Like

The new from_dataset formula function is now available on mockaroo.com.

2 Likes

You’re amazing! Thank you so much.

Taking this one step further, could it be possible to nest this type of functionality?

from_dataset("Countries", "name", id: from_dataset("Contacts", "country_id", id: id))

In my example, none of the IDs are auto-generated (they come from my datasets). So, I’m looking to grab the country_id of a contact that matches the id of randomly selected contact for this row. And then, I’ll use this country_id to lookup the name of the country in the Countries dataset.

This is hardly an essential feature, since I can do this simply by adding a country_id field to this schema and doing this:

from_dataset("Countries", "name", id: country_id)

But it’d be nice to not have to spit out the extra (unnecessary) fields. (Which I’m going to start another thread about.)

It should be pretty easy for me to add nesting. I’ll take a look at it soon. You can hide any field from the output simply by naming it starting with two underscores. So for example, “__myHiddenField”.

1 Like

Perhaps I’m screwing something up, but this doesn’t seem to be working when I download the data.

I’m getting this error (in the field values):

error: Could not access blank value: Use || to provide a default value for blank fields.

But I’m fairly certain I solved for that already, and it doesn’t occur when I preview. And, just to be clear, it was happening in my previews, until I fixed the formulas.

Can you post the share link at the bottom of your schema? I’ll have a look.

Should this feature work within a JSON Array’s values?

For example:

(Where id in that from_dataset is the id of the current row.)

It should now :slight_smile: Good catch! You should also be able to nest calls to from_dataset to join across multiple datasets.

1 Like

“Right now it just picks the last record matching the criteria.”

Is there anyway you could modify this behavior such that rather than picking the last record matching the criteria, it picks a random record matching the criteria?

Possibly. I’ll look into it tonight…

Cheers. Thanks a lot for your product.

I quickly realized a work around for my particular situation that inspired the question, so no rush. I could still see the value in it though if it wouldn’t require too much of your time to implement.

Any chance you’ve had any luck with this? I’ve come across another situation in which it would be very valuable. Thanks in advance.

@mockaroo Did you come up with a way to get a random entry rather than the last record matching the criteria?

@benallen002 could you post your work around ?

Yes, thank you for reminding me! I just updated the site so that it randomly picks a value if multiple matches are found.

1 Like

Thank you for the fast turn-around.

Is it possible to add an option to prevent duplicates? So, I can make a JSON array of n random values from another dataset and know there won’t be repeats within it.