Can I lookup a value in one dataset using a field in another?

yoni · April 13, 2016, 5:20pm

Basically, I’ve got a whole lot of data to work from (too much for my prototyping needs), and I want to be able to create all sorts of derivative datasets from it. But the data I’ve got is meaningfully related. So, while I’m happy to pull a thousand random rows from dataset A, when I pull from dataset B, for certain fields, I only want rows that match the data pulled from dataset A. Does that make sense?

mockaroo · April 13, 2016, 5:34pm

It does. I actually got a formula function that would allow you to lookup values from a dataset working in development last night. I should have it pushed up tonight. The syntax will be:

from_dataset('Countries', 'name', id: country_id)

Does that look like it would work for your purposes?

yoni · April 13, 2016, 5:49pm

Oooh. I think so!

So, what that does is look in the (external) dataset called “Countries” for a row with an id equal to the country_id that is returned from a different field (in this schema), and it results the field name from that row. Right?

Out of curiosity, how will this handle cases in which the lookup results multiple matches? For example, what if instead of a unique id, I try to match on something less unique? Or would it just pick the first match? Or randomly select from multiple? (My QA past is rearing its head here, and I wanna make sure this does explode or overly tax your DB.)

Also, how tough would multivariate lookup be at this point? It’s been a few years since I’ve played with Rails, but, given your syntax, it feels like this wouldn’t be a big jump.

mockaroo · April 13, 2016, 6:03pm

Yep, you’ve got it.

Right now it just picks the last record matching the criteria. The third argument to the function is a Hash, so looking up based on multiple variables is a simple as:

`from_dataset(‘Countries’, ‘name’, col1: field1, col2: field2)

mockaroo · April 14, 2016, 1:56am

The new from_dataset formula function is now available on mockaroo.com.

yoni · April 14, 2016, 4:56pm

You’re amazing! Thank you so much.

Taking this one step further, could it be possible to nest this type of functionality?

from_dataset("Countries", "name", id: from_dataset("Contacts", "country_id", id: id))

In my example, none of the IDs are auto-generated (they come from my datasets). So, I’m looking to grab the country_id of a contact that matches the id of randomly selected contact for this row. And then, I’ll use this country_id to lookup the name of the country in the Countries dataset.

This is hardly an essential feature, since I can do this simply by adding a country_id field to this schema and doing this:

from_dataset("Countries", "name", id: country_id)

But it’d be nice to not have to spit out the extra (unnecessary) fields. (Which I’m going to start another thread about.)

mockaroo · April 14, 2016, 5:01pm

It should be pretty easy for me to add nesting. I’ll take a look at it soon. You can hide any field from the output simply by naming it starting with two underscores. So for example, “__myHiddenField”.

yoni · April 14, 2016, 6:36pm

Perhaps I’m screwing something up, but this doesn’t seem to be working when I download the data.

I’m getting this error (in the field values):

error: Could not access blank value: Use || to provide a default value for blank fields.

But I’m fairly certain I solved for that already, and it doesn’t occur when I preview. And, just to be clear, it was happening in my previews, until I fixed the formulas.

mockaroo · April 14, 2016, 7:27pm

Can you post the share link at the bottom of your schema? I’ll have a look.

yoni · April 14, 2016, 10:46pm

Should this feature work within a JSON Array’s values?

For example:

(Where id in that from_dataset is the id of the current row.)

mockaroo · April 14, 2016, 11:52pm

It should now Good catch! You should also be able to nest calls to from_dataset to join across multiple datasets.

benallen002 · June 16, 2016, 6:38pm

“Right now it just picks the last record matching the criteria.”

Is there anyway you could modify this behavior such that rather than picking the last record matching the criteria, it picks a random record matching the criteria?

mockaroo · June 16, 2016, 9:20pm

Possibly. I’ll look into it tonight…

benallen002 · June 16, 2016, 9:23pm

Cheers. Thanks a lot for your product.

I quickly realized a work around for my particular situation that inspired the question, so no rush. I could still see the value in it though if it wouldn’t require too much of your time to implement.

benallen002 · June 20, 2016, 5:38pm

Any chance you’ve had any luck with this? I’ve come across another situation in which it would be very valuable. Thanks in advance.

ken · September 29, 2016, 2:58am

@mockaroo Did you come up with a way to get a random entry rather than the last record matching the criteria?

ken · September 29, 2016, 2:59am

@benallen002 could you post your work around ?

mockaroo · September 29, 2016, 3:44am

Yes, thank you for reminding me! I just updated the site so that it randomly picks a value if multiple matches are found.

ken · September 29, 2016, 4:05am

Thank you for the fast turn-around.

yoni · November 11, 2016, 8:05pm

Is it possible to add an option to prevent duplicates? So, I can make a JSON array of n random values from another dataset and know there won’t be repeats within it.