from_dataset causing timeout errors

Hi,

I’ve got a dataset with ~42,000 rows in it and I am trying to lookup a value using the following formula.

from_dataset(“Time Distribution”, “hour”, location_id: location_id, year: year, month: month, week: week, dow: dow)

Fairly obvious what this does, but fundamentally I’m shaping a time series distribution for locations based on the year, month, week and day of week (dow).

If I use this, I cannot generate ANY data. Even if I only ask to generate ONE row, this errors out and I need to generate around 5000 rows.

Additionally, it seems that when I call generate in my API handler, I can’t say generate 800 + rand(201) to generate between 800 and 1,000 rows. That causes a generic “Something went wrong” / 500 response from the server.

For clarity, the frequency column in my dataset is grouped by all those time periods, including hour. Hence, that call should return me a value for hour which is based on the frequency… I can’t shrink my dataset any more unless I partition it by location, but even if I do that (remove all but 1 location from the Dataset) it still times out.

I have just tried this in the Preview of the Schema (to remove the API from the equation) and with only 20% of the dataset loaded, it just shows the generic “Something went wrong” message after 5 minutes, when I am only asking it to generate 1 row.

So there’s something definitely running slowly in the from_dataset function here, as if I remove the formula and just use a constant value, it returns instantly.

Any thoughts @mockaroo?

I’m having the same timeout issues with a data set that’s only 8k records and I just purchased the Silver plan to enable large data sets. If I can’t get this working, this makes it unusable for my needs. Can we get a fix or an update @mockaroo ?

I’d like to investigate your schema in particular. Please contact support through the website and include a link to your schema in the message and I will troubleshoot.