Lookup using from_dataset() regularly times out

I have a data set of 5 columns and about 87,000 lines, and I am trying to look up codes in it using the Diagnosis Code column. With only one such column in my schema, it often outputs correctly, but sometimes I get time-out errors. When I have two such columns, it always times out:

ClaimId,LineNumber,Icd10ProcedureCode,HcpcsCptCode
1000003,1,Error: Timed out,Error: Timed out
1000003,2,Error: Timed out,Error: Timed out
1001002,1,Error: Timed out,Error: Timed out
1001002,2,Error: Timed out,Error: Timed out
1002002,1,Error: Timed out,Error: Timed out
...

Sometimes both columns time out, and other times only one does.

Here is the schema with two similar lookups using from_dataset():
Test Claim Line

Here is a simpler schema with only one columns that usually times out, though less often:
Test Procedure Code Lookup

UPDATE:

I have reduced the lookup data set to 10,000 rows, and the problem has subsided. I hope this post helps others avoid this problem. Of course I also had to regenerate/upload all the data sets for which I was performing the lookup, as they were also generated using the same data set. I wonder what the true high limit is for lookup table size.

Did the time-out errors occur from the first row on, or did they start somewhere in the middle of the dataset?

In the Preview, it was always from the first row all the way down. I might have downloaded it once to see if it was only a side effect of the Preview mode, and I think it was throughout the whole file.

That makes sense. There is a timeout to prevent bad user scripts from hogging processing power, but the initial indexing of large datasets for lookup might exceed that under heavy load. I’m going to try exempting that work from the timeout and see if it cures the issue. I’ll pos here once that’s available. Thanks for reporting!

1 Like

Just deployed an update which bumps up the timeout for loading/indexing datasets to 1 minute. That should allow for much larger datasets.

1 Like