I have a data set of 5 columns and about 87,000 lines, and I am trying to look up codes in it using the Diagnosis Code column. With only one such column in my schema, it often outputs correctly, but sometimes I get time-out errors. When I have two such columns, it always times out:
ClaimId,LineNumber,Icd10ProcedureCode,HcpcsCptCode
1000003,1,Error: Timed out,Error: Timed out
1000003,2,Error: Timed out,Error: Timed out
1001002,1,Error: Timed out,Error: Timed out
1001002,2,Error: Timed out,Error: Timed out
1002002,1,Error: Timed out,Error: Timed out
...
Sometimes both columns time out, and other times only one does.
Here is the schema with two similar lookups using from_dataset():
Test Claim Line
Here is a simpler schema with only one columns that usually times out, though less often:
Test Procedure Code Lookup
UPDATE:
I have reduced the lookup data set to 10,000 rows, and the problem has subsided. I hope this post helps others avoid this problem. Of course I also had to regenerate/upload all the data sets for which I was performing the lookup, as they were also generated using the same data set. I wonder what the true high limit is for lookup table size.