New join processor - Too many records in the remote dataset

Question

Hi everyone,First, I want to thank all the OpenDataSoft developers for adding the new join processor — it looks very promising!I tried to identify a list of French companies along with their addresses using the Sirene dataset (https://public.opendatasoft.com/explore/dataset/economicref-france-sirene-v3). Unfortunately, I’m unable to publish the resulting dataset: for some addresses, there are too many companies. For instance, I get the following error message: “Local key value 'paris, 155, rue, de charonne' matches too many records in the remote dataset. Reduce the join scope.”There are 67 companies at that address. Obviously, I can't reduce the scope, since all those companies share the exact same address.What can I do to successfully join the two datasets and publish the result — even if it means ignoring some addresses?Thanks in advance for your help!

Valentine · Answer

Hello Antoine,We are pleased to hear that you enjoy the new join processor!Usually, we do not advise to join on adresses,because it is not reliable and it outputs many matches. We rather recommend to use join keys such as codes, identifiers, etc.Nevertheless, we are going to improve the current behaviour, so that if too many matches are found, you will still be able to publish the dataset (the results that contain too many matches will be truncated).

Sign up

Already have an Opendatasoft account ?

Login to the community

Already have an Opendatasoft account ?

Scanning file for viruses.

This file cannot be downloaded