Skip to main content

Hi everyone,

First, I want to thank all the OpenDataSoft developers for adding the new join processor — it looks very promising!

I tried to identify a list of French companies along with their addresses using the Sirene dataset (https://public.opendatasoft.com/explore/dataset/economicref-france-sirene-v3). Unfortunately, I’m unable to publish the resulting dataset: for some addresses, there are too many companies. For instance, I get the following error message: “Local key value 'paris, 155, rue, de charonne' matches too many records in the remote dataset. Reduce the join scope.”

There are 67 companies at that address. Obviously, I can't reduce the scope, since all those companies share the exact same address.

What can I do to successfully join the two datasets and publish the result — even if it means ignoring some addresses?

Thanks in advance for your help!

Hello Antoine,

We are pleased to hear that you enjoy the new join processor!
Usually, we do not advise to join on adresses, because it is not reliable and it outputs many matches. We rather recommend to use join keys such as codes, identifiers, etc.
Nevertheless, we are going to improve the current behaviour, so that if too many matches are found, you will still be able to publish the dataset (the results that contain too many matches will be truncated).


Thank you for your message and for the improvements you're planning to bring to the join processor.

I would indeed prefer to perform joins using an identifier, as recommended. However, due to the way my datasets are currently structured, this is unfortunately not possible.

I really appreciate you taking my request into account.


Reply