Skip to main content

We are setting up a process to update the resources for our datasets using the Automation API. I am looking for best practice advice. Here is what we currently do, starting with a dataset_id and a file that we want to upload:

  1. Retrieve the dataset_uid using the endpoint /api/explore/v2.1/catalog/datasets/{dataset_id}.

  2. Retrieve the currently used resource_uid of the dataset using the endpoint /api/automation/v1.0/datasets/{dataset_uid}/resources, with the dataset_uid from step 1.

  3. Upload the file to the dataset using the endpoint /api/automation/v1.0/datasets/{dataset_uid}/resources/files, again using the dataset_uid from step 1. From the response I get theĀ file_uid of the uploaded file.

  4. Update the dataset resource using the endpoint /api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}, with the dataset_uid from step 1, the resource_uid from step 2, and theĀ file_uidĀ from step 3.

  5. Republish the dataset using the endpoint /api/automation/v1.0/datasets/{dataset_uid}/publish, with the dataset_uid from step 1.

This process works so far. However, I'm not quite sure if there's a more efficient way. I'm especially interested in the following two points:

  • Can I somehow combine steps 3 and 4 by directly uploading the file to /api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}? I see in the documentation that instead of a dataset_uid, a DatasetFile is also mentioned. However, I'm not quite sure how I would go about doing that.

  • Once I upload a new resource file in step 3, I can't find the new file anywhere in the Opendatasoft GUI. Is there a way to manage all the resource files I’ve uploaded so that I don’t clutter the space? Or should I just use the endpoint to clean the cache of the resource, assuming that all uploaded files not referenced in a resource are removed?

Of course, if there are other best practices to streamline the process of updating dataset resources, I would be happy to hear about them. 😊

Many thanks,
Johannes

Hello,

You are using the Automation API the right way, and there isn’t much roomĀ for optimization:

  • You can’t combine step 3 and 4, the file must be uploaded first then associated with the dataset resource in another API call. The mentions to the ā€œDatasetFileā€ object in the documentation refers to the ability to expand the datasource when making a GET API call to request resource information, in which case you will get more information about the file associated with the resource. For example: GETĀ /api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}?expand=datasource.file
  • You can’t list or manage the files associated with the dataset (resources). We didn’t implement this featureĀ because it can be complex and usually not needed: the uploaded files which aren’t used by the dataset are automatically cleaned up on our side after a while. Note that cleaning the cache of the resource have no effect for a resource using uploaded files: it is useful when the source is an FTP directory, files have been removed on the FTP server, and you also want to remove them on the Opendatasoft side (which has a cache).


Best regards,
Hugo


Hi Hugo
Thank you very much. :-)


Reply