Best Practice for Updating Dataset Resources with the Automation API

Question

We are setting up a process to update the resources for our datasets using the Automation API. I am looking for best practice advice. Here is what we currently do, starting with a dataset_id and a file that we want to upload:

Retrieve the dataset_uid using the endpoint /api/explore/v2.1/catalog/datasets/{dataset_id}.
Retrieve the currently used resource_uid of the dataset using the endpoint /api/automation/v1.0/datasets/{dataset_uid}/resources, with the dataset_uid from step 1.
Upload the file to the dataset using the endpoint /api/automation/v1.0/datasets/{dataset_uid}/resources/files, again using the dataset_uid from step 1. From the response I get the file_uid of the uploaded file.
Update the dataset resource using the endpoint /api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}, with the dataset_uid from step 1, the resource_uid from step 2, and the file_uid from step 3.
Republish the dataset using the endpoint /api/automation/v1.0/datasets/{dataset_uid}/publish, with the dataset_uid from step 1.

This process works so far. However, I'm not quite sure if there's a more efficient way. I'm especially interested in the following two points:

Can I somehow combine steps 3 and 4 by directly uploading the file to /api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}? I see in the documentation that instead of a dataset_uid, a DatasetFile is also mentioned. However, I'm not quite sure how I would go about doing that.
Once I upload a new resource file in step 3, I can't find the new file anywhere in the Opendatasoft GUI. Is there a way to manage all the resource files I’ve uploaded so that I don’t clutter the space? Or should I just use the endpoint to clean the cache of the resource, assuming that all uploaded files not referenced in a resource are removed?

Of course, if there are other best practices to streamline the process of updating dataset resources, I would be happy to hear about them.

Many thanks,
Johannes

Hugo · Accepted Answer

Hello,You are using the Automation API the right way, and there isn’t much roomfor optimization:You can’t combine step 3 and 4, the file must be uploaded first then associated with the dataset resource in another API call. The mentions to the “DatasetFile” object in the documentation refers to the ability to expand the datasource when making a GET API call to request resource information, in which case you will get more information about the file associated with the resource. For example: GET/api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}?expand=datasource.fileYou can’t list or manage the files associated with the dataset (resources). We didn’t implement this featurebecause it can be complex and usually not needed: the uploaded files which aren’t used by the dataset are automatically cleaned up on our side after a while. Note that cleaning the cache of the resource have no effect for a resource using uploaded files: it is useful when the source is an FTP directory, files have been removed on the FTP server, and you also want to remove them on the Opendatasoft side (which has a cache).Best regards,Hugo

Johannes Hool · Answer

Hi HugoThank you very much. :-)

Reply

Sign up

Already have an Opendatasoft account ?

Login to the community

Already have an Opendatasoft account ?

Scanning file for viruses.

This file cannot be downloaded