We are setting up a process to update the resources for our datasets using the Automation API. I am looking for best practice advice. Here is what we currently do, starting with a dataset_id
and a file that we want to upload:
-
Retrieve the
dataset_uid
using the endpoint/api/explore/v2.1/catalog/datasets/{dataset_id}
. -
Retrieve the currently used
resource_uid
of the dataset using the endpoint/api/automation/v1.0/datasets/{dataset_uid}/resources
, with thedataset_uid
from step 1. -
Upload the file to the dataset using the endpoint
/api/automation/v1.0/datasets/{dataset_uid}/resources/files
, again using thedataset_uid
from step 1. From the response I get theÂfile_uid
of the uploaded file. -
Update the dataset resource using the endpoint
/api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}
, with thedataset_uid
from step 1, theresource_uid
from step 2, and theÂfile_uid
 from step 3. -
Republish the dataset using the endpoint
/api/automation/v1.0/datasets/{dataset_uid}/publish
, with thedataset_uid
from step 1.
This process works so far. However, I'm not quite sure if there's a more efficient way. I'm especially interested in the following two points:
-
Can I somehow combine steps 3 and 4 by directly uploading the file to
/api/automation/v1.0/datasets/{dataset_uid}/resources/{resource_uid}
? I see in the documentation that instead of adataset_uid
, aDatasetFile
is also mentioned. However, I'm not quite sure how I would go about doing that. -
Once I upload a new resource file in step 3, I can't find the new file anywhere in the Opendatasoft GUI. Is there a way to manage all the resource files I’ve uploaded so that I don’t clutter the space? Or should I just use the endpoint to clean the cache of the resource, assuming that all uploaded files not referenced in a resource are removed?
Of course, if there are other best practices to streamline the process of updating dataset resources, I would be happy to hear about them.
Many thanks,
Johannes