About the series
As you know, we at Humans in the Loop have a great love and appreciation of a well-designed annotation tool. After the great feedback on the reviews we published of our the best platforms on the market here and here, we decided that it’s time for a deep dive in some of our all-time favorites!
This article is the seventh from a series of 10 reviews which you can access on our blog.
The whole series is based on the premise of transparency and honesty and none of these reviews are sponsored. They are just our way to give props to the best teams out there working on making annotation easier for AI teams, and to share some of the know-how that we have been accumulating over the past few years as a professional annotation company.
As in previous reviews, our parameters are:
- project management
If you have additional questions or want to get in touch with us to beta test or feature your tool in an upcoming article, feel free to email us at firstname.lastname@example.org!
Created in 2018 as an annotation platform by California-based tech entrepreneur Anthony Sarkis, Diffgram has evolved into a dataset management platform for AI, making it easy to control, store, and retrieve data. Throughout the past couple of years Diffgram has made great progress always with its customers in mind and it may have found its unique niche.
Currently the platform offers a free Explorer plan with limited labels, as well as Teams and Enterprise plans which come with additional automation features and priority technical support. Pricing is monthly and is available by demand.
The formats supported are both images and video (up to 4k) while the export can be done in both JSON format as as TF records of the images and annotations. Both operations can be done from the UI as well as the SDK/API.
Managing annotation projects through Diffgram is quite straightforward. User roles include Admins, Editors, Viewers, and Annotators (even though there is no way to set up ‘Supervisor’ or ‘QC’ roles). In addition, Diffgram can be integrated with external annotation workforce providers as ‘Controllers’.
Data is available to annotators from a shared pool, so faster annotators get access to more work. They can either ‘complete’ a task or ‘defer’ it which is quite useful in difficult annotation projects. The process for annotator management includes adding ‘Guides’ (using Markdown) for annotators working on a specific task as well as setting up ‘Awards’ which can be either ‘required’ in order to get access to a task (e.g. ‘Bounding boxes level 1’) or ‘granted’ by completing the task.
Statistics are really customizable and reports can be generated on the level of instances, files, events, and tasks, which can be grouped by date, user, task, and label.
Diffgram’s mission is to make data management easier for data science teams, considering the amount of time spent on compiling datasets, getting them annotated, extracting and transforming the various versions of the datasets, and iterating through the whole process again.
Through key integrations with Google GCP and Amazon AWS, users can connect Diffgram to their data source and set up long term Syncing so that any new files added to the dataset can go directly for annotation. In the same way, data which has been annotated in one task can be funneled to another task by copying or moving it.
The event-triggered Sync feature looks quite promising because in the future it might enable customized non-linear data flows as well as conditional relationships between datasets and tasks. The possibilities for setting the perfect data flow for each user’s needs then become endless – so we will be closely following Diffgram’s updates!
Hope this was helpful! If you are working on an AI project and are currently reviewing which tool might be the most appropriate for it, get in touch with us and we would be happy to have a call and advise you on the best way to build your pipeline.