Overview
As part of my new role within the Microsoft Graph product group I am responsible for helping our partners and customers be successful with the Microsoft Graph. Specifically, I focus on the Microsoft Graph Data Connect area. Microsoft Graph Data Connect, or MGdc for short, is a component that lets you extract Microsoft 365 data in bulk. In the past, customers have developed creative approaches to extract large amount of data from the Graph. While the Graph supports batching requests via the /$batch endpoint, applications are still subject to throttling limits. MGdc provides a different approach and lets you export your Microsoft 365 data in bulk without having your app make thousands of calls to the APIs. Exported data will be stored in an Azure Storage Account or Azure Data Lake Storage as a raw JSON file (sample files provided below).
Microsoft Graph Data Connect makes it easier for organizations to generate Insights dashboards, train Machine Learning models and do forecasting based on the large amount of data acquired. Once the data has been exported, developers, data analysts and system integrators can ingest it with tool such as PowerBI, Cognitive Services and Azure Synapses to generate powerful and complex solutions.
Support for the following data sets is currently available in MGdc, with support for Teams Chat coming soon:
- Calendar Views (Schema Information | Sample Dataset)
- Contacts (Schema Information | Sample Dataset)
- Direct Reports (Schema Information | Sample Dataset)
- Events (Schema Information | Sample Dataset)
- Mailbox Settings (Schema Information | Sample Dataset)
- Mail Folders (Schema Information | Sample Dataset)
- Messages (Schema Information | Sample Dataset)
- Sent Items (Schema Information | Sample Dataset)
- Users (Schema Information | Sample Dataset)
Graph REST APIs vs Microsoft Graph Data Connect
The first question that comes to everyone’s mind when assessing MGdc is why should you use it over the Graph REST APIs that have been around for a long time. The answer to this is that you don’t necessarily have to use one over the other, they can be complimentary of one other; each one serves a very different purpose. The REST APIs were not designed to allow organizations to capture large amount of data in bulk, such as all emails in an organization. Trying to achieve this using them will end up taking a very long time and will expose your solution to throttling limits. On the other end, if you are trying to obtain information about events as they happen, Data Connect has some overhead and can take up to 45 minutes before being able to export an entire DataSet. You could however build a solution that would leverage Data Connect to do an initial export of all emails in your organizations up to a certain date and then leverage the REST APIs to capture emails moving forward in a transactional fashion.
Another important difference between the Graph REST APIs and MGdc is how they implement privacy around the data being retrieved. If an application is granted access to read emails via the Graph REST APIs, it will automatically be allowed to retrieve all fields pertaining to the emails, including the content and information about the attachments. With MGdc, when you request access to a dataset, you can specify what properties you wish to retrieve and the customer has to implicitly approve the request. You can therefore specify what properties you want to retrieve . For example, you could specify that only the From and To fields of email messages be extracted, enforcing security on your data by giving you full control over what to expose to an application. It is also very important to note that data never leaves your tenant, meaning that it is exported from your Microsoft 365 environment and stored within the associated Azure tenant’s storage, in the region you specify.
The table below is an attempt at summarizing when you should consider using one over the other:
Scenario | Graph REST APIs | Microsoft Graph Data Connect | Rationale |
---|---|---|---|
Need to retrieve information about a specific user or group | X | X | Graph REST APIs allow for delegated permissions, allowing your app to only retrieve information about the authenticated user. MGdc also lets you target specific groups to extract the information from. |
Need to retrieve information about all users in your tenant | X | While you could achieve this with the Graph REST APIs, you would hit throttling limits and it would likely take a very long time to complete. By default, MGdc lets you extract information for all users in your tenant. | |
Need to retrieve large amounts of data | X | MGdc is optimized for big data extractions and is not subject to any throttling. | |
Need to update large amount of data | MGdc only supports extracting the information (GET operations). While you can update data with the Graph REST APIs, it is not recommended to update large amount of data using them due to throttling limits in place. | ||
Need to ensure data privacy and prevent access to certain properties | X | MGdc supports selecting what fields are to be extracted from the entities and only the specified fields are accessible. The REST API is granted access at the entity level and automatically gets access to extract all fields. | |
Data cannot leave the customer’s tenant and GoLocal region | X | When using MGdc, the data is only extracted in an Azure Storage account located in the customers’ tenants. |
For additional information on Microsoft Graph data connect, including pricing details, please refer to https://azure.microsoft.com/en-us/services/graph-data-connect/. Expect to see more blog posts around this topic over the next few weeks.