As explained in my previous post, What is Microsoft Graph Data Connect & When to Use It, MGdc allows organizations to export Office 365 data in bulk without facing any throttling limits whatsoever. In this blog post, I will guide you through the minimal set of steps required to be able to extract your first data set from Office 365 using Microsoft Graph data connect. If you wish to automate the deployment steps covered in this article, there is a one-click deployment button available at the bottom of the article which automates the deployment with an ARM template.
Prerequisites
Before you can start leveraging Microsoft Graph data connect within your organization, there are a few things you need to have in place first. While we won’t be covering each one in details as part of this article, it is very important for all 3 prerequisites listed below be in place before attempting to get started with Microsoft Graph data connect.
P1 – Create an Approval Azure Active Directory Group
With Microsoft Graph data connect, the first time you want to initiate a data export process (referred to as pipeline), someone with access to your tenant needs to approve the request. This is done via what we refer to as an Approver Group, which essentially is an Azure Active Directory security group which contains users who will be allowed to approve requests for data coming from MGdc. IMPORTANT NOTE: All users in this group, no matter what their assigned roles are will be able to approve MGdc’s requests to export the data using Exchange Online Management Shell (PowerShell). See this link for additional details on how to approve a request using Approve-ElevatedAccessRequest cmdlet.
For this purpose of this article however, I am going to show you how to approve these requests via the web based Office 365 admin portals. In order for a user to be able to approve a request using the admin portal, they need to have either the Global Reader or Global Administrator roles in your tenant, and they need to be assigned an Office 365 license. All users in the approvers group will be able to approve requests from Microsoft Graph data connect.
In my case, I have created a new security mail-enabled group named MGdc Approvers from the Office 365 admin portal in which I added two users as members:
User | License | Assigned Roles |
---|---|---|
Adele Vance | Office 365 E3 | Global Reader |
Nestor Wilke | Office 365 E5 | Global Administrator |
This group will be used at Step 1 below when we activate Microsoft Graph data connect at the tenant level.
P2 – Create an Azure Active Directory Application Registration
Microsoft Graph data connect will store the exported data in an Azure Storage Account (or Azure Data Lake Storage), but in order to do so it needs to authenticate to it using the Service Principal of an Azure Active Directory application in the tenant. Feel free to reuse any existing application you have registered in your tenant, but in my case I will be creating a brand new App Registration called MGdcAPP. The application itself does not require any permissions whatsoever. We will simply be granting its Service Principal contribute rights to our storage location later on. Make sure you take note of the Application (Client) ID for your app’s registration.
You will also be required to create a secret for your app registration. Make sure you also take note of it, since we will need to provide it along with the Application ID when creating the sink for our Microsoft Graph data connect pipeline later on.
P3 – Create an Azure Storage Account & Associated Container
Microsoft Graph data connect will need to store the exported data somewhere. As mentioned previously, that location can either be an Azure Storage Account (Blob) or an Azure Data Lake Storage (1st or 2nd Generation). For the purposes of this blog article, will will create an Azure Storage account called mgdcoverviewstore, in which we will have created a private container named mgdcdatasetstore.
One thing that is very important with the storage location is that you need to grant permissions to access your storage account to the Service Principal of the Azure Active Directory Application Registraton we created above. To do so, you can go in the Access Control (IAM) section of your storage account and grant your application registration the Storage Blob Data Contributor role.
Step 1 – Enabled Microsoft Graph Data Connect on your Tenant
The first step is for us enable Microsoft Graph data connect at the tenant level. To do this, navigate to your Office 365 admin portal and from the left menu, navigate to Settings > Org settings and from the list, select Microsoft Graph data connect.
This will bring up the Microsoft Graph data connect blade on the right hand side of the screen. To enable it, simply check the Turn Microsoft Graph data connect on or off for your entire organization and choose a default approval group. checkbox. When checking the box, a new people picker field will automatically show up. In that field, select the Azure Active Directory group with the MGdc request approvers you created as a prerequisites. Once the group has been selected, click on the Save button at the bottom.
Step 2 – Create an Azure Data Factory
Microsoft Graph data connect (MGdc) in itself is not a tool. In order to export the data from your Office 365 data, you need to use Azure Data Factory (ADF) and create a pipeline. Think of an Azure Data Factory pipeline as a workflow which defines a set of sequential or parallel activities to execute. One of the activities available as part of Azure Data Factory is called a Copy Data activity, which allows us to pull data from a given location, referred to as Source, and store it into another on, referred to as Sink. There is a direct correlation between the Source and the Sink inside of the Copy Data activity, meaning that depending on the type of Source you select, there is a filtered list of available Sinks which will be available.
If I had to summarize in one sentence what Microsoft Graph data connect is from a technical standpoint, my definition would be the following: “Microsoft Graph data connect is an available Source for the Copy Data activity in Azure Data Factory, which allows you to export data from Office 365.“. Off course it is much more than that but from a purely technical lens, that’s how I would describe it.
In order to create a new Azure Data Factory instance, navigate to your Azure portal and in the top search bar, type in data factories. Select Data factories from the results.
On the Data factories screen, click on the + Add button. In the Create Data Factory screen that appears, fill in the required information on the Basics tabs.
Switch the to Git configuration tab and check the Configure Git later checkbox. The process of automatically changing your Azure Data Factory changed to a Git repository is out of scope for the context of this blog article. Once the box is checked, click on the Review + create button at the bottom of the blade.
On the creation review screen, click on the Create button.
Step 3 – Create a Pipeline
Now that we have our Azure Data Factory created, we need to create an actual pipeline. When the creation process is completed for the data factory, click on the Go to resource button.
Once on the Overview page of your new data factory, click on the Author & Monitor button in the middle of the content section. This will launch the ADF workbench editor in a new tab.
From the workbench, click on the Author (pencil icon) in the left navigation.
From the Author screen, click on the + button beside the filter textbox, and select Pipeline from the menu.
On the new pipeline’s canvas, expand the Move & transform menu, select the Copy data activity and drag it onto the canvas.
Make sure the new Copy data activity you added to the canvas is selected so that the properties panel is displayed at the bottom. In that panel, navigate to the Source tab. In the tab, click on the + New button.
On the New dataset blade, search for Office 365, select the item, and click on the Continue button.
Back on the Source tab, click on the Open button beside the newly created Office 365 dataset.
From the dataset panel, click on the + New button beside the Linked Service drop down.
In the New linked service (Office 365) panel, in the Service principal ID provide the Application ID of the Azure Active Directory application registration you created as a prerequisite, and in the Service principal key field, provide the Application Secret you also created as a prerequisite. When done, click on the Create button.
Back on the Office 365 dataset panel, pick the dataset you wish to extract from the Table drop down menu. In my case, I will be selecting the BasicDataSet_v0.MailboxSettings_v0 dataset, which represents the settings of every mailbox in my tenant. Note: There are a few datasets that will require you to set a dat filter in order to export them (e.g. Messages, Events, etc.). It is not the case for the Mailbox Settings dataset. For a complete list of supported datasets, along with information on their schema and sample data, please refer to my previous blog article: What is Microsoft Graph Data Connect & When to Use It.
Go back to the pipeline’s properties panel, and on the Source tab, click on the Import schema button to load the list of fields to export from the selected dataset.
Now that our Source is configured, we need to go and configure our Sink. To do so, at the top of the workbench, flip back to your pipeline’s tab. Again, make sure that the Copy data activity is selected in the canvas, and in the properties blade at the bottom, select the Sink tab. From that tab, click on the + New button beside the Sink dataset dropdown menu.
In the New dataset blade that appears on the right hand side of the workbench, select the destination storage type. In my case I will select Azure Blob Storage. Once selected, click on the Continue button at the bottom of the blade.
On the Select format blade, select the only available option which is Binary and once again, click on the Continue button at the bottom.
On the Set properties blade, expand the Linked service dropdown menu and select the + New option from it.
On the New linked service (Azure Blob Storage) blade, expand the Authentication method dropdown menu and select Service Principal.Note that this is to=he only supported authentication method for Microsoft Graph data connect. In the same blade, in the Storage account name dropdown menu, select the Azure Storage Account you’ve created as a prerequisite. For the Service principal ID field, provide the Application ID of the Azure Active Directory application registration you created as a prerequisite and use the application secret you created for it as the value for the Service principal key field. When all fields are completed, click on the Create button at the bottom of the blade.
In the Set properties blade, select the container you created in your storage account as a prerequisite and click on the OK button at the bottom.
Our pipeline is now complete. All that is left to do is to publish it by clicking on the Publish All button at the top of the workbench, and then to click on the Publish button on the review blade that will open on the right hand side of the workbench.
Step 4 – Trigger the Pipeline
To initiate the data export process, we need to trigger our pipeline. You can off course define automated triggers that will automatically initiate the process at a define time or interval, but in our case we are simply going to go an manually trigger the pipeline. To do so, simply click on the Add trigger button at the top of the pipeline’s canvas and select Trigger now. Then on the pipeline run blade, click on the OK button at the bottom.
To monitor your pipeline’s execution, you can navigate to the Pipeline Monitoring page by clicking on the Monitor (radar icon) in the left navigation bar of the Azure Data Factory workbench.
Step 5 – Approve the Request
The last step of the process is to approve what we call the Privilege Access Management (PAM for shorts) request. This essentially grants access to our Azure Data Factory to go an export our Office 365 data in bulk. One very important thing here is that the user who triggered the pipeline cannot be the one approving the PAM request, even though he is part of the assigned approvers group. Someone else from that group needs to do it. In my case, I will be connecting to my Office 365 admin portal as Adele Vance, one of the two users in the Approver groups we created as part of the prerequisites.
Just like for when I wanted to go and enable Microsoft Graph data connect for my tenant in Step 1, from the Office 365 admin portal I need to navigate to Settings > Org settings. From the list, select Microsoft Graph data connect. From the Microsoft Graph data connect blade, click on the Privileged Access Management portal link.
On the Privileged Access Management portal, you will see a list of all PAM requests, including a pending one for the pipeline execution we just triggered. Click on that entry to launch the Data Access Request blade. On that blade, review the details of the request and click on the Approve button at the bottom. This will allow the pipeline’s execution to finish.
Step 6 – Review the Exported Data
Any pipeline execution in Microsoft Graph data connect take between 35 and 45 minutes to complete. You also need to factor in the time it takes to do the initial PAM request’s approval, which you only need to do the first time the pipeline gets executed. You can always watch the pipeline’s execution status on the Monitor Pipeline page of your Azure Data Factory’s instance.
Once your pipeline successfully completes, navigate to your Azure Storage Account to view the raw data exported as a binary file containing JSON.
The next logical step would then be for you to build an application or some analytics and insights dashboards that consumes this raw data!
Azure Resource Management Template
If you wish to automate the deployment of the above solution, you can use the following One-Click deployment ARM template: