Content Type Hub Packages

Every now and then I like to spend some time understanding the internals of some of the various components that make up SharePoint. This week, while troubleshooting an issue at a customer’s, I decided to crack open the Content Type Hub to see how it is exactly that Content Types get published down to subscriber site collections. In this article I will explain in details the process involved in publishing a Content Type from the Content Type Hub to the various site collections that will be consuming them.

First off, let us be clear. The Content Type Hub is nothing more than a regular site collection onto which the Content Type Syndication Hub feature has been activated on (site collection feature).

The moment you activate this feature onto your site collection, a new hidden list called “Shared Packages” will be created in the root web of that same site collection.

The moment you activate the feature, that list will be empty and won’t contain any entries. You can view the list by navigating to http://Content Type Hub Url/Lists/PackageList/AllItems.aspx.

However, the moment you publish a Content Type in the Hub, you will see an entry for that Content Type appear.
Publish a SharePoint Content Type
SharePoint Content Type Hub Package

In the two figures above, we can see that we have publish Content Type “Message” which has an ID of 0x0107. Therefore the entry that gets created in the Shared Packages list has that same ID value (0x0107) set as its “Published Package ID” column. “Pulished Package ID” will always represent the ID of the Content Type that was recently Published/Updated/Unpublished and which is waiting to by synchronized down to the subscriber site collections. The “Taxonomy Service Store ID” column, contains the ID of the Managed Metadata Service that is used to syndicate the Content Type Hub changes to the subscriber sites. In my case, if I was to navigate to my instance of the Managed Metadata Service that takes care of the syndication process and look at its URL, I should be able to see that its ID matches the value of that column (see the two figures below).

SharePoint Taxonomy Service Store ID
Managed Metadata Service Application ID

The “Taxonomy Service Name” is self explanatory, it represents the display name of the Manage Metadata Service Application instance represented by the “Taxonomy Service Store ID” (see the two figures below).

SharePoint Taxonomy Service Name
SharePoint Managed Metadata Service Application Name

The “Published Package Type” column will always have a value of “{B4AD3A44-D934-4C91-8D1F-463ACEADE443}” which means it is a “Content Type Syndication Change”.
SharePoint Published Package Type

The last column, “Unpublished”, is a Boolean value (true or false) that indicates whether or not the operation that was added to the queue is a Publish/Update, in which case that value will be set to “No”, or if it was an “Unpublish” operation, in which case the value would be set to “Yes”. The two figures below show the results of sending an “Unpublish” operation on a previously published Content Type to the queue.
SharePoint Unpublish a Content Type

Now what is really interesting, is that even after the subscriber job (Content Type Subscriber timer job) has finished running, entries in the “Shared Packages” list persist. In fact, these are required for certain operations in the Hub to work as expected. For example, when you navigate to the Content Type Publishing page, if there are no entries in the “Shared Packages” list for that content type, you would never get “Republish” and “Unpublished” as an option. The page queries the list to see if there is an entry for the given Content Type that proves it was published at some point before offering the option to undo the publishing.

To better demonstrate what I am trying to explain, take the following scenario. Go to your Content Type Hub and publish the “Picture” Content Type. Now once published, simply go back to that same “Content Type Publishing” page. You will be presented with only two options: Republish or Unpublish. The “Publish” option will be greyed out, because SharePoint assumes that because you have an entry in the Shared Packages list marked with “Unpublished = No”, that the Content Type has already been published. Therefore you can only “Republish” or “Unpublish” it. No navigate to the “Shared Packages” list and delete the entry for that Content Type. Once the entry has been deleted, navigate back to the Content Type Publishing page for the Picture Content Type. The “Publish” option is now enabled, and the “Republish” and “Unpublish” ones are disabled. That is because SharePoint couldn’t find a proof in the “Shared Packages” list that this Content type has been published in the past.

Also, if you were to publish a Content Type, and later unpublish it, you would see two entries in the “Shared Packages” list (one for publish one for unpublish). The Unpublish operation simply updates the existing entry in the “Shared Packages” list and sets its “Unpublished” flag to “Yes”.

If you were to create a Custom Content Type, publish it, and then delete it. SharePoint is not going to automatically remove its associated entry in the “Shared Packages” list. Instead, the next time the “Content Type Hub” timer job runs, it will update the associated entry to set its “Unpublished” flag to “false”. Meaning that we want to make sure that deleted Content Type never makes it down to the Subscriber Site Collections.

How does synchronization Works

By now you are properly wondering how the Synchronization process works between the Hub and the Subscriber Site Collections if entries are always persisted in the “Shared Packages” list. The way this process works is actually quite simple. The “Content Type Subscriber” timer job is the one responsible for that operation. By default that timer job runs on an hourly basis, and indirectly (via Web Services) queries the “Shared Packages” list to retrieve all changes that have to be synchronized. The root web of every Site Collection that subscribes to the Content Type Hub exposes a property called “metadatatimestamp” that represents the last time the Content Type Gallery for that given Site Collection was synchronized. The following PowerShell script can help you obtain that value for any given subscriber Site Collection.

$url = Read-Host "URL for your subscriber Site Collection"
$site = Get-SPSite $url
Write-Host $site.RootWeb.Properties["metadatatimestamp"]

When the “Content Type Subscriber” timer job runs, it goes through every Subscriber Site Collection, retrieve its “metadatatimestamp” value, and queries the “Shared Packages” list passing that date stamp. The queries then returns only the list of entries that have their “Last Modified” date older than that time stamp. Upon receiving the list of changes to sync, the timer job retrieves the Content Type information associated with the listed changes from the Hub and applies them locally to the Subscriber Site Collection’s Content Type gallery. Once it finished synchronizing a Site Collection, it updates its “metadatatimestamp” to reflect the new timestamp.

If you really wanted to, you can make sure that every single Content Type listed in the “Shared Packages” list be synchronized against a given Site Collection by emptying the “metadatatimestamp” property on that given site. As an example, when you create a new Site Collection, that root web won’t have that value set and therefore every Content Type that was ever published in the Hub will make its way down to that new Site Collection. Using the interface, you can also blank out that property by going to the Content Type Publishing page and selecting the option to “Refresh all published content types on next update”. All that this option does is empty the value for that property on the site.

Refresh all published content types on next update

Document Sets

Let’s now look at a more complex scenario (which is really why I started taking a closer look at the Publishing process in the first place). Let us investigate what happens if we publish Content Types that inherit from the Document Set Content Type. In my example, I’ve created a new custom Content Type named “DocSetDemo”. This custom Content Type, since it inherits from its Document Set parent, defines a list of Allowed Content Types. In my case, I only allow for “Picture” and “Image” as Allowed Content Types.
Allowed Content Types

The moment you go and try to publish this custom Content Type, you will get the following prompt, which let’s you know that every Allowed Content Types identified within your custom Content Type will also be published.

What that truly means is that not only is SharePoint going to create an entry in the “Shared Packages” list for your custom Content Type, it will also create one for every Allowed Content Type identified. In the figure below, we can see that after publishing my custom Content Type “DocSetDemo”, which has an ID of 0x0120D5200049957D530FA0554787CFF785C7C5C693, there are 3 entries in the list. One for my custom Content Type itself, one for the Image Content Type (ID of 0x0101009148F5A04DDD49CBA7127AADA5FB792B00AADE34325A8B49CDA8BB4DB53328F214) and one for the Picture Content Type (ID of 0x010102).

How to use the ReverseDSC Core

The ReverseDSC.Core module is the heart of the ReverseDSC process. This module defines several functions that will help you dynamically extract the DSC configuration script for each resource within a DSC module. The ReverseDSC Core is generic, meaning that it applies to any technology, not only SharePoint. In this blog article I will describe in details how you can start using the ReverseDSC Core module today and integrate it into your existing solutions. To better illustrate the process, I will be using an example where I will be extracting the properties of a given user within Active Directory using the ReverseDSC Core.

Getting Started

If you were to take a look at the content of the ReverseDSC.Core.psm1 module (https://github.com/NikCharlebois/SharePointDSC.Reverse/blob/master/ReverseDSC.Core.psm1), you would see about a dozen functions defined. The one we are truly interested in is the Export-TargetResource one. This method takes in two mandatory parameters, the name of the DSC resource we wish to “Reverse”, and the list of mandatory parameter for the Get-TargetResource of that same resource. The mandatory parameters are essential because without them, the Get-TargetResource is not able to determine what instance of the resource we wish to obtain the current state for. The third optional parameter lets you define a DependsOn clause in the case the current instance depends on another one. However, let us not worry about that parameter for our current example.

As mentioned previously, for the sake of our example, we want to extract the information about the various users in our Active Directory. Active Directory users are represented by the MSFT_xADUser resource, so you will need to make sure the xActiveDirectory module is properly installed on the machine you are about to extract the information from.

Let us now take a look at the Get-TargetResource function of the MSFT_xADUser resource. The function only requires two mandatory parameters: DomainName and UserName.

Therefore we need to pass these two mandatory parameters to our Export-TargetResource function. Now, in my case, I do know for a fact that I have a user in my Active Directory named “John Smith”, who has a username of “contoso\JSmith”. In my case, I have a local copy of the ReverseDSC.Core.psm1 module located under c:\temp I can then initiate the ReverseDSC process for that user by calling the following lines of PowerShell:

Import-Module -Name "C:\temp\ReverseDSC.Core.psm1" -Force
$mandatoryParameters = @{DomainName="contoso.com"; UserName="JSmith"}
Export-TargetResource -ResourceName xADUser -MandatoryParameters $mandatoryParameters

Executing these lines of code will produce the following output:

Since the Export-TargetResource function simply outputs the resulting DSC resource block as a string, you would need to capture it in a variable somewhere and manually build your resulting DSC configuration. The following modifications to our script will allow us to build the resulting Desired Configuration Script and save it locally on disk, in my case under C:\temp\:

Import-Module -Name "C:\temp\ReverseDSC.Core.psm1" -Force
$output = "Configuration ReverseDSCDemo{`r`n    Import-DSCResource -ModuleName xActiveDirectory`r`n    Node localhost{`r`n"
$mandatoryParameters = @{DomainName="contoso.com"; UserName="JSmith"}
$output += Export-TargetResource -ResourceName xADUser -MandatoryParameters $mandatoryParameters
$output += "    }`r`n}`r`nReverseDSCDemo"
$output | Out-File "C:\Temp\ReverseDSCDemo.ps1"

Running this will generate the following DSC Configuration script:

Configuration ReverseDSCDemo{
Import-DSCResource -ModuleName xActiveDirectory
Node localhost{
xADUser baf586dd-3c2c-4131-9267-d4d8fb1d5d01
{
CannotChangePassword = $True;
HomePage = "http://Nikcharlebois.com";
DisplayName = "John Smith";
Description = "John's Account";
Notes = "These are my notes";
Office = "Basement of the building";
State = "Quebec";
Fax = "";
JobTitle = "Aquatic Plant Watering";
Country = "";
Division = "";
Initials = "";
POBox = "23";
HomeDirectory = "";
EmployeeID = "";
LogonScript = "";
GivenName = "John";
EmployeeNumber = "";
UserPrincipalName = "jsmith@contoso.com";
ProfilePath = "";
StreetAddress = "55 Mighty Suite";
CommonName = "John Smith";
Path = "CN=Users,DC=contoso,DC=com";
HomePhone = "555-555-5555";
City = "Gatineau";
Manager = "CN=Nik Charlebois,CN=Users,DC=contoso,DC=com";
MobilePhone = "";
Pager = "";
Company = "Contoso inc.";
HomeDrive = "";
OfficePhone = "555-555-5555";
Surname = "Smith";
Enabled = $True;
DomainController = "";
PostalCode = "J8P 2A9";
IPPhone = "";
EmailAddress = "JSmith@contoso.com";
PasswordNeverExpires = $True;
UserName = "JSmith";
DomainName = "contoso.com";
Ensure = "Present";
Department = "Plants";
}
}
}
ReverseDSCDemo

Executing the resulting ReverseDSCDemo.ps1 script will generate a MOF file that can be used with PowerShell Desired State Configuration.

Recap

The ReverseDSC Core allows you to easily extract all the parameters of an instance of a resource by simply specifying the few mandatory parameters its Get-TargetResource function requires. It does not take care of scanning all instance on an environment, this should be left to the user to create that script. It is not to say that we will not be looking at ways of automating this in the future, but the current focus is to keep it as unit calls.

For a DSC Resource to work with the ReverseDSC Core, you need to ensure it has a well written Get-TargetResource function that returns the proper parameters. This should already be the case for any well written resources out there, but it is not always the case. In the past, most of the development effort for new DSC Resource was put on the Set-TargetResource function to ensure the “Forward” DSC route was working well. However, in order for the whole DSC process to properly work, it is crucial that your Get-TargetResource function be as complete as possible. After all, the Test-TargetResource function also depends on it to check whether or not your machine has shifted away from its desired state.