Technology - Software Development

Sitecore API Processing a big collection of secured items can harm your performance

May 19, 2015
Ruben De Leon

Have you ever dealt with long waits while accessing your Sitecore pages, the ones that for instance, basically renders items using a GridView or a Repeater? When you work with Sitecore almost everything becomes an Item, and more than displaying information from a single item you may also want to list a collection of items for any sort of routine based on your existing requirements. And what if these kind of delays are only happening at your QA/Production Environment and not at your local workstation? Well in most cases this kind of issue is more about Security and the way we use to process the items through the Sitecore's API.

In this article I'll try to give you guys some ideas on how we can avoid those long delays while security is set to Sitecore's items. Also we are going to do a quick walk through some of the Sitecore's API methods to understand why the security can actually harm the performance, and then with that knowledge, be able to find the best/optimized code for our custom routine(s).

Case Study

There is an existing page on our website that basically lists all the articles by Year and Month using a Repeater control. The repeater and the logic associated are at a sublayout called "ArticlesCtrl.ascx", being this an UserControl associated to our Articles.aspx page. The list of articles are rendered in an accordion-like fashion, meaning when you come to the page, you first see a list of years and when you uncollapse one of those rows you will get the list of months right underneath. Then you can finally see the articles for the selected month by uncollapsing one or more month's rows. One thing to be aware of is the fact that there are articles for internal employees only and others for external customers meaning the articles items are being secured by using Sitecore's security. So the client reported a Bug associated to that page, saying the page is taking around 30 seconds to load, something that's not acceptable by our requirements.

Solution

The first thing we want to do is to discard whether the delays could be caused by the loading of resources different than our article's page itself (i.e: javascript libraries, images, flash files, etc). We can do this y using our WebBrowser DevTools section. If we realized the aspx page is the one that is taking most of the loading time, then we want to start looking at the sublayouts.

The next step should be to measure the performance of our routines by placing time-controls/checkpoints within our sublayout's code, dividing our routines by blocks and adding a couple of lines of code to be writing the recorded timing to either the WebBrowser console/page or to a log file. For example:

DateTime startDate = DateTime.Now;
...
.. code to be measured.. 
...
Response.Write("Block#1: " + (DateTime.Now - startDate).ToString("hh:mm:ss"));

The above is the simplest way to find the routine (or piece of routine) that is causing the longest delay. It doesn't looks like a complex implementation and I'm pretty sure most of you have done something alike before, but if not, give it a try and you will realize it's not a big deal at all. So at this point, let's say we find a piece of code like below that is causing a delay of 25 secs:

// Getting the article's root-item, while its children are the articles we need to proccess
Item articlesRootItem = SiteCore.Context.DataBase.GetItem(ARTICLES_ROOT_ITEM_ID);

// This executes a LinQ query over the articles and gets all the years (non repeated)
Item yearsList = GetYears(articlesRootItem);

repeaterCtrl.DatSource = yearsList;
repeaterCtrl.DataBind();

So at this point, you may be wondering: Why is this taking so long since we only have around 2000 article's items? It's working just fine on my local workstation, even with double of those items it's loading the page almost instant for me. Let's then go down the possible issues that can be causing this:

1. Security:

When you have security enabled for your website, and therefore to your Sitecore's items it means the CMS does validations while retrieving/creating the item's objects for you. Those validations involve, in most cases, the Active Directory Users and Groups and it normally uses to take a little while to do so. So it means that, without security applied you will got better/quicker loading-times, so what can we do about it? The first thing we can do, is to enclose the whole block of code within the following using clause:

        using (new Sitecore.SecurityModel.SecurityDisabler())

        {

        ....

        }

With that clause above, we are just asking sitecore to avoid any security check over the item(s) we access. So then, by running the app at the environment where issues raised up, we can say whether the timing decreases and therefore be completely sure we are in presence of a delay due to the security. At this point we know the security check is our main issue, and we need to figure out how to fix it. How then? Well, the solution I would say basically stands on: a) Understand how Sitecore retrieves/creates items and b) The way we write code, as much optimized as we can (we will look at this in the next topic "Non Efficient Code").

How Sitecore retrieves/creates items:
When we executes the GetChildren() method for any item, what it pretty 
much does is:

 

You probably realized about the securityCheck variable, the one that is passed thru while requesting the creation of the child's collection.Then, if you continue going deeper, you will get to the following method:

 

So at the code above, we can see how another loop is added to the party for our retrieval routine and same way some other checks that if you go even deeper will meet. So yes, there are several actions associated than just a simple database query for a couple of items, and if you add the fact that perhaps you could probably be dealing with a synced active directory with thousands of users and groups, it could get even worst.

 

2. Non Efficient Code

What we do the most while coding, is that we use to underestimate things entering into performance issues almost everyday. In the best case scenario we are good, because we got a good server with plenty of ram memory and processing but in the common/worst scenario we do deal with important issues like the Study Case above. The fact that we are dealing with about 2000 articles, and lets say we know how they are going to be growing up (i.e: one article a week), makes us think that they are going to be around 55 to 60 articles a year, meaning it should not be a problem at all to be retrieving such amount of items and get them displayed within a page. But, on the other hand, things went bad when we missed one secret "variable": Sitecore Security!
 
So how can we turn things out to make the above routine work in an efficient way?
 
Well, the very first issue from the code above and what perhaps we don't actually see at the first glance, is the fact that while we actually thinks we are in presence of a routine with a O(N) complexity (i.e: getting a list of N items and associate them to a single repeater), we are in fact dealing with a O(N^3) complexity because there are another two nested repeaters (Months and Articles). The point is that, during the DataBound of the "repeaterCtrl" repeater, per each row representing a year we got two other nested routines that are executed, one for the Months and another one for the Articles.
 
Now that we realized about the above, we can deduce that we need to "avoid as much as possible any API access to retrieve secured items" from our Sitecore CM. So guys, at this point this is basically a matter of being improving our code and there are lot of ways to do so, a couple of them could be:
- Retrieve items on demand: 
This is the most common strategy out there and it's far from a bad one. At the code above, we are retrieving all the Years/Months/Articles collection at once (i.e: Page's loading) while we can easily only retrieve the years and then, when the user tries to uncollapse one of those rows, we are going to retrieve the months and so on. Yeah, you may be right, we are basically splitting out the hard work that may not work fine for all case scenarios, but it could be better in some cases.
- Using the Session/View State: 
We can either use this alone or as a complement for the above option, meaning that every time we retrieve a fresh/new collection of articles based on Year/Month, we can store them in our Session/View state in order to avoid any further API access for these already processed items.
- Splitting up public articles from secure ones: 
You may have a collection of articles that are going to be tagged as "public", and perhaps this collection is going to be the most important and huge one meaning that, if we can get rid of the security for those items then we would be improving the Articles page a lot. So let's put those public articles within a folder's hierarchy different than the secure one and then, when we retrieves them from our code-behind we can basically use the "using (new SecurityDisabler()){...}" clause and save us from a lot of processing time.

 

In the end...

Sitecore is an awesome CMS that gives us tons of flexibility for any kind of WebSites but, it do not means it's magical rather perfect ;-). We always need to pay attention to the details and in this case, for what matters to this article, we need to be aware of the fact that while working with a collection of "Secured Items" we would probably be dealing with performance issues. So we need prepare/design our best implementation's strategy for those scenarios. In summary, there are three important tips you want to remember when working with Sitecore API and Secured Items: 

  1. You want to avoid, as much as possible, any API access to a collection of items when they are secured 
  2. You need to be careful with your code, try to think a little bit "out of the box" for every single (easy-looking) peace of code you are about to write
  3. The APIs, Sitecore's in this case, uses to be a bunch of easy-to-use libraries that reduces our coding time/complexity but, and this is important, you want to know as much as possible from the way it works from inside in order to use it the best way you can.