Friday, May 18, 2007 10:29 PM
S.DS and large data sets
I’m not a great programmer but from time to time I have to write some piece of code – this is mostly related to my day job and MIIS projects or creating ad hoc tools for Active Directory projects. Most of these projects are being created using .NET 2.0 as this is environment we are working with and to be honest it is very comfortable environment to create such solutions.
When I dealt with AD I used in most cases System.DirectoryServices library, which provides reach interface to work with directory data. This interface is built around ADSI and provides developer with reach set of interfaces to perform operations against Active Directory \ ADAM.
With S.DS we’ve built (first version was created by subcontractor with major re-design and functionality added later by myself) an Extended MA for MIIS which has a functionality of managing Terminal Services and group membership for a user. To be honest I think that this is nice piece of work and I shared this with some other consultants. We have deployed this agent in middle size organizations (15+ k of managed objects) with success however one of consultants (sorry for this Mike ) tired to use this code to read data about 0,5 mln (billion – US version ) objects and came across really serious performance problems.
I’ve tried to debug and optimize my code and I find few areas to optimize it, however it was still consuming a lot of memory so I’ve started to look at it from different angle. After asking few wise people I gave a try to the same task but not with S.DS – I’ve used System.DirectoryServices.Protocol code which ~Eric has shared with me as an example.
S.DS.P is new namespace introduced in .NET 2.0 which is not using ADSI interface under the hood but directly utilizes LDAP API for directory operations. Because of this writing a code is getting slightly more complicated but it is allowing dropping ADSI usage and allows code to operate with any LDAP directory.
So I’ve created two pieces of code with S.DS and S.DS.P and I’ve performed a simple task:
- Query directory (Windows 2003 AD but it doesn’t matter) with simple query which returns about 150k of objects
- For each objects I’ve requested only DN attribute
- I’ve enumerated through result set just to do something with results.
What I found out is not surprising when you will think about it for a while – S.DS is very heavy on memory in such task, while S.DS.P is very reasonable in memory usage. Below You will find data from performance monitor, which represents the size of working set for application written with S.DS and S.DS.P respectively during the execution of this task .
I was looking for a way to free some memory during performing this operation with S.DS but I can’t find any.
Bug ??? No, its not. What You have to remember is that S.DS is using ADSI interface and COM objects under the hood – this means that for any S.DS object some COM objects are also being created. In this particular case, when my query has resulted in large number of SearchResult objects being returned for each of this objects some ADSI object has to be used and in the way in which SearchResultCollection is being designed these objects are available for developer to be used during the life time of collection. This causes such memory usage you can see on the graph.
To clarify … I think that S.DS is a great library and I’m not intend to stop to use it. Learning for me … when I can expect that large number of directory object may be returned by a query using S.DS might not be a good solution and it has to be carefully tested how Your code will handle this (maybe we have missed this part during development phase - our fault). It is better to spend some more time on development with S.DS.P and avoid potential problems with performance which may occur. As in many other situations – You have to know the tool, its advantages and limitations and then decide when to use it. I’m describing it here to just bring this to the attention of others who may fall into the same situation.