The Engine Room
The engine room is a place for people to contribute ideas on the development of Sharepoint, Visual Studio and information technology.

Team Build and BVTs: The Gotchas

There are a couple of things that people forget or miss when setting up Team Build to run a series of Build Verification Tests.

·         Test tools need to be installed on the Build Server: Get you Visual Studio installation media. The only option that needs to be installed is “Team Developer and Tester tools”.

·         Once you have setup a series of tests and they are added to your “Test Manager”, you still need to create the BVTs in a new “Test List” or the build you configure won’t pick them up.

And just in case, you can run tests without an .vsmdi file. Check out this blog

Data Mining 101: Fact Tables

The information in these posts is a summation from the book Professional SQL Server Analysis Services 2005 with MDX (Harinath, S., and S.R. Quinn) - credit where credit is due.

The biggest confusion I had when reading about Data Mining was about Facts and Dimensions. When jumping in head first with some of the SQL Server Analysis Services (SSAS) examples, the tables in the example databases are nicely suffixed with _Dim and _Fact where appropriate, allowing you to select Dimensions Facts easily when prompted. But that doesn't necessarily translate to understanding what they are.

A key part of this comes from the last post about how the data sources used for data mining are actually re-architected (i.e. do not have the same schema as the original transactional system). The data is structured “in such a way as to facilitate later numeric manipulation" (Harinath & Quinn).

Fact tables are at the core of data analysis because they contain the values which will actually be 'crunched' - literally such as how many bikes were sold or records of outgoing expenses in the flat. Usually, as one would expect, these values are 'numeric' in order to do number crunching, but this depends on the type of data you are analysing.

I'll touch on how Fact tables differ from Dimension tables next week, but for the moment, the key idea is that Fact tables contain indisputable 'facts'. Even if you added more information to a row in the Fact table (e.g. qualifying that the $10 withdrawn from the Flat account on 23/7/07 was in fact to pay a bill), it only provides more information, but does not alter the unique 'fact' expressed by the data in the row (e.g. $10 Debit on 23/7/07)

Fact tables are often 'summarised' by pre-aggregating the data in some form (e.g. aggregating the total debits and credits per month would result in rows of $25.35 Debit in August for example). This allows improved query performance as the data has already been aggregated and the analysis engine does not have to perform that action. The Microsoft MSDN article linked below discusses some of the options about aggregating data and where it should be stored in the database.

Some useful links that have some more information and good examples:

Microsoft MSDN: Designing a Data Warehouse - Fact Tables: http://msdn2.microsoft.com/en-us/library/aa905984(SQL.80).aspx

DatumResources.com: Data Warehousing Objects - Fact Table, Dimension Table, Heirarchy Table: http://datumresources.com/moodle/mod/forum/discuss.php?d=36

Wikipedia: Fact table: http://en.wikipedia.org/wiki/Fact_table

Active Reports - .Value vs .Text Properties

It seems that some of the basics in constructing a good ActiveReport are often misunderstood, resulting in frustration on behalf of the developer in the form of 'Why won't this work?'. One of the most common mistakes I come across in ActiveReports is the misuse of the .Text property when dynamically assigning numerical values to textboxes with an OutputFormat property assigned.

 

About the OutputFormat Property

When constructing an ActiveReport, use the OutputFormat property of the textbox to specify any formatting for percentages, dates, currency or general numbers.

e.g.

Percentage     :    0.0%        formats as     78.5%

Currency    :    $#,##0.00    formats as    $52,187.23

Number    :    #,##0        formats as     2,548

         

This will apply the formatting string to the value which has been placed in the textbox AFTER the value is assigned, guaranteeing a consistent formatting result regardless of the value in the textbox. In the example above, I have specified a two decimal place currency format, complete with commas.

 

The Problem

Having already declared an OutputFormat property of a textbox specifying formatting, the calculated value is assigned to the textbox in the behind-code using a txtMyTextBox.Text statement.

 

int iResult = Employee.GetEmployeeTotals();

txtMyTextBox.Text = iResult.ToString();

 

This code shows a typical example of retrieving the value we want, and placing it in the text box. However, because the property .Text was used, this will literally remove all text in the box and replace it with the string. Any formatting strings, such as that specified in the OutputFormat, will be overridden and therefore not applied.

In this example, our result would be displayed without any decimal places, a currency sign or any commas.

To the frustration of the developer, it would appear as if the specified OutputFormat string has not been applied. Even adding a line in the code re-declaring the OutputFormat will not achieve anything.

The next inevitable step is modifying the code as follows.

 

int iResult = Employee.GetEmployeeTotals();

txtMyTextBox.Text = ((double)iResult).ToString("c");

 

This converts the integer result to a double to give us our two decimal places, and then uses the C# number formatting specifier of "c" which gives us currently complete with dollar sign and commas. We have now duplicated our ActiveReport OutputFormat property in the C#.

 

The Solution

So instead of using the .Text property, use the .Value property, yes, even with working with strings.

The difference is that the OutputFormat formatting string will be applied post value-assigning, ensuring a consistent format, and removing the need to place redundant string formatting in the C#.

In addition, it ensures that the formatting style you have declared in the ActiveReport is the one in use. Otherwise you could declare a formatting style with two decimal places in the OutputFormat, but format the string to a non-decimal format in the C#. In the end you would wonder why the textbox keeps displaying a 0 decimal answer, when you have specified two decimals.

Storing files in SQL Server 2008

There are many occasions where you need to store files along with metadata inside a database. An example of this would be a document repository where metadata and documents are all stored and managed by a system backed with a database.

Before SQL 2008 there were two methods to store the actual binary data for a file that you want to be managed by your system. Storing the file in the database as a binary object, or storing a URL to the file in the database which points to the file.

There are pros and cons for both which I will quickly touch on.

Storing binary objects in the database: 

Pros:     

·         Database backups will backup files stored in the database.

·         Transactional consistency for your stored files.

Cons:

·         Can potentially take up a large amount of room on the database server.

·         Large amount of disk IO to access file data can slow performance over the rest of the database.

·         Higher cost as database disk storage is usually more expensive than fileserver storage.

·         Harder to access the stored files.

 

Storing a pointer in the database to a file on a file system:

Pros:

·         Database can be kept small.

·         Separate file server for serving files will speed performance.

·         Lower load on the database server.

·         Can access the files through explorer.

Cons:

·         Need to manage two servers.

·         Separate backup.

·         Can lead to more data inconsistencies if files are ever moved/renamed.

 

A general rule for deciding between the two options is to store files inside the database for small systems, and move files out to a separate file system for medium to large systems. Here small systems are database server with medium to low load, and storing less than 10 gigabytes in files.

With the new VARBINARY(MAX) FILESTREAM data type in SQL Server 2008 we should be able to get the best of both worlds.

With the File stream data type, SQL Server will store files on an external file system, but these files will be managed by SQL Server. Your application can access these files as you would a SQL BLOB, with INSERT, UPDATE, and DELETE statements all working as expected. At the same time you get benefits of cheaper storage, the ability to access the data on the file system, and database backups will catch the file data.

Attachments in a WSS List

Sharepoint list attachments

 

Using out of the box sharepoint functionality, it is easy to add and view attachments to list items.

 

However, a common issue in WSS 3.0 is that customised list views do not support attachments ... or, "attachement".  The error "This form was customized not working with attachement" is displayed.

 

The only supported way of storing attachments to a list item, is to use the default newform.aspx page, associated with the given list.

This works, but has a number of restrictions:

Branding

It is difficult to brand or customise the upload form. Even though an upload.aspx file is associated with the list, it is not used (by default). The actual upload form which is used, is located in the 'Layouts' directory, on the webserver; any customisation of this would change the uploading form for all sites in the site collection. This is not desirable.

Views

The default newform.aspx used a listview to display the fields associated with the list. Unfortunately, there is no easy way to change the fields that are displayed to the user; all fields associated with the list are shown by default.

Customisation

With respect to general customisation of the newform.aspx, it is also possible to view the data using dataview webparts, and custom list forms. Unfortunately, both of these approaches appear to strip support for attachments; dataview webparts do not support an upload control, and custom list forms throw the error in the screenshot above.

 

Solution

 

In order to overcome the above issues, it is fairly simple to implement a custom webpart using the Visual Studio extensions for WSS.

The general process is as follows:

 

Setup the project

Firstly, create a new webpart project, using the VSeWSS. Then, change the class declaration to inherit from

 

Microsoft.SharePoint.WebPartPages.WebPart

 

The existing class definition inherits from the standard ASP.NET 2.0 WebPart class – this ensures interoperability with existing .NET webparts, but does not provide access to all of the sharepoint WebPart features.

 

Create the code structure

The default WebPart project contains a single overridable method "Render". The render method outputs straight html, via the use of a HtmlTextWriter. This method is not useful for our purposes: delete it. Create two new methods: CreateChildControls(), and RenderWebPart() – both should be defined as "protected override void".

 

The CreateChildControls() method instantiates the ASP.net controls that you want to use – in this case, a HtmlInputFile (for the attachment). Make sure you declare the controls globally. The RenderWebPart() method adds the controls to the Html, using the format

 

myControl.RenderControl(output)

 

Add your submission eventhandler

 

In your CreateChildControls() method, add a button with an event handler – this will be used to submit the entire form. Make sure you also render this button in the RenderWebPart() method.

In the eventhandler, create a SPList object pointing to the list you want to add an item to, and add an SPListItem to the list, using:

 

    SPListItem myItem = myList.items.add()

 

Now, add any properties to the listitem using:

 

    myItem["Title"] = "{title here}";

 

And, add the attachment, using:

 

    Stream fstream = myHtmlInputFile.PostedFile.InputStream;

    byte[] contents = new byte[fstream.length];

    fstream.Read(contents, 0, (int)fstream.length);

    fstream.Close();

    fstream.Dispose();

 

    myItem.Attachments.Add(Path.GetFileName(myHtmlInputFile.PostedFile.FileName), contents);

 

Finally, make sure you commit these changes to the listitem, using

 

    myItem.Update();

 

Summary

 

Hopefully, a future sharepoint release will solve the current implementation issues with list attachments. In the meantime, you may need to either use the default sharepoint behaviour (without customisation), or create a custom webpart. If you choose to follow the latter route, the preceding instructions should provide you with all of the tools you need to get started.

 

TF30162: Task "SharePointPortal" from Group "Portal" failed

This is one of the things that has been keeping me awake at night. Out of the blue (and yes, I know there will be a reason, but right know... I’m still investigating) I lost the ability to create Team Projects. Not only it fails but it locks my account.

I know it is permissions related, I know it is WSS related but I don’t seem to find exactly where the root of the problem is. At the MSDN forums, I got the following response from:

A Team Foundation Server administrator must be a member of the following groups:

1)      Team Foundation Administrators.

2)      SharePoint Administration group in SharePoint Central Administration.

3)      SQL Server Reporting Services Content Manager

4)      SQL Server Reporting Services System Administrator.

I am a member of the Administrators group of the machine, so that shouldn’t be a problem.

1)      Checked.

2)      I made myself secondary owner of the site collection

3)      I made myself a content manager in SQL Reporting

4)      I made myself a Reporting Services Sys Admin.

And the result was: FAIL! I still have the same problem. Will keep posting until I figure out the issue.

InfoPath Data Connection Options

InfoPath 2003 provided a great way of creating forms that could connect and store data via a database, web services or XML files. InfoPath 2007 extends these options by providing the ability to use data connections stored in Microsoft SharePoint Server (MOSS), and the ability to submit data to a SharePoint list/document library, a hosting application such as a Windows form or via email.


This post provides a basic comparison of the data connection options that you should be aware of when designing your form using the InfoPath 2007 version.


Main vs. Secondary Data Connections


The first thing to note is that InfoPath allows one main data connection and multiple secondary data connections. The main data connection defines the main data source for the form, and is created automatically when you create a form based on an external data source. By default the data stored in the main data source is sent when the form is submitted. Combinations of the data connections may be used for secondary data sources. For example you might use an XML schema as your main data source but use a secondary database connection to retrieve information, and submit data through a web service. The options for your main data connection are described in the following sections.


Web Services


Using Web Services to retrieve and/or submit data adds an additional layer to your solution that can be used to perform additional validation or processing that cannot be provided within the InfoPath forms.


Pros/Cons

  • Processing logic can be maintained in a central location – no need to re-deploy forms if application logic is updated (or if data store location changes)
  • Supports complex data types such as rich text, file attachments and images
  • No direct database access


Usage scenarios


Where MOSS is not an option, using Web Services as a data connection provides the most functional solution.


Database


A direct database connection is not suitable for most sizable business applications, but may be used to provide a quick and easy way to maintain small sets of data.


Pros/Cons

  • Does not support complex data types including rich text (think bold, italics), file attachments, picture and ink controls.
  • Requires ‘code behind’ if any processing is needed between the form and the database. For example, implementing concurrency or reading in CSV files.
  • Cannot be used when submitting data from browser based forms
  • Directly runs queries and updates on the database


Usage scenarios


This is ideal for applications that have basic data entry requirements, or for displaying information.


XML/Schema


Storing data in an XML file makes it easy to integrate the form’s data with other applications that may be part of your solution.


Pros/Cons

  • May also require custom code to achieve desired functionality
  • Supports complex data types such as rich text, file attachments and images.
  • Works well when form is hosted in an custom application


Usage scenarios


This option works well if you are submitting data to the host application (Windows or Web form), a SharePoint document library or via email.


Data Connection Library


Data connection libraries allow you to share data connections, so this is useful if multiple forms use the same or similar data connections.


Pros/Cons

  • Data connections are maintained in a single location – if your data source changes you only need to update this in one place
  • A browser based form without full trust can connect to a computer in another domain
  • Requires Microsoft Office SharePoint Server


Usage scenarios


This option provides a great way of managing the data connections when Microsoft Office SharePoint Server is available.


Additional Submit Options


In addition to the data connections mentioned above, InfoPath 2007 provides the ability to submit data to an application on a Web server, to a document library on a server running Microsoft Windows SharePoint Services, or in an e-mail message.


Summary


In this post I’ve given a brief description of the data connection options available in InfoPath 2007 including database vs. Web Services vs. XML. The next post in this InfoPath development series
I’ll cover off the hosting environment options available, and how these can be used with the data connections described above.


For more information see the “Introduction to data connections” 
article on Microsoft Office Online

Visual Studio Team Edition: Database Professionals - Review

A few Provokers and myself went along a while ago to a presentation on Visual Studio 2005 Team Edition for Database Professionals, given by Clive Trott to the Wellington SQL Server User Group.

Part of Microsoft’s latest offerings of Visual Studio 2005, Visual Studio 2005 Team Edition (VSTE) comes in different flavours –Architect, Software Developer, Tester and Database Professional.

If you want to know about VSTE: DB Pro and what it does etc, have a look at Trott’s presentation (see above) or the Microsoft product page here.

In a nutshell, VSTE: DB Pro will supposedly take us away from using SQL Server Management Studio for the standard development chores, letting us do all of our work in Visual Studio.

True, I loved the ‘proper’ version control finally available to a databases and the ‘Schema’ view you could use to view objects (as opposed to only the script view given in the Database Project available in VS2005).

However, I came away with two key disappointments about the product which means that my conversion from good ol’ SQL Server Management Studio to VSTE: DB Pro is rather unlikely.

The first was a lack of GUI interface for Schema objects.

In Management Studio you have the age old ability to view a GUI version of the table, its columns, the tick box if it’s null etc., right out of working with Microsoft Access. It was an easy way to add columns, set identities etc using an interface instead of remembering the SQL syntax.

Unfortunately, VSTE:DB Pro lets you view Schema objects alright – but only in it’s SQL format, so to edit a table you are back to editing a CREATE TABLE statement.

Now some might argue that editing and changing tables using scripts is the ‘true’ way of SQL development. But regardless of your opinions on that matter, it is inarguably less user-friendly. And I can’t help but feel that it’s just going to be a right pain to use in every day development.

The second, and more significant, is licensing.

As Trott mentioned in his presentation, in small-medium sized software development companies, there isn’t a distinct segregation of database development vs. software development. As a developer, we all do a bit of coding and a bit of SQL. There is no DBA, we all act as hybrids.

However, VSTE: Software Developer (my obvious requirement for those C# programming requirements) and VSTE: DB Pro (my other requirement for that SQL development under Source Control) are separate products, separate prices and sold separately. They’re about as separate as you can get.

So as a ‘hybrid’ developer, I would require license for both programs, or otherwise one person is put in charge of all the database development for a project in order to use one of a few VSTE: DB Pro licenses available.

Somehow I don’t think the former is financially realistic and the latter seems equally improbable in a smaller software house when you have fewer personnel.

The alternative is a wacky combination of developers using Management Studio for development and storing changes in VSTE: DB Pro when done for checking in. Which needless to say is just asking for trouble.

For the meantime, most of us developers will probably continue to use Management Studio, and still be without one of the key advantages that VSTE: DB Pro offers – decent source control.

Data Mining: Part One

When I first started getting interested in Business Intelligence, one of the obvious paths to go down in a Microsoft shop is looking at Analysis Services 2005. However, I found pretty quickly that while one thinks they understand what 'data mining' is, that might not actually be the case. I realised I had no clue. So, from a distinctly non-data-mining background, learning what data mining actually was is quite an eye opener.

I found what I learnt so interesting and rather revelational about how I viewed data and data mining that the next few posts will be a series on the basics of what data mining actually is. Hopefully this might help other people out there like me.

PART ONE: DATA WAREHOUSES

'Data mining' is a term chucked around a bit and I think most people understand its general meaning of along the lines of 'deriving meaningful information from data'. And likewise, the term 'data warehouse' is general understood to be a database containing trillions of GB of information... or something thereabouts.

I always considered our normal transactional database running our web application a data warehouse. But it’s not.

Our normal Transactional database with tens of thousands or millions of financial transactions (for example) has data which needs querying for performance, cost and trend analysis – in other words Business Intelligence.

But if you want to query the database there are two key things to consider:

Use a copy

It would be inconceivable to run cubes over live databases from a pure performance point of view. For two reasons – one it would affect the performance of your dependant web application due to demand on the database server, two as the data continually changes it will actually affect the performance of your cube. So use a copy. This means the data is not volatile like your production database.

However, more interestingly:

De-normalise the data

A normal database with transactional data is normalised, to reduce data duplication and data integrity. As is well known however, this can sometimes reduce optimisation during query time due to a potentially high number of joins.

Analytical databases (or OLAP databases) should be the opposite. Analytical data sources are not interested in optimised data updates or inserts, nor holding redundant data. Therefore a key part of setting up a data warehouse for querying through the use of cubes etc. is to actually specifically re-architect the data to a de-normalised state. This will provide faster querying time for those multi-million row aggregations.

 

It strikes me that de-normalising a database is going to be harder than it looks and should be done by a trained DBA familiar with data mining, so kids don’t try this one at home.

Brokered Authentication using MLLSTS

Microsoft Live Labs Security Token Service is an experimental Identity Provider service that can make up part of a secure and interoperable brokered authentication solution.  Inside the STS, it integrates with Windows Cardspace and provides brokered authentication with web sites or services that support authentication from the STS.

How does it work?

A client sends an authentication request, with accompanying credentials, to the STS.  The STS verifies the credentials presented by the client and verifies them against the Windows Cardspace Managed information card, and then in response, it issues a security token that provides proof that the client has authenticated with the STS.  The client presents the security token to the Web service.  The Web service verifies that the token was issued by a trusted STS (in this case MLLSTS), which proves that the client has successfully authenticated with the STS.