Category: Rant

Developing To Specifications

I’m a DBA. As a class of people you will find that DBAs have a tendency to rant a little about developers. I would certainly be someone that you would find in that category. The trouble is that most of the time I don’t think that it is the developers fault, it is just a case of shooting the messenger.

As an example let’s look at a new database release that was being asked.

The biggest issue was that the code was provided so late in the cycle that the issues found (and there were a great many of them) could not be fixed. Why could they not be fixed? Because an arbitrary release data had been provided to the customer and this data could not slip for any reason whatsoever. Artificial deadlines, one of the worst things that devs and DBAs have to deal with.

The developers agreed to get the code fixes added to their backlog and to get it into a sprint for fixing in the next month. So after much discussion and a firm commitment we decided to move ahead with the release.

My next question to the dev team was “how are you going to get the data out? You have a few procs here for loading data into tables, but nothing for being able to consume that data afterwards.”

The response was a little stunning:

The only requirement to get done by is to have the data written to a database. After this deploy, we are going to create a way to get the data out

Outstanding. Way to develop to requirements.

In this instance I cannot really place blame on the dev team, they are just doing what is asked of them. I think the bigger problem is sitting with the folks who are gathering the business requirements and translating those, along with timelines, up to the business.

I think that it might be time for DBAs to stop pointing fingers at devs and start holding accountable those who are causing these problems, namely the Business Analysts and Project Managers.

Who’s with me on this?

SCCM & SQL Server – A DBAs Worst Nightmare

Microsoft put out some great products, no really, they do. There are any number of applications and tools available for you to be able to do pretty much anything. One thing gaining popularity recently is System Center Configuration Manager (SCCM) which can be used to provide patch management, software distribution, inventory management, server provisioning and more.

Nightmare+on+SCOM+street Freddy & SCCM – a nightmare double feature

SCCM is great for businesses that are growing and need to maintain control over the devices used and maintain compliance across the enterprise.

This is all great. Businesses use it, businesses need it. SCCM has been designed to provide a relatively straightforward deployment that does not require any strong level of expertise. This is where SCCM falls down for me, as a DBA.

What is the problem?

SCCM does its own database management. It is a set it and forget it kind of thing. This is done so that an enterprise without SQL Server DBAs can go ahead and perform the deployment and management with any specialist knowledge.

This is all good and well, except when you do have a SQL Server DBA on staff; you have multiple deployments of SQL; and you like to perform consolidate servers wherever possible.

SCCM does some things which go completely against my wishes as a production DBA:

  • Requires sysadmin on SQL Server to both install and run the application
  • Requires Windows admin rights on the SQL Server
  • Installs software on Windows to perform backups of SQL Server
  • Adjusts SQL Server configuration settings (CLR & max text repl size)
  • Enables the TRUSTWORTHY option for the SCCM database
  • Sets the database recovery model to SIMPLE

Fortunately I found a lot of this information up front and decided that there was no way I was going to try and consolidate this database with any other in my environment. The security model is lacking in the worst fashion, and there is not much worse than taking all control away from a DBA.

I was glad that I made this choice as the SCCM decided to restart SQL as a part of the installation process. That would have caused a production outage if I had attempt to co-locate it with other low used databases.

Short recommendation

Being brief….if your sysadmins are looking to deploy SCCM in your environment, ask for a dedicated VM for SQL Server. Any attempt to consolidate this database will leave you open to massive security holes and production outages.

Windows Hotfix KB 2661254 Breaks Reporting Services

I have spent the last 3 weeks trying to troubleshoot an issue with Reporting Services for SQL Server 2008 R2 Service Pack 2 failing to start on a server and have come to discover that a Windows Hotfix is causing the issue.

There is no distinction between trying to install a slipstreamed version of SQL Server 2008 R2 with SP2 or trying to install SQL Server 2008 R2 and then attempting to apply SP2 on top of it, either way if KB 2661254 is installed the Reporting Services service will fail to start. You will not get an error indicating the reason for the failure, just that it failed (way to go with the pertinent error messages there Microsoft).

The Windows hotfix KB 2661254 is an update for the certificate minimum key length to prevent the use of any certificate keys that are less than 1024 bit long. This is a security measure to help prevent brute force attacks against private keys. Why this breaks SSRS I do not know. The patch can be safely applied to systems running SQL Server 2008 R2 SP1. 

For now I have passed along word to the sysadmins to not deploy this particular patch to any Windows machine that runs SQL Server and have created a Microsoft Connect item in a hope that they provide resolution to the issue. Please try this in your own test environment, then upvote and mark that you are able to reproduce the problem on Connect.

Fun With Recruiters

I love it when I get those special kinds of emails from recruitment agencies who claim they have the perfect position. I got one of those kinds of emails last week, I thought I would share it (as well as my response).

 

Title: Front End Web Development Lead
Position Type: Direct Placement
Location: Bothell, WA, United States
Description:

Duration: 0-6 month(s)
Job Description:
Front-End Web Development Lead – Bothell, WA
Every day over 19,000 Amdocs employees, serving customers in more than 60 countries, collaborate to help our customers realize their vision. We have a 30-year track record of ensuring service providers¿ success by embracing their most complex, mission-critical challenges. 100% of Fortune¿s Global 500 quad-play providers rely on Amdocs to help them run their businesses better.
Amdocs is a ¿can do¿ company that leads the industry, is fully accountable and most importantly, always delivers. This is our DNA. Our success has been sparked and sustained by hiring exceptional people. If this sounds like you— if you have the drive, focus and passion to succeed in a fast-paced, delivery-focused, global environment– then Amdocs would like to talk with you. Amdocs: Embrace Challenge, Experience Success.
– Please Note: All applicants must be currently authorized to work in the United States without employer sponsorship now or in the future.
Role Overview:
We are looking for a Front-End Web Development Lead to be a team lead directing a multi-shore group of developers tasked with providing issue resolution support for a very large-scale web retail store. Some of the responsibilities and duties include, but are not limited to:
Interface with defect assurance team to accept inbound production issues for resolution
Direct and coordinate work of offshore development team to ensure accurate and timely resolution of front-end production issues
Interface with customer development, business, and other teams as needed to provide good service, promote team visibility and positive perception
As team grows, evaluate potential additional team candidates and support Amdocs executive management by providing expert advice as required to grow our presence with the customer and provide continuous improvement
Provide analytical support to identify, develop, and drive strategic improvement initiatives involving functionality improvements, innovation solutions, and development and implementation methodologies
Serve as trusted advisor to management and client
Work day-to-day with key client management, development fulfillment partner, QA testing organization, providing expert support to each as needed and appropriate
Support development of improved governance of production defect management, including definitions of severity, criteria for prioritization, and defect management lifecycle processes.
Requirements:
5+ years front-end web development experience
5+ years hands on experience with the following key technologies: JSP Integration, HTML / HTML 5, AJAX, CSS, JavaScript, JSON, XML, JQuery
Strong leadership skills
Preferences:
Large scale /enterprise web retail experience
Integration with ATG Commerce
Integration with Adobe CQ
Experience with other industry standard integration technologies (e.g. WebLogic)
Technical leadership experiences in relevant technologies
Telecom experience
All Amdocs roles require strong verbal and written communications skills, position-appropriate mentoring/leadership abilities, ability to quickly master new systems and/or processes, capacity to stay organized while managing competing priorities, and a deep customer service orientation, both internally and externally.

 

I’m a database guy, I’ve never been a developer let alone a dev lead, and so I replied…

 

As a solutions provider I would expect you have have some great analytics. This leads me to ask the question as to what part of my skillset or background leads you, or anyone at your company to believe that I would be a good fit for, or consider the opportunity that you list below.

 

If I ever get a response I’ll be sure to post it.

Do You Trust Your Application Admins?

I was sitting at my desk, happily minding my own business when an alert came through that a database backup had failed. Ok, backups fail, I just figured one of the transaction log backups hiccupped (we’ve been having some problems the last few days).

When I looked at the failure it was a backup trying to write to the C drive on the server.

I NEVER backup to C. It can easily fill the drive and take down the system.

A bigger indicator that something was up was that all of our backups are done across a 10Gb network to a centralized location for ease of tape backup. This indicated that someone, not a DBA, had the access to run a SQL backup.

I trawled through the permissions on the server and nobody has that level of access so I couldn’t figure out who had done this and how.

 

So What Happened?

Looking through the SQL logs I saw multiple attempts by a contractor to login to SQL, all of which failed, then about 5 minutes after the backup error came through. Interesting stuff, so I walked over to the contractor and asked what was going on.

After he was unable to login he went to the application admin who helped him out with access…using the application service account.

One of the third party applications from Microsoft some unnamed vendor has a database on that server. Due to the nature of the well designed code the database owner has to be the same as the service account of the application. The application admin knows this password (not my doing).

After logging this contractor in as the application service account the app admin walked away and left him to his own devices. As a result this contractor was dbo on a database which manages security for the entire company. We should just consider ourselves lucky all this guy did was attempt to perform a backup.

 

Preventative Actions

In order to try and prevent this kind of thing in the future I am looking at implementing a login trigger for the service account which checks the host and application connecting and denying access to anything not in a specifically approved list. There is also a conversation going on to possibly disable interactive logons for the service account using a group policy at the domain level.

 

It is a Matter of Trust

While the application admin is obviously at serious fault here it leads to a question of how well do you trust your admin team?

Domain admins will be able to access your SQL Servers (get over it, there is no way you can keep them out, if they really want in there are numerous ways for them to do so).

Anyone with a password could share that with someone else and allow them to access your servers.

Ultimately you have to trust those that you work with to do the right thing. It’s always sad when those people let you down.

Vendor Support–The Good And Bad

When you go out and buy yourself new hardware or software you have the option of purchasing maintenance agreements at the same time. For software this generally provides the ability to constantly upgrade to the latest and greatest product. For hardware this tends to provide onsite support for when things go wrong and an SLA around that support arriving and the hardware being fixed.

I’ve been dealing with hardware and software vendors for years, I thought I’d share a couple of stories the really depict excellent service and the stuff that you never want to deal with as a customer.

 

The bad

When I started at one of my previous positions I walked into a Dell shop. If you aren’t familiar with that term it means that all hardware purchased was through Dell and that it gave us steeper discounts on the hardware that we would purchase.

I was put in charge of the Windows team pretty early at this company and we started to go through a hardware refresh. I sat down with the team and started asking questions about how things were with Dell. To a person they liked the hardware and what it delivered however they hated the service. There were common problems with SLAs not being met, the wrong replacement parts being delivered and phone support being unable to provide decent assistance.

I brought these issues to the Dell account rep and explained that we were looking at a fairly significant budget spend the next year on hardware (>$1m) and that I needed to see better results from the support team over there if I was going to spend any of that money with them.

Over the next 6 months I thoroughly documented every engagement with their support staff. This support engagements included:

  • Server down – customer impact
  • Hardware problem, replacement part needed – non-customer impacting
  • General troubleshooting assistance required – non-customer impacting

I’m sad to say that Dell was only able to meet the 4 x 7 x 365 agreement we had for hardware support in 10% of the cases that we opened. Techs would show up late (or not at all), parts would be incorrect even when the tech was onsite in time (techs did not bring the parts, they would be delivered separately) and we would have trouble getting anything above a level 2 tech person on the phone who’s troubleshooting ability seemed to be limited to “have you tried turning it off and back on again”.

This was several years ago and Dell might have significantly improved their support since then, however when I left the company we did not have a single Dell server in any of the three datacenters I had built out.

 

The good

Software has bugs. We all know that and have experienced problems with vendor applications, but what happens when you run into a significant issue and how does the vendor respond?

Recently SQLSentry released a new version of their Performance Advisor for SQL Server tool which is for monitoring and tuning SQL Server. I performed an upgrade to the new version and resumed monitoring, I didn’t run into any issues or problems.

A couple of days later I got a call from our Windows folks stating they had an alarm on high memory utilization on the monitoring server. I logged in to take a look and was shocked to see the SQLSentry monitoring service had consumed over 5GB of memory. I bounced the service and it reset itself. Over the next couple of days memory usage increased again, causing me to restart the service.

At this point I engaged the support folks, in particular Jason Hall (blog|twitter). We started triaging the issue.

We started up perfmon and captured a few counters to file to try and localize the memory leak. This allowed us to discover the leak was in unmanaged code, making the trouble a lot tougher to track down.

The next step was to install the Windows Debugging Tools from Microsoft. With these deployed and a set of symbols downloaded we used UMDH to capture the before and after log heap allocations for the monitoring service. One comparison log later and we were able to track the issue down to a leak in Microsoft’s managed wrapper for the VDS (Virtual Disk Service) subsystem which is used by SQLSentry to monitor mount points.

I’m running several multi-node, multi-instance SQL Server Failover Clustered Instances and make extensive use of mount points (current count is 136 monitored mount points).

To test and confirm that VDS was the actual issue one of the SQLSentry development team threw together a very small 50 line application that I could hit a couple of buttons on an watch memory usage. It took a bit of a tired mouse finger, but I was able to verify quickly that VDS was indeed the problem.

Now fully understanding the problem in hand the SQLSentry team quickly built their own COM wrapper to handle mount point monitoring and provided me with a new build of the product. I went through a standard deployment and started the services back up again. A week later and the service is still running at around 500MB.

 

Throughout the process of problem triage, issue identification and resolution it was a very engaged support process with an appropriate level of urgency for each of the steps. Everything was handled to completion and I have been very happy with the support I received. That’s why my maintenance for this product will be renewed next year. I know that the money spent is worth it.

 

TL;DR

In the past I have spent a lot of money on very high levels of support from Dell and received nothing but poor service. As a result they lost several million dollars of business.

On the flip side I spent a small amount of money on maintenance with SQLSentry and received excellent support and levels of engagement which will help retain me as a long term customer.

 

I’d be interested to hear about your experiences with these vendors,  or any other.

The Importance Of Good Documentation

Believe it or not I’m not actually talking about server documentation here (for an excellent post on that go read Colleen Morrow’s The Importance of a SQL Server Inventory).

I have spent the last 12 days dealing with a single production release. It is being considered a significant release, but to be honest it really isn’t. The biggest challenge has been to do with the way that the release documentation has been provided and the fashion in which the scripts have been built.

 

What I got

Here’s a brief example of a change request I’ve seen:

  • Change Request:
    • Update database – products (this links to a Sharepoint page)
    • Use code from this location (links to a file share)
  • Sharepoint page
    • Go to this location (but replace the middle part of the link with the link from the change request page)
    • Copy this subfolder to your machine
    • Follow the process on Sharepoint page 2 to deploy the code
    • Once Sharepoint page 2 is complete run script X
  • Sharepoint page 2
    • run script 1
    • run script 2
    • run script 3

 

Pretty painful right? Now multiply that by 8 for each of the database code deployments that needed to be completed. No fun, no fun at all.

 

What do I want?

It’s going to be a work in progress but we’ll be working with this particular dev team to put together a unified document to simplify the release structure.

Here’s what I want to see:

  • Change Request:
    • Update database – products – deployment instructions attached
  • Attachment
    • Deploy script 1 (link to script)
    • Deploy script 2 (link to script)
    • Deploy script 3 (link to script)
    • Deploy script X (link to script)
    • Rollback script (link to script)

 

The difference?

Instead of having to reference several different Sharepoint locations in addition to a change control document I now have a single document, attached to the change, which clearly defines the process for the release, the order for scripts to be executed, a link to each of those scripts and the relevant rollback information.

It’s not something that I think is too out of line to provide, but I’ve found the folks who have been providing releases in this method are extremely resistant to change. I can understand that, but to be fair, they aren’t the ones under the gun trying to put something in to a production environment in a consistent and stable manner.

I’ve lots of fun meetings coming up to talk about this.

 

What about you?

How do you get your change control documentation? Is it something plainly written and easy to follow? Or do you have to have a degree in cryptography to get code in to production?