Category: Rant

Fun With Recruiters

I love it when I get those special kinds of emails from recruitment agencies who claim they have the perfect position. I got one of those kinds of emails last week, I thought I would share it (as well as my response).

 

Title: Front End Web Development Lead
Position Type: Direct Placement
Location: Bothell, WA, United States
Description:

Duration: 0-6 month(s)
Job Description:
Front-End Web Development Lead – Bothell, WA
Every day over 19,000 Amdocs employees, serving customers in more than 60 countries, collaborate to help our customers realize their vision. We have a 30-year track record of ensuring service providers¿ success by embracing their most complex, mission-critical challenges. 100% of Fortune¿s Global 500 quad-play providers rely on Amdocs to help them run their businesses better.
Amdocs is a ¿can do¿ company that leads the industry, is fully accountable and most importantly, always delivers. This is our DNA. Our success has been sparked and sustained by hiring exceptional people. If this sounds like you— if you have the drive, focus and passion to succeed in a fast-paced, delivery-focused, global environment– then Amdocs would like to talk with you. Amdocs: Embrace Challenge, Experience Success.
– Please Note: All applicants must be currently authorized to work in the United States without employer sponsorship now or in the future.
Role Overview:
We are looking for a Front-End Web Development Lead to be a team lead directing a multi-shore group of developers tasked with providing issue resolution support for a very large-scale web retail store. Some of the responsibilities and duties include, but are not limited to:
Interface with defect assurance team to accept inbound production issues for resolution
Direct and coordinate work of offshore development team to ensure accurate and timely resolution of front-end production issues
Interface with customer development, business, and other teams as needed to provide good service, promote team visibility and positive perception
As team grows, evaluate potential additional team candidates and support Amdocs executive management by providing expert advice as required to grow our presence with the customer and provide continuous improvement
Provide analytical support to identify, develop, and drive strategic improvement initiatives involving functionality improvements, innovation solutions, and development and implementation methodologies
Serve as trusted advisor to management and client
Work day-to-day with key client management, development fulfillment partner, QA testing organization, providing expert support to each as needed and appropriate
Support development of improved governance of production defect management, including definitions of severity, criteria for prioritization, and defect management lifecycle processes.
Requirements:
5+ years front-end web development experience
5+ years hands on experience with the following key technologies: JSP Integration, HTML / HTML 5, AJAX, CSS, JavaScript, JSON, XML, JQuery
Strong leadership skills
Preferences:
Large scale /enterprise web retail experience
Integration with ATG Commerce
Integration with Adobe CQ
Experience with other industry standard integration technologies (e.g. WebLogic)
Technical leadership experiences in relevant technologies
Telecom experience
All Amdocs roles require strong verbal and written communications skills, position-appropriate mentoring/leadership abilities, ability to quickly master new systems and/or processes, capacity to stay organized while managing competing priorities, and a deep customer service orientation, both internally and externally.

 

I’m a database guy, I’ve never been a developer let alone a dev lead, and so I replied…

 

As a solutions provider I would expect you have have some great analytics. This leads me to ask the question as to what part of my skillset or background leads you, or anyone at your company to believe that I would be a good fit for, or consider the opportunity that you list below.

 

If I ever get a response I’ll be sure to post it.

Do You Trust Your Application Admins?

I was sitting at my desk, happily minding my own business when an alert came through that a database backup had failed. Ok, backups fail, I just figured one of the transaction log backups hiccupped (we’ve been having some problems the last few days).

When I looked at the failure it was a backup trying to write to the C drive on the server.

I NEVER backup to C. It can easily fill the drive and take down the system.

A bigger indicator that something was up was that all of our backups are done across a 10Gb network to a centralized location for ease of tape backup. This indicated that someone, not a DBA, had the access to run a SQL backup.

I trawled through the permissions on the server and nobody has that level of access so I couldn’t figure out who had done this and how.

 

So What Happened?

Looking through the SQL logs I saw multiple attempts by a contractor to login to SQL, all of which failed, then about 5 minutes after the backup error came through. Interesting stuff, so I walked over to the contractor and asked what was going on.

After he was unable to login he went to the application admin who helped him out with access…using the application service account.

One of the third party applications from Microsoft some unnamed vendor has a database on that server. Due to the nature of the well designed code the database owner has to be the same as the service account of the application. The application admin knows this password (not my doing).

After logging this contractor in as the application service account the app admin walked away and left him to his own devices. As a result this contractor was dbo on a database which manages security for the entire company. We should just consider ourselves lucky all this guy did was attempt to perform a backup.

 

Preventative Actions

In order to try and prevent this kind of thing in the future I am looking at implementing a login trigger for the service account which checks the host and application connecting and denying access to anything not in a specifically approved list. There is also a conversation going on to possibly disable interactive logons for the service account using a group policy at the domain level.

 

It is a Matter of Trust

While the application admin is obviously at serious fault here it leads to a question of how well do you trust your admin team?

Domain admins will be able to access your SQL Servers (get over it, there is no way you can keep them out, if they really want in there are numerous ways for them to do so).

Anyone with a password could share that with someone else and allow them to access your servers.

Ultimately you have to trust those that you work with to do the right thing. It’s always sad when those people let you down.

Vendor Support–The Good And Bad

When you go out and buy yourself new hardware or software you have the option of purchasing maintenance agreements at the same time. For software this generally provides the ability to constantly upgrade to the latest and greatest product. For hardware this tends to provide onsite support for when things go wrong and an SLA around that support arriving and the hardware being fixed.

I’ve been dealing with hardware and software vendors for years, I thought I’d share a couple of stories the really depict excellent service and the stuff that you never want to deal with as a customer.

 

The bad

When I started at one of my previous positions I walked into a Dell shop. If you aren’t familiar with that term it means that all hardware purchased was through Dell and that it gave us steeper discounts on the hardware that we would purchase.

I was put in charge of the Windows team pretty early at this company and we started to go through a hardware refresh. I sat down with the team and started asking questions about how things were with Dell. To a person they liked the hardware and what it delivered however they hated the service. There were common problems with SLAs not being met, the wrong replacement parts being delivered and phone support being unable to provide decent assistance.

I brought these issues to the Dell account rep and explained that we were looking at a fairly significant budget spend the next year on hardware (>$1m) and that I needed to see better results from the support team over there if I was going to spend any of that money with them.

Over the next 6 months I thoroughly documented every engagement with their support staff. This support engagements included:

  • Server down – customer impact
  • Hardware problem, replacement part needed – non-customer impacting
  • General troubleshooting assistance required – non-customer impacting

I’m sad to say that Dell was only able to meet the 4 x 7 x 365 agreement we had for hardware support in 10% of the cases that we opened. Techs would show up late (or not at all), parts would be incorrect even when the tech was onsite in time (techs did not bring the parts, they would be delivered separately) and we would have trouble getting anything above a level 2 tech person on the phone who’s troubleshooting ability seemed to be limited to “have you tried turning it off and back on again”.

This was several years ago and Dell might have significantly improved their support since then, however when I left the company we did not have a single Dell server in any of the three datacenters I had built out.

 

The good

Software has bugs. We all know that and have experienced problems with vendor applications, but what happens when you run into a significant issue and how does the vendor respond?

Recently SQLSentry released a new version of their Performance Advisor for SQL Server tool which is for monitoring and tuning SQL Server. I performed an upgrade to the new version and resumed monitoring, I didn’t run into any issues or problems.

A couple of days later I got a call from our Windows folks stating they had an alarm on high memory utilization on the monitoring server. I logged in to take a look and was shocked to see the SQLSentry monitoring service had consumed over 5GB of memory. I bounced the service and it reset itself. Over the next couple of days memory usage increased again, causing me to restart the service.

At this point I engaged the support folks, in particular Jason Hall (blog|twitter). We started triaging the issue.

We started up perfmon and captured a few counters to file to try and localize the memory leak. This allowed us to discover the leak was in unmanaged code, making the trouble a lot tougher to track down.

The next step was to install the Windows Debugging Tools from Microsoft. With these deployed and a set of symbols downloaded we used UMDH to capture the before and after log heap allocations for the monitoring service. One comparison log later and we were able to track the issue down to a leak in Microsoft’s managed wrapper for the VDS (Virtual Disk Service) subsystem which is used by SQLSentry to monitor mount points.

I’m running several multi-node, multi-instance SQL Server Failover Clustered Instances and make extensive use of mount points (current count is 136 monitored mount points).

To test and confirm that VDS was the actual issue one of the SQLSentry development team threw together a very small 50 line application that I could hit a couple of buttons on an watch memory usage. It took a bit of a tired mouse finger, but I was able to verify quickly that VDS was indeed the problem.

Now fully understanding the problem in hand the SQLSentry team quickly built their own COM wrapper to handle mount point monitoring and provided me with a new build of the product. I went through a standard deployment and started the services back up again. A week later and the service is still running at around 500MB.

 

Throughout the process of problem triage, issue identification and resolution it was a very engaged support process with an appropriate level of urgency for each of the steps. Everything was handled to completion and I have been very happy with the support I received. That’s why my maintenance for this product will be renewed next year. I know that the money spent is worth it.

 

TL;DR

In the past I have spent a lot of money on very high levels of support from Dell and received nothing but poor service. As a result they lost several million dollars of business.

On the flip side I spent a small amount of money on maintenance with SQLSentry and received excellent support and levels of engagement which will help retain me as a long term customer.

 

I’d be interested to hear about your experiences with these vendors,  or any other.

The Importance Of Good Documentation

Believe it or not I’m not actually talking about server documentation here (for an excellent post on that go read Colleen Morrow’s The Importance of a SQL Server Inventory).

I have spent the last 12 days dealing with a single production release. It is being considered a significant release, but to be honest it really isn’t. The biggest challenge has been to do with the way that the release documentation has been provided and the fashion in which the scripts have been built.

 

What I got

Here’s a brief example of a change request I’ve seen:

  • Change Request:
    • Update database – products (this links to a Sharepoint page)
    • Use code from this location (links to a file share)
  • Sharepoint page
    • Go to this location (but replace the middle part of the link with the link from the change request page)
    • Copy this subfolder to your machine
    • Follow the process on Sharepoint page 2 to deploy the code
    • Once Sharepoint page 2 is complete run script X
  • Sharepoint page 2
    • run script 1
    • run script 2
    • run script 3

 

Pretty painful right? Now multiply that by 8 for each of the database code deployments that needed to be completed. No fun, no fun at all.

 

What do I want?

It’s going to be a work in progress but we’ll be working with this particular dev team to put together a unified document to simplify the release structure.

Here’s what I want to see:

  • Change Request:
    • Update database – products – deployment instructions attached
  • Attachment
    • Deploy script 1 (link to script)
    • Deploy script 2 (link to script)
    • Deploy script 3 (link to script)
    • Deploy script X (link to script)
    • Rollback script (link to script)

 

The difference?

Instead of having to reference several different Sharepoint locations in addition to a change control document I now have a single document, attached to the change, which clearly defines the process for the release, the order for scripts to be executed, a link to each of those scripts and the relevant rollback information.

It’s not something that I think is too out of line to provide, but I’ve found the folks who have been providing releases in this method are extremely resistant to change. I can understand that, but to be fair, they aren’t the ones under the gun trying to put something in to a production environment in a consistent and stable manner.

I’ve lots of fun meetings coming up to talk about this.

 

What about you?

How do you get your change control documentation? Is it something plainly written and easy to follow? Or do you have to have a degree in cryptography to get code in to production?

Stop Bad Database Design

Every year that goes by I sit in hope that I won’t see bad database design.

Every year I am disappointed.

 

As an example here is a table create statement that I saw the other day (table and column names have been changed to protect the innocent)

CREATE TABLE [dbo].BestestTableEVAR(

 Col1 [int] IDENTITY(1,1) NOT NULL,

 Col2 [uniqueidentifier] NULL,

 Col3 [uniqueidentifier] NOT NULL,

 Col4 [smallint] NULL,

 Col5 [smallint] NOT NULL,

 Col6 [bit] NOT NULL,

 Col7 [xml] NULL,

 Col8 [xml] NULL,

 ColA [xml] NULL,

 ColB [xml] NULL,

 ColC [datetime2](2) NULL,

 ColD [datetime2](2) NULL,

 COlE [datetime2](2) NULL,

 ColF [datetime2](2) NULL,

 CONSTRAINT [PK_BestestTableEVAR] PRIMARY KEY CLUSTERED 

(

 Col3 ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]

) ON [PRIMARY]

 

So what’s wrong with this?

The clustered primary key on this table is a GUID. Is that bad? That’s an unequivocal YES! Read all that Kimberly Tripp (blog|twitter) has to say about GUIDs in database design.

What makes this all the more crazy is that the table has an identity column. That’s a natural clustering key ready and waiting to be used and yet for some reason it’s not.

This table is going to fragment like crazy, it won’t scale and performance will be hideous. Additionally, thanks to the XML columns this table can’t even be rebuilt online meaning there’s no way to help the fragmentation or performance without actually taking the table offline to do it, meaning it can’t handle any transactions. This is a problem in a table on an OLTP system.

 

I would go back and change some things. Let’s say you wanted to keep the table structure the same, that’s fine, but let’s be smart about the keys and indexes.

It would make sense to change the identity column to be clustered (I would also make this the primary key) and then, to ensure uniqueness on Col2 which is the current primary key a unique index is warranted.

CREATE TABLE [dbo].BestestTableEVAR(

 Col1 [int] IDENTITY(1,1) NOT NULL,

 Col2 [uniqueidentifier] NULL,

 Col3 [uniqueidentifier] NOT NULL,

 Col4 [smallint] NULL,

 Col5 [smallint] NOT NULL,

 Col6 [bit] NOT NULL,

 Col7 [xml] NULL,

 Col8 [xml] NULL,

 ColA [xml] NULL,

 ColB [xml] NULL,

 ColC [datetime2](2) NULL,

 ColD [datetime2](2) NULL,

 COlE [datetime2](2) NULL,

 ColF [datetime2](2) NULL,

 CONSTRAINT [PK_BestestTableEVAR] PRIMARY KEY CLUSTERED 

(

 Col1 ASC

)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]

) ON [PRIMARY];

 

CREATE UNIQUE NONCLUSTERED INDEX UI_BestestTableEVAR_Col2 on BestestTableEVAR (Col2);

Sure, we still won’t be able to rebuild the indexes online, but we won’t have the same crazy levels of fragmentation that we would have had before.

 

I know I’ll be seeing a lot of bad design this year and I know that I’ll be forced to push that bad design into production. Doesn’t stop me trying to change things however. Help your devs out, let them know when their design is a problem. Who knows, maybe you’ll change things for the better.

Your “Five Nines” Means Nothing To Me

There’s a lot of talk on High Availability. I know I’m a huge fan of it, in particular clustering (but I know it’s not for all situations and the changes that SQL 2012 with AlwaysOn Groups may mean that traditional clustering is used less and less). There are of course other HA solutions out there like Log Shipping, Mirroring and SAN replication technologies (sorry folks, I disagree on transactional replication as an HA concept unless you are talking downstream reporting infrastructure behind load balancers).

People constantly push to achieve the magical five nines availability number. What does that mean?

Five nines means that your SQL Server must be available 99.999% of the time. This means you can have only 5.26 minutes of downtime a year.

So let’s say that you are running clustering and mirroring; that you have load balancing with peer-to-peer replication; that you are running SAN replication between two datacenters and have log shipping setup ready to go live at the flick of a switch. You have all your hardware ducks in a row and are ready for any emergency that may crop up.

I commend you for all of that.

Now let’s say that your manager is going to bonus you on the downtime that you have. You are the DBA, a single link in the chain, and your entire bonus is going to based on the number of nines that you can get over the year.

Time to sit down with your manager and ask some questions:

 

  • Does this five nines bonus plan include maintenance time?
  • How are you going to measure uptime?
  • What tools are going to be used to measure it?
  • What are the tolerances on those tools?
  • What about upstream items?

 

 Let’s take a look at these one at a time.

 

Is maintenance time included?

There may be bugs in the database code for the application that would require an app being offline while changes are made. You may want to patch SQL Server with the latest service pack in order to maintain supportability. You may want to test failover and DR. If none of these things are excluded then chances are you’ll have trouble making three nines (8.76 hours of downtime in a year) and none at all of making five nines.

 

How will uptime be measured?

Is uptime calculated based upon SQL Server being up and running? Or maybe on that there are successful connections? How about certain queries getting returned? Or a website returning a page within a certain time period?

So let’s say you are running a query. What if that query gets blocked or deadlocked? SQL is up and processing other transactions but that single query is having a problem.

Successful connections? Great, doesn’t mean that the database is available that they are going to use. Same issue with checking if SQL Server is available. Heck, are you measuring server uptime or database uptime here?

Website returning a page? What if there’s an issue with the web server? Or a networking problem between there and the SQL database? 

 

What tool will you use?

Are you going to buy something off the shelf? Then who’s going to spend the time configuring it for your particular environment? Isn’t that time better spent on solidifying  things,  performance analysis? Query tuning?

You’re going to make your own tool? Great, how are you going to do that, what technologies? How is the information going to be reported upwards? Who has the last say on possible exceptions? You have the time to write this yourself? Awesome, I’m glad that you do.

 

What is your tool tolerance?

Tolerance? Yup. Let’s say that you’ve figured out all the other stuff and have agreements in place for everything and got a tool that will do the job. How does that tool work?

Let’s say it runs a query and fails. It says that the database is down. When will it next attempt to query the database? In 15 minutes? If that’s the case then any bad poll from your tool just put you outside of that five nines criteria and your bonus has been dinged.

Let’s just say that there was an actual problem and that happened 30 seconds after the last poll but you resolved it before the next one. Everything was down for 14 minutes but your tool didn’t capture that.

Some serious flaws there.

 

Upstream items

Are you the Windows, network, firewall, and datacenter admin? Are you responsible for generating the electricity that comes into the building? How about the HVAC system, is that yours? Are you checking the diesel levels in your backup generators weekly?

Each one of these upstream items can and will affect peoples ability to access your SQL Server and its databases. Unless you are responsible for them all then how can you be held accountable for perceived uptime on SQL Server?

SQL can be up, able to process data and all the databases available, but without a funcitoning network nobody would know that and your monitoring wouldn’t pick it up.

 

 

All of these things add up to not being able to accurately measure the mythical five nines uptime creature. If indeed we can’t get those kinds of accurate measurements and I can’t own those upstream processes then why should I be held accountable for a number.

Here’s a better idea. Let’s look at the processes, let’s look at whether or not things are available from a users perspective. Let’s gauge customer interactions, build outs, deployments. Let’s track tickets and issues. Let’s talk about the things we need to do to make things better. Let’s not go pulling a number out of the ether, it does nobody any favors.

Be A Courteous Blogger

Today I received an email, nothing unusual about that. This email happened to be from a SQL blogger. It’s one of those newletter type things, it even said at the bottom that I had opted into receiving this email. The only thing is that I, at no point, signed up to receive a newsletter. In fact this blog is not one I have visited other than on a couple of ocassions. I believe I may have commented on one of the posts I saw and that is how the blog owner got my email address.

My guess is that the blogger in question went through the WordPress database (yes WordPress site) and pulled emails for everyone that had posted comments and sent out this unsolicited email.

That is absolutely the wrong way to do something like this. A strongly worded email is on its way to the blog owner. Small mercies…they included an unsubscribe link on the email, so hopefully I won’t get another.

 

The right way to have done this was to do the exact same thing that Paul Randal (blog|twitter), Robert Davis (blog|twitter) and Brent Ozar PLF (blog|twitter) have done. That is let people know you have a newsletter and give folks the opportunity to opt in and receive it.

All you bloggers out there, please do the right thing.