Tag: Rant

High Availability–It’s Not All About The Hardware/Software

As I write this I’m sitting my my house working with the last hour of juice left in my laptop. We had a major Snow + Seattle = Bad Newssnowstorm here in the Seattle area which was followed by a big ice storm and some heavy winds, all of which have coincided to knock out power to over 200,000 homes. I’ve been without power for over 75 hours now and it’s given me a lot of time to think about multiple facets of High Availability (HA) and Disaster Recovery (DR).

HA/DR are critical items for all companies. HA in various degrees, but most certainly DR and the ability to provide some kind of business continuity model. Different companies, large and small, have different requirements around HA/DR. Some could easily remain down for a couple of days with no lingering effects, others cannot sustain downtime on their most critical systems of more than a couple of minutes. Based upon those requirements (and any budget constraints) a company could implement HA/DR using Failover Clustering, Mirroring, Log Shipping, SAN Replication, Peer-to-Peer Replication, VMotion or some other technology.

 

 

Where HA/DR often falls down

This is all good and well but what about the people that you need in order to get things up and running, the folks that you need in order to provide you with ongoing support in the event that a situation arises where you have to go to your DR site, or provide support for that critical system where a mirror has failed?

In the last 80 hours I have been called by the NOC at work stating that they have an issue. I don’t have power at home so my internet access is off. I don’t have sufficient power in my phone to tether it (don’t tell the wireless company I can do that) and it would take 4 hours sitting in the car to get sufficient power to allow me to do that. I do have a mobile broadband USB device, however I’m not getting coverage in my house right now from that carrier and I have been unable to get out of my road. In other words in the whole HA/DR side of things I’m a single point of failure. Luckily I have a colleague with power who I was able to point the NOC to in order to provide resolution to the issue.

My colleague and I live less than 5 miles apart, we are just luckily on different circuits and substations for the power company. If neither of us had been able to dial in then the problem would not have been resolved.

 

If you truly want to implement HA/DR you need to go further than just looking at your hardware infrastructure or the way that your software is setup. It means more than your DNS configuration and your Active Directory setup. There’s more to it than setting a Failover Partner in your application connection string. It’s about the people that you need to support these things!

 

 

Artificial limitations

I’ve been a DBA for about 12 years now. I’ve not been hands on with a server since doing datacenter implementation work about 5 years ago. Other than in very small shops it’s rare that a DBA touches a server, most don’t even have access to the datacenter in which their database servers reside. There has been a huge move towards setup in Colocation Datacenters (CoLo’s) over the last few years, meaning that more often than not the datacenter where you servers reside are probably not even in the same physical building that you are. There is also the current move to the Cloud initiative that’s gaining some traction. Given all of this information I ask you this:

Why are DBAs so often tied to a physical location for work?

 

This is a serious question. There are so few telecommuting options for DBAs out there. You can look on the job boards and see virtually none. There’s a lot of relocation opportunities if you want to move to another state, but it doesn’t make a lot of sense. If you are going to geographically disperse your hardware due to HA/DR needs why would you not do that same for the teams that have to support it?

So many companies point blank refuse to even entertain the idea of telecommuters from another state (heck the vast majority won’t even contemplate allowing IT support staff to work from home a couple of days a week).

The same companies that happily outsource all development processes to an unknown entity in India or China refuse to entertain the idea of allowing Windows/Unix Admins or DBAs to telecommute.

 

I feel that it is somewhat ironic that the same people that expect you to come to the office and be tied to a physical location also have you be on call so that you can provide remote support when something goes wrong.

 

 

Moving beyond the office

Forcing IT support staff to come to a physical location 5 days a week makes little to no sense to me. I can understand the desire to have folks in the same physical location but as your organization grows it really needs to find a way to get past that mentality. There are various options such as LiveMeeting or GoToMeeting that can provide a telepresence in meetings for people that work remotely. Instant Messaging can be used to constantly keep people in touch. Services such as Yammer can be used to quickly and easily provide collaborative thought outside of meetings.

Looking at my current situation, if my colleague and I had both been without power we would have had no way to provide support. Our servers sit in a CoLo that we don’t have access to so even if we could have gotten there, we would have had no hands on way to work with the servers.

Now imagine we had a third person on the team that lived and worked in another state (or that the second person was living in another state and telecommuted in). Even with power down for me (or the entire state) we would still have someone available to provide support (unless by extremely unusual circumstances they were also down). With a remote DR site they could have effected whatever steps were necessary to bring that online and check that things were working ok.

Let’s look at this from another angle. I’m in the Pacific Time Zone and get to work usually 7:00-7:30am and leave 4:30-5:00pm. If we had someone working in the Eastern Time Zone and they worked the same hours we would have coverage (without emergencies handled by being on call) from 4am PST – 4:30pm PST. That’s an extra 3 hours of standard DBA coverage. Imagine how much more work could get done. How overnight problems could be identified and resolved earlier?

We live in a global society and the company that I work for does business internationally. Would it not make a lot more sense to provide a wider margin of coverage? Think about it, it would also improve the quality of life for anyone that had to be on call. For me the worst time to get called out is about 4am as I’ll never get back to sleep. If there was someone already in the office on the east coast I wouldn’t need to get called out. The same for them with evening plans. They could go out to dinner or a movie knowing that they wouldn’t have to worry about missing a call.

 

 

TL;DR

It’s the 21st century, there are many options to prevent the need for anyone to be physically tied to an office location (certainly amongst IT staff). In order to provide higher levels of support to companies they really need to embrace telecommuting options beyond providing emergency support.

 

 

Late Note: After 80 hours PSE were able to restore power to my neighborhood. Throughout the outage PSE did a fantastic job of handling things. Ultimately over 375,000 were without power at one time or another. PSE provided constant information wherever possible via their Twitter feed and website. They brought in crews from around the country to assist in the restoration efforts. All in all I think they handled things really well. I have no complaints with how they handled any of this. (I just wish I’d been closer to the top of the restore list)

Who Needs a DBA?

I was perusing my RSS Feeds this morning, and came across an article by Steve Jones (b|t) about Changing Times. In this post Steve talks about Microsoft not having any DBAs in their cloud team, and that it is all managed by DevOps folks. This message came from a talk that was given at SQLBits and summarized by Victoria Holt.

MS devops facts

Wow, this looks amazing, and really seems to ring the death knell for the DBA…or does it?

Let’s consider what we are looking at here, and for the purposes of this, I am going to focus on Azure SQL Database (or whatever we’re calling it this week).

There are 1.7 million production databases that do not have DBAs. Well, that’s flat out wrong, there might not be DBAs at Microsoft, but you can be pretty sure that there are DBAs out there managing a portion of those. I guess we should really focus on the Microsoft side of things.

So there are 1.7 million databases, what is Microsoft’s role in these databases? Other than keeping up the infrastructure that the databases are based upon not much really. They have no posted performance metrics to which they must aspire. They are not going to be in the business of ensuring that your backups are taken, your HA requirements are based upon you setting up the services and regions correctly, and they don’t care how fast or slow your query is. There is an SLA for uptime, and don’t worry, if it gets below 99% you get a 25% discount.

Given that there are no requirements to perform any work that a typical DBA would be required to do why would they need a DBA at all? They wouldn’t. It is this kind of information that makes people who work as DBAs question their existence and future, and frankly, there is no need to do that.

As a DBA you will find that the cloud will become a significant part of your future (if it is not already), and your role will shift, but there will always be a place for the on-premises products (for as long as it is made), and that means DBAs to manage it. Heck if you want to see what the world without DBAs looks like just skim through dba.stackexchange sometime.

I feel that a lot of messaging coming out of Microsoft puts a not insignificant level of fear into some people, as they attempt to set an example where the only people that really matter are developers, and developers sitting in a DevOps role. Is Microsoft going to change their tune around any of this? Nope, I don’t see that happening at any point. One thing you could question is, given that there are all these Azure databases that don’t have DBAs why is it that Microsoft has any DBAs working for them at all? Surely DevOps people would be all that they need.

In-Memory OLTP – Shiny, But Useful?

SQL 2014 introduces In-Memory OLTP to the product (although I prefer the name Hekaton).  It is a significant leap forward so far as the technology goes, and can lead to significant performance improvements when utilized, but is it going to be a technology for the masses? Maybe eventually, but right now I see limited use cases for it.

 

Limitations on compiled tables

First up there are limitations around what you can do with the In-Memory tables. There’s no option to perform a truncate or merge into one. You also cannot perform a cross-database join with one, or access it from a CLR module using context connections (see Accessing Memory-Optimized Tables Using Interpreted Transact-SQL).

Interested in using contained databases for portability? Forget using an In-Memory table then, just not going to happen. You can’t put one on a particular filegroup or partition either, or use computed columns. Like to use IDENTITY? Not any more. DRI? Nope, not supported, nor are clustered primary keys.

That being said you can do pretty much anything else with these tables, just so long as you don’t want to use a LOB data type, datetimeoffset, or potentially have a row overflow (see Supported Data Types). Then again, were that the case you shouldn’t be doing those kinds of things in your OLTP system. We’ve never seen anyone do that, right?

You could also have problems with transactions in these tables, and so you have to be very careful with setting retry logic

You might want to be careful when creating your tables too, because once they are there you can’t alter them, you actually have to drop and recreate it.

 

Limitations on indexes

Unique indexes are a no go, so are filtered ones. Want to drop or alter an index? Well that’s a table drop and recreate right there. Those null columns you have? Well you can’t have those in an index. Maybe it’s time to start listening to Joe Celko and getting rid of those pesky nullable columns.

 

Limitations on compiled procedures

On the bright side your developers will no longer be able to use cursors, so it’s not all doom and gloom. Set based items are a little more challenging though, you cannot do a multi-row insert using INSERT…VALUES, but that’s pretty minor, and just requires more typing.

You won’t be getting too recursive, CTEs aren’t supported, nor are subqueries. 

DELETE and UPDATE from aren’t allowed, nor are CASE statements. Forget about using UDFs, or the OUTPUT clause to get information back.

Want to get data from more than one table? UNION won’t get you there, nor will INTERSECT or EXCEPT. OUTER JOINS and APPLY aren’t supported either.

Filtering your results? Don’t expect to use ( OR IN (..)) as a construct or NOT, and you better be sure what you are looking for because LIKE isn’t allowed either. 

By no means is this a complete list, there are lots more limitations. 

 

So many limitations, such little time

In-Memory is fast, that’s been demonstrated, but there are so many restrictions around it’s usage that for most companies it is not going to be something that is even worth looking at. The use cases for it are probably too extreme.  Yes, it will get used, and used to great effect by some, but I think that we are looking at less than 1% of the install base could actually see true use cases and be able to adjust their development methods and code base to be able to support it (at least in this version).

So while this is an exciting breakthrough in performance you’ll have to excuse me while I keep working on tuning my code and indexes for performance, and setting up my AGs for DR.

 

 

 

For a full list of Unsupported Transact-SQL Constructs for In-Memory OLTP visit Technet

 

So Long MCM

The MCM cert is dead. So is the replacement MCSM and the MCA. That’s according to an email that was distributed tonight by Shelby Grieve, the Director of Certification Product Management. 

The end of the program comes in a short email that clearly talks of the end of the certs

Microsoft will no longer offer Masters and Architect level training rotations and will be retiring the Masters level certification exams as of October 1, 2013.

That’s pretty clear information right there.  

This from an email that came out at 10pm, the Friday of a long holiday weekend in the US. 

Microsoft appears to be attempting to bury this when people aren’t around, hoping that nobody will notice and that the furor will have calmed down some by the time people start checking back in on Tuesday.

 

Attaining an MCM is no easy feat. There is a reason that there are only a couple of hundred people with that cert. Taking this away really pains everyone who has worked so hard to get it. Those who have given up many hours, and for some many thousands of dollars. 

I guess it just means being a part of an exclusive club now, one that will not get any larger.  

That is a really sad thing, because there are people out there that should have this cert, that are actively working towards it, and now that is going to be taken away.

 

Follow Up 8/31/13 08:23

Jen Stirrup (blog | @jenstirrup) has created a Connect item “Please don’t get rid of the MCM and MCA programs“. If these are certs that you have any interest in at all I highly recommend you go and vote it up. Being honest, it won’t mean much, but it might at least let folks understand the deep level of dissatisfaction its killing has created. 

 

 

  

Developing To Specifications

I’m a DBA. As a class of people you will find that DBAs have a tendency to rant a little about developers. I would certainly be someone that you would find in that category. The trouble is that most of the time I don’t think that it is the developers fault, it is just a case of shooting the messenger.

As an example let’s look at a new database release that was being asked.

The biggest issue was that the code was provided so late in the cycle that the issues found (and there were a great many of them) could not be fixed. Why could they not be fixed? Because an arbitrary release data had been provided to the customer and this data could not slip for any reason whatsoever. Artificial deadlines, one of the worst things that devs and DBAs have to deal with.

The developers agreed to get the code fixes added to their backlog and to get it into a sprint for fixing in the next month. So after much discussion and a firm commitment we decided to move ahead with the release.

My next question to the dev team was “how are you going to get the data out? You have a few procs here for loading data into tables, but nothing for being able to consume that data afterwards.”

The response was a little stunning:

The only requirement to get done by is to have the data written to a database. After this deploy, we are going to create a way to get the data out

Outstanding. Way to develop to requirements.

In this instance I cannot really place blame on the dev team, they are just doing what is asked of them. I think the bigger problem is sitting with the folks who are gathering the business requirements and translating those, along with timelines, up to the business.

I think that it might be time for DBAs to stop pointing fingers at devs and start holding accountable those who are causing these problems, namely the Business Analysts and Project Managers.

Who’s with me on this?

How To Log Off of Server 2012

I’m finding little things that are petty annoyances when working with Windows Server 2012.

NoLogoffOption

The most recent one of these is the lack of a logoff option when connected to a server via a remote desktop session. When using the charm you can get the option to Disconnect, Shut down or Restart the server, but not log off.

It turns out that the solution is pretty simple, but not intuitive.

Just open up PowerShell and type logoff.

SCCM & SQL Server – A DBAs Worst Nightmare

Microsoft put out some great products, no really, they do. There are any number of applications and tools available for you to be able to do pretty much anything. One thing gaining popularity recently is System Center Configuration Manager (SCCM) which can be used to provide patch management, software distribution, inventory management, server provisioning and more.

Nightmare+on+SCOM+street Freddy & SCCM – a nightmare double feature

SCCM is great for businesses that are growing and need to maintain control over the devices used and maintain compliance across the enterprise.

This is all great. Businesses use it, businesses need it. SCCM has been designed to provide a relatively straightforward deployment that does not require any strong level of expertise. This is where SCCM falls down for me, as a DBA.

What is the problem?

SCCM does its own database management. It is a set it and forget it kind of thing. This is done so that an enterprise without SQL Server DBAs can go ahead and perform the deployment and management with any specialist knowledge.

This is all good and well, except when you do have a SQL Server DBA on staff; you have multiple deployments of SQL; and you like to perform consolidate servers wherever possible.

SCCM does some things which go completely against my wishes as a production DBA:

  • Requires sysadmin on SQL Server to both install and run the application
  • Requires Windows admin rights on the SQL Server
  • Installs software on Windows to perform backups of SQL Server
  • Adjusts SQL Server configuration settings (CLR & max text repl size)
  • Enables the TRUSTWORTHY option for the SCCM database
  • Sets the database recovery model to SIMPLE

Fortunately I found a lot of this information up front and decided that there was no way I was going to try and consolidate this database with any other in my environment. The security model is lacking in the worst fashion, and there is not much worse than taking all control away from a DBA.

I was glad that I made this choice as the SCCM decided to restart SQL as a part of the installation process. That would have caused a production outage if I had attempt to co-locate it with other low used databases.

Short recommendation

Being brief….if your sysadmins are looking to deploy SCCM in your environment, ask for a dedicated VM for SQL Server. Any attempt to consolidate this database will leave you open to massive security holes and production outages.

Fun With Recruiters

I love it when I get those special kinds of emails from recruitment agencies who claim they have the perfect position. I got one of those kinds of emails last week, I thought I would share it (as well as my response).

 

Title: Front End Web Development Lead
Position Type: Direct Placement
Location: Bothell, WA, United States
Description:

Duration: 0-6 month(s)
Job Description:
Front-End Web Development Lead – Bothell, WA
Every day over 19,000 Amdocs employees, serving customers in more than 60 countries, collaborate to help our customers realize their vision. We have a 30-year track record of ensuring service providers¿ success by embracing their most complex, mission-critical challenges. 100% of Fortune¿s Global 500 quad-play providers rely on Amdocs to help them run their businesses better.
Amdocs is a ¿can do¿ company that leads the industry, is fully accountable and most importantly, always delivers. This is our DNA. Our success has been sparked and sustained by hiring exceptional people. If this sounds like you— if you have the drive, focus and passion to succeed in a fast-paced, delivery-focused, global environment– then Amdocs would like to talk with you. Amdocs: Embrace Challenge, Experience Success.
– Please Note: All applicants must be currently authorized to work in the United States without employer sponsorship now or in the future.
Role Overview:
We are looking for a Front-End Web Development Lead to be a team lead directing a multi-shore group of developers tasked with providing issue resolution support for a very large-scale web retail store. Some of the responsibilities and duties include, but are not limited to:
Interface with defect assurance team to accept inbound production issues for resolution
Direct and coordinate work of offshore development team to ensure accurate and timely resolution of front-end production issues
Interface with customer development, business, and other teams as needed to provide good service, promote team visibility and positive perception
As team grows, evaluate potential additional team candidates and support Amdocs executive management by providing expert advice as required to grow our presence with the customer and provide continuous improvement
Provide analytical support to identify, develop, and drive strategic improvement initiatives involving functionality improvements, innovation solutions, and development and implementation methodologies
Serve as trusted advisor to management and client
Work day-to-day with key client management, development fulfillment partner, QA testing organization, providing expert support to each as needed and appropriate
Support development of improved governance of production defect management, including definitions of severity, criteria for prioritization, and defect management lifecycle processes.
Requirements:
5+ years front-end web development experience
5+ years hands on experience with the following key technologies: JSP Integration, HTML / HTML 5, AJAX, CSS, JavaScript, JSON, XML, JQuery
Strong leadership skills
Preferences:
Large scale /enterprise web retail experience
Integration with ATG Commerce
Integration with Adobe CQ
Experience with other industry standard integration technologies (e.g. WebLogic)
Technical leadership experiences in relevant technologies
Telecom experience
All Amdocs roles require strong verbal and written communications skills, position-appropriate mentoring/leadership abilities, ability to quickly master new systems and/or processes, capacity to stay organized while managing competing priorities, and a deep customer service orientation, both internally and externally.

 

I’m a database guy, I’ve never been a developer let alone a dev lead, and so I replied…

 

As a solutions provider I would expect you have have some great analytics. This leads me to ask the question as to what part of my skillset or background leads you, or anyone at your company to believe that I would be a good fit for, or consider the opportunity that you list below.

 

If I ever get a response I’ll be sure to post it.

Do You Trust Your Application Admins?

I was sitting at my desk, happily minding my own business when an alert came through that a database backup had failed. Ok, backups fail, I just figured one of the transaction log backups hiccupped (we’ve been having some problems the last few days).

When I looked at the failure it was a backup trying to write to the C drive on the server.

I NEVER backup to C. It can easily fill the drive and take down the system.

A bigger indicator that something was up was that all of our backups are done across a 10Gb network to a centralized location for ease of tape backup. This indicated that someone, not a DBA, had the access to run a SQL backup.

I trawled through the permissions on the server and nobody has that level of access so I couldn’t figure out who had done this and how.

 

So What Happened?

Looking through the SQL logs I saw multiple attempts by a contractor to login to SQL, all of which failed, then about 5 minutes after the backup error came through. Interesting stuff, so I walked over to the contractor and asked what was going on.

After he was unable to login he went to the application admin who helped him out with access…using the application service account.

One of the third party applications from Microsoft some unnamed vendor has a database on that server. Due to the nature of the well designed code the database owner has to be the same as the service account of the application. The application admin knows this password (not my doing).

After logging this contractor in as the application service account the app admin walked away and left him to his own devices. As a result this contractor was dbo on a database which manages security for the entire company. We should just consider ourselves lucky all this guy did was attempt to perform a backup.

 

Preventative Actions

In order to try and prevent this kind of thing in the future I am looking at implementing a login trigger for the service account which checks the host and application connecting and denying access to anything not in a specifically approved list. There is also a conversation going on to possibly disable interactive logons for the service account using a group policy at the domain level.

 

It is a Matter of Trust

While the application admin is obviously at serious fault here it leads to a question of how well do you trust your admin team?

Domain admins will be able to access your SQL Servers (get over it, there is no way you can keep them out, if they really want in there are numerous ways for them to do so).

Anyone with a password could share that with someone else and allow them to access your servers.

Ultimately you have to trust those that you work with to do the right thing. It’s always sad when those people let you down.