Three hour fiber outage this morning

USInternet decided to do some service-interrupting scheduled maintenance this morning.

Apparently they swapped out some fiber switches.

All-in-all I was down for three hours. Needless to say, I was not happy. So I sent them an email:

At approximately 9:40 this morning my fiber connection stopped working.

I was at work so I couldn’t do any troubleshooting right away.

I headed home around Noon and since my connection was still down at 12:20 or so I called the support line. I did get connected almost immediately to a tech, and he discovered the problem right away, so kudos for that.

The connection came back up at approximately 12:45 pm.

But.

The outage appears to have been service-interrupting, scheduled, maintenance. In the middle of the day. With no notification of your customers.

I am paying elevated prices for “Business Class” service. The term “Business Class” implies to me that it should include business class up-time, during business hours. I’m not expecting five nines of up-time. But with today’s 3 hour outage you are barely going to make three nines.

When I asked the tech if there was an email list I could get on for maintenance announcements he told me that he wished there were and that he would like to be on it too. So apparently you do scheduled maintenance and don’t even tell your front-line support people so they can answer questions.

I’ve been in IT for over 14 years now, and if I did this kind of stuff I’d get fired. This is the kind of stuff I would expect from “Jim Bob’s Internet and Bait Company”, not a large company like US Internet.

I understand that most of your customers are residential and you would expect them to be at work at 9:40 am. But you did force me into “Business Class” service since I host some web sites, and this outage did impact me. 9:40 am is pretty much prime-time for businesses.

I also understand that it’s difficult to separate business customers from residential customers in a mainly residential area. But there are quite a few small businesses in the area you serve, and it would surprise me if some of them don’t have your service and if they were not also impacted by this outage.

Service interrupting maintenance should always be scheduled for off-hours, typically around 2:00 am. Nobody likes to do work at 2:00 am, but as an IT professional I can’t count the number of times I’ve been up swapping hardware or performing software upgrades in the early morning hours. It comes with the profession.

Additionally, no matter when you are going to perform service-interrupting maintenance you should have a method of notifying your customers. Mailing list servers are easy to setup and run, and I’d rather have you over-communicate to me instead of none at all.

Up until now I’ve been very happy with the USI Fiber service. It’s fast, it has low latency, and until today it just worked. I was happy to recommend it to my friends and brag about the price and the speed.

Today I’m disappointed.

I’m hoping that you will forward this email to someone who is in position to make some better policy decisions about maintenance in the future.

At the least you should setup a notification email list that your customers can subscribe to.

Circle-Diamond-Square Redux

If you have been following along, you know that I’m trying to get my ShapeOko to be fairly accurate.

One of the tests I’m using is the Circle-Diamond-Square. I ran this test a while ago, and was not very pleased with the results.

I ran it again today. And I’m still not pleased.

cds2

It’s better than last time, but the circle is still an egg and the diamond is off a bit too.

But this time I put a square on the square and determined that it’s not square! Aha!

I need to figure out why it’s not square. As far as I can measure, the X and Y axis are reasonably square. Which I guess means I need to figure out a better way to measure it.

So stay tuned.


Running total costs.

All Shapeoko CNC Mill Posts.

Self-inflicted DNS Outage

So now that I’m back online I can confess to the self-inflicted DNS outage that took down my web sites and email lists from around Noon on Saturday until now (around 1900 on Tuesday.)

First off: I am hosting my domains at 1&1. I can not recommend them and will be moving all my domains away from them shortly.

The sequence of events is kind of messy, but I’ll try and lay it out here as it happened.

I host my own DNS on a server in my basement. This server is the master for the domain zone file. There are three secondaries configured for the domain: one at the company where I used to work and two hosted by a friend.

Since I changed ISPs (from CenturyLink nee Qwest to US Internet Fiber) I had to change my IP addresses. This meant that I needed to contact the admins for the secondary servers and have them update their configs to perform transfers from the new IP address. It also meant that the IP address for my name server would be changing.

This should not be a problem, except for two issues:

1. When you run a name server that lives in the same domain that it is serving for (i.e. my domain is anansi-web.com and my name server is ns1.anansi-web.com) you have to setup something called a “glue record” at the registrar. This solves the chicken and the egg issue of trying to look up the name server for the domain which is in the domain itself. I was under the impression that I had done this in the past at 1&1, but when I tried to figure out how to change it, the web site told me they don’t support glue records. WTF. This was the primary trigger for the steps I took that caused the extended outage.

2. The company that I used to work for told me that they would like to stop hosting DNS as a secondary for me. I have no issue with this, it’s understandable. But that means I need to remove their DNS server from all my domains. Again, not an issue, but still a few more changes to make.

I decided that since I could not setup a new glue record I would just move the DNS hosting to the Amazon Web Services Route 53 service. It’s $0.50/month per domain hosted there and I figure that is worth the price so I don’t have to mess around with glue records and the like. So I setup anansi-web.com at Route 53 and it was working great.

In addition I told the friend hosting the other secondaries for me that they could stop doing so, since I was moving to Route 53.

I also decided (at the same time) that since I was annoyed at 1&1 I would start the transfer of my domain from 1&1 back to GoDaddy. I know, I know. I moved off GoDaddy to 1&1 as protest for some dumb stuff GoDaddy was doing, but I know their registrar stuff works okay, and they are cheap.

This last change was the root cause of the extended outage.

When you initiate a domain transfer from 1&1, they cease all updates to the domain records. Which means that when I changed the name servers for the domain to point at Route 53, the change never went through.

When I first contacted 1&1 about the issue, they told me that it can take 24 to 48 hours for the changes to go through, so I should just wait. Which used to be true, back in 2001. But these days it is usually about 15-30 minutes before the updates hit the root servers.

When I contacted them again today, after 48 hours had passed, that’s when they decided to tell me that the changes would not go through since I had initiated a transfer for the domain.

The transfer email from 1&1 stated that the transfer would be completed on 2013-11-28 19:35:40. That’s still two days from now!

So the situation stood thus:

  1. I can’t change the name servers at 1&1 to point to Route 53.
  2. The secondary at the company I used to work for is still listed as a name server on my domain, but the IP address for my master server has changed and the secondary will not pull the zone from it. And I can’t remove it at 1&1, which means it’s still handing out old information.
  3. The other two secondaries are turned off.

I’m pretty much dead in the water. What to do?

Today I figured it out.

  1. Since I have not yet turned off the old ISP, I can send an update to the old secondary from the original IP! So I setup BIND on my Ubuntu laptop, loaded the updated zone file with the new IP addresses, plugged the laptop into the router for the old ISP, configured the laptop IP address to be the same as the old DNS server address and sent out a notify. Then I watched the logs while the old secondary pulled the zone file! Eureka!
  2. I asked my friend to setup the other two old secondaries for me again. Luckily he had just commented them out in his config files (smart man.) So I updated the zone file on my server and they updated too.

Now we are back in business!

On Thursday when the domain transfers to GoDaddy we should see no blip. I believe that I have it setup to just start pointing to the Route 53 DNS servers, so it should just work. I will be watching though, so if there is an issue it should be an easy fix.

What did I learn from this fun excursion?

Well, it’s a lesson that I seem to keep forgetting: Only make one change at a time!

If I had left well enough alone and not started the transfer to GoDaddy then I could have changed the DNS servers to point to Route 53 and there would have been minimal down time. But no, I had to make multiple changes at the same time, and that always causes trouble.

So to the people who use my mailing lists, I’m sorry. This was entirely self-inflicted, and I’m annoyed with myself for causing such a long outage.

RS Resurrection – Hiatus

hi·a·tus

noun \hī-ˈā-təs\

: a period of time when something (such as an activity or program) is stopped


Unfortunately, I live in Minne-snow-ta. The weather here has been cold and kind of yucky already. (Well, it is almost November.)

So the RS Resurrection has been put on hiatus until spring. I have a couple of items I’m going to try and get done during the winter: the battery cage needs de-rusting and painting, and the seat needs recovering. But all other work will come to a halt.

The RS has been parked against the back wall of the garage and buried again. But this time I promise to get it out in the spring and get it running!

Bad bearing in my trim router

I was all set to do some more work with my ShapeOko tonight, but when I turned on the Rigid trim router it made a terrible noise…

Sounded like a bad bearing. So I had to take it apart.

I found a new bearing online. In multiple places, for a cost of anywhere from $2 to $7. But with shipping I ended up paying about $10.

Unfortunately I managed to mangle the fan a bit while I was pulling off the old bearing, so I’ll have to try and repair that now also.

Guess I need to delay my projects now. Pewp.


Running total costs.

All Shapeoko CNC Mill Posts.