Lawks, another derp durr moment.
A plumber with a blowtorch is the enemy of the data centre
Welcome again to On-Call, our regular week-ender in which readers share their tales of possibly-career-ending errors. This week, reader “Harry” tells us he's done support for the same global company for the last thirteen years and in one role found himself doing network support. One cold, cold winter's day, that gig took him …
COMMENTS
-
-
Friday 2nd September 2016 13:29 GMT Anonymous Coward
Building Codes
Apparently where "Harry" works, they don't have building codes.
Where I live/work, it's a violation of building electrical codes to run copper data cables between buildings, due to both power ground loop issues and lightening issues. All data cables run between buildings have to be optical fiber (or wireless connection). Even between trailers parked next to each other.
-
This post has been deleted by its author
-
-
This post has been deleted by its author
-
Friday 2nd September 2016 17:53 GMT Phil O'Sophical
Re: Building Codes
Here in the UK, most of the internet to homes and businesses is by cable and twisted pairs. These pairs and cables go from one building to another!
Twisted pair is OK since it's isolated from ground, but running co-ax between buildings on different mains supplies has problems, especially if they don't have a common earth/neutral wire. The difference in earth potential might only be 2-3 volts, but a few hundred metres of co-ax might have 0.01ohm or so of resistance via the sheath. 2v across 0.01 ohms leaves 200A flowing in the cable sheath, which will melt most co-ax.
-
-
-
-
-
-
Friday 2nd September 2016 08:11 GMT foo_bar_baz
Been there
Oh look, a cable with a connector.
Oh look, a box with blinkenlights and sockets.
Perhaps they fit...
Perhaps they should be connected ...
Why is the internet broken?
Lessons learned:
1) Use "proper" managed switches with loop protection, even at the edge.
1.1) No desktop switches.
1.2) Keep the switches locked up.
2) Monitoring - SNMP is your friend.
-
Friday 2nd September 2016 08:48 GMT Alan Brown
Re: Been there
> 1.1) No desktop switches.
On a lot of managed switches you can limit the number of clients to _1_ - that prevents unauthorised desktop switches or APs from working.
If you've got a radius-controlled setup (packetfence, etc) then you can enforce 802.1x or lockout unauthorised MACs (this doesn't protect against MAC spoofing, but not all kit supports 802.1x)
-
Friday 2nd September 2016 09:15 GMT Lee D
Re: Been there
I *still* get cabling engineers etc. who talk to me about network loops on large networks.
I have to educate them all the time.
I have had to say "You run the cables, I will take responsibility for the networking 'loops'" to at least one cable-runner who thought they knew better.
Seriously, who the hell DOESN'T run spanning-tree nowadays? Hands up? You're idiots.
And then when you try to explain LACP where you WANT several cables or fibres all going between the same two switches or devices, you just have to give up and say "Look, just do it."
Who doesn't use LACP on their main backbone with all their spare fibre cores? Hands up? Guess what...
(a) Redundancy (great when you need to repatch a cabinet without actually taking it offline - just disconnect only one patch lead at a time on the uplinks).
(b) Accidents.
(c) Bandwidth increase.
(d) Just about any managed switch in the last 15 years supports it just fine.
-
Sunday 4th September 2016 19:13 GMT Alan Brown
Re: Been there
"Seriously, who the hell DOESN'T run spanning-tree nowadays? Hands up? You're idiots."
Spanning tree was (and is) great for what it's intended to do, but the stories of spanning tree storms are legion as networks grow well beyond their original designs.
TRILL is cheap enough that if you need mission-critical networking or exceeding 6 spans it should be included automatically in your core - and it has the nice side effect that you no longer need to mess with LACP between the switches.
-
-
-
-
Friday 2nd September 2016 14:56 GMT Lee D
Re: Been there
But if you'd ran another cable, any other cable, even in a circle around the site?
Then one cable cut wouldn't be capable of cutting you off, and "network loops" can't happen on modern systems.
Wire your sites in circle topologies and it takes two independent cuts of critical cables - plus all their spares, extra fibre cores and parallel cables - in order to actually stop everything talking to everything.
Think that costs more than necessary? Then consider if you're using VoIP, your network is how people dial 999. And if you have any kind of linear or star topology, it's easy to run additional connections between existing switches that aren't directly connected to get the same effect.
"But my cable was cut" isn't an answer. Always have a backup.
You put in redundant disks, redundant servers, redundant storage, etc. And then you put one cable between those systems and the people who need them? Put in a redundant cable. And if you use it properly it will boost inter-switch bandwidth under normal operation (e.g. LACP) or it will provide alternate routing in the event of a cable cut (STP/RSTP).
-
Saturday 3rd September 2016 07:51 GMT Anonymous Coward
Re: Been there
A University in Hampshire has gone fully VoIP, and with one swift stroke lost their entire safety net, even the highest risk labs are VoIP only, the trades unions have asked for hard wired twisted pair safety phones that won't fail because someone cuts a cable with a digger and kills the entire network (yes it is that bad), but moving the data centre off site was more important.
Then there's the Trend 'engineer' who took down an entire microelectronics building by plugging a lead into a switch in a Trend outstation, and they didn't even know he was onsite.
As for stupidly running cables, the welsh electrical wizards working for a contractor, I'll call them Bloody Liars Limited, ignored all the planned raceways above a new clean room and used the shortest routes, the service walkways, so now servicing the hepa filters is all but impossible, access in some parts is by flat down belly crawl, but they saved loads of cable for spidge... And the maths prof dvc in charge of the project hailed it as a great success, the explosive plumbing is another issue.
-
Saturday 3rd September 2016 10:48 GMT John Brown (no body)
Re: Been there
"risk labs are VoIP only, the trades unions have asked for hard wired twisted pair safety phones that won't fail because someone cuts a cable with a digger and kills the entire network"
And when it's all resolved, the money allocated and the work orders raised, someone will think it's cheaper and easier to put the POTS wire down the same ducts as everything else. For extra bonus points, break the network while putting the POTS in.
-
-
-
-
Friday 2nd September 2016 08:11 GMT Dazed and Confused
At least the switch was still there
About 10 years ago I got a call from a customer I looked after, the guy on the phone was complaining his Internet connection wasn't working (major panic it must have cut his porn feed).
I couldn't ping the site, not even the BT router.
So I jumped in the car and drove down to take a look.
The door next to the guy with the missing link was the one into the data centre, I peered through the window, nothing, I mean nother, there was nothing in the data centre, it had all gone!
Err John where's everything gone?
Oh they came and took it all away yesterday.
Well why do you think your Internet connection doesn't work?
-
Friday 2nd September 2016 10:26 GMT Chris King
Re: At least the switch was still there
BT took all of their kit away ? Did somebody not pay the bill or something ?!
Getting BT to haul away their old crap usually means having to dump it on them next time they turn up to fix something, and it's usually followed by howls of "Awwwww, do I have to ?"
-
Saturday 3rd September 2016 19:17 GMT Dazed and Confused
Re: At least the switch was still there
OK, on closer inspection the computer room wasn't quite empty.
The BT router was leaning against the wall, it wasn't connected to anything and the cable had just been pulled physically out of the connector.
There was also a VA disk array which presumably no one wanted enough to pay the shipping costs.
The "they" were the people who owned all the kit, "the bosses from the US"
I suspect that the John was part of the reason they'd taken all the kit away.
(PS. Oh bo!!065s the spelling checker in this copy of Firefox is F*&^ed)
-
-
-
Friday 2nd September 2016 08:22 GMT Simon 4
Had this once.
Builder was replacing flat roof while I was on holiday.
Came back home and found wifi wasn't giving any traffic.
Looked out the bathroom window to find the Ethernet cable from garage to loft had been blow-torched.
IP65 box and punch-down junction box solved the problem without having to re-run the cable.
-
Friday 2nd September 2016 08:26 GMT Joe Werner
Cable woes...
One of my mates is doing support on one of the higher tiers of an infrastructure provider - they guys 'n' gals you talk to if you are $big_customer (and quite up from the 'restart the machine' ..."help"). His favourite is when whole segments go dark - because copper thieves pinch the fibre. Happened quite a lot a few years back when copper prices were way up.
Copper thieves also stole piping out of my university. Unfortunately this was connect to the He supplies (we have a closed system to re-liquify the stuff for cooling purposes). Damage in copper was low. We lost a lot of He (which is *f'iing expensive*).
(icon 'cause that's how they try to extract the copper from fibre cables)
-
Friday 2nd September 2016 10:20 GMT Triggerfish
Re: Cable woes...
Quite a few years back when doing some surveying work for Nynex I recall a large fibre cable being cut, as it was explained to me workmen in the building, had encountered it as an obstruction, a large cable, that was apparently armoured and buried with extra protective sheathing. Their solution to the protection was a bigger saw.
-
Friday 2nd September 2016 11:37 GMT Anonymous Coward
Re: Cable woes...
(and quite up from the 'restart the machine' ..."help")...
I have dealt with these types of provider and I have been asked if it would be lot of trouble to reboot mainframe of a multinational merchant bank in order to test a an out of country WAN link really wasn't working...
I suggested this was not a good idea and it would be great if perhaps one of the engineers could take a look...
I have not noted much difference in the approach of $big_provider in my experience...
-
Friday 2nd September 2016 13:58 GMT Skoorb
Re: Cable woes...
I once made the mistake of trying to explain to someone that you don't even "boot" a mainframe, you execute an IPL. And anyway, that isn't going to help with what you are describing as an OS problem as the thing has to run multiple OSs in multiple LPARs just to actually turn on properly.
"But that makes no sense, it has to be the hypervisor, just reboot the thing".
ಠ_ಠ
-
-
Sunday 4th September 2016 19:18 GMT Alan Brown
Re: Cable woes...
"Unfortunately this was connect to the He supplies (we have a closed system to re-liquify the stuff for cooling purposes). Damage in copper was low. We lost a lot of He"
Too bad you can't have a continuously circulating (or just capped) set of pipes containing hydrazine running alongside.
(Who me? Evil death dealing b'stard?)
-
-
Friday 2nd September 2016 08:32 GMT Sir Runcible Spoon
DC Deluge
Back in the late 90's I was hopping around Europe evaluating DC's to extend our network to and encountered the most bizarre setups you could imagine, one place even had a swimming pool on the roof that was plumbed into the cooling system - genius :)
However, this wasn't the place that flooded. That place was the one where we had put some pilot kit and were prepping the whole place with raised flooring etc. when we got a full on DC outage alert - including power.
When we inquired as to the cause we were told the place had flooded, like it was under a foot of water! This was a surprise because the place was fairly high up compared to the surrounding area.
Turns out that a moronic truck driver had driven into the corner of the building, causing a small section of it to collapse. Along the way he had also managed to take the head off a water point (the kind fire engines to connect to) which was now merrily spouting water upwards and then being diverted into the newly opened hole in the wall by the underside of the truck, the front of which was lodged 3 feet inside the DC!
It was so bizarre I was convinced they were winding me up, but they sent pictures to prove it :)
-
Friday 2nd September 2016 08:52 GMT Alan Brown
Re: DC Deluge
" Along the way he had also managed to take the head off a water point"
This has been frequently cited to me by fire brigades and various civil engineers as an outstanding reason why water points should always be under a pavement access plate.
Of course if one was to do this in the USA, dog bladders would explode.
-
Friday 2nd September 2016 12:57 GMT I ain't Spartacus
Re: DC Deluge
This has been frequently cited to me by fire brigades and various civil engineers as an outstanding reason why water points should always be under a pavement access plate.
Of course if one was to do this in the USA, dog bladders would explode.
Not just that, Think of all the Hollywood car chases that would be ruined!
With any change, one must assess the all-important unintended consequences.
We once dealt with a flood where the period of the waves in a 200,000 litre water tank managed to exactly line up, so that enough water would disappear down the other end to cause the float-valve to open - and that slug of incoming water nicely added to the returning wave in a lovely feedback loop. All until the many tonnes of water hit the end of the tank so hard it simply flew off across the room. The basement suddenly became a very wet place indeed...
-
Friday 2nd September 2016 13:52 GMT W4YBO
Re: DC Deluge
"...why water points should always be under a pavement access plate.
Of course if one was to do this in the USA, dog bladders would explode."
That nut on top of a US style fire hydrant connects to a shaft extending down to a valve that's mounted in the (well buried) street water line. Except in the movies, a vehicle running over a hydrant rarely causes a gusher. Although, I did see one fairly pulled out of the ground by a fire truck that ran over it slowly.
-
Saturday 3rd September 2016 22:17 GMT david 12
Re: DC Deluge
Where I live, we have a mixture or ground-level and above-ground stand-pipes. Both constructed in the usual way, with a ball valve underground on the main pipe. The ONLY reason I ever see a gusher is a vehicle running over a hydrant. Invariably it is replaced with a ground-level connection.
I also sometimes see cracked pipe-- typically an even greater water flow, but not fountaining up into the air.
-
-
-
Friday 2nd September 2016 08:40 GMT Anonymous Coward
A Remote Job Entry device was temporarily installed on the floor above the computer room comms unit. The connection was full-duplex using 2400bps over two lengths of domestic twin flat cable. One day it stopped working - although the lights on the RJE showed it was being polled and was apparently replying. This was in the days before network monitors - so rather primitive diagnostics in the mainframe established that the RJE's replies were not reaching the comms unit.
The pair of twin flat cables came into the computer room via a hole in the ceiling - then trailed some distance across the floor to the comms cabinet. It was eventually spotted that the cables suddenly went under a floor tile - and reappeared on its other side.
The false floor construction was unusual as it was a matrix of steel bars that supported the tiles along all four edges. The floor tiles also had an underside of sheet steel.
Someone had obviously lifted the floor tile and then ignored the wires they had trapped when replacing it. To get the tile flush they would have had to jump on it very hard. The steel bar and sheet then acted as a guillotine - shearing the return signal's cable.
-
Friday 2nd September 2016 09:54 GMT Peter Gathercole
Fun with Serial lines.
Two examples, both from the same place.
First. Wired corridor in a Polytechnic for Acorn Econet (bussed RS422 serial network used for BBC Micro's). One day, the network stops. We check each of the access ports (very basic 5 pin DIN connector in a box soldered directly to the wires in the cable to reduce contact resistance in connectors). All OK. Terminator, OK. Meter across the wires shows a short. Eventually tracked it down to a staple carelessly driven through the cable to 'tidy it up'.
Secondly. Camtec X.29 PAD used as an RS-232 terminal switch. In order to get it's attention from the PDP-11, it was necessary to generate a communication break (data line connected to ground for a second or so). DZ-11 or the comms software (can't remember which) could not do this, so I created an interposer that consisted of a 25 pin male D-shell connected to a 25 pin female D-shell with soldered wires between the pins, and the two held together with long bolts with several nuts holding everything in place, and a press-to-make, release-to-to-break switch between pin 7 and pin 2 (or was it 3). One day, PDP-11 gets slower and slower, and eventually stops. Reboot, everything OK for a while, then the same thing happens. Looking in the log, it was reporting data over-runs on one of the DZ-11 ports.
Turns out some vibration had loosened the nuts, one of the D-Shells had moved, causing a short from pin 2 to pin 3 (data out and data in or vice-verca) in the wiring to the press switch. Login banner sent from PDP-11 came back as if typed from the terminal, which generated errors and a new login banner. Eventually system was so busy fielding exponential amounts of data that it ground to a halt.
Moral. Good wiring is important, good soldering equally so.
-
Friday 2nd September 2016 12:30 GMT Martin an gof
Re: Fun with Serial lines.
Slightly OT, sorry.
Wired corridor in a Polytechnic for Acorn Econet
Had a room full of Econet at the Poly I attended in the late 1980s(*). Place was mostly VMS but had a few rooms with alternatives in - some XTs, some '286es (optical mice on optical mats!), some Apollos, and a room in the maths department full of Archimedes.
The Econet was in the engineering block, and was used to teach 6809 programming. Acorn buffs may be wondering how, given that the BBC Micro was 6502 and although there were several second-processor units available, the 6809 was not one of them (as far as I'm aware).
The lecturer in question had built his own second processor. Full marks for initiative, but the power supplies in the second processor units were so flaky that if you had to - erm - "hard reset" one of them for any reason, you had to make sure everyone else in the room had saved their work first. I'm not entirely sure how, but resetting a second processor unit would at the very least "take out" the Econet segment (usually meaning that people couldn't reach the file server any more), and would often (presumably via a mains spike) cause other 6809s to freeze.
It wasn't a heavily-used lab, and it paid to get friendly with a technician who could let you in when it wasn't being used so that you had the place to yourself. Not that I ever did.
M.
(*)When said Poly converted to a university after I left, the sign on the nearest dual carriageway read - for a couple of years - "University of Glam". I wish I'd taken a photo.
-
Friday 2nd September 2016 14:06 GMT Anonymous Coward
Re: Fun with Serial lines.
"[...] the sign on the nearest dual carriageway read - for a couple of years - "University of Glam". I wish I'd taken a photo."
When the ICL brand name was finally taken out of service all the old buildings' external signs were changed. During the change there was an ICL sign on the ground - alongside a neat vertical stack of individual letters ready to form "Fujitsu".
At little later it was noticed that the letter "F" was now on its own on the ground - exposing the "U" at the top of the pile. All in a neat row reading from above - "F" "U" "ICL".
-
-
Friday 2nd September 2016 13:32 GMT Anonymous Coward
Re: Fun with Serial lines.
"Turns out some vibration had loosened the nuts"
A video terminal on a naval air base kept losing its sessions intermittently. It was connected to the network by "thin" Ethernet. It was eventually found that there was a poor connection inside the wall socket.
That sounds simple - but the vibration had to be quite excessive to produce the fault. Someone finally twigged that it only happened when a Harrier jet was doing a vertical take-off/landing on the concrete pad outside that building.
-
-
-
Friday 2nd September 2016 19:07 GMT Anonymous Coward
"An up vote is yours for the mention of RJE."
The aforementioned RJE terminal was being tested preparatory to being installed in an office about a mile away. It was intended to replace the existing system - where a man on a bicycle with a traditional big basket on the front would ferry cards and printout.
It eventually transpired that the man on his bicycle was usually quicker than this latest bit of IT kit.
-
-