Are you having trouble upgrading your hardware?

Are you running out of disk space (storage space), are you just completely lost in your computer, and don’t know what to do to fix your problem? Well let me tell you, I have yet to find a problem, or task to complicated that given enough time, I could not over come. Below is the ongoing task of a complete redesign of a web – database structure that is being implemented on a very large client of the company I work for; and all this while the client is using the computers to generate revenue, and serve their clients. If you think upgrading your single hard drive on your computer is difficult, stay tuned to this post to see the outcome of the complexity and uniqueness of the task below.

And on a side note, even people that are in the field of IT / network structures can not comprehend HOW to make this migration / upgrade.

The task:
Upgrade server from 250GB to 2TB hard drive on 2 separate servers. Estimated time to completion, 3 hours.

The Structure:
Server 1 (xen1) – the primary server
Running CentOS as Virtual Host (VH or VS), has 3 Virtual Machines (VM). Each machine has 2 logical volumes (LV) to create separate drives for each VM. Two LV’s are connected using DRBD.
Web server is running and is live environment, meaning that the client is using this VM to generate revenue.
SQL server is running and is live environment; this VM is cloned using DRBD onto Server 2.

Server 2 (xen2) – the backup server
Running CentOS as Virtual Host (VH or VS), has 3 Virtual Machines (VM). Each machine has 2 logical volumes (LV) to create separate drives for each VM. Two LV’s are connected using DRBD.
Web server is running and is used for testing.
SQL server is DRBD cloned and is not running; used for manually fail-over.

My assessment:
The overall structure and layout of the above is completely wrong, and is useless in any event. The manual fail-over allows for downtime, which is not good practice in an real-time raid environmental. So a complete redesign was in order just to perform the task of upgrading the hard drives.

New Task:
Design a complete fail-over load-balanced redundant structure to allow for optimal uptime, and allow for ease of upgrading hard ware in the future. The new design is outline in the following image. Estimated time for completion: 5 weeks

And here are the steps I performed in order to make this structure designe change happen, in real time with minimal downtown.

1. Verify no traffic to test server at all. ntop, iftop, tcpdump, etc. tools were used.
2. Turn down test web server and wait a few days. (Let’s see if any calls come in about that machine)
3. Turn down the physical machine.
4. Add 2TB drive to second bay, and boot using Gentoo.
5. dd if/dev/sda of=/dev/sdb bs=1024; and wait
6. Turn up physical machine.
7. Create new LVG’s and LVM’s for new device structures.
8. dd if=/dev/mapper/VG_test_os of=/dev/mapper/VG_new_test_os
9. dd if=/dev/mapper/VG_test_data of=/dev/mapper/vg_new_test_data
10. Modify config files to use new devices in test os
11. Boot up the new test vm…modify.. reboot..
12. Inform client of new test machine IP, and wait for their conclusion to continue the migration/upgrade

This completes thehard drive upgrade on the backup server. Now one point is that since I cloned the original drive over to the 2TB drive, everything is working as before. The manual redundant SQL Server is still using the DRBD structure for full clonning, however, this will be replace with a clustered DB structure on a larger LVM. (This one is going to be tricky… no more DRBD). The test server was upgraded to a larger LVM and once booted, the drives were then expanded using the Windows OS to use the complete drive.

Once confirmation from the client is received that the temp server is ready to go, we move onto the fun stuff, which kinda moves pretty quickly into the new structure.

FAQ:
Some people will of course have questions that they want to ask, so before hand, I’ll go ahead and answer some of the major questions here. If you have a question and it is not listed / answered below shot me an email dscott at ucann2 dot org.
Q. How does the change in structure affect the developers and how code is developed and rolled into production?
A. Nothing changes on the end-user side. The IP that any developer will use to roll code out into production and that process will remain the same.

TO DO:
1. adjust proxy to hit temp machine
2. turn down prod machine
3. clone through the network prod(xen1) to prod(xen2)
4. turn prod(xen2) + expand drives
5. rotate sql server from xen1 to xen2 (shutdown sql (xen1) drbd secondary sql-xen1; drbd primary sql-xen2 turn up sql-xen2) [WOW!]
6. TEST LIKE HELL! (remember this is production envirnment!)

COMMANDS to remember

`nc` changed, no longer need -p when using -l “nc -l ” NOT “nc -l -p ”
On the Receiving server
root@xen2 $ nc -l 9901 | dd of=/dev/dvice_name

On the sending server
root@xen1 $ dd if=/dev/device_name | nc 9901

 

Bochs image the hard way…

So, I’ve been playing around with some emulators running on my iBook. My iBook is a power pc G4, with 1.5 gigs or ram.
Not many emulators will run on ppc, but there are 2 that work real well. QEMU aka Q and Bochs (pronounced box).

So after playing around QEMU, and finding that 1 cpu is the same as a 7mHz processor, up the processors to 4 and you get 700 mHz. But the performance on a G4 is sloppy, so say the least, installing Windows, took around 14 days, for a complete install, and another day and a half to actually go into the start menu…Bah Hum Bug!.

So across I run Bochs (pronounced box). But I never got the preinstalled images to work, except for the DOS one. Well.. that’s not going to work, I need a linux box, (don’t ask me why, as I don’t know why). So I tried booting from cd, no go, tried dd the image to an ISO, and again no go.. ok, so here’s how I did it.. and it is a pain.. BUT it is a full blown 20 gig Linux installation with working X11 and mouse.

First thing is first, you need the right devices…

1. laptop hard drive
2. usb connection for the laptop hard drive
3. a working pc, not a mac with a bootable cd drive. with a drive that can be used to store the image temp.
4. you favorite os installation, (linux for my methods)

And now the steps….
WARNING, this will erase the hard drive of ALL contents
1) Connect the hard drive using the usb connection to the laptop.
2) Start up the laptop using the bootable CD in the drive to start the installation.
Since I am using linux, I have to hit enter at the boot prompt.
3) The installation onto the USB device. (using Slackware)
a. Login into the install disk with root, bypass everything till you get to a command prompt.
b. fdisk /dev/sda or cfdisk /dev/sda [ usb devices are listed as scsi devices under slackware, sda sdb so on ]
— you may have to unplug and replug the usb device in for slack to recognize the device.
c. in fdisk, you are going to make 2 partitions.
1. /dev/sda1 bootable taking at least 10 gigs.
2. /dev/sda2 swap taking at least 1 gig.
3. write the partition table and quit back to a prompt.
d. Now on slackware you will type install. Following the prompts along the way, and remember to install to /dev/sda1 not /dev/hda1
e. Reboot, put the disc back in and boot to disc again.

f. Once you are at the prompt of the “Live CD”, now for the real fun.
since this is an image to be used in Bochs, the image once booted will be using hda1 not sda1. (confused yet)

g. mount your usb device.
1. mount -t filesys /dev/sda1 /mnt/
h. edit /mnt/etc/fstab with pico, vim, or some other editor (pico is easier)
1. look for the line that reads
/dev/sda1 / filesys defaults 0 0 or something like that.
2. change the sda1 to hda1
3. save and quit.

I. No for fixing the boot area.
1. edit /mnt/etc/lilo.conf
2. look for sda or sda1 change all sda to hda
3. save and quit.

J. Now you need to install the boot loader, again.
1. at the command prompt type
chroot /mnt /bin/bash
2. now type lilo

K. type exit to remove from the chroot.
L. umount the usb device. and mount the temp drive you can store the image.
should be something like mount /dev/hda1 /mnt note your device may be different, /dev/hda1 /dev/hdb1 etc…

M. Once the device is mounted you are ready to make the image, and this can take a while depending on large the device is. Issue the command dd if=/dev/sda of=/mnt/linux_bochs_bootable.img This command can take a while.

N. Once the image is done, you can copy it over to you mac, so you can use it. Also once the image is on your Mac, you can make any changes to the image, without having to go back to the device. You will need to connect your Mac, and your machine, you are using to make the image to a network, I used just a cat5 ethernet cable and issued the command scp linux_bochs_bootable.img ppcg4:/Users/dscott/Desktop/linux_bochs_bootable.img This can take a while also.

O. Once the image was onto the machine, I could finish setting up the bochs. You will need to play around with the bochrs.txt file to make sure that the image is used. Do not worry with the swap partition has the image, will auto use the fstab device for swap image.

NOTES:
1. On my machine, Slackware did run extremely slow, and X11 was not very functional, but for what I wanted to do, I got it to work. If you machine is faster then 2 gHz and you have more then 2 gigs of ram, then you will have no problems running your full blown slackware linux server from within bochs. And since Bochs is command line driven, there is no overhead dealing with a GUI.

2. Took me over one month to get this to work, I know it is kinda useless, but if any one wanted to image their Windows, to run on a Mac, let me know, I can get it done now, in about 3 days.

3. I know about mistakes on the page, so don’t bother me with type-o’s, however is something is missing or you don’t understand some of the steps, let me know, I can help you.. (I am not good at writing tuts….)

DISCLAIMER:
This is just a memory copy of the steps I took, some steps may be missing or out of place. These steps should only be used as a guide line, and should not be quoted as 100% accurate. I can not be held responsible for the attempt you make at doing this, and thus by losing all or any type of information that would be considered vital in your opinion. The author of this post, takes no responsibility for any other actions, based on the the reading of this material.

SHORT:
Don’t blame me, if it doesn’t work for you.