Welcome!

Linux Containers Authors: Liz McMillan, Elizabeth White, Zakia Bouachraoui, Pat Romanski, Stefana Muller

Related Topics: Linux Containers

Linux Containers: Article

rsync and the Unsung Command Line

How to use rsync to keep data on your Unix computers synchronized perfectly

(LinuxWorld) -- This week's topic is a salute to the command line. It was inspired by a reader named Kevin, who recently brought to my attention some interesting limitations of Windows XP's new feature called "fast user switching." In case you missed the hype, "fast user switching" is Microsoft's name for "multi-user system." It lets more than one user log into Windows XP at the same time on the same machine.

For more details on Kevin's observations, see the resources section for a link to his comments posted on VarLinux.org. You'll also find a link to a column I wrote about fast user switching, in which I assumed that Microsoft had finally delivered a multi-user system with Windows XP. I've also written up some speculation on the matter in my Computerworld column for January 14, so when the 14th rolls around you may want to pay a visit to www.computerworld.com and browse through the columnists section for that particular article. The bottom line is that I admit I was wrong to assume fast user switching is Microsoft's delivery of multi-user capabilities. Based on Kevin's observations, it's obvious Microsoft still hasn't turned Windows NT into a true multi-user system.

How does this relate to the command line? Rather tangentially, I must admit. Bear with me as I walk you through the twisted thought process that led me there.

One of Kevin's observations is that you cannot use fast user switching if you are also using the Offline Folders feature of Windows XP. Offline Folders is a Microsoft Exchange feature that works something like the Briefcase in Windows 9x. It allows you to work on documents that are normally stored on a network even when you are disconnected from the network. When you reconnect, the documents are synchronized through local replication. You can check the resources section for links to the full descriptions of Offline Folders on Microsoft's site, but to quote the relevant section (OST refers to an Offline Storage file):

"If both the offline information store and the information store on the server have changed at the time they are synchronized, changed data on the OST is first copied to the server, and then changed data on the server is copied to the OST. If this is an automatic synchronization -- that is, it is occurring because the user has reconnected to the server -- data is copied in only one direction from the OST to the server, regardless of whether data on the server has changed."

 

Money for nothing

There are two things that strike me as odd about the Microsoft approach to this problem as compared to Linux or any other Unix. First, why does one have to purchase Microsoft Exchange to get this feature? I can accomplish the same task several different ways in Unix with free software that isn't even remotely related to a message store or even a database.

The obvious answer is that Microsoft is using the Offline Folders features as one of Microsoft's many crowbars that are designed to crack open its customers doors (not to mention their wallets) and shove Exchange into their enterprise.

There is no such agenda in Linux, so there is no need to complicate the process of synchronizing documents by wedging the folder replication feature into a message store. There are several ways you can accomplish the same goal in Linux, but the one that seems closest to duplicating the function of Microsoft's Offline Folders is rsync, which is the utility Kevin mentioned in his VarLinux.org comment.

I don't often find the need to work offline, but I happen to use rsync to synchronize my entire home directory across three machines. The rsync utility is handy for this purpose because it's very quick after the first synchronization is done. That's because rsync only copies information that has changed. This is similar to using the command cp -u, which only copies files that have been updated, except that cp checks the file date and rsync uses a checksum algorithm to see which files have changed.

I also use rsync to make a local backup of the entire Documents directory tree on my file server. While I don't happen to work in disconnected mode very often, I could certainly do so thanks to this rsync process. If for some reason my file server failed, I could continue to write new columns and modify old ones with the assurance that all these files would be synchronized properly when my server came back up.

 

How it is done

One of the advantages of using command-line utilities over GUI configuration tools is that GUI tools tend to confine your options to whatever the GUI designer imagined you would want. The command line gives you almost unlimited options as to how you want to manage any given administrative task.

For example, I could create a shell script that synchronizes the Documents directory tree and place that script in my startup folder for KDE. That way I would be assured that the local Documents would always be synchronized before I could start up the KDE word processor, Kword.

If you're only interested in synchronizing the files once per day, you could instead create a cron job that runs the script once daily. In the case of Debian (and probably many other distributions), you can simply place the shell script in the directory /etc/cron.daily.

The shell script might look something like this:

 

#!/bin/bash

PATH=/usr/bin:/usr/sbin:/bin:/sbin USER="me" RSYNC_PASSWORD="secret-password"

echo Synchronizing Documents

rsync -bHlpogtr /var/Documents/* myserver::Documents rsync -bHlpogtr myserver::Documents/* /var/Documents

This script does a two-way synchronization that mimics the behavior of Microsoft's Offline Folders. It first copies any new files or changed files from the client to the server, and then copies anything that has changed at the server back to the client. Personally, I do the synchronization in reverse of this order, since I tend to work with files on the server and only store copies on my client as a backup.

The long list of command-line switches (-bHlpogtr) tells rsync to do things like recurse through directories, preserve the user and group ownership, and other options. You can browse through the various options with the command man rsync.

You may be uncomfortable with the fact that the rsync password is integrated into the script itself, and justifiably so.

This is only necessary because the process of synchronization is automated. If you use rsync interactively, you probably want to use secure shell (SSH) as your transport, in which case rsync does not allow you to automate the process of entering a password. If you use SSH (or even rsh - remote shell) with rsync, you need to be there to type it in yourself. You can only automate the process of entering a password if you are using rsync to talk to an rsync server.

If you're going to automate the process, you'll have to set up an rsync server at the other end, configure the rsync server to recognize passwords, and then store the password somewhere on your local machine. Fortunately, rsync has an option called --password-file that allows you to store the password in a file that you can restrict to root access and hide somewhere. That isn't a perfectly secure solution, but you may prefer it to including the password in the cron script itself. If so, then you probably want to configure your script to look more like the following:

 

#!/bin/bash

PATH=/usr/bin:/usr/sbin:/bin:/sbin USER="me"

echo Synchronizing Documents

rsync --password-file=/home/me/.rsyncpwd -bHlpogtr /var/Documents/* myserver::Documents rsync --password-file=/home/me/.rsyncpwd -bHlpogtr myserver::Documents/* /var/Documents

In the above example, you'll have to create a file called /home/me/.rsyncpwd that contains the text secret-password.

 

The rest of the rsync configuration

I only wanted to make the point about the superiority of command-line processes versus GUI administration, but in case you're interested in using rsync and haven't learned how to use it, here's the rest of what you'd have to do to make the above script work.

First, you need to make sure that your server (called "myserver" in this example) is set up to run rsync as a daemon. You can simply run rsync from an initialization script with the --daemon option, but I prefer to use the inetd approach. If you do, too, here's the line you want to add to /etc/inetd.conf (assuming, of course, that rsync is located in the /usr/bin directory).

 

rsync   stream  tcp     nowait  root   /usr/bin/rsync rsyncd --daemon

You need two more files to make this work: /etc/rsyncd.conf and /etc/rsyncd.secrets. The first file will look something like this (run man rsyncd.conf for more details):

 

[Documents]
uid = me
gid = me
path = /var/stuff/Documents
comment = All server-stored documents
secrets file = /etc/rsyncd.secrets

Then you'll need to specify a user and password for the rsync user called me in rsyncd.secrets. The entry should look something like this:

 

me:secret-password

That's all there is to it. Restart inetd and you should be able to run your synchronization script at the client.

 

Getting really twisted

My coverage of rsync was inspired by the need to duplicate what Microsoft offers through Exchange, but it's hardly an ideal example of how flexible and powerful the command line can be when compared to a GUI administration tool. You can do so much more at the Unix command line. While it may be possible to duplicate the functions by designing a flexible GUI interface, I can't imagine why anyone would bother doing so, since it would require a significant effort that wouldn't pay off in the end.

For example, here's a command I used to extract list of unique host names of computers that probed my Web servers for nimda and other similar Windows Internet Information Server security holes. (My servers run Linux, so they are immune to such probes, but I was curious as to how many probes I was receiving per day. During the height of nimda's popularity, I received at least 30,000 probes the first weekend.)

 

egrep --regexp="^.*\.(exe|dll|ida).*"  \
/var/log/apache/access.log | cut -f 1 -d ' ' | sort | uniq

The above command searches the file /var/log/apache/access.log for log entries that contain any of the following file extension strings: .exe, .dll, or .ida. If it finds a match, it will output only the first field from that log entry, which is the host name of the computer that probed the web site for vulnerability to the nimda worm. It then sorts the output of all those sites, and eliminates any duplicate entries.

The power here lies in the ability to pipe the output of one command to another, then to another, and so on, so that you end up with the results you want using a single command. One of the coolest portions of this command line is the cut -f 1 -d ' ' part. As the egrep command finds matching text lines from the access.log file, this cut command cuts out the desired "field" from the text line. The -f 1 tells it to grab the first field, which is where the host name can be found. The -d ' ' portion tells it that all the fields in this text line are delimited by a space. Obviously, cut is a very powerful command, since it allows you to grab just about any information imaginable from a text line as long as you know how that text line is formatted. And if cut doesn't cut it for you, then there's always sed, a more powerful stream editor that lets you apply very complex search conditions to extract text from the output of a prior command.

In conclusion, I'll admit that GUIs are wonderful and I enjoy using the extremely powerful KDE for most of my work. I'll even confess that I'm a sucker for things like the "mosfet" theme for KDE that adds features like translucent menus and Macintosh-like liquid components. (For information on how to get these features, see resources.) When it comes to getting serious work done, there's no administration tool that compares to the Unix command line.

More Stories By Nicholas Petreley

Nicholas Petreley is a computer consultant and author in Asheville, NC.

Comments (1)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
Moroccanoil®, the global leader in oil-infused beauty, is thrilled to announce the NEW Moroccanoil Color Depositing Masks, a collection of dual-benefit hair masks that deposit pure pigments while providing the treatment benefits of a deep conditioning mask. The collection consists of seven curated shades for commitment-free, beautifully-colored hair that looks and feels healthy.
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
We all love the many benefits of natural plant oils, used as a deap treatment before shampooing, at home or at the beach, but is there an all-in-one solution for everyday intensive nutrition and modern styling?I am passionate about the benefits of natural extracts with tried-and-tested results, which I have used to develop my own brand (lemon for its acid ph, wheat germ for its fortifying action…). I wanted a product which combined caring and styling effects, and which could be used after shampo...
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.